Survival Analysis

Brenda Parnin

03/02/2022

Libraries

library(lemon)
library(png)
library(EBImage)
library(tinytex)
library (survival)
library(survminer)
knit_print.data.frame <- lemon_print

Krusty the Clown Disease

You are analyzing a 16 day study of patients who have Krusty the Clown disease. Your data includes a subject ID, their survival time and a status. A status of 0 means they are alive while a status of 1 means they died of the disease.

a. Construct a survival table similar to the one in the lecture for the data. You will show your work within the table – if there is some division like 1/5 then show that like 1/5=.2 and if there is some multiplication then show the numbers being multiplied and the result e.g. .1 * 4 =0.4

df_Krusty = read.csv("Exercise 3 Part 1.csv", header = 1,fileEncoding = 'UTF-8-BOM',check.names=FALSE)
df_Krusty
##   Time #Risk #died  haz 1-haz            survival
## 1    0    10     0 0/10 10/10                   1
## 2    1    10     1 1/10  9/10          9/10 = 0.9
## 3    4     9     1  1/9   8/9       .9 * 8/9 = .8
## 4    5     8     1  1/8   7/8       .8 * 7/8 = .7
## 5    7     6     1  1/6   5/6     .7 * 5/6 = .583
## 6   12     3     1  1/3   2/3  .583 * 2/3 = .3886
## 7   14     2     1  1/2   1/2 .3886 * 1/2 = .1943

b. Plot a survival curve for the probabilities you generate in part a.

img = readImage("Execise 3 Part 1.png")
display(img, method = "raster")

La Traviata Disease

Read in Data

df_survival = read.csv("Data for part 2 of exercise 3.csv", fileEncoding = 'UTF-8-BOM')
names(df_survival)
## [1] "Group" "Time"  "Event"
df_survival$Time = as.numeric(df_survival$Time)
summary(df_survival)
##      Group            Time            Event       
##  Min.   :1.000   Min.   :   1.0   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.: 162.8   1st Qu.:0.0000  
##  Median :2.000   Median : 603.5   Median :1.0000  
##  Mean   :2.059   Mean   : 879.3   Mean   :0.6103  
##  3rd Qu.:3.000   3rd Qu.:1541.8   3rd Qu.:1.0000  
##  Max.   :3.000   Max.   :2640.0   Max.   :1.0000
attach(df_survival)

Survival Analysis

s <- Surv(df_survival$Time, df_survival$Event)
class(s)
## [1] "Surv"
survfit(s~1)
## Call: survfit(formula = s ~ 1)
## 
##        n events median 0.95LCL 0.95UCL
## [1,] 136     83    677     456    1410
survfit(Surv(Time, Event)~1, data=df_survival)
## Call: survfit(formula = Surv(Time, Event) ~ 1, data = df_survival)
## 
##        n events median 0.95LCL 0.95UCL
## [1,] 136     83    677     456    1410
sfit_1 <- survfit(Surv(Time, Event)~1, data=df_survival)

Kaplan-Meier Plots

a. Draw a survival plot that shows the survival curves for all three drugs.

sfit_2 <- survfit(Surv(Time, Event)~Group, data=df_survival)
ggsurvplot(sfit_2, conf.int=FALSE, pval=FALSE, risk.table=FALSE, 
           legend.labs=c("Group 1", "Group 2", "Group 3"), legend.title="Group: \n",  
           palette=c("blue", "green", "red"), 
           title="Kaplan-Meier Curve for La Traviata Disease", 
           risk.table.height=.15)

b. Test to see if overall there is an effect of any of the drugs on survival taken as a global set.

  • Ho: In terms of survivability, there is no difference between the three drug groups.

  • Ha: There is a survival differential between the three drug groups.

sfit <- survdiff(Surv(Time, Event)~Group, data=df_survival)
sfit
## Call:
## survdiff(formula = Surv(Time, Event) ~ Group, data = df_survival)
## 
##          N Observed Expected (O-E)^2/E (O-E)^2/V
## Group=1 37       24     11.8     12.77      15.6
## Group=2 54       25     46.4      9.84      23.3
## Group=3 45       34     24.9      3.33       4.8
## 
##  Chisq= 27.6  on 2 degrees of freedom, p= 1e-06

The Chi-Squared test statistic is 27.6 with 2 degree of freedom and the corresponding p-value is less than .05, we reject the null hypothesis, there is a survival differential between the three drug groups.

c. Compare the survival curves for each of the three drugs with each other (three comparisons) and see if any if the curves are different from each other. Note that you should be sure to adjust for multiple group comparisons.

  • Compare Group 1 and Group 2: The biggest difference between these two groups is that Group 1 had most of its participants exhaust themselves and die during the early time period of 0-300. Group 2 looks to have slowed the progression of La Traviata disease during this same time period.

  • Compare Group 1 and Group 3: Group 1 and Group 3 had very early drop in being able to slow the progression of La Traviata disease. Both groups have most of their participants exhausting themselves and dying during the time period of 1-300. Group 1 looks to not have any more participants after the time period of 1200. Where in Group 2, after the sudden loss in the begging of the study, we can see that it held steady and the drug had longer lasting effect to have its participants last over the time period of 2500.

  • Comparison Group 2 and Group 3 Group 2 and Group 3 have both long last effects from using the drug. However, Group 2 seems to be the best choice. Group 2 starts strong with very minimal deaths between the time period of 1-300. Group 2 also shows that even though both Groups reach over the time period of 2500, more particpents are in Group 2 then Group 3.