# LOAD THE NECESSARY PACKAGES
library(survival)
library(survminer)
## Loading required package: ggplot2
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
library(ggplot2)
# CHOOSE DATASET TO USE
data(veteran)
## Warning in data(veteran): data set 'veteran' not found
# DISPLAY THE STRUCTURE OF THE DATASET
str(veteran)
## 'data.frame': 137 obs. of 8 variables:
## $ trt : num 1 1 1 1 1 1 1 1 1 1 ...
## $ celltype: Factor w/ 4 levels "squamous","smallcell",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ time : num 72 411 228 126 118 10 82 110 314 100 ...
## $ status : num 1 1 1 1 1 1 1 1 1 0 ...
## $ karno : num 60 70 60 60 70 20 40 80 50 70 ...
## $ diagtime: num 7 5 3 9 11 5 10 29 18 6 ...
## $ age : num 69 64 38 63 65 49 69 68 43 70 ...
## $ prior : num 0 10 0 10 10 0 10 0 0 0 ...
# DEFINE THE SURVIVAL OBJECT
surv_object<-Surv(time = veteran$time,event = veteran$status)
# CREATE THE KAPLAN-MEIER ESTIMATE
km_fit_vet<-survfit(surv_object~1,data = veteran)
# PLOT THE CURVE
ggsurvplot(km_fit_vet,data = veteran,
title="KAPLAN-MEIER SURVIVAL CURVE",
xlab="Time(days)",
ylab="Survival Probability",
conf.int = TRUE,
risk.table = TRUE,
ggtheme = theme_minimal())
INTERPRETATION OF THE KAPLAN SURVIVAL CURVE The curve indicates a gradual decline over time indicating that the survival probability decreases as time progresses.The curve is also steep in the first 250 days indicating that a higher number of individuals experience the event within that timeline.From the curve we can also see that the median survival time(i.e 0.50 survival probability) is at 200days.
# CREATE THE NELSON-AALEN ESTIMATE (CUMULATIVE HAZARD FUNCTION)
na_fit_vet<-survfit(surv_object~1,data = veteran,type="fleming-harrington")
# PLOT THE CUMULATIVE HAZARD CURVE
ggsurvplot(na_fit_vet,fun = "cumhaz",conf.int = TRUE,
xlab="Time(Days)",
ylab="Cumulative Hazard",
title="NELSON-AALEN CUMULATIVE HAZARD CURVE",
ggtheme = theme_minimal())
INTERPETATION OF THE NELSON AALEN CUMULATIVE HAZARD CURVE The cumulative
hazard shows the accumulation of risk over time.The curve increases over
time indicating that the accumulated risk associated with the event is
growing as times increases.The first 250 days the curve is relatively
steep indicating at this period the rate of events is high.At above
250days the curve is less steep showing the rate of events has slowed
down.The tighter confidence interval(represented by the grey shaded
area) in the early days shows the estimates are more precise due to more
individuals being at risk.The wider confidence interval at later stages
show high uncertainty due to less individuals being at risk.
# THE LOG RANK TEST(COMPARING SURVIVAL BETWEEN TREATMENT GROUPS)
logrank_test_vet<-survdiff(surv_object~trt,data = veteran)
# PRINTING THE RESULTS FOR THE LOG RANK TEST
print(logrank_test_vet)
## Call:
## survdiff(formula = surv_object ~ trt, data = veteran)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## trt=1 69 64 64.5 0.00388 0.00823
## trt=2 68 64 63.5 0.00394 0.00823
##
## Chisq= 0 on 1 degrees of freedom, p= 0.9
INTERPRETING THE LOG RANK TEST RESULTS The p-value in the results is 0.9 which is greater than the required 0.05 indicating we fail to reject the null hypothesis i.e there is no statistically significant difference in survival between the two treatment groups.Also the chi square of 0 shows the survival curves are almost identical.
# THE COX PROPORTIONAL HAZARD MODEL
cox_model_vet<-coxph(surv_object~trt+age+karno+diagtime+celltype+prior,data = veteran)
# PRINTING THE RESULTS
summary(cox_model_vet)
## Call:
## coxph(formula = surv_object ~ trt + age + karno + diagtime +
## celltype + prior, data = veteran)
##
## n= 137, number of events= 128
##
## coef exp(coef) se(coef) z Pr(>|z|)
## trt 2.946e-01 1.343e+00 2.075e-01 1.419 0.15577
## age -8.706e-03 9.913e-01 9.300e-03 -0.936 0.34920
## karno -3.282e-02 9.677e-01 5.508e-03 -5.958 2.55e-09 ***
## diagtime 8.132e-05 1.000e+00 9.136e-03 0.009 0.99290
## celltypesmallcell 8.616e-01 2.367e+00 2.753e-01 3.130 0.00175 **
## celltypeadeno 1.196e+00 3.307e+00 3.009e-01 3.975 7.05e-05 ***
## celltypelarge 4.013e-01 1.494e+00 2.827e-01 1.420 0.15574
## prior 7.159e-03 1.007e+00 2.323e-02 0.308 0.75794
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## trt 1.3426 0.7448 0.8939 2.0166
## age 0.9913 1.0087 0.9734 1.0096
## karno 0.9677 1.0334 0.9573 0.9782
## diagtime 1.0001 0.9999 0.9823 1.0182
## celltypesmallcell 2.3669 0.4225 1.3799 4.0597
## celltypeadeno 3.3071 0.3024 1.8336 5.9647
## celltypelarge 1.4938 0.6695 0.8583 2.5996
## prior 1.0072 0.9929 0.9624 1.0541
##
## Concordance= 0.736 (se = 0.021 )
## Likelihood ratio test= 62.1 on 8 df, p=2e-10
## Wald test = 62.37 on 8 df, p=2e-10
## Score (logrank) test = 66.74 on 8 df, p=2e-11
INTERPRETING THE RESULTS The p-values of less that 0.05 at the end indicate that the model is highly significant.From the results we can also see the significant predictors of survival.They include the karno,the celltypesmallcell and the celltypeadeno.The karno is significantly higher indicating lower risk of death.The celltypesmallcell have 2.3669times higher hazard than the squamuos cell indicating its very significant.The Adenocarcinoma patients have 3.31 times higher hazard than squamous cell patients indicating the celltypeadeno is also very significant.The other predictors are not significant to impact survival.