Survival analysis focuses on the expected time until occurrence of an event. Some important definition in survival analysis are:
In medical research, \(S(t)\) could be the probability that an individual survives to a future time \(t\) (e.g. after diagnosis of cancer, the probability of being alive at time \(t\)).
Censoring is when the event of interest has not occurred due to some random cause (e.g. patients without an outcome). In medical studies, censoring may arise because a patient:
This is different to truncation where it is incomplete due to a systematic selection. For more details, see link.
#Uncomment to install
#install.packages(c("survival", "survminer"))
library(survival)
library(survminer)
library(ggcorrplot)
DATA <- read.delim("~/Desktop/Interview/IDNT_OUT.CSV", header = TRUE, sep = "," )
head(DATA)
AGE ALBUMIN HEMOGLOB ACR_0 gender DSCR_ESRD DSCR_ESRD_RD ESRD GFR_0 TRT
1 56 3.3 15.0 1790.5 1 0 1462 0 73.59759 0
2 47 4.0 15.3 782.1 1 0 1450 0 76.26230 1
3 58 4.1 10.6 597.2 1 0 1572 0 39.80008 1
4 70 3.9 12.9 393.4 1 0 98 0 49.17549 0
5 67 3.5 13.0 1328.8 1 0 817 0 49.61471 1
6 59 3.0 12.9 5454.5 1 1 1415 1 60.05364 0
The subgroups of interest are:
Albuminuria is a condition where albumin is abnormally present in the urine. Glomerular filtration rate is the filtering rate through the kidney.
# Treatment: Placebo = 0, Drug X = 1
DATA$TRT <- factor(DATA$TRT,
levels = c("0", "1"),
labels = c("Placebo", "DrugX"))
# Gender: Male = 1, Female = 2
DATA$gender <- factor(DATA$gender,
levels = c("1", "2"),
labels = c("MALE", "FEMALE"))
# High Albumin: Albumin >= 1000
DATA <- DATA %>% mutate(ACR_0_group = ifelse(ACR_0 >= 1000, "low", "high"))
DATA$ACR_0_group <- factor(DATA$ACR_0_group)
# High GFR: GFR >= 45
DATA <- DATA %>% mutate(GFR_0_group = ifelse(GFR_0 >= 45, "low", "high"))
DATA$GFR_0_group <- factor(DATA$GFR_0_group)
The Kaplan-Meier statistic is a non-parametric estimator of the survival function see. In this case, the event of interest is Doubling of Serum Creatinine or End-stage renal disease (DSCR_ESRD). The subgroups to compare are the patients under Treatment with Drug X or Placebo (TRT).
#surv_object
surv_object <- Surv(time = DATA$DSCR_ESRD_RD, event = DATA$ESRD)
# group by treatment
fitTRT <- survfit(surv_object ~ TRT, data = DATA)
#summary(fitTRT)
#summary(fitTRT)$table
fitTRT
Call: survfit(formula = surv_object ~ TRT, data = DATA)
1 observation deleted due to missingness
n events median 0.95LCL 0.95UCL
TRT=Placebo 568 100 NA NA NA
TRT=DrugX 579 82 NA NA NA
The median survival time (i.e. when the survival function is halved) is not achieved (NA).
# plot(fitTRT)
# abline(h = 0.5)
# Change color, linetype by strata, risk.table color by strata
ggsurvplot(fitTRT,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
#surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Warning: Vectorized input to `element_text()` is not officially supported.
Results may be unexpected or may change in future versions of ggplot2.
The horizontal axis represents time to DSCR_ESRD event in days, and the vertical axis the probability of ‘surviving’ (the proportion that do not suffer an DSCR_ESRD event). Tick marks mean that a patient was censored. The ‘Number at risk’ are the number of patients who are still ‘alive’.
The log-rank test compares the binary outcome (event) between two groups. It can be used to test whether the difference between the two survival functions is statistically significant.
logrank <- survdiff(surv_object ~ TRT, data = DATA)
logrank
Call:
survdiff(formula = surv_object ~ TRT, data = DATA)
n=1147, 1 observation deleted due to missingness.
N Observed Expected (O-E)^2/E (O-E)^2/V
TRT=Placebo 568 100 88.4 1.51 2.94
TRT=DrugX 579 82 93.6 1.43 2.94
Chisq= 2.9 on 1 degrees of freedom, p= 0.09
The Cox model expresses the hazard function \(h(t)\) as,
\(h(t) = h_0(t) \times exp(b_1x_1 + b_2x_2 + ... + b_px_p)\) see
where,
The quantities \(exp(b_i)\) are called hazard ratios (HRs). The HRs compares the probability of events between two groups. A HR of 1 means that both groups are experiencing an equal number of events at any point in time (no difference in survival). It shows how some factors influence the hazard’s rate (the rate to the ocurrence of the event).
In this case, \(x_1\) = TRT is studied within the ACR_0 and GFR_0 groups, i.e. in subsets of the data given by high/low ACR_0 and GFR_0.
## High Albumin
surv_object <- Surv(time = DATA[ DATA$ACR_0_group=="high", ]$DSCR_ESRD_RD,
event = DATA[ DATA$ACR_0_group=="high" , ]$ESRD)
fitTRT <- survfit(surv_object ~ TRT,
data = DATA[ DATA$ACR_0_group=="high", ])
# ggsurvplot(fitTRT,
# pval = TRUE, conf.int = TRUE,
# risk.table = TRUE, # Add risk table
# risk.table.col = "strata", # Change risk table color by groups
# linetype = "strata", # Change line type by groups
# #surv.median.line = "hv", # Specify median survival
# ggtheme = theme_bw(), # Change ggplot2 theme
# palette = c("#E7B800", "#2E9FDF"))
#fitTRT
fit.coxph <- coxph(surv_object ~ TRT,
data = DATA[ DATA$ACR_0_group=="high", ])
ggforest(model = fit.coxph, data = DATA[ DATA$ACR_0_group=="high", ], fontsize = 1)
summary(fit.coxph)
Call:
coxph(formula = surv_object ~ TRT, data = DATA[DATA$ACR_0_group ==
"high", ])
n= 364, number of events= 12
(60 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
TRTDrugX 0.3786 1.4603 0.5857 0.646 0.518
exp(coef) exp(-coef) lower .95 upper .95
TRTDrugX 1.46 0.6848 0.4633 4.603
Concordance= 0.535 (se = 0.075 )
Likelihood ratio test= 0.42 on 1 df, p=0.5
Wald test = 0.42 on 1 df, p=0.5
Score (logrank) test = 0.42 on 1 df, p=0.5
## Low Albumin
surv_object <- Surv(time = DATA[ DATA$ACR_0_group=="low", ]$DSCR_ESRD_RD,
event = DATA[ DATA$ACR_0_group=="low" , ]$ESRD)
fitTRT <- survfit(surv_object ~ TRT,
data = DATA[ DATA$ACR_0_group=="low", ])
# ggsurvplot(fitTRT,
# pval = TRUE, conf.int = TRUE,
# risk.table = TRUE, # Add risk table
# risk.table.col = "strata", # Change risk table color by groups
# linetype = "strata", # Change line type by groups
# #surv.median.line = "hv", # Specify median survival
# ggtheme = theme_bw(), # Change ggplot2 theme
# palette = c("#E7B800", "#2E9FDF"))
#fitTRT
fit.coxph <- coxph(surv_object ~ TRT,
data = DATA[ DATA$ACR_0_group=="low", ])
ggforest(model = fit.coxph, data = DATA[ DATA$ACR_0_group=="low", ], fontsize = 1)
summary(fit.coxph)
Call:
coxph(formula = surv_object ~ TRT, data = DATA[DATA$ACR_0_group ==
"low", ])
n= 723, number of events= 152
(61 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
TRTDrugX -0.3393 0.7122 0.1635 -2.076 0.0379 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
TRTDrugX 0.7122 1.404 0.517 0.9813
Concordance= 0.54 (se = 0.021 )
Likelihood ratio test= 4.35 on 1 df, p=0.04
Wald test = 4.31 on 1 df, p=0.04
Score (logrank) test = 4.35 on 1 df, p=0.04
# High Glomerular Filtration Rate
surv_object <- Surv(time = DATA[ DATA$GFR_0_group =="high", ]$DSCR_ESRD_RD,
event = DATA[ DATA$GFR_0_group =="high" , ]$ESRD)
fitTRT <- survfit(surv_object ~ TRT,
data = DATA[ DATA$GFR_0_group=="high", ])
# ggsurvplot(fitTRT,
# pval = TRUE, conf.int = TRUE,
# risk.table = TRUE, # Add risk table
# risk.table.col = "strata", # Change risk table color by groups
# linetype = "strata", # Change line type by groups
# #surv.median.line = "hv", # Specify median survival
# ggtheme = theme_bw(), # Change ggplot2 theme
# palette = c("#E7B800", "#2E9FDF"))
#fitTRT
fit.coxph <- coxph(surv_object ~ TRT,
data = DATA[ DATA$GFR_0_group=="high", ])
ggforest(model = fit.coxph, data = DATA[ DATA$GFR_0_group=="high", ], fontsize = 1)
summary(fit.coxph)
Call:
coxph(formula = surv_object ~ TRT, data = DATA[DATA$GFR_0_group ==
"high", ])
n= 562, number of events= 159
(8 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
TRTDrugX -0.3790 0.6845 0.1597 -2.373 0.0176 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
TRTDrugX 0.6845 1.461 0.5006 0.9361
Concordance= 0.548 (se = 0.021 )
Likelihood ratio test= 5.67 on 1 df, p=0.02
Wald test = 5.63 on 1 df, p=0.02
Score (logrank) test = 5.7 on 1 df, p=0.02
# Low Glomerular Filtration Rate
surv_object <- Surv(time = DATA[ DATA$GFR_0_group =="low", ]$DSCR_ESRD_RD,
event = DATA[ DATA$GFR_0_group =="low" , ]$ESRD)
fitTRT <- survfit(surv_object ~ TRT,
data = DATA[ DATA$GFR_0_group=="low", ])
# ggsurvplot(fitTRT,
# pval = TRUE, conf.int = TRUE,
# risk.table = TRUE, # Add risk table
# risk.table.col = "strata", # Change risk table color by groups
# linetype = "strata", # Change line type by groups
# #surv.median.line = "hv", # Specify median survival
# ggtheme = theme_bw(), # Change ggplot2 theme
# palette = c("#E7B800", "#2E9FDF"))
#fitTRT
fit.coxph <- coxph(surv_object ~ TRT,
data = DATA[ DATA$GFR_0_group=="low", ])
ggforest(model = fit.coxph, data = DATA[ DATA$GFR_0_group=="low", ], fontsize = 1)
summary(fit.coxph)
Call:
coxph(formula = surv_object ~ TRT, data = DATA[DATA$GFR_0_group ==
"low", ])
n= 578, number of events= 22
(7 observations deleted due to missingness)
coef exp(coef) se(coef) z Pr(>|z|)
TRTDrugX -0.1642 0.8486 0.4283 -0.383 0.701
exp(coef) exp(-coef) lower .95 upper .95
TRTDrugX 0.8486 1.178 0.3665 1.964
Concordance= 0.531 (se = 0.055 )
Likelihood ratio test= 0.15 on 1 df, p=0.7
Wald test = 0.15 on 1 df, p=0.7
Score (logrank) test = 0.15 on 1 df, p=0.7
The column coef in the output denotes the log hazard ratios, and exp(coef) the corresponding hazard ratios (HR). Increasing by one the coeff translates into increasing the log hazard ratio by coef. The Cox model is a proportional-hazards model, that means that the hazard of the event in any group is a constant multiple of the hazard in any other. This implies that curves cannot cross each other.
sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7600)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggcorrplot_0.1.3 survminer_0.4.6 ggpubr_0.2.5 magrittr_1.5
[5] ggplot2_3.3.0 survival_3.1-8
loaded via a namespace (and not attached):
[1] zoo_1.8-7 tidyselect_1.0.0 xfun_0.13 purrr_0.3.4
[5] splines_3.6.3 lattice_0.20-38 colorspace_1.4-1 vctrs_0.2.4
[9] generics_0.0.2 htmltools_0.4.0 yaml_2.2.1 survMisc_0.5.5
[13] rlang_0.4.5 pillar_1.4.3 glue_1.4.0 withr_2.1.2
[17] lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 ggsignif_0.6.0
[21] gtable_0.3.0 evaluate_0.14 labeling_0.3 knitr_1.28
[25] fansi_0.4.1 broom_0.5.6 Rcpp_1.0.4.6 xtable_1.8-4
[29] scales_1.1.0 backports_1.1.6 farver_2.0.3 km.ci_0.5-2
[33] gridExtra_2.3 digest_0.6.25 stringi_1.4.6 dplyr_0.8.5
[37] KMsurv_0.1-5 cowplot_1.0.0 grid_3.6.3 cli_2.0.2
[41] tools_3.6.3 tibble_3.0.0 crayon_1.3.4 tidyr_1.0.2
[45] pkgconfig_2.0.3 ellipsis_0.3.0 Matrix_1.2-18 data.table_1.12.8
[49] assertthat_0.2.1 rmarkdown_2.1 rstudioapi_0.11 R6_2.4.1
[53] nlme_3.1-144 compiler_3.6.3