In cancer studies, most of survival analyses use the following methods:
-Kaplan-Meier plots to visualize survival curves Log-rank test to compare
thesurvival curves of two or more groups
-Cox proportional hazards model.
data("lung")
head(lung,3)
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1 3 306 2 74 1 1 90 100 1175 NA
## 2 3 455 2 68 1 0 90 90 1225 15
## 3 3 1010 1 56 1 0 90 90 NA 15
fit <- survfit(Surv(time, status) ~ sex, data = lung)
res.sum <- surv_summary(fit)
## Warning in .get_data(x, data = data): The `data` argument is not provided. Data
## will be extracted from model fit.
head(res.sum,3)
## time n.risk n.event n.censor surv std.err upper lower strata
## 1 11 138 3 0 0.9782609 0.01268978 1.0000000 0.9542301 sex=1
## 2 12 135 1 0 0.9710145 0.01470747 0.9994124 0.9434235 sex=1
## 3 13 134 2 0 0.9565217 0.01814885 0.9911586 0.9230952 sex=1
## sex
## 1 1
## 2 1
## 3 1
d <- data.frame(time = fit$time,
n.risk = fit$n.risk,
n.event = fit$n.event,
n.censor = fit$n.censor,
surv = fit$surv,
upper = fit$upper,
lower = fit$lower)
head(d,3)
## time n.risk n.event n.censor surv upper lower
## 1 11 138 3 0 0.9782609 1.0000000 0.9542301
## 2 12 135 1 0 0.9710145 0.9994124 0.9434235
## 3 13 134 2 0 0.9565217 0.9911586 0.9230952
ggsurvplot(fit,
ggtheme = theme_classic(),
palette = c("#E7B800", "#2E9FDF"))
The plot can be further customized using the following arguments:
conf.int.style = “step” to change the style of confidence interval bands. xlab to
change the x axis label. break.time.by = 200 break x axis in time intervals by 200.
risk.table = “abs_pct”to show both absolute number and percentage of individuals
at risk. risk.table.y.text.col = TRUE and risk.table.y.text = FALSE
to provide bars instead of names in text annotations of the legend
of risk table. ncensor.plot = TRUE to plot the number of censored subjects
at time t. As suggested by Marcin Kosinski, This is a good additional feedback
to survival curves, so that one could realize: how do survival curves look like,
what is the number of risk set AND what is the cause that the risk set
become smaller: is it caused by events or by censored events?
legend.labs to change the legend labels.``
The Kaplan-Meier plot can be interpreted as follow:
The horizontal axis (x-axis) represents time in days,
and the vertical axis (y-axis) shows the probability
of surviving or the proportion of people surviving.
The lines represent survival curves of the two groups.
A vertical drop in the curves indicates an event.
The vertical tick mark on the curves
means that a patient was censored at this time.
ggsurvplot(fit, conf.int = TRUE, risk.table.col = "strata", break.time.by = 200, ggtheme = theme_bw(), palette = c("#E7B800", "#2E9FDF"), xlim = c(0, 600)) ```
surv_diff <- survdiff(Surv(time, status) ~ sex, data = lung)
surv_diff
## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = lung)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 138 112 91.6 4.55 10.3
## sex=2 90 53 73.4 5.68 10.3
##
## Chisq= 10.3 on 1 degrees of freedom, p= 0.001
fit2 <- survfit( Surv(time, status) ~ sex + rx + adhere,
data = colon )
ggsurv <- ggsurvplot(fit2, fun = "event", conf.int = TRUE, ggtheme = theme_bw())
ggsurv$plot + theme_bw() +
theme(legend.position = "right") +
facet_grid(rx ~ adhere)
library(dplyr)
summary(lung$wt.loss)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -24.000 0.000 7.000 9.832 15.750 68.000 14
lung <- lung %>%
mutate(group = case_when(
wt.loss >= quantile(wt.loss, 0.5, na.rm = TRUE) ~ 'wl.high',
wt.loss < quantile(wt.loss, 0.5, na.rm = TRUE) ~ 'wl.low',
TRUE ~ NA_character_
))
# when use quantile or mean, need to add na.rm = TRUE
fit1 <- survfit(Surv(time, status) ~ group, data = lung)
ggsurvplot(fit1,
#pval = TRUE, conf.int = TRUE,
#risk.table = TRUE,
#risk.table.col = "group",
#surv.median.line = "hv",
ggtheme = theme_classic(),
palette = c("#E7B800", "#2E9FDF"))
## Warning: Vectorized input to `element_text()` is not officially supported.
## Results may be unexpected or may change in future versions of ggplot2.