library(tidyverse)
library(survival)
library(ggfortify) # one-line survival plots with ggplot2::autoplot.
library(ggpubr)
As the COO of a hospital, you are interested in readmission patterns, such as the proportion of patients who are readmitted to your hospital within 7 days of discharge.
You have collected historical data on readmissions, and want to see if they can help you answer questions such as the following:
You have data on 15 patients.1 Seven of them were discharged, but not observed to have been readmitted (yet). Thus, we don’t really have the value of the variable days before readmit for these patients. In the stats lingo, these are right-censored data.
# paste from Excel using datapasta add-in:
df1.readmits <-
data.frame(stringsAsFactors=FALSE,
patient_id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L),
days.before.readmit = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 7L, 2L, 12L, 15L, 30L, 2L, 22L, 4L),
not.censored = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
notes = c(NA, NA, NA, NA, NA, NA, NA, NA,
"no observed readmit", "no observed readmit",
"no observed readmit", "no observed readmit",
"no observed readmit", "no observed readmit",
"no observed readmit")
)
View the data:
df1.readmits %>%
kable() %>%
kable_styling(bootstrap_options = c("striped")) %>%
scroll_box(width = "100%", height = "400px")
| patient_id | days.before.readmit | not.censored | notes |
|---|---|---|---|
| 1 | 1 | 1 | NA |
| 2 | 2 | 1 | NA |
| 3 | 2 | 1 | NA |
| 4 | 2 | 1 | NA |
| 5 | 3 | 1 | NA |
| 6 | 3 | 1 | NA |
| 7 | 4 | 1 | NA |
| 8 | 7 | 1 | NA |
| 9 | 2 | 0 | no observed readmit |
| 10 | 12 | 0 | no observed readmit |
| 11 | 15 | 0 | no observed readmit |
| 12 | 30 | 0 | no observed readmit |
| 13 | 2 | 0 | no observed readmit |
| 14 | 22 | 0 | no observed readmit |
| 15 | 4 | 0 | no observed readmit |
Now let’s explore alternative ways of visualizing this data, and what conclusions we can draw from each.
The simplest thing to do is to just plot a histogram or density function of time-to-readmission, for the patients who have been observed as readmits.
p1.density <-
df1.readmits %>%
filter(not.censored == 1) %>%
ggplot(aes(x = days.before.readmit)) +
scale_x_continuous(expand = c(0,0)) +
scale_y_continuous(expand = c(0, 0)) +
geom_bar(fill = "skyblue4") +
labs(x = "Days before readmit",
y = "Number of cases") +
theme_light() +
theme(panel.grid.minor = element_line(colour="grey95"),
panel.grid.major = element_line(colour="grey95"))
p1.density
Without further context, this gives the impression that the vast majority of patients are readmitted in 5 days or less.
Plot a Kaplan-Meier curve:2
# Kaplan-Meier curve:
km1.survival <- survfit(Surv(days.before.readmit,
not.censored) ~ 1,
data = df1.readmits)
# summary(km1.survival)
# ?autoplot.survfit
p2.survival <- autoplot(km1.survival,
conf.int = FALSE,
firsty = 1,
surv.colour = "skyblue4",
surv.size = 1.5,
censor.size = 5,
censor.colour = "firebrick") +
scale_y_continuous(limits = c(0,1),
expand = c(0, 0),
breaks = seq(0, 1, .1)) +
scale_x_continuous(expand = c(0, 0),
breaks = seq(0, 30, 2)) +
labs(x = "Days before readmit",
y = "Survival probability") +
theme_light() +
theme(panel.grid.minor = element_line(colour="grey95"),
panel.grid.major = element_line(colour="grey95"))
p2.survival
ggarrange(p1.density,
p2.survival,
nrow = 2)
There’s a difference between these two:
days to readmission, because they still haven’t been readmitted. A major advantage of the survival analysis is that it can use these data points to inform inferences about how long someone is likely to “survive” - i.e. remain outside of the hospital. For example, even though we don’t know when Patient 10 will be readmitted (or if they ever will be), we know that they have already “survived” past Day 1, and this is important information.
p2.survival +
geom_vline(xintercept = 4,
col = "red") +
geom_hline(yintercept = 0.49,
col = "red")
It’s a small hospital↩
Reference: https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/↩
The survival curve gives the probability that a patient will survive x days or longer, so the probability that a patient will be readmitted within x days is 1 - S(x), where S(x) is the value of the survival curve at x (i.e. the y-axis value)↩