library(tidyverse)
library(survival)
library(ggfortify)  # one-line survival plots with ggplot2::autoplot.
library(ggpubr)

Understanding readmission patterns

As the COO of a hospital, you are interested in readmission patterns, such as the proportion of patients who are readmitted to your hospital within 7 days of discharge.

You have collected historical data on readmissions, and want to see if they can help you answer questions such as the following:

  • When do readmissions occur? For example, is a patient more likely to be readmitted on Day 3 after discharge, or on Day 5, or after Day 10?
  • What proportion of patients are readmitted, and what proportion are not?

Data

You have data on 15 patients.1 Seven of them were discharged, but not observed to have been readmitted (yet). Thus, we don’t really have the value of the variable days before readmit for these patients. In the stats lingo, these are right-censored data.

# paste from Excel using datapasta add-in: 

df1.readmits <- 
    data.frame(stringsAsFactors=FALSE,
                patient_id = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L),
       days.before.readmit = c(1L, 2L, 2L, 2L, 3L, 3L, 4L, 7L, 2L, 12L, 15L, 30L, 2L, 22L, 4L),
              not.censored = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
                     notes = c(NA, NA, NA, NA, NA, NA, NA, NA,
                               "no observed readmit", "no observed readmit",
                               "no observed readmit", "no observed readmit",
                               "no observed readmit", "no observed readmit",
                               "no observed readmit")
    )

View the data:

df1.readmits %>% 
    kable() %>% 
    kable_styling(bootstrap_options = c("striped")) %>%
    scroll_box(width = "100%", height = "400px")
patient_id days.before.readmit not.censored notes
1 1 1 NA
2 2 1 NA
3 2 1 NA
4 2 1 NA
5 3 1 NA
6 3 1 NA
7 4 1 NA
8 7 1 NA
9 2 0 no observed readmit
10 12 0 no observed readmit
11 15 0 no observed readmit
12 30 0 no observed readmit
13 2 0 no observed readmit
14 22 0 no observed readmit
15 4 0 no observed readmit

     

Now let’s explore alternative ways of visualizing this data, and what conclusions we can draw from each.

     

Visualization Option 1: Density of time-to-readmission

The simplest thing to do is to just plot a histogram or density function of time-to-readmission, for the patients who have been observed as readmits.

p1.density <- 
    df1.readmits %>% 
    filter(not.censored == 1) %>% 
    
    ggplot(aes(x = days.before.readmit)) + 
    scale_x_continuous(expand = c(0,0)) +
    scale_y_continuous(expand = c(0, 0)) + 
    geom_bar(fill = "skyblue4") + 
    labs(x = "Days before readmit", 
         y = "Number of cases") + 
    theme_light() + 
    theme(panel.grid.minor = element_line(colour="grey95"), 
          panel.grid.major = element_line(colour="grey95"))
    
p1.density

Without further context, this gives the impression that the vast majority of patients are readmitted in 5 days or less.

     

Visualization Option 2: Survival curve

Plot a Kaplan-Meier curve:2

# Kaplan-Meier curve: 
km1.survival <- survfit(Surv(days.before.readmit, 
                             not.censored) ~ 1,
                        data = df1.readmits)

# summary(km1.survival)
# ?autoplot.survfit

p2.survival <- autoplot(km1.survival,
                        conf.int = FALSE, 
                        firsty = 1, 
                        surv.colour = "skyblue4",
                        surv.size = 1.5, 
                        censor.size = 5, 
                        censor.colour = "firebrick") + 
    scale_y_continuous(limits = c(0,1), 
                       expand = c(0, 0), 
                       breaks = seq(0, 1, .1)) + 
    scale_x_continuous(expand = c(0, 0), 
                       breaks = seq(0, 30, 2)) + 
        labs(x = "Days before readmit",
             y = "Survival probability") + 
    theme_light() + 
    theme(panel.grid.minor = element_line(colour="grey95"), 
          panel.grid.major = element_line(colour="grey95"))

p2.survival


     

What’s the difference?

ggarrange(p1.density, 
          p2.survival, 
          nrow = 2)

There’s a difference between these two:

  • Modeling time-to-readmission, conditional on being readmitted
  • Modeling time-to-readmission, without conditioning on readmission status

Advantages of the survival curve:

  • For the density plot, note that we throw away all data points that don’t have a defined end date for days to readmission, because they still haven’t been readmitted. A major advantage of the survival analysis is that it can use these data points to inform inferences about how long someone is likely to “survive” - i.e. remain outside of the hospital. For example, even though we don’t know when Patient 10 will be readmitted (or if they ever will be), we know that they have already “survived” past Day 1, and this is important information.

 

  • From an operational perspective, at the time of discharge of a patient, we don’t know whether or not they will be readmitted. Therefore, it may be more useful to examine readmission patterns across both populations - those who are readmitted, and those who are not.

 

  • You can directly read off the survival curve to answer questions such as the following: “I just discharged a patient. What’s the probability that they’ll be readmitted within 4 days?” The answer, based on all the data we have available, including the right-censored data, is about 51%.3

 

p2.survival + 
    geom_vline(xintercept = 4, 
               col = "red") + 
    geom_hline(yintercept = 0.49, 
               col = "red")

  • Since the survival curve never reaches the x-axis, this makes it clear that there is a significant proportion of patients who are never readmitted. This is not clear in the histogram/density plot of known time-to-readmission cases.

  1. It’s a small hospital

  2. Reference: https://rviews.rstudio.com/2017/09/25/survival-analysis-with-r/

  3. The survival curve gives the probability that a patient will survive x days or longer, so the probability that a patient will be readmitted within x days is 1 - S(x), where S(x) is the value of the survival curve at x (i.e. the y-axis value)