Setup

This section contains the setup and the various utility functions used throughout.

Libraries used:

library(rstan)
library(survival)
library(flexsurv)
library(data.table)
# See https://github.com/maj-biostat/pwexp
library(pwexp)
library(splines2)
library(muhaz)
rstan::rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
options(mc.cores = 1)

Compiled code (any models used are shown later):

mod1 <- rstan::stan_model("../stan/surv_mspline4.stan", verbose = FALSE)

Demonstration

Commonly, survival data look like this

set.seed(1)
N <- 200

bins = c(0, 50, 150, 250)
rate = c(1/200, 1/100, 1/75, 1/50)

d <- data.table(
  id = 1:(2*N),
  y = c(pwexp::rpw(N, bins, rate),
        pwexp::rpw(N, bins, 2*rate)),
  trt = rep(0:1, N)
)
# Observations censored after 200 days
d[, evt := as.numeric(y < 200)]
d[y >= 200, y := 200]
head(d)
##    id         y trt evt
## 1:  1 155.70809   0   1
## 2:  2 123.85287   1   1
## 3:  3  80.71255   0   1
## 4:  4  19.25642   1   1
## 5:  5 176.32975   0   1
## 6:  6  21.43027   1   1

where the data contain an id which may represent an individual person, but could also be a manufactured product like a lightbulb or a car engine. Here we will assume that id refers to a participant in a clinical trial. The y represents the time (e.g. days) until an event of interest occurred, such as death, trt represents some experimental intervention and evt represents whether we observed the event of interest or whether the experiment finished prior to us being able to observe the event. In the latter case, we have right censoring; we know that the event did not happen up to the censoring time (200 days), but we do not know when it will happen.

However, in some situations, e.g. when using some piecewise survival models, the data needs to be transformed into a counting process format. In this form, the data for each participant time is divided into \(J\) non-overlapping segments \((0, s_1], (s_1, s_2], \dots, (s_{J-1}, s_J]\). The survSplit function in the survival package enables us to do the transformation.

# Just create some arbitrary segment cut points for demo.
dbin <- data.table(
  id_timept = 1:5,
  bins = c(10, 20, 50, 100, 150)
)
d2 <- data.table(
  survSplit(Surv(y, evt) ~ trt, 
            start="tstart", end="tstop", id = "id",
            data=d, cut=dbin$bins)
)
d2 <- merge(d2, dbin, by.x = "tstop", by.y = "bins" )
setkey(d2, id)
setcolorder(d2, c("id", "id_timept", "tstart", "tstop", "evt"))
d2[id %in% c(1, 2, 3, 10), ]
##     id id_timept tstart tstop evt trt
##  1:  1         1      0    10   0   0
##  2:  1         2     10    20   0   0
##  3:  1         3     20    50   0   0
##  4:  1         4     50   100   0   0
##  5:  1         5    100   150   0   0
##  6:  2         1      0    10   0   1
##  7:  2         2     10    20   0   1
##  8:  2         3     20    50   0   1
##  9:  2         4     50   100   0   1
## 10:  3         1      0    10   0   0
## 11:  3         2     10    20   0   0
## 12:  3         3     20    50   0   0
## 13: 10         1      0    10   0   1
## 14: 10         2     10    20   0   1
## 15: 10         3     20    50   0   1
## 16: 10         4     50   100   0   1
## 17: 10         5    100   150   0   1

Notice how each individual now has multiple records for each of the segments that were defined. Also note that the last segment will be either 1 if the event was observed or 0 if the event was not observed up to the maximum possible time (censoring time 200 days).

A related function is tmerge, which is useful if you are introducing time dependent covariates.