Motivation

In some contexts, one may be interested in calculating the survival probabilities (at specific time points) for a population of interest, based on the corresponding life tables. For instance, in relative survival analysis (Pohar-Perme, Stare, and Estève 2012), the so-called ``Pohar-Perme’’ estimator requires the calculation of these probabilities for each individual in the data set. There are also cases where one may be interested in understanding the survival probabilities in a population of interest for specific combinations of individual characteristics (such as age, sex, year of diagnosis, and deprivation level).

The following R code presents a step-by-step illustrative example on the calculation of the survival probabilities of a set of individuals based on life tables from England for the years 2010-2015. The R command spLT from the routines.R source file performs the following steps:

  1. It calculates all the change points from the date of diagnosis (birthdays and new years) and merges them with the time points of interest.

  2. It extracts the hazard rates at these points from the life table.

  3. It calculates the integrals of the step function defined in the previous steps up to each time point of interest. These integrals represent the cumulative hazard function.

  4. It calculates the survival probabilities using the usual relationship between the cumulative hazard function and the survival function \(S(t) = \exp\{-H(t)\}\).

The routines.R source file can be found in the SurvLT GitHub repository.

See also: SimLT

Illustrative example: R code

In this example we will calculate the survival probabilities for \(n=1000\) individuals using a life table from England based on year, sex, deprivation level, age, English region (gor). The life table can be downloaded from SimLT.

Data preparation

rm(list=ls())

# Required packages
#library(devtools)
#install_github("FJRubio67/SimLT")
library(SimLT)
library(msm)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
# https://github.com/FJRubio67/SurvLT
source("routinesSPLT.R")

#------------------------------------------------------------------------
# Life tables
#------------------------------------------------------------------------
LT <- data.table(read.table("England_LT_2010_2015_dep_gor.txt",header = T))
head(LT)
##    year sex dep age      rate gor
## 1: 2010   1   9   0 0.0047365  10
## 2: 2011   1   9   0 0.0046188  10
## 3: 2012   1   9   0 0.0046188  10
## 4: 2013   1   9   0 0.0046188  10
## 5: 2014   1   9   0 0.0046188  10
## 6: 2015   1   9   0 0.0046188  10

Simulated design matrix

We now simulate the individuals’ characteristics using the built-in function simDesMatrix in the R package simLT. These characteristics will be used to extract the hazard rates in the period of interest (piecewise constant hazard), and to calculate the survival probabilities. This function is based on real data on colon and lung (male and female) patients in England. You can use your own design matrix instead.

#------------------------------------------------------------------------
# Design matrix simulation 
#------------------------------------------------------------------------
# Sample size
n <- 1000
# Design matrix
Xcf <- simDesMatrix(seed = 123, n = n, admin.cens = "31-12-2015", scale.age = FALSE,
                    site = "colon", sex = "female")
head(Xcf)
##   ID  date.diag dep gor      age fup.time
## 1  1 2010-05-11   2   2 80.09770 5.639973
## 2  2 2010-12-21   2   3 90.00683 5.026694
## 3  3 2010-03-31   1   2 76.93449 5.752225
## 4  4 2010-03-11   3   6 92.35372 5.806982
## 5  5 2010-12-17   3   6 67.86141 5.037645
## 6  6 2010-05-13   4   7 28.95104 5.634497
Xcf$sex <- rep(2, nrow(Xcf))

Calculating the survival probabilities

Now, we calculate the survival probabilities for the simulated individuals at the time points of interest using the command spLT. The spLT command returns a list with the time points of interest (tps) and the survival probabilities (psurv) at those time points.

#------------------------------------------------------------------------
# Population survival probabilities at specific time points
#------------------------------------------------------------------------

# Time points of interest
tps <- seq(0.1,5, by = 0.5)

# Calculation of the survival probabilities
sps <- spLT(tps = tps, LT = LT, des = Xcf, sdvars = c("dep","gor"), 
     dodn = "date.diag", progress = FALSE, dy = 365.25)

head(sps$psurv)
##           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
## [1,] 0.9960162 0.9763340 0.9574754 0.9367413 0.9155518 0.8934859 0.8708033
## [2,] 0.9869407 0.9243880 0.8645248 0.8013189 0.7411753 0.6802668 0.6234038
## [3,] 0.9977636 0.9851599 0.9729287 0.9595642 0.9462532 0.9314587 0.9167608
## [4,] 0.9836284 0.9057043 0.8297316 0.7573339 0.6860176 0.6180131 0.5502933
## [5,] 0.9990407 0.9937284 0.9883974 0.9830950 0.9773317 0.9709931 0.9646456
## [6,] 0.9999779 0.9998582 0.9997443 0.9996214 0.9994973 0.9993626 0.9992265
##           [,8]      [,9]     [,10]
## [1,] 0.8469903 0.8227859 0.7977380
## [2,] 0.5658484 0.5126087 0.4588520
## [3,] 0.9006056 0.8847351 0.8671474
## [4,] 0.4894324 0.4297070 0.3767983
## [5,] 0.9576627 0.9506758 0.9429900
## [6,] 0.9990782 0.9989299 0.9987651
# Some plots for specific individuals
index <- c(1,200,400,600,800,1000)

# Individual characteristics
Xcf[index,]
##        ID  date.diag dep gor      age fup.time sex
## 1       1 2010-05-11   2   2 80.09770 5.639973   2
## 200   200 2010-04-14   1   4 93.06768 5.713895   2
## 400   400 2010-04-20   5   6 88.40733 5.697467   2
## 600   600 2010-10-30   1   5 71.30269 5.169062   2
## 800   800 2010-09-07   5   7 75.35541 5.314168   2
## 1000 1000 2010-04-29   3   8 84.85043 5.672827   2
# Survival probabilities at times of interest
par(mfrow=c(2,3))
for(i in index){
  plot(sps$tps, sps$psurv[i,], type = "p", ylim = c(0,1), lwd = 2, main = paste("Individual ", i),
       xlab = "time", ylab = "Survival", cex.axis = 1.5, cex.lab = 1.5)
}

Pohar-Perme, M., J. Stare, and J. Estève. 2012. “On Estimation in Relative Survival.” Biometrics 68 (1): 113–20.