In some contexts, one may be interested in calculating the survival probabilities (at specific time points) for a population of interest, based on the corresponding life tables. For instance, in relative survival analysis (Pohar-Perme, Stare, and Estève 2012), the so-called ``Pohar-Perme’’ estimator requires the calculation of these probabilities for each individual in the data set. There are also cases where one may be interested in understanding the survival probabilities in a population of interest for specific combinations of individual characteristics (such as age, sex, year of diagnosis, and deprivation level).
The following R code presents a step-by-step illustrative example on
the calculation of the survival probabilities of a set of individuals
based on life tables from England for the years 2010-2015. The R command
spLT
from the routines.R
source file performs
the following steps:
It calculates all the change points from the date of diagnosis (birthdays and new years) and merges them with the time points of interest.
It extracts the hazard rates at these points from the life table.
It calculates the integrals of the step function defined in the previous steps up to each time point of interest. These integrals represent the cumulative hazard function.
It calculates the survival probabilities using the usual relationship between the cumulative hazard function and the survival function \(S(t) = \exp\{-H(t)\}\).
The routines.R
source file can be found in the SurvLT GitHub
repository.
See also: SimLT
In this example we will calculate the survival probabilities for \(n=1000\) individuals using a life table from England based on year, sex, deprivation level, age, English region (gor). The life table can be downloaded from SimLT.
rm(list=ls())
# Required packages
#library(devtools)
#install_github("FJRubio67/SimLT")
library(SimLT)
library(msm)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
# https://github.com/FJRubio67/SurvLT
source("routinesSPLT.R")
#------------------------------------------------------------------------
# Life tables
#------------------------------------------------------------------------
LT <- data.table(read.table("England_LT_2010_2015_dep_gor.txt",header = T))
head(LT)
## year sex dep age rate gor
## 1: 2010 1 9 0 0.0047365 10
## 2: 2011 1 9 0 0.0046188 10
## 3: 2012 1 9 0 0.0046188 10
## 4: 2013 1 9 0 0.0046188 10
## 5: 2014 1 9 0 0.0046188 10
## 6: 2015 1 9 0 0.0046188 10
We now simulate the individuals’ characteristics using the built-in
function simDesMatrix
in the R package simLT
.
These characteristics will be used to extract the hazard rates in the
period of interest (piecewise constant hazard), and to calculate the
survival probabilities. This function is based on real data on colon and
lung (male and female) patients in England. You can use your own design
matrix instead.
#------------------------------------------------------------------------
# Design matrix simulation
#------------------------------------------------------------------------
# Sample size
n <- 1000
# Design matrix
Xcf <- simDesMatrix(seed = 123, n = n, admin.cens = "31-12-2015", scale.age = FALSE,
site = "colon", sex = "female")
head(Xcf)
## ID date.diag dep gor age fup.time
## 1 1 2010-05-11 2 2 80.09770 5.639973
## 2 2 2010-12-21 2 3 90.00683 5.026694
## 3 3 2010-03-31 1 2 76.93449 5.752225
## 4 4 2010-03-11 3 6 92.35372 5.806982
## 5 5 2010-12-17 3 6 67.86141 5.037645
## 6 6 2010-05-13 4 7 28.95104 5.634497
Xcf$sex <- rep(2, nrow(Xcf))
Now, we calculate the survival probabilities for the simulated
individuals at the time points of interest using the command
spLT
. The spLT
command returns a list with the
time points of interest (tps
) and the survival
probabilities (psurv
) at those time points.
#------------------------------------------------------------------------
# Population survival probabilities at specific time points
#------------------------------------------------------------------------
# Time points of interest
tps <- seq(0.1,5, by = 0.5)
# Calculation of the survival probabilities
sps <- spLT(tps = tps, LT = LT, des = Xcf, sdvars = c("dep","gor"),
dodn = "date.diag", progress = FALSE, dy = 365.25)
head(sps$psurv)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.9960162 0.9763340 0.9574754 0.9367413 0.9155518 0.8934859 0.8708033
## [2,] 0.9869407 0.9243880 0.8645248 0.8013189 0.7411753 0.6802668 0.6234038
## [3,] 0.9977636 0.9851599 0.9729287 0.9595642 0.9462532 0.9314587 0.9167608
## [4,] 0.9836284 0.9057043 0.8297316 0.7573339 0.6860176 0.6180131 0.5502933
## [5,] 0.9990407 0.9937284 0.9883974 0.9830950 0.9773317 0.9709931 0.9646456
## [6,] 0.9999779 0.9998582 0.9997443 0.9996214 0.9994973 0.9993626 0.9992265
## [,8] [,9] [,10]
## [1,] 0.8469903 0.8227859 0.7977380
## [2,] 0.5658484 0.5126087 0.4588520
## [3,] 0.9006056 0.8847351 0.8671474
## [4,] 0.4894324 0.4297070 0.3767983
## [5,] 0.9576627 0.9506758 0.9429900
## [6,] 0.9990782 0.9989299 0.9987651
# Some plots for specific individuals
index <- c(1,200,400,600,800,1000)
# Individual characteristics
Xcf[index,]
## ID date.diag dep gor age fup.time sex
## 1 1 2010-05-11 2 2 80.09770 5.639973 2
## 200 200 2010-04-14 1 4 93.06768 5.713895 2
## 400 400 2010-04-20 5 6 88.40733 5.697467 2
## 600 600 2010-10-30 1 5 71.30269 5.169062 2
## 800 800 2010-09-07 5 7 75.35541 5.314168 2
## 1000 1000 2010-04-29 3 8 84.85043 5.672827 2
# Survival probabilities at times of interest
par(mfrow=c(2,3))
for(i in index){
plot(sps$tps, sps$psurv[i,], type = "p", ylim = c(0,1), lwd = 2, main = paste("Individual ", i),
xlab = "time", ylab = "Survival", cex.axis = 1.5, cex.lab = 1.5)
}