Excess Mortality Hazard and Net Survival

Survival analysis after the diagnosis of cancer is an active research area in cancer epidemiology and a primary interest of many countries. Three typical frameworks adopted to model cancer survival data are: (i) the overall survival setting (where all-cause mortality is studied), (ii) the cause-specific setting (where the cause of death is known), and (iii) the relative survival setting (where the cause of death is unknown or unreliable). The overall survival setting is not the optimal choice for cancer epidemiology when the main interest is on comparing two (or more) populations (e.g. two countries, two periods in the same country, two groups with different deprivation levels within the same country and at the same period, and etcetera), since it is affected by other causes of mortality (which may differ for the populations of interest). In practice when using population-based cancer registry data, the cause of death is typically unavailable or the death certificates are not reliable. Thus, the cause-specific setting is not a reasonable choice either. The relative survival setting represents a useful alternative, where the mortality hazard associated to other causes is approximated by the population hazard \(h_P(\cdot;{\bf z})\), which is typically obtained from life tables based on the available sociodemographic characteristics \({\bf z}\) (e.g. sex, region and deprivation level) in addition to age and year of diagnosis.

In practice, a limitation of quantities derived in the relative survival setting (Perme, Stare, and Estève 2012), such as the excess hazard, is that patients need to be matched to groups of the general population sharing the available characteristics \({\bf z}\) in order to obtain the background mortality rates \(h_P(\cdot;{\bf z})\) from the life tables. And in many countries, the life tables are not stratified according to some important characteristics (e.g. deprivation). We propose an approach to deal with that situation (Rubio et al. 2019).

The classical model, the single-parameter correction model, and the frailty model

We refer the reader to references (Rubio et al. 2018) and (Rubio et al. 2019) for more details on these models.

In the paper associated to this tutorial (Rubio et al. 2019), we compare the classical excess hazard model (M1): \[ h_o(t;{\bf x}) = h_P(A+t;y+t;{\bf z}) + h_E(t;{\bf x}), \] to the single correction model (M2): \[ h_o^{C}(t;{\bf x}) = \gamma h_P(A+t;y+t;{\bf z}) + h_E(t;{\bf x}), \] and the frailty correction model (M3) \[ \tilde{h}_o(t;{\bf x}) = \dfrac{\mu \, h_P(A+t;y+t;{\bf z})}{1+b \left[H_P(A+t;y+t;{\bf z})-H_P(A;y;{\bf z})\right]} + h_E(t;{\bf x}). \] In these models, we adopt the general excess hazard model (GH) proposed in (Rubio et al. 2018): \[ h_{\text{E}}^{\text{GH}}\left(t;{\bf x}_i\right) = h_0\left(t \exp({\bf x}_i^T\beta_1)\right)\exp({\bf x}_i^T\beta_2), \] where the baseline hazard is modelled using the Exponentiated Weibull distribution (see The Exponentiated Weibull distribution for more details). The data are simulated as described in Simulation Scenario I in (Rubio et al. 2019), and the mismatch is induced by simulating a random correction given by \(\gamma \sim Gamma(6.5,10)\). This corresponds to the Wide Mismatch scenario described in (Rubio et al. 2019).

Example using simulated data

The following R code shows how to fit these models using a simulated data set. We can observe that not accounting for mismatches in the life tables leads to an overestimation of the excess hazard, while our approach based on model M3 allows to retrieve the simulated excess hazard (Rubio et al. 2019).

Useful functions and data preparation

rm(list=ls())

# Required packages
library(numDeriv)
library(knitr)
library(compiler)

# Exponentiated Weibull Distribution
# Hazard
hexpw <- function(t,lambda,kappa,alpha){
  pdf0 <-  alpha*dweibull(t,scale=lambda,shape=kappa)*pweibull(t,scale=lambda,shape=kappa)^(alpha-1) 
  cdf0 <- pweibull(t,scale=lambda,shape=kappa)^alpha
  cdf0 <- ifelse(cdf0==1,0.9999999,cdf0)
  return(pdf0/(1-cdf0))
}                                                                                      

# Cumulative hazard
Hexpw <- function(t,lambda,kappa,alpha,log.p=FALSE){
  cdf <- pweibull(t,scale=lambda,shape=kappa)^alpha  
  return(-log(1-cdf))
} 

###########################################################################################
# Function to calculate the normal confidence intervals
# The parameters indicated with "index" are transformed to the real line using log()
###########################################################################################
# FUN   : minus log-likelihood function to be used to calculate the confidence intervals
# MLE   : maximum likelihood estimator of the parameters of interest
# level : confidence level
# index : position of the positive parameters under the original parameterisation

Conf.Int <- function(FUN,MLE,level=0.95){
  sd.int <- abs(qnorm(0.5*(1-level)))
  HESS <- hessian(FUN,x=MLE)
  Fisher.Info <- solve(HESS, tol = 1e-18)
  Sigma <- sqrt(diag(Fisher.Info))
  U<- MLE + sd.int*Sigma
  L<- MLE - sd.int*Sigma
  C.I <- cbind(L,U,MLE, Sigma)
  rownames(C.I)<- paste0("par", seq_along(1:length(MLE)))
  colnames(C.I)<- c("Lower","Upper","Transf MLE", "Std. Error")
  return(C.I)
}

# Reading the simulated data (available at: https://sites.google.com/site/fjavierrubio67/dataFGH.txt)
dat <- read.table("dataFGH.txt", header = FALSE)

# Difference of cumulative hazards
Hp.diff.new <- as.vector(dat[,5])

# Variables used in the modelling
hp.as = as.vector(dat[,4])  # expected rates
ind.cens = as.vector(dat[,6])  # censoring status: 0-alive, 1-dead
surv.time = as.vector(dat[,7]) # survival times
x2 = as.matrix(dat[,1:3]) # design matrix
colnames(x2) <- c("agec","sex","TTT") # names of the covariates
dim.x22 <- dim(x2)[2]

Model M1: Classical Model

log.likewG <- cmpfun(function(par){
    if(anyNA(par)) return(Inf)
    ae0 <- exp(par[1]); be0 <- exp(par[2]); ce0=exp(par[3]); beta0 <- par[4:(4+dim(x2)[2]-1)]; beta1 <- tail(par,n = dim(x2)[2]);
    exp.x.beta0 <- exp(x2%*%beta0)
    exp.x.beta1 <- exp(x2%*%beta1)
    exp.x.dif <- exp(x2%*%(beta1-beta0))
    haz0 <- hp.as + hexpw(surv.time*exp.x.beta0,ae0,be0,ce0)*exp.x.beta1
    CH0 <- Hexpw(surv.time*exp.x.beta0,ae0,be0,ce0)*exp.x.dif + Hp.diff.new
    if(anyNA(CH0)) return(Inf)
    if(anyNA(haz0)) return(Inf)
    else  val <- - ind.cens*log(haz0) + CH0
    return(sum(val))
})
  
#----------------------------------------------------------------------------------------------------
# nlminb within Coordinate Descent algorithm
# Used to calculate a better initial point
#----------------------------------------------------------------------------------------------------
  
NCD <- 1 # Number of coordinate descent iterations
  
init.cd <- c(0,0,0.5,rep(0,3),rep(0,3)) # initial point for coordinate descent
  
for(j in 1:NCD){
    #print(j)
    for(i in 1:length(init.cd)){
      tempf <- Vectorize(function(par){
        val <- replace(init.cd, i, par)
        return(log.likewG(val))
      })
      
      new.val <- nlminb(init.cd[i],tempf)$par
      init.cd <- replace(init.cd, i,  new.val )
    }}  


#---------------------------------------------------------------------------------------------------- 
# Optimisation
  
if(any(is.na(init.cd))) {
    initG <- c(0,0,0.5,rep(0,3),rep(0,3))}  else initG <- init.cd
  
OPTEWG1 <- nlminb(initG, log.likewG,control=list(iter.max=10000))
OPTEWG2 <- optim(initG, log.likewG,control=list(maxit=10000))
  
if(OPTEWG1$objective <=  OPTEWG2$value){
    MLEEWG <- c(exp(OPTEWG1$par[1:3]),OPTEWG1$par[-c(1:3)])
    AICEWG <- 2*OPTEWG1$objective + 2*length(MLEEWG)
}
  
if(OPTEWG1$objective >  OPTEWG2$value){
    MLEEWG <- c(exp(OPTEWG2$par[1:3]),OPTEWG2$par[-c(1:3)])
    AICEWG <- 2*OPTEWG2$value + 2*length(MLEEWG)
}

# MLE
kable(MLEEWG)

x
1.2702134
0.6050082
2.3015349
0.1177166
0.4059126
-0.0005506
0.0631782
0.2857123
0.1850393

# Confidence interval (the parameters of the Exponentiated Weibull baseline hazard are in the log-scale)
CI <- Conf.Int(log.likewG,c(log(MLEEWG[1:3]),MLEEWG[-c(1,2,3)]),level=0.95)
kable(CI)

	Lower	Upper	Transf MLE	Std. Error
par1	-0.0025804	0.4809503	0.2391850	0.1233519
par2	-0.6069121	-0.3981146	-0.5025133	0.0532656
par3	0.6591310	1.0080215	0.8335762	0.0890043
par4	0.1011707	0.1342625	0.1177166	0.0084420
par5	0.1082526	0.7035727	0.4059126	0.1518702
par6	-0.3002885	0.2991873	-0.0005506	0.1529303
par7	0.0599485	0.0664080	0.0631782	0.0016479
par8	0.2256502	0.3457745	0.2857123	0.0306445
par9	0.1249923	0.2450863	0.1850393	0.0306368

Model M2: Single parameter correction

log.likEWC = function(par){
    if(any(is.na(par))) return(Inf)
    alpha <- exp(par[1]); ae0 <- exp(par[2]); be0 <- exp(par[3]); ce0 <- exp(par[4]); beta0 <- par[5:(5+dim(x2)[2]-1)]; beta1 <- tail(par,n = dim(x2)[2]);
    exp.x.beta0 <- exp(x2%*%beta0)
    exp.x.beta1 <- exp(x2%*%beta1)
    exp.x.dif <- exp(x2%*%(beta1-beta0))
    lhaz0= log(alpha*hp.as + hexpw(surv.time*exp.x.beta0,ae0,be0,ce0)*exp.x.beta1)
    lSurv0 = -alpha*(Hp.diff.new) -Hexpw(surv.time*exp.x.beta0,ae0,be0,ce0)*exp.x.dif
    val = sum(- ind.cens*lhaz0 -lSurv0)
    ifelse(is.na(val),return(Inf),return(val))
}
  
# Optimisation
initC <- c(log(3),log(MLEEWG[c(1,2,3)]),MLEEWG[-c(1,2,3)])
OPTC1 = nlminb(initC,log.likEWC,control=list(iter.max=10000))
OPTC2 = optim(initC,log.likEWC,control=list(maxit=10000))
  
if(OPTC1$objective <=  OPTC2$value){
    MLEC <- c(exp(OPTC1$par[1:4]),OPTC1$par[-c(1:4)])
    AICC <- 2*OPTC1$objective + 2*length(MLEC)
}
  
if(OPTC1$objective >  OPTC2$value){
    MLEC <- c(exp(OPTC2$par[1:4]),OPTC2$par[-c(1:4)])
    AICC <- 2*OPTC2$value + 2*length(MLEC)
}

# MLE
kable(MLEC)

x
0.0000015
1.4203225
0.6687226
1.9736752
0.1187587
0.3935385
0.0210009
0.0654359
0.3077478
0.1681506

# Confidence interval
CIC <- Conf.Int(log.likEWC,c(log(MLEC[1:4]),MLEC[-c(1:4)]),level=0.95)
kable(CIC)

	Lower	Upper	Transf MLE	Std. Error
par1	-1939.7829968	1913.0086780	-13.3871594	982.8730796
par2	0.1726343	0.5291336	0.3508839	0.0909454
par3	-0.4881770	-0.3165949	-0.4023860	0.0437717
par4	0.5411638	0.8186310	0.6798974	0.0707837
par5	0.1017150	0.1358024	0.1187587	0.0086959
par6	0.0867579	0.7003192	0.3935385	0.1565236
par7	-0.2873398	0.3293416	0.0210009	0.1573196
par8	0.0624665	0.0684053	0.0654359	0.0015150
par9	0.2535104	0.3619851	0.3077478	0.0276726
par10	0.1138745	0.2224267	0.1681506	0.0276924

Model M3: Frailty Model

log.lik.E = function(par){
    if(any(is.na(par))) return(Inf)
    theta = exp(par[1]); kappa = exp(par[2])/exp(par[1]); ae0 = exp(par[3]); be0 = exp(par[4]); ce0 <- exp(par[5]);
    beta0 <- par[6:(6+dim(x2)[2]-1)]; beta1 <- tail(par,n = dim(x2)[2]);
    exp.x.beta0 <- exp(x2%*%beta0)
    exp.x.beta1 <- exp(x2%*%beta1)
    exp.x.dif <- exp(x2%*%(beta1-beta0))
    haz0= kappa*theta*hp.as/(1+theta*Hp.diff.new) + hexpw(surv.time*exp.x.beta0,ae0,be0,ce0)*exp.x.beta1
    log.Surv0 = -Hexpw(surv.time*exp.x.beta0,ae0,be0,ce0)*exp.x.dif - kappa*log(1+theta*Hp.diff.new)
    val = sum(- ind.cens*log(haz0) - log.Surv0)
    ifelse(is.na(val),return(Inf),return(val))
}
  
# Optimisation step
initE <- c(log(10),log(6.5),log(MLEEWG[c(1,2,3)]),MLEEWG[-c(1,2,3)])
OPTE1 = nlminb(initE,log.lik.E,control=list(iter.max=10000))
OPTE2 = optim(initE,log.lik.E,control=list(maxit=10000))
  
if(OPTE1$objective <=  OPTE2$value){
    MLEE <- c(exp(OPTE1$par[1:5]),OPTE1$par[-c(1:5)])
    AICE <- 2*OPTE1$objective + 2*length(MLEE)
}
  
if(OPTE1$objective >  OPTE2$value){
    MLEE <- c(exp(OPTE2$par[1:5]),OPTE2$par[-c(1:5)])
    AICE <- 2*OPTE2$value + 2*length(MLEE)
}

# MLE
kable(MLEE)

x
9.2479647
5.8119256
1.8308332
0.6255403
2.3633986
0.1042421
0.0805945
0.0920908
0.0490128
0.2013338
0.2620419

# Confidence interval
CIE <- Conf.Int(log.lik.E,c(log(MLEE[1:5]),MLEE[-c(1:5)]),level=0.95)
kable(CIE)

	Lower	Upper	Transf MLE	Std. Error
par1	1.0815227	3.3672843	2.2244035	0.5831132
par2	1.3147741	2.2050497	1.7599119	0.2271153
par3	0.1693553	1.0401870	0.6047711	0.2221550
par4	-0.7166535	-0.2216255	-0.4691395	0.1262850
par5	0.5064020	1.2137993	0.8601007	0.1804618
par6	0.0740148	0.1344694	0.1042421	0.0154224
par7	-0.3952150	0.5564040	0.0805945	0.2427644
par8	-0.3298207	0.5140022	0.0920908	0.2152649
par9	0.0373661	0.0606595	0.0490128	0.0059423
par10	0.0942409	0.3084267	0.2013338	0.0546402
par11	0.1620441	0.3620396	0.2620419	0.0510202

Comparison of the models

# True values of the parameters
beta1 <- c(0.1,0.1,0.1)
beta2 <- c(0.05,0.2,0.25)
ae <- 1.75; be <- 0.6; ce <- 2.5;

# Comparison with the MLEs
MLES <- cbind(c(NA,NA,MLEEWG), c(NA,MLEC), MLEE, c(10,6.5,ae,be,ce,beta1,beta2))
colnames(MLES) <- c("M1", "M2", "M3", "True")
kable(MLES, digits = 2)

M1	M2	M3	True
NA	NA	9.25	10.00
NA	0.00	5.81	6.50
1.27	1.42	1.83	1.75
0.61	0.67	0.63	0.60
2.30	1.97	2.36	2.50
0.12	0.12	0.10	0.10
0.41	0.39	0.08	0.10
0.00	0.02	0.09	0.10
0.06	0.07	0.05	0.05
0.29	0.31	0.20	0.20
0.19	0.17	0.26	0.25

# Comparison using AIC
AICS <- cbind( c("M1", "M2", "M3"), round(c(AICEWG, AICC, AICE),2))
kable(AICS, col.names = c("Model", "AIC"))

Model	AIC
M1	24879.34
M2	24878.27
M3	24866.1

# Comparison of the fitted excess baseline hazards
true.haz <- Vectorize(function(t) hexpw(t,ae,be,ce))
fit1 <- Vectorize(function(t) hexpw(t, MLEEWG[1], MLEEWG[2], MLEEWG[3]))
fit2 <- Vectorize(function(t) hexpw(t, MLEC[2], MLEC[3], MLEC[4]))
fit3 <- Vectorize(function(t) hexpw(t, MLEE[3], MLEE[4], MLEE[5]))

curve(true.haz, 0, 5, lwd = 2, xlab = "Time", ylab ="Baseline Excess Hazard", main = "Fitted Excess Hazards",
      cex.axis = 1.5, cex.lab = 1.5, ylim= c(0.05,0.475), n = 1000)
curve(fit1, 0, 5, lwd = 2, lty = 2, col = "red",add = T, n = 1000)
curve(fit2, 0, 5, lwd = 2, lty = 2, col = "orange",add = T, n = 1000)
curve(fit3, 0, 5, lwd = 2, lty = 2, col = "blue",add = T, n = 1000)
legend(4,0.475, c("True","M1","M2","M3"),
       text.col = c("black","red","orange","blue"), col = c("black", "red", "orange", "blue"), lty = c(1, 2, 2, 2), lwd=2,
       merge = TRUE, bg = "white")

References

Perme, M.P., J. Stare, and J. Estève. 2012. “On Estimation in Relative Survival.” Biometrics 68: 113–20.

Rubio, F.J., B. Rachet, R. Giorgi, C. Maringe, and A. Belot. 2019. “On Models for the Estimation of the Excess Mortality Hazard in Case of Insufficiently Stratified Life Tables.” Biostatistics in press: na–na.

Rubio, F.J., L. Remontet, N.P. Jewell, and A. Belot. 2018. “On a General Structure for Hazard-Based Regression Models: An Application to Population-Based Cancer Research.” Statistical Methods in Medical Research in press: na–na.

Simulation design I: Excess hazard models for insufficiently stratified life tables

F. Javier Rubio and Aurelien Belot

April 14, 2019