Bayesian survival modelling

Introduction

Commonly survival analyses involve the Cox proportional hazards model. However, a parameteric approach can improve precision and also leads to smooth survival function. The problem with a parameteric form is that it constrains you to the assumptions of that model. Here we look at a weibull proportional hazard model fit in stan.

Stan and Survival

You will need to load the following libraries and environment settings.

library(rstan)
library(flexsurv)
library(data.table)
# See https://github.com/maj-biostat/pwexp
library(pwexp)
rstan::rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
options(mc.cores = 1)

The stan code for a weibull model is shown below.

functions {
  real weibull2_lpdf(real y, real alpha, real lambda) {
    real prob;
    real lprob;
    prob = alpha * lambda * y^(alpha - 1) * exp(- lambda*y^alpha);
    lprob = log(prob);
    return lprob;
  }
  real weibull2_lcdf(real y, real alpha, real lambda) {
    real prob;
    real lprob;
    prob = 1-exp(-lambda * y^alpha);
    lprob = log(prob);
    return lprob;
  }
  real weibull2_lccdf(real y, real alpha, real lambda) {
    real prob;
    real lprob;
    prob = exp(-lambda * y^alpha);
    lprob = log(prob);
    return lprob;
  }
  real weibull2_rng(real alpha, real lambda) {
    real u;
    real z;
    u = uniform_rng(0, 1);
    z = (-log(u)/lambda)^(1/alpha);
    return z;
  }
  real weibull2_lb_rng(real alpha, real lambda, real lb) {
    real p = exp(weibull2_lcdf(lb | alpha, lambda));
    real u;
    real z;
    if(is_nan(p)) p = 0.999;
    else if (p >= 1) p = 0.999;
    else if (p <= 0) p = 0;
    u = uniform_rng(p, 1);      
    z = (-log(1-u)/lambda)^(1/alpha);
    return z;
  }
}
data {
  int<lower=0> N;
  int<lower=0> K;
  real y[N];
  int v[N];
  matrix[N, K] X;
  real<lower=0> alpha_a0;
  real<lower=0> alpha_b0;
  real<lower=0> beta_sig;
}
parameters {
  real<lower=0> alpha;
  vector[K] beta;
}
transformed parameters {
  vector[N] lp;
  real shape;
  real scale;
  lp = exp(X*beta);
  shape = alpha;
  scale = exp(beta[1]);
}
model {
  alpha ~ gamma(alpha_a0, alpha_b0);
  beta ~ normal(0, beta_sig);
  for(n in 1:N) {
    if(v[n] == 0) // right-censored
      target += weibull2_lccdf(y[n] | alpha, lp[n]);
    else if(v[n] == 1) // observed event time
      target += weibull2_lpdf(y[n] | alpha, lp[n]);
  }
}
generated quantities {
}

With a Bayesian analyses we are looking to find the joint posterior of the parameters conditional on the model, priors and data. Generically, this is stated as

\[ \begin{aligned} p(\theta|X) \propto p(X|\theta) p(\theta) \end{aligned} \]

which in words just says that the posterior of the parameters given the data \(p(\theta|X)\) is proportional to the product of the prior \(p(\theta)\) and the likelihood \(p(X|\theta)\).

Now, the Weibull distribution in the PH form has density

\[ \begin{aligned} f(y) = \alpha m y^{\alpha-1} \exp(- m y^\alpha) \end{aligned} \]

where the shape parameter is \(\alpha\) and the scale parameter is \(m\). The cdf, survival, cumulative hazard and hazard are

\[ \begin{aligned} F(y) &= 1 - \exp(-m y^\alpha) \\ S(y) &= \exp(-m y^\alpha) \\ H(y) &= -m y^\alpha \\ h(y) &= \alpha my^{\alpha-1} \\ \end{aligned} \]

and the proportional hazards model assumes

\[ \begin{aligned} h_0(y) &= \alpha m_0 y^{\alpha - 1} \\ &= \alpha \exp(\beta_0) y^{\alpha - 1} \\ \\ h_1(y) &= \alpha m_1 y^{\alpha - 1} \\ &= \alpha \exp(\beta_0 + \beta_1) y^{\alpha - 1} \\ &= \alpha \exp(\beta_0)\exp(\beta_1) y^{\alpha - 1} \\ &= h_0(y) \exp(\beta_1) \end{aligned} \]

where the log scale parameter \(\log(m) = X\beta\) has been used to introduce covariates in the form of a linear predictor. We are generally interested in the parameter estimates \(\beta\).

All of the above functions are implemented within the stan code in the functions block and at each iteration of the model block. The log posterior is updated via the log density weibull2_lpdf for events and the log survival weibull2_lccdf for censored values. The model uses a gamma distribution with user nominated hyper-parameters and the \(\beta\)’s are given normal priors, again with a user nominated hyper-prior for the scale.

Data generation

For the weibull distribution, when the shape is less than, equal to and greater than one, the hazard will be decreasing, constant and increasing respectively. For example, the code below simulates survival times with an increasing hazard.

set.seed(2)
n <- 1000
trt <- rbinom(n, size = 1, prob = 0.5)
logb0 <- -3
logb1 <- 1
y <- sapply(1:n, function(i){
  if(trt[i] == 0){
    # shape controls whether hazard is decreasing, constant or increasing
    # over time. scale is used to introduce a linear predictor. 
    rweibullPH(1, shape = exp(0.5), scale = exp(logb0))  
  } else {
    rweibullPH(1, shape = exp(0.5), scale = exp(logb0 + logb1))  
  }
})
evt <- rep(1, n)

A weibull PH model can be fit with flexsurv from which we see that the parameter estimates are close to what we used to simulated the data.

f1 <- flexsurvreg(Surv(y, evt)~trt, dist = "weibullPH")
print(f1)

## Call:
## flexsurvreg(formula = Surv(y, evt) ~ trt, dist = "weibullPH")
## 
## Estimates: 
##        data mean  est      L95%     U95%     se       exp(est)  L95%   
## shape       NA    1.59638  1.52037  1.67618  0.03973       NA        NA
## scale       NA    0.05587  0.04649  0.06713  0.00524       NA        NA
## trt    0.49700    0.95373  0.82084  1.08661  0.06780  2.59536   2.27242
##        U95%   
## shape       NA
## scale       NA
## trt    2.96420
## 
## N = 1000,  Events: 1000,  Censored: 0
## Total time at risk: 4244.379
## Log-likelihood = -2261.398, df = 3
## AIC = 4528.796

f1$coefficients

##     shape     scale       trt 
##  0.467738 -2.884767  0.953726

Modelling

mod1 <- rstan::stan_model("weibullph_002.stan" ,verbose = F)
# suppressMessages(rstan::expose_stan_functions(mod1))

ld <- list(N = n,
           # number of groups
           K = 2, 
           # design matrix
           X = cbind(rep(1, n), trt),
           y = y,
           v = evt,
           alpha_a0 = 1,
           alpha_b0 = 2,
           beta_sig = 5)

f2 <- rstan::sampling(object  = mod1,
                        data    = ld,
                        chains  = 1,
                        thin    = 3,
                        iter    = 10000,
                        warmup  = 2000,
                        refresh = 0)

Parameter estimates similar to those from flexsurv (exponentiate the shape from the flexsurv output to align with stan results)

print(f2, pars = c("alpha", "beta"), digits = 3, probs = c(0.025, 0.5, 0.975))

## Inference for Stan model: weibullph_002.
## 1 chains, each with iter=10000; warmup=2000; thin=3; 
## post-warmup draws per chain=2667, total post-warmup draws=2667.
## 
##           mean se_mean    sd   2.5%    50%  97.5% n_eff  Rhat
## alpha    1.594   0.001 0.040  1.518  1.593  1.671  2023 1.001
## beta[1] -2.878   0.002 0.094 -3.066 -2.879 -2.695  1940 1.001
## beta[2]  0.949   0.001 0.068  0.813  0.951  1.082  2279 1.000
## 
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 12:37:11 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

# flexsurv results for comparison
f1$coefficients

##     shape     scale       trt 
##  0.467738 -2.884767  0.953726

The posterior predictive distribution is the distribution of unobserved values conditional on those observed. It can be used to sanity check the model fit. In a nutshell, we take draws from the posterior and use those draws to simulate new data. This can usually be done within the generated quantitites block of the stan model, but below I do it in R to give a better view of the process.

post_predictive <- function(ff, nsim = 20){
  post <- rstan::extract(ff)
  nsim <- 20
  idx_smpl <- sample(1:nrow(post$alpha), replace = FALSE, size = nsim)
  m0 <- matrix(NA, nrow = n, ncol = nsim)
  m1 <- matrix(NA, nrow = n, ncol = nsim)
  for(i in 1:nsim){
    m0[, i] <- rweibullPH(n, 
                         shape = post$shape[idx_smpl[i]], 
                         scale = exp(post$beta[idx_smpl[i], 1]))
    
    m1[, i] <- rweibullPH(n, 
                         shape = post$shape[idx_smpl[i]], 
                         scale = exp(post$beta[idx_smpl[i], 1] +
                                       post$beta[idx_smpl[i], 2]))
  }
  return(list(m0 = m0, m1 = m1))
}
nsim <- 20
pp <- post_predictive(f2, nsim)

Now visualise the posterior predictive relative to the original data. Note that there are fancier ways of doing this, e.g. bayesplot(https://mc-stan.org/bayesplot/).

xlim <- range(as.numeric(c(pp$m0, pp$m1, y)))
par(mfrow = c(1, 2))
hist(y[trt == 0], prob = TRUE, main = "Control group", xlim = xlim)
for(i in 1:nsim){
  lines(density(pp$m0[, i]), col = 2, lwd = 0.3)
}
hist(y[trt == 1], prob = TRUE, main = "Treatment group", xlim = xlim)
for(i in 1:nsim){
  lines(density(pp$m1[, i]), col = 2, lwd = 0.3)
}

par(mfrow = c(1, 1))

What you are looking for is that the data arising from the posterior predictive is consistent with the original data. If it is not, then you may need to go back to the drawing board.

Misspecification

The weibull distribution only supports constant or monotonically increasing or decreasing hazards. As such, the weibull model would not be suitable for data associated with the following hazard function.

tt <- seq(1e-06, 75, len = 250)
bins = c(0, 10, 20, 30, 40, 50)
rate = c(1/90, 1/40, 1/10, 1/20, 1/40, 1/100)
hh <- pwexp::hpw(tt, bins, rate)
plot(tt, hh, type = "l", ylim = c(0, max(hh)), ylab = "Hazard", xlab = "Time")

Generate survival times from this hazard

set.seed(2)
n <- 1000
trt <- rbinom(n, size = 1, prob = 0.5)
logb0 <- -3
logb1 <- 1
y <- sapply(1:n, function(i){
  if(trt[i] == 0){
    # shape controls whether hazard is decreasing, constant or increasing
    # over time. scale is used to introduce a linear predictor. 
    pwexp::rpw(1, bins, rate)  
  } else {
    pwexp::rpw(1, bins, 1.2 * rate)  
  }
})
evt <- rep(1, n)

Fit the weibull model

ld <- list(N = n,
           # number of groups
           K = 2, 
           # design matrix
           X = cbind(rep(1, n), trt),
           y = y,
           v = evt,
           alpha_a0 = 1,
           alpha_b0 = 2,
           beta_sig = 5)

f3 <- rstan::sampling(object  = mod1,
                        data    = ld,
                        chains  = 1,
                        thin    = 3,
                        iter    = 10000,
                        warmup  = 2000,
                        refresh = 0)

print(f3, pars = c("alpha", "beta"), digits = 3, probs = c(0.025, 0.5, 0.975))

## Inference for Stan model: weibullph_002.
## 1 chains, each with iter=10000; warmup=2000; thin=3; 
## post-warmup draws per chain=2667, total post-warmup draws=2667.
## 
##           mean se_mean    sd   2.5%    50%  97.5% n_eff Rhat
## alpha    1.006   0.000 0.021  0.964  1.006  1.046  2151    1
## beta[1] -3.656   0.002 0.101 -3.849 -3.656 -3.456  2155    1
## beta[2]  0.265   0.001 0.064  0.138  0.265  0.389  2286    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 12:37:41 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Explore the posterior predictive. Clearly, there is a lack of consistency between the observed data and what arose from the posterior predictive.

nsim <- 20
pp <- post_predictive(f3, nsim)
xlim <- range(as.numeric(c(pp$m0, pp$m1, y)))
par(mfrow = c(1, 2))
hist(y[trt == 0], prob = TRUE, main = "Control group", xlim = xlim)
for(i in 1:nsim){
  lines(density(pp$m0[, i]), col = 2, lwd = 0.3)
}
hist(y[trt == 1], prob = TRUE, main = "Treatment group", xlim = xlim)
for(i in 1:nsim){
  lines(density(pp$m1[, i]), col = 2, lwd = 0.3)
}