Commonly survival analyses involve the Cox proportional hazards model. However, a parameteric approach can improve precision and also leads to smooth survival function. The problem with a parameteric form is that it constrains you to the assumptions of that model. Here we look at a weibull proportional hazard model fit in stan.
You will need to load the following libraries and environment settings.
library(rstan)
library(flexsurv)
library(data.table)
# See https://github.com/maj-biostat/pwexp
library(pwexp)
rstan::rstan_options(auto_write = TRUE)
Sys.setenv(LOCAL_CPPFLAGS = '-march=native')
options(mc.cores = 1)
The stan code for a weibull model is shown below.
functions {
real weibull2_lpdf(real y, real alpha, real lambda) {
real prob;
real lprob;
prob = alpha * lambda * y^(alpha - 1) * exp(- lambda*y^alpha);
lprob = log(prob);
return lprob;
}
real weibull2_lcdf(real y, real alpha, real lambda) {
real prob;
real lprob;
prob = 1-exp(-lambda * y^alpha);
lprob = log(prob);
return lprob;
}
real weibull2_lccdf(real y, real alpha, real lambda) {
real prob;
real lprob;
prob = exp(-lambda * y^alpha);
lprob = log(prob);
return lprob;
}
real weibull2_rng(real alpha, real lambda) {
real u;
real z;
u = uniform_rng(0, 1);
z = (-log(u)/lambda)^(1/alpha);
return z;
}
real weibull2_lb_rng(real alpha, real lambda, real lb) {
real p = exp(weibull2_lcdf(lb | alpha, lambda));
real u;
real z;
if(is_nan(p)) p = 0.999;
else if (p >= 1) p = 0.999;
else if (p <= 0) p = 0;
u = uniform_rng(p, 1);
z = (-log(1-u)/lambda)^(1/alpha);
return z;
}
}
data {
int<lower=0> N;
int<lower=0> K;
real y[N];
int v[N];
matrix[N, K] X;
real<lower=0> alpha_a0;
real<lower=0> alpha_b0;
real<lower=0> beta_sig;
}
parameters {
real<lower=0> alpha;
vector[K] beta;
}
transformed parameters {
vector[N] lp;
real shape;
real scale;
lp = exp(X*beta);
shape = alpha;
scale = exp(beta[1]);
}
model {
alpha ~ gamma(alpha_a0, alpha_b0);
beta ~ normal(0, beta_sig);
for(n in 1:N) {
if(v[n] == 0) // right-censored
target += weibull2_lccdf(y[n] | alpha, lp[n]);
else if(v[n] == 1) // observed event time
target += weibull2_lpdf(y[n] | alpha, lp[n]);
}
}
generated quantities {
}
With a Bayesian analyses we are looking to find the joint posterior of the parameters conditional on the model, priors and data. Generically, this is stated as
\[ \begin{aligned} p(\theta|X) \propto p(X|\theta) p(\theta) \end{aligned} \]
which in words just says that the posterior of the parameters given the data \(p(\theta|X)\) is proportional to the product of the prior \(p(\theta)\) and the likelihood \(p(X|\theta)\).
Now, the Weibull distribution in the PH form has density
\[ \begin{aligned} f(y) = \alpha m y^{\alpha-1} \exp(- m y^\alpha) \end{aligned} \]
where the shape parameter is \(\alpha\) and the scale parameter is \(m\). The cdf, survival, cumulative hazard and hazard are
\[ \begin{aligned} F(y) &= 1 - \exp(-m y^\alpha) \\ S(y) &= \exp(-m y^\alpha) \\ H(y) &= -m y^\alpha \\ h(y) &= \alpha my^{\alpha-1} \\ \end{aligned} \]
and the proportional hazards model assumes
\[ \begin{aligned} h_0(y) &= \alpha m_0 y^{\alpha - 1} \\ &= \alpha \exp(\beta_0) y^{\alpha - 1} \\ \\ h_1(y) &= \alpha m_1 y^{\alpha - 1} \\ &= \alpha \exp(\beta_0 + \beta_1) y^{\alpha - 1} \\ &= \alpha \exp(\beta_0)\exp(\beta_1) y^{\alpha - 1} \\ &= h_0(y) \exp(\beta_1) \end{aligned} \]
where the log scale parameter \(\log(m) = X\beta\) has been used to introduce covariates in the form of a linear predictor. We are generally interested in the parameter estimates \(\beta\).
All of the above functions are implemented within the stan code in the functions block and at each iteration of the model block. The log posterior is updated via the log density weibull2_lpdf
for events and the log survival weibull2_lccdf
for censored values. The model uses a gamma distribution with user nominated hyper-parameters and the \(\beta\)’s are given normal priors, again with a user nominated hyper-prior for the scale.
For the weibull distribution, when the shape is less than, equal to and greater than one, the hazard will be decreasing, constant and increasing respectively. For example, the code below simulates survival times with an increasing hazard.
set.seed(2)
n <- 1000
trt <- rbinom(n, size = 1, prob = 0.5)
logb0 <- -3
logb1 <- 1
y <- sapply(1:n, function(i){
if(trt[i] == 0){
# shape controls whether hazard is decreasing, constant or increasing
# over time. scale is used to introduce a linear predictor.
rweibullPH(1, shape = exp(0.5), scale = exp(logb0))
} else {
rweibullPH(1, shape = exp(0.5), scale = exp(logb0 + logb1))
}
})
evt <- rep(1, n)
A weibull PH model can be fit with flexsurv
from which we see that the parameter estimates are close to what we used to simulated the data.
f1 <- flexsurvreg(Surv(y, evt)~trt, dist = "weibullPH")
print(f1)
## Call:
## flexsurvreg(formula = Surv(y, evt) ~ trt, dist = "weibullPH")
##
## Estimates:
## data mean est L95% U95% se exp(est) L95%
## shape NA 1.59638 1.52037 1.67618 0.03973 NA NA
## scale NA 0.05587 0.04649 0.06713 0.00524 NA NA
## trt 0.49700 0.95373 0.82084 1.08661 0.06780 2.59536 2.27242
## U95%
## shape NA
## scale NA
## trt 2.96420
##
## N = 1000, Events: 1000, Censored: 0
## Total time at risk: 4244.379
## Log-likelihood = -2261.398, df = 3
## AIC = 4528.796
f1$coefficients
## shape scale trt
## 0.467738 -2.884767 0.953726
mod1 <- rstan::stan_model("weibullph_002.stan" ,verbose = F)
# suppressMessages(rstan::expose_stan_functions(mod1))
ld <- list(N = n,
# number of groups
K = 2,
# design matrix
X = cbind(rep(1, n), trt),
y = y,
v = evt,
alpha_a0 = 1,
alpha_b0 = 2,
beta_sig = 5)
f2 <- rstan::sampling(object = mod1,
data = ld,
chains = 1,
thin = 3,
iter = 10000,
warmup = 2000,
refresh = 0)
Parameter estimates similar to those from flexsurv
(exponentiate the shape from the flexsurv
output to align with stan results)
print(f2, pars = c("alpha", "beta"), digits = 3, probs = c(0.025, 0.5, 0.975))
## Inference for Stan model: weibullph_002.
## 1 chains, each with iter=10000; warmup=2000; thin=3;
## post-warmup draws per chain=2667, total post-warmup draws=2667.
##
## mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
## alpha 1.594 0.001 0.040 1.518 1.593 1.671 2023 1.001
## beta[1] -2.878 0.002 0.094 -3.066 -2.879 -2.695 1940 1.001
## beta[2] 0.949 0.001 0.068 0.813 0.951 1.082 2279 1.000
##
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 12:37:11 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
# flexsurv results for comparison
f1$coefficients
## shape scale trt
## 0.467738 -2.884767 0.953726
The posterior predictive distribution is the distribution of unobserved values conditional on those observed. It can be used to sanity check the model fit. In a nutshell, we take draws from the posterior and use those draws to simulate new data. This can usually be done within the generated quantitites
block of the stan model, but below I do it in R to give a better view of the process.
post_predictive <- function(ff, nsim = 20){
post <- rstan::extract(ff)
nsim <- 20
idx_smpl <- sample(1:nrow(post$alpha), replace = FALSE, size = nsim)
m0 <- matrix(NA, nrow = n, ncol = nsim)
m1 <- matrix(NA, nrow = n, ncol = nsim)
for(i in 1:nsim){
m0[, i] <- rweibullPH(n,
shape = post$shape[idx_smpl[i]],
scale = exp(post$beta[idx_smpl[i], 1]))
m1[, i] <- rweibullPH(n,
shape = post$shape[idx_smpl[i]],
scale = exp(post$beta[idx_smpl[i], 1] +
post$beta[idx_smpl[i], 2]))
}
return(list(m0 = m0, m1 = m1))
}
nsim <- 20
pp <- post_predictive(f2, nsim)
Now visualise the posterior predictive relative to the original data. Note that there are fancier ways of doing this, e.g. bayesplot
(https://mc-stan.org/bayesplot/).
xlim <- range(as.numeric(c(pp$m0, pp$m1, y)))
par(mfrow = c(1, 2))
hist(y[trt == 0], prob = TRUE, main = "Control group", xlim = xlim)
for(i in 1:nsim){
lines(density(pp$m0[, i]), col = 2, lwd = 0.3)
}
hist(y[trt == 1], prob = TRUE, main = "Treatment group", xlim = xlim)
for(i in 1:nsim){
lines(density(pp$m1[, i]), col = 2, lwd = 0.3)
}
par(mfrow = c(1, 1))
What you are looking for is that the data arising from the posterior predictive is consistent with the original data. If it is not, then you may need to go back to the drawing board.
The weibull distribution only supports constant or monotonically increasing or decreasing hazards. As such, the weibull model would not be suitable for data associated with the following hazard function.
tt <- seq(1e-06, 75, len = 250)
bins = c(0, 10, 20, 30, 40, 50)
rate = c(1/90, 1/40, 1/10, 1/20, 1/40, 1/100)
hh <- pwexp::hpw(tt, bins, rate)
plot(tt, hh, type = "l", ylim = c(0, max(hh)), ylab = "Hazard", xlab = "Time")
Generate survival times from this hazard
set.seed(2)
n <- 1000
trt <- rbinom(n, size = 1, prob = 0.5)
logb0 <- -3
logb1 <- 1
y <- sapply(1:n, function(i){
if(trt[i] == 0){
# shape controls whether hazard is decreasing, constant or increasing
# over time. scale is used to introduce a linear predictor.
pwexp::rpw(1, bins, rate)
} else {
pwexp::rpw(1, bins, 1.2 * rate)
}
})
evt <- rep(1, n)
Fit the weibull model
ld <- list(N = n,
# number of groups
K = 2,
# design matrix
X = cbind(rep(1, n), trt),
y = y,
v = evt,
alpha_a0 = 1,
alpha_b0 = 2,
beta_sig = 5)
f3 <- rstan::sampling(object = mod1,
data = ld,
chains = 1,
thin = 3,
iter = 10000,
warmup = 2000,
refresh = 0)
print(f3, pars = c("alpha", "beta"), digits = 3, probs = c(0.025, 0.5, 0.975))
## Inference for Stan model: weibullph_002.
## 1 chains, each with iter=10000; warmup=2000; thin=3;
## post-warmup draws per chain=2667, total post-warmup draws=2667.
##
## mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
## alpha 1.006 0.000 0.021 0.964 1.006 1.046 2151 1
## beta[1] -3.656 0.002 0.101 -3.849 -3.656 -3.456 2155 1
## beta[2] 0.265 0.001 0.064 0.138 0.265 0.389 2286 1
##
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 12:37:41 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Explore the posterior predictive. Clearly, there is a lack of consistency between the observed data and what arose from the posterior predictive.
nsim <- 20
pp <- post_predictive(f3, nsim)
xlim <- range(as.numeric(c(pp$m0, pp$m1, y)))
par(mfrow = c(1, 2))
hist(y[trt == 0], prob = TRUE, main = "Control group", xlim = xlim)
for(i in 1:nsim){
lines(density(pp$m0[, i]), col = 2, lwd = 0.3)
}
hist(y[trt == 1], prob = TRUE, main = "Treatment group", xlim = xlim)
for(i in 1:nsim){
lines(density(pp$m1[, i]), col = 2, lwd = 0.3)
}
par(mfrow = c(1, 1))