Survival Analysis with Recurrent Events clustered in patients using HMC.

The aim of this study was to investigate the different effects of several factors on hospital readmission among patients diagnosed with colorectal cancer.(J.R. Gonza´lez, 2005).The main outcome is the hospital readmmision times related to colorectal cancer after surgical procedure, considering it as a potential recurrent event (colorectal cancer patients may have several readmission after discharge).   The first ten observations of the data set is given as follows:

load("DataColonDukesABvsC.rda")
data<-DataColonDukesABvsC[,c(3,5,7:10)]
data$event<-1-data$event
names<-c(1,4:6)
data[,names]<-lapply(data[,names],factor)
attach(data)
data<-data[order(id),]
detach(data)
head(data, n=10)
#>    id time event chemoter dukes distance
#> 1   2  489     0        1     2        1
#> 2   2  693     1        1     2        1
#> 3   3   15     0        1     2        1
#> 4   3  768     1        1     2        1
#> 5   4  163     0        2     1        1
#> 6   4  125     0        2     1        1
#> 7   4  350     0        2     1        1
#> 8   4   48     0        2     1        1
#> 9   4 1362     1        2     1        1
#> 10  5 1134     0        1     2        1

Fitting models with brms

Suppose we want to predict the recurrence time which is possibly censored. A log-normal mixed effects model could be used, in which the intercept is nested within patients. Then, we may use the following code:

library(brms)
fit1 <- brm(formula = time | cens(event) ~ chemoter +   dukes + distance + (1|id),
            data = data, family = lognormal(),
            prior = c(set_prior("normal(0,0.5)", class = "b"),
                      set_prior("cauchy(0,2)", class = "sd")),
            warmup =1500, iter = 3000, chains = 4, control = list(adapt_delta = 0.99),thin=3)
  • formula: On the left side of ~, cens makes up the internal function that handles censored data, and event is the variable that contains information on the censoring. On the right hand, the predictors chemoter, dukes and distance are treated as population level, only one group-level term is specified which implies that intercept of the model is supposed to vary between patients.
  • Distribution of the response variable: lognormal can be used (among others) for survival regression.(Other: Gamma, exponential, and weibull)
  • prior: + we put the same prior (Normal(0,0.5)) on all population-level effects at once. + For the standard deviation of the group-level intercepts, which is restricted to be non-negative and, by default, the prior has been set to be a half Student-t distribution with 3 degrees of freedom and a scale parameter that is minimally 10. + the residual standard deviation sigma in this log-normal model has a half cauchy prior with scale parameter 10 (by default, 5).
  • control: to reduce the number of divergent transitions that cause a bias in the obtained posterior samples, we increase adapt_delta from the default 0.8 to 0.99.

Results analyzing and diagnostics

The model fit1 is fitted using 4 chains, each with 3000 iterations of which the first 1500 are warmup to calibrate the sampler with thinning rate being 3, leading to a total of 2000 posterior samples.   A summary of the fitted model is available using:

summary(fit1)
#>  Family: lognormal 
#>   Links: mu = identity; sigma = identity 
#> Formula: time | cens(event) ~ chemoter + dukes + distance + (1 | id) 
#>    Data: data (Number of observations: 654) 
#> Samples: 4 chains, each with iter = 3000; warmup = 1500; thin = 3;
#>          total post-warmup samples = 2000
#> 
#> Group-Level Effects: 
#> ~id (Number of levels: 327) 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept)     1.63      0.17     1.31     1.99 1.00     1420     1688
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept     7.39      0.33     6.74     8.04 1.00     1932     1873
#> chemoter2     0.21      0.34    -0.43     0.91 1.00     1871     1696
#> dukes2       -0.61      0.33    -1.26     0.02 1.00     1917     1975
#> distance2    -0.07      0.48    -1.05     0.86 1.00     1807     1933
#> 
#> Family Specific Parameters: 
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     2.20      0.11     2.00     2.43 1.00     1931     1723
#> 
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

From the above output, all of the Rhat values equals to 1 indicating the convergence of 4 chains. (If Rhat is considerably greater than 1, the chains have not yet converged and it is necessary to run more iterations and/or set stronger priors). The values of Bulk_ESS and Tail_ESS for each parameters (the effective sample sizes of the 5% and 95% quantiles) are all smaller than 2000, so the sample size could be considered as large enough.   * To visually investigate the posterior distributions, pairs plots show univariate histograms and bivariate scatter plots for selected parameters. Especially, a positive collinearity between variables dukes and chemoter was identified through the plots.

pairs(fit1,off_diag_args = list(size = 1/3, alpha = 1/3))

 Q: how to interpret the negative relationship between the parameters corresponding to intercept and chemoters?

  * From the following trank plots, we can see that the histograms of 4 chains are similar to one another, largely overlapping, staying within the same range, showing a good mixing of lines:

library(ggplot2)
library(magrittr)
library(bayesplot)

post <- posterior_samples(fit1, add_chain = T)

post %>% 
  mcmc_rank_overlay(pars = vars(b_Intercept, b_dukes2,b_chemoter2,b_distance2,sigma)) +
  coord_cartesian(ylim = c(10, NA)) 

  • Trace plots: Something like random normal draws over a series. Trace plots in general should look like a ‘fuzzy caterpillar’. In this example, chains are well converging with one another, not getting stuck around certain estimates, or not separating from one another.
mcmc_trace(post[, c(1:6)],  facet_args = list(ncol = 2), size = .15) 

  • Autocorrelation plots: The correlations dropped rapidly off
 mcmc_acf(post, vars(b_Intercept, b_dukes2,b_chemoter2,b_distance2,sigma),lags = 5) 

Hypothesis testing and model comparison

Looking at the population-level effects, the parameter estimates of variables, chemoter, dukes and distance are suspiciously small. To test whether they equal to 0, we apply the hypothesis method:

h <- c("chemoter2=0", " dukes2=0"," distance2=0")
hyps <- hypothesis(fit1, h, alpha=0.05)
plot(hyps, ignore_prior = TRUE)

Seen from the three posterior probability plots, 0 locates inside 95% credible intervals, which may raise the question if every coefficient should be modeled as a fixed effect term at all. To answer this question, thus we fit another model without chemoter and distance terms:

fit2<- update(fit1, formula. = ~ . - chemoter- distance )

A good way to compare both models is leave-one-out cross-validation (LOO), which can be called in brms using:

loo(fit1,fit2)
#> Warning: Found 32 observations with a pareto_k > 0.7 in model 'fit1'. It is
#> recommended to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.
#> Warning: Found 24 observations with a pareto_k > 0.7 in model 'fit2'. It is
#> recommended to set 'moment_match = TRUE' in order to perform moment matching for
#> problematic observations.
#> Output of model 'fit1':
#> 
#> Computed from 2000 by 654 log-likelihood matrix
#> 
#>          Estimate    SE
#> elpd_loo  -2434.4  85.2
#> p_loo       113.8   4.8
#> looic      4868.8 170.5
#> ------
#> Monte Carlo SE of elpd_loo is NA.
#> 
#> Pareto k diagnostic values:
#>                          Count Pct.    Min. n_eff
#> (-Inf, 0.5]   (good)     540   82.6%   380       
#>  (0.5, 0.7]   (ok)        82   12.5%   148       
#>    (0.7, 1]   (bad)       29    4.4%   160       
#>    (1, Inf)   (very bad)   3    0.5%   1762      
#> See help('pareto-k-diagnostic') for details.
#> 
#> Output of model 'fit2':
#> 
#> Computed from 2000 by 654 log-likelihood matrix
#> 
#>          Estimate    SE
#> elpd_loo  -2433.5  85.3
#> p_loo       112.8   4.8
#> looic      4867.1 170.5
#> ------
#> Monte Carlo SE of elpd_loo is NA.
#> 
#> Pareto k diagnostic values:
#>                          Count Pct.    Min. n_eff
#> (-Inf, 0.5]   (good)     548   83.8%   463       
#>  (0.5, 0.7]   (ok)        82   12.5%   216       
#>    (0.7, 1]   (bad)       23    3.5%   47        
#>    (1, Inf)   (very bad)   1    0.2%   1796      
#> See help('pareto-k-diagnostic') for details.
#> 
#> Model comparisons:
#>      elpd_diff se_diff
#> fit2  0.0       0.0   
#> fit1 -0.9       0.9

Q: Could you please have a look at the warning messages of loo command?

Discussion on LOO diagnostics

First plot fitted scale parameters k for the 655 leave-one-out data sets:

yrep <- posterior_predict(fit1)
loo1 <- loo(fit1, save_psis = TRUE)
loo1
#> 
#> Computed from 2000 by 654 log-likelihood matrix
#> 
#>          Estimate    SE
#> elpd_loo  -2433.4  85.2
#> p_loo       112.9   4.8
#> looic      4866.8 170.4
#> ------
#> Monte Carlo SE of elpd_loo is NA.
#> 
#> Pareto k diagnostic values:
#>                          Count Pct.    Min. n_eff
#> (-Inf, 0.5]   (good)     547   83.6%   305       
#>  (0.5, 0.7]   (ok)        80   12.2%   114       
#>    (0.7, 1]   (bad)       22    3.4%   155       
#>    (1, Inf)   (very bad)   5    0.8%   1857      
#> See help('pareto-k-diagnostic') for details.
plot(loo1)

From the above plot, 23 observations with a pareto_k > 0.7 are found in model, indicating that the variance of the raw importance ratios is infinite,the generalized central limit theorem for stable distributions holds, and the convergence of the estimate is slower. The variance of the PSIS estimate is finite but may be large. For the one with k > 1, the variance and the mean of the raw ratios distribution do not exist, therefore Monte Carlo SE of elpd_loo is NA. of the PSIS estimate is finite but may be large thus not reliable!

  Now have a look at the plot of LOO-PIT values are cumulative probabilities for yi computed using the LOO marginal predictive distributions p(yi|y−i).

ppc_loo_pit_overlay(
  y = na.omit(data)$time,
  yrep = yrep,
  lw = weights(loo1$psis_object)
)

loo should be used?

waic(fit1, fit2)
#> Output of model 'fit1':
#> 
#> Computed from 2000 by 654 log-likelihood matrix
#> 
#>           Estimate    SE
#> elpd_waic  -2427.9  85.2
#> p_waic       107.4   4.5
#> waic        4855.8 170.4
#> 
#> 45 (6.9%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
#> 
#> Output of model 'fit2':
#> 
#> Computed from 2000 by 654 log-likelihood matrix
#> 
#>           Estimate    SE
#> elpd_waic  -2427.8  85.2
#> p_waic       107.0   4.5
#> waic        4855.5 170.4
#> 
#> 44 (6.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
#> 
#> Model comparisons:
#>      elpd_diff se_diff
#> fit2  0.0       0.0   
#> fit1 -0.1       0.8

Conclusion: removing variables chemoter and distance cannot improve the predictive accuracy.