rstanarm feature/survival branch

stan_surv

Explanation

The CRAN version of rstanarm currently lacks capabilities for survival models. However, there is currently a feature branch on survival analyses, which we will demonstrate here.

What is most special about survival time modeling is the existence of the baseline hazard function. Frequentist proportional hazards regression (Cox regression) omits estimation of the baseline hazard function using a partial likelihood, from which the baseline hazard function drops out.

In the Bayesian paradigm, we need a full likelihood, resulting in a need to model it one way or another. This survival feature branch of rstanarm supports the following options for the baseline hazard function modeling.

cubic M-spline
cubic B-spline
exponential
weibull
gomperz

#' @param basehaz A character string indicating which baseline hazard to use
#'   for the event submodel. Current options are:
#'   \itemize{
#'     \item \code{"ms"}: a flexible parametric model using cubic M-splines to
#'     model the baseline hazard. The default locations for the internal knots,
#'     as well as the basis terms for the splines, are calculated with respect
#'     to time. If the model does \emph{not} include any time-dependendent
#'     effects then a closed form solution is available for both the hazard
#'     and cumulative hazard and so this approach should be relatively fast.
#'     On the other hand, if the model does include time-dependent effects then
#'     quadrature is used to evaluate the cumulative hazard at each MCMC
#'     iteration and, therefore, estimation of the model will be slower.
#'     \item \code{"bs"}: a flexible parametric model using cubic B-splines to
#'     model the \emph{log} baseline hazard. The default locations for the
#'     internal knots, as well as the basis terms for the splines, are calculated
#'     with respect to time. A closed form solution for the cumulative hazard
#'     is \strong{not} available regardless of whether or not the model includes
#'     time-dependent effects; instead, quadrature is used to evaluate
#'     the cumulative hazard at each MCMC iteration. Therefore, if your model
#'     does not include any time-dependent effects, then estimation using the
#'     \code{"ms"} baseline hazard will be faster.
#'     \item \code{"exp"}: an exponential distribution for the event times.
#'     (i.e. a constant baseline hazard)
#'     \item \code{"weibull"}: a Weibull distribution for the event times.
#'     \item \code{"gompertz"}: a Gompertz distribution for the event times.
#'   }

Exponential model

stan_surv_exponential <- stan_surv(formula = Surv(time, status) ~ x,
                                   data = leukemia,
                                   basehaz = "exp")
prior_summary(stan_surv_exponential)

## Priors for model 'stan_surv_exponential' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (NA)
##  ~ flat
## ------
## See help('prior_summary.stanreg') for more details

summary(stan_surv_exponential)

## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: exponential
##  formula:         Surv(time, status) ~ x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    23
##  events:          18 (78.3%)
##  right censored:  5 (21.7%)
##  delayed entry:   no
## 
## Estimates:
##                  mean   sd   10%   50%   90%
## (Intercept)    -4.1    0.4 -4.6  -4.1  -3.7 
## xNonmaintained  0.9    0.5  0.3   0.9   1.6 
## 
## MCMC diagnostics
##                mcse Rhat n_eff
## (Intercept)    0.0  1.0  2226 
## xNonmaintained 0.0  1.0  2512 
## log-posterior  0.0  1.0  1690 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

It looks like the log baseline hazard appears as the “(Intercept)” since the results are comparable to my previous attempt using the Poisson trick: Piecewise constant hazard Cox.

The plot method is designed to give the baseline hazard function

plot(stan_surv_exponential) + coord_cartesian(ylim = c(0, 0.1))

It is a constant in an exponential model.

Weibull model

stan_surv_weibull <- stan_surv(formula = Surv(time, status) ~ x,
                               data = leukemia,
                               basehaz = "weibull")
prior_summary(stan_surv_weibull)

## Priors for model 'stan_surv_weibull' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (weibull-shape)
##  ~ half-normal(location = 0, scale = 2)
## ------
## See help('prior_summary.stanreg') for more details

summary(stan_surv_weibull)

## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: weibull
##  formula:         Surv(time, status) ~ x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    23
##  events:          18 (78.3%)
##  right censored:  5 (21.7%)
##  delayed entry:   no
## 
## Estimates:
##                  mean   sd   10%   50%   90%
## (Intercept)    -5.3    1.0 -6.7  -5.2  -4.0 
## xNonmaintained  1.2    0.5  0.5   1.2   1.8 
## weibull-shape   1.3    0.2  1.0   1.3   1.6 
## 
## MCMC diagnostics
##                mcse Rhat n_eff
## (Intercept)    0.0  1.0  1050 
## xNonmaintained 0.0  1.0  1303 
## weibull-shape  0.0  1.0  1102 
## log-posterior  0.0  1.0  1232 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

It looks like the Weibull rate parameter appears as the “(Intercept)” and the Weibull shape parameter is handled as an auxiliary parameter with its own prior.

plot(stan_surv_weibull) + coord_cartesian(ylim = c(0, 0.1))

This seems to be an increasing hazard setting with the Weibull fit.

tidybayes::tidy_draws(stan_surv_weibull)

## # A tibble: 4,000 x 12
##    .chain .iteration .draw `(Intercept)` xNonmaintained `weibull-shape` accept_stat__ stepsize__ treedepth__
##     <int>      <int> <int>         <dbl>          <dbl>           <dbl>         <dbl>      <dbl>       <dbl>
##  1      1          1     1         -4.83          0.942            1.15         0.989      0.144           2
##  2      1          2     2         -4.89          0.592            1.31         0.997      0.144           4
##  3      1          3     3         -6.61          2.19             1.42         0.999      0.144           4
##  4      1          4     4         -7.01          2.16             1.53         0.998      0.144           4
##  5      1          5     5         -6.71          1.98             1.53         0.995      0.144           3
##  6      1          6     6         -4.95          0.934            1.26         0.954      0.144           4
##  7      1          7     7         -5.57          1.75             1.30         0.985      0.144           5
##  8      1          8     8         -5.04          0.548            1.34         0.933      0.144           4
##  9      1          9     9         -5.12          0.896            1.22         0.969      0.144           3
## 10      1         10    10         -6.47          1.63             1.43         0.985      0.144           4
## # … with 3,990 more rows, and 3 more variables: n_leapfrog__ <dbl>, divergent__ <dbl>, energy__ <dbl>

Gompertz model

stan_surv_gompertz <- stan_surv(formula = Surv(time, status) ~ x,
                                data = leukemia,
                                basehaz = "gompertz")

## Warning: There were 2 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.

## Warning: Examine the pairs() plot to diagnose sampling problems

prior_summary(stan_surv_gompertz)

## Priors for model 'stan_surv_gompertz' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (gompertz-scale)
##  ~ half-normal(location = 0, scale = 0.5)
## ------
## See help('prior_summary.stanreg') for more details

summary(stan_surv_gompertz)

## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: gompertz
##  formula:         Surv(time, status) ~ x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    23
##  events:          18 (78.3%)
##  right censored:  5 (21.7%)
##  delayed entry:   no
## 
## Estimates:
##                  mean   sd   10%   50%   90%
## (Intercept)    -4.4    0.4 -4.9  -4.4  -3.9 
## xNonmaintained  1.1    0.5  0.5   1.1   1.8 
## gompertz-scale  0.0    0.0  0.0   0.0   0.0 
## 
## MCMC diagnostics
##                mcse Rhat n_eff
## (Intercept)    0.0  1.0  1188 
## xNonmaintained 0.0  1.0  1565 
## gompertz-scale 0.0  1.0  1306 
## log-posterior  0.0  1.0  1027 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

Gompertz with its default configuration did not converge. More investigation is needed.

Cubic M-spline

stan_surv_mspline <- stan_surv(formula = Surv(time, status) ~ x,
                               data = leukemia,
                               basehaz = "ms")
prior_summary(stan_surv_mspline)

## Priors for model 'stan_surv_mspline' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (M-spline-coefficients)
##  ~ dirichlet(concentration = [1,1,1,...])
## ------
## See help('prior_summary.stanreg') for more details

summary(stan_surv_mspline)

## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: M-splines on hazard scale
##  formula:         Surv(time, status) ~ x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    23
##  events:          18 (78.3%)
##  right censored:  5 (21.7%)
##  delayed entry:   no
## 
## Estimates:
##                   mean   sd   10%   50%   90%
## (Intercept)     0.7    0.4  0.1   0.8   1.3  
## xNonmaintained  0.9    0.5  0.3   0.9   1.5  
## m-splines-coef1 0.0    0.0  0.0   0.0   0.0  
## m-splines-coef2 0.1    0.0  0.0   0.1   0.1  
## m-splines-coef3 0.4    0.2  0.2   0.4   0.7  
## m-splines-coef4 0.3    0.2  0.0   0.2   0.5  
## m-splines-coef5 0.1    0.1  0.0   0.1   0.3  
## m-splines-coef6 0.1    0.1  0.0   0.1   0.3  
## 
## MCMC diagnostics
##                 mcse Rhat n_eff
## (Intercept)     0.0  1.0  3192 
## xNonmaintained  0.0  1.0  3976 
## m-splines-coef1 0.0  1.0  3396 
## m-splines-coef2 0.0  1.0  3398 
## m-splines-coef3 0.0  1.0  2628 
## m-splines-coef4 0.0  1.0  2623 
## m-splines-coef5 0.0  1.0  3728 
## m-splines-coef6 0.0  1.0  3921 
## log-posterior   0.1  1.0  1004 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

The default configuration of the spline is explained in the basehaz_ops argument.

#' @param basehaz_ops A named list specifying options related to the baseline
#'   hazard. Currently this can include: \cr
#'   \itemize{
#'     \item \code{df}: a positive integer specifying the degrees of freedom
#'     for the M-splines or B-splines. An intercept is included in the spline
#'     basis and included in the count of the degrees of freedom, such that
#'     two boundary knots and \code{df - 4} internal knots are used to generate
#'     the cubic spline basis. The default is \code{df = 6}; that is, two
#'     boundary knots and two internal knots.
#'     \item \code{knots}: An optional numeric vector specifying internal
#'     knot locations for the M-splines or B-splines. Note that \code{knots}
#'     cannot be specified if \code{df} is specified. If \code{knots} are
#'     \strong{not} specified, then \code{df - 4} internal knots are placed
#'     at equally spaced percentiles of the distribution of uncensored event
#'     times.
#'   }
#'   Note that for the M-splines and B-splines - in addition to any internal
#'   \code{knots} - a lower boundary knot is placed at the earliest entry time
#'   and an upper boundary knot is placed at the latest event or censoring time.
#'   These boundary knot locations are the default and cannot be changed by the
#'   user.

plot(stan_surv_mspline) + coord_cartesian(ylim = c(0, 0.1))

With more flexible modeling, the baseline hazard is increasing then decreasing.

Cubic B-spline

stan_surv_bspline <- stan_surv(formula = Surv(time, status) ~ x,
                               data = leukemia,
                               basehaz = "bs")
prior_summary(stan_surv_bspline)

## Priors for model 'stan_surv_bspline' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (B-spline-coefficients)
##  ~ normal(location = [0,0,0,...], scale = [20,20,20,...])
## ------
## See help('prior_summary.stanreg') for more details

summary(stan_surv_bspline)

## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: B-splines on log hazard scale
##  formula:         Surv(time, status) ~ x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    23
##  events:          18 (78.3%)
##  right censored:  5 (21.7%)
##  delayed entry:   no
## 
## Estimates:
##                   mean   sd    10%   50%   90%
## (Intercept)     -10.8    3.6 -15.6 -10.4  -6.7
## xNonmaintained    0.9    0.5   0.3   0.9   1.5
## b-splines-coef1   8.8    4.4   3.5   8.3  14.6
## b-splines-coef2   5.0    3.3   1.1   4.6   9.5
## b-splines-coef3  14.2    5.4   7.6  13.9  21.3
## b-splines-coef4 -12.2   11.7 -27.5 -11.3   1.9
## b-splines-coef5 -12.6   14.8 -32.9 -10.7   5.1
## 
## MCMC diagnostics
##                 mcse Rhat n_eff
## (Intercept)     0.1  1.0  1082 
## xNonmaintained  0.0  1.0  2641 
## b-splines-coef1 0.1  1.0  1096 
## b-splines-coef2 0.1  1.0  1177 
## b-splines-coef3 0.2  1.0  1214 
## b-splines-coef4 0.3  1.0  1845 
## b-splines-coef5 0.4  1.0  1394 
## log-posterior   0.1  1.0  1282 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).

plot(stan_surv_bspline) + coord_cartesian(ylim = c(0, 0.1))

This baseline hazard estimate with B-spline appears wired. There is a region with a negative hazard, so there is some posterior resurrection happening…

The resurrection issue previously observed seems to have been solved as of 2020-11-12.

Baseline hazard function comparison

bayesplot::bayesplot_grid(plot(stan_surv_exponential),
                          plot(stan_surv_weibull),
                          plot(stan_surv_mspline),
                          plot(stan_surv_bspline),
                          ylim = c(-0.01, 0.15))

Top Page: http://rpubs.com/kaz_yos/
Github: https://github.com/kaz-yos

print(sessionInfo())

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin19.6.0 (64-bit)
## Running under: macOS Catalina 10.15.7
## 
## Matrix products: default
## BLAS:   /usr/local/Cellar/r/4.0.3/lib/R/lib/libRblas.dylib
## LAPACK: /usr/local/Cellar/r/4.0.3/lib/R/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] survival_3.2-7    bayesplot_1.7.2   tidybayes_2.3.1   rstanarm_2.21.2   Rcpp_1.0.5        forcats_0.5.0    
##  [7] stringr_1.4.0     dplyr_1.0.2       purrr_0.3.4       readr_1.4.0       tidyr_1.1.2       tibble_3.0.4     
## [13] ggplot2_3.3.2     tidyverse_1.3.0   doRNG_1.8.2       rngtools_1.5      doParallel_1.0.16 iterators_1.0.13 
## [19] foreach_1.5.1     knitr_1.30       
## 
## loaded via a namespace (and not attached):
##   [1] minqa_1.2.4          colorspace_2.0-0     ellipsis_0.3.1       ggridges_0.5.2       rsconnect_0.8.16    
##   [6] markdown_1.1         base64enc_0.1-3      fs_1.5.0             rstudioapi_0.12      farver_2.0.3        
##  [11] rstan_2.21.2         svUnit_1.0.3         DT_0.16              fansi_0.4.1          lubridate_1.7.9     
##  [16] xml2_1.3.2           splines_4.0.3        codetools_0.2-18     shinythemes_1.1.2    jsonlite_1.7.1      
##  [21] nloptr_1.2.2.2       broom_0.7.2          dbplyr_2.0.0         ggdist_2.3.0         shiny_1.5.0         
##  [26] compiler_4.0.3       httr_1.4.2           backports_1.2.0      Matrix_1.2-18        assertthat_0.2.1    
##  [31] fastmap_1.0.1        cli_2.1.0            later_1.1.0.1        htmltools_0.5.0      prettyunits_1.1.1   
##  [36] tools_4.0.3          igraph_1.2.6         coda_0.19-4          gtable_0.3.0         glue_1.4.2          
##  [41] reshape2_1.4.4       V8_3.4.0             cellranger_1.1.0     vctrs_0.3.4          nlme_3.1-150        
##  [46] crosstalk_1.1.0.1    xfun_0.19            ps_1.4.0             lme4_1.1-25          rvest_0.3.6         
##  [51] mime_0.9             miniUI_0.1.1.1       lifecycle_0.2.0      gtools_3.8.2         statmod_1.4.35      
##  [56] MASS_7.3-53          zoo_1.8-8            scales_1.1.1         colourpicker_1.1.0   hms_0.5.3           
##  [61] promises_1.1.1       inline_0.3.16        shinystan_2.5.0      yaml_2.2.1           curl_4.3            
##  [66] gridExtra_2.3        loo_2.3.1            StanHeaders_2.21.0-6 stringi_1.5.3        dygraphs_1.1.1.6    
##  [71] boot_1.3-25          pkgbuild_1.1.0       rlang_0.4.8          pkgconfig_2.0.3      matrixStats_0.57.0  
##  [76] distributional_0.2.1 evaluate_0.14        lattice_0.20-41      splines2_0.3.1       labeling_0.4.2      
##  [81] rstantools_2.1.1     htmlwidgets_1.5.2    tidyselect_1.1.0     processx_3.4.4       plyr_1.8.6          
##  [86] magrittr_1.5         R6_2.5.0             generics_0.1.0       DBI_1.1.0            pillar_1.4.6        
##  [91] haven_2.3.1          withr_2.3.0          xts_0.12.1           modelr_0.1.8         crayon_1.3.4        
##  [96] arrayhelpers_1.1-0   utf8_1.1.4           rmarkdown_2.5        grid_4.0.3           readxl_1.3.1        
## [101] callr_3.5.1          threejs_0.3.3        reprex_0.3.0         digest_0.6.27        xtable_1.8-4        
## [106] httpuv_1.5.4         RcppParallel_5.0.2   stats4_4.0.3         munsell_0.5.0        shinyjs_2.0.0

## Record execution time and multicore use
end_time <- Sys.time()
diff_time <- difftime(end_time, start_time, units = "auto")
cat("Started  ", as.character(start_time), "\n",
    "Finished ", as.character(end_time), "\n",
    "Time difference of ", diff_time, " ", attr(diff_time, "units"), "\n",
    "Used ", foreach::getDoParWorkers(), " cores\n",
    "Used ", foreach::getDoParName(), " as backend\n",
    sep = "")

## Started  2020-11-12 18:20:48
## Finished 2020-11-12 18:21:36
## Time difference of 47.7399 secs
## Used 8 cores
## Used doParallelMC as backend

rstanarm feature/survival branch

Kazuki Yoshida

2020-11-12

References

Load packages

Load dataset