============================================================================================================

This up-to-date document is available at https://rpubs.com/sherloconan/1216724

 

0. Getting Started

 

SETUP

①    Cumulative density function \(F(t):=\mathbb{P}(T\leqslant t)\)    and probability density function \(f(t):=\frac{\text{d}}{\text{d}t}F(t),\ t\geqslant0\)

②    Survival function \(S(t):=\mathbb{P}(T>t)=1-F(t)\)

③    Hazard function (failure rate, i.e., the instantaneous risk of an event for any time) \[h(t):=\lim_{\Delta t\to0}\frac{\mathbb{P}(t\leqslant T<t+\Delta t\ \mid\ T>t)}{\Delta t}=\lim_{\Delta t\to0}\frac{\mathbb{P}(t\leqslant T<t+\Delta t)}{\Delta t\cdot\mathbb{P}(T>t)}=\lim_{\Delta t\to0}\frac{F(t+\Delta t)-F(t)}{\Delta t\cdot S(t)}=\frac{f(t)}{S(t)}=-\frac{\text{d}}{\text{d}t}\ln S(t)\]         and cumulative hazard function \(H(t):=\int_0^t h(u)\text{d}u=-\ln S(t)\)

②    \(S(t)=e^{\ln S(t)}=e^{-\int_0^t h(u)\text{d}u}=e^{-H(t)}\)

①    \(f(t)=h(t)\cdot S(t)\)

 

The likelihood function for right-censored data (where the event has not occurred for some subjects by the end of the study) is as follows.

\[\begin{equation} L=\prod_{i=1}^n \left[f(t_i)\right]^{\delta_i}\cdot \left[S(t_i)\right]^{1-\delta_i}=\prod_{i=1}^n \left[h(t_i)\right]^{\delta_i}\cdot S(t_i), \end{equation}\]

where \(\delta_i=1\) when \(t_i\) is an observed death and \(\delta_i=0\) when \(t_i\) is a censored observation for subject \(i\).

Sometimes, \(h_i(t_k)\) represents the hazard rate for subject \(i\) at time \(t_k\).

 

  • Type I Censoring: Any subject whose event time exceeds this fixed maximum time is right-censored at that time.

  • Type II Censoring: Data are collected until a specified number of events occurs.

  • Random Censoring: Each subject may have their own censoring time (\(C_i\perp \!\!\! \perp T_i\)), which is not necessarily the same across all subjects.

   

   

MODEL

  • \(\{(T_i^*,\ \delta_i,\ \mathbf{X}_i^\top)\}_{i=1}^n\): time-to-event data

  • \(T_i^*=\min(T_i,\ C_i)\): observed time (event or censoring time) on study for subject \(i\)

  • \(\delta_i=\mathbf{1}_{T_i\leqslant C_i}\): event indicator; 1 if the event occurred, 0 if censored

  • \(\mathbf{X}_i=(x_{i1},\dotsb,x_{ip})^\top\): covariate vector

  • (semi-parametric) proportional hazards model \(\qquad h(t_i;\mathbf{X}_i)=h_0(t_i)\cdot\psi(\mathbf{X}_i,\boldsymbol{\beta})=h_0(t_i)\cdot e^{\mathbf{X}_i^\top\!\boldsymbol{\beta}}\)

  • hazard ratio (HR, cf, the relative risk) \(\hspace{6em}e^{\beta_j}\) for \(x_{ij}\), \(j\in\{1,\dotsb,p\}\)
    \(\hspace{0.15em}\textit{HR}=1\): no effect; \(\quad\textit{HR}<1\): reduction in hazard; \(\quad\textit{HR}>1\): increase in hazard.

Key assumption: The hazard functions for any two subjects maintain a constant proportion (cannot cross) over time.

 

Suppose \(p=1\).

[A categorical predictor] \(\qquad x=1\) if treated, and \(x=0\) if placebo. \(\frac{h(t\ ;\ x=1)}{h(t\ ;\ x=0)}=\frac{h_0(t)\ \cdot\ e^{\beta\cdot1}}{h_0(t)\ \cdot\ e^{\beta\cdot0}}=e^\beta\).

\(\widehat{\textit{HR}}=e^{\hat{\beta}}=0.2\) suggests that the hazard of event for treated subjects is estimated to be 0.2 times the hazard of event for control subjects.

 

[A continuous predictor] \(\qquad\frac{h(t\ ;\ x+1)}{h(t\ ;\ x)}=\frac{h_0(t)\ \cdot\ e^{\beta\cdot(x+1)}}{h_0(t)\ \cdot\ e^{\beta\cdot x}}=e^\beta\).

An HR of 0.2 means that the hazard of the event happening at any point in time is only 20% of what it would be if the predictor were 1 unit lower.

   

   

IMPORTANCE

The baseline hazard function \(h_0(t)>0\) is not specified in the model and does not need to be estimated directly. See ?survival::basehaz.
This is where the concept of partial likelihood comes in (Cox, 1972; Cox, 1975).

\[\begin{equation} \require{cancel} L^*=\prod_{k=1}^K\frac{\cancel{h_0(t_i)}\cdot\exp\{\mathbf{X}_{(k)}^\top\boldsymbol{\beta}\}}{\sum_{r\in R(t_k)}\cancel{h_0(t_i)}\cdot\exp\{\mathbf{X}_r^\top\boldsymbol{\beta}\}}=\prod_{k=1}^K\frac{\exp\{\mathbf{X}_{(k)}^\top\boldsymbol{\beta}\}}{\sum_{r\in R(t_k)}\exp\{\mathbf{X}_r^\top\boldsymbol{\beta}\}} \end{equation}\]

  • \(t_1<\dotsb<t_K\): unique times (no ties); survival::coxph(ties=c("breslow", "efron", "exact"), ...)

  • \(t_k\): time of the \(k\)-th event for \(k\in\{1,\dotsb,K\}\)

  • \(R(t_k)\): set of subjects still at risk at time \(t_k\)

 

Example

Subject
\(i\)
Time
\(T_i^*\)
Event Indicator
\(\delta_i\)
Event
\(k\)
Risk Set
\(R(t_k)\)
Categorical Predictor
\(x_i\)
1 5 1 1 {1,2,3,4} 0
2 6 1 2 {2,3,4} 1
3 7 0 {4} 1
4 8 1 3 {4} 0

\(n=4,\ p=1,\ K=3\), no tied times, and \(x_i\) is not time-varying.

\[\begin{align} L^*=&\frac{\exp\{\beta\cdot0\}}{\exp\{\beta\cdot0\}+\exp\{\beta\cdot1\}+\exp\{\beta\cdot1\}+\exp\{\beta\cdot0\}} \\ \\ \times&\frac{\exp\{\beta\cdot1\}}{\exp\{\beta\cdot1\}+\exp\{\beta\cdot1\}+\exp\{\beta\cdot0\}} \\ \\ \times&\frac{\exp\{\beta\cdot0\}}{\exp\{\beta\cdot0\}} \end{align}\]

 

The partial likelihood is the likelihood contribution of each subject at the time of their event, considering only the subjects still at risk.

 

scroll to top

   

1. Methods

 

\[\begin{equation} L=\prod_{i=1}^n \left[f(t_i)\right]^{\delta_i}\cdot \left[S(t_i)\right]^{1-\delta_i}=\prod_{i=1}^n \left[h(t_i)\right]^{\delta_i}\cdot S(t_i)=\prod_{i=1}^n \left[{\color{red}{h_0(t_i)}}\cdot\exp\{\mathbf{X}_i^\top\!\boldsymbol{\beta}\}\right]^{\delta_i}\!\cdot\left[{\color{red}{S_0(t_i)}}\right]^{\exp\{\mathbf{X}_i^\top\!\boldsymbol{\beta}\}}, \end{equation}\]

In the Bayesian paradigm, a full likelihood \(L\) is necessary, which requires modeling it in some way. The survival feature branch of “rstanarm” offers the following options for modeling the baseline hazard function \(h_0(t)\).

  • cubic M-spline \(\hspace{3.73em}h_0(t)=\sum_\xi b_\xi\cdot M_\xi(t)\quad\) (default)

  • cubic B-spline \(\hspace{3.9em}h_0(t)=\exp\left\{\sum_\xi b_\xi\cdot B_\xi(t)\right\}\)

  • exponential \(\hspace{5em}h_0(t)=\lambda\)

  • two-parameter Weibull \(\hspace{0.76em}h_0(t)=\kappa\lambda^\kappa t^{\kappa-1}\quad\) (The standard version reduces \(\lambda=1\). \(\quad\lambda\) represents the rate or the inverse scale.)
    \(\hspace{0.15em}\kappa<1,\ h_0(t)\searrow\qquad\kappa>1,\ h_0(t)\nearrow\qquad\kappa=1, S_0(t)\) is an exponential

  • Gompertz \(\hspace{5.65em}h_0(t)=\lambda\cdot e^{\eta t}\)
    \(\hspace{0.15em}\eta<0,\ h_0(t)\searrow\qquad\eta>0,\ h_0(t)\nearrow\qquad\eta=0, S_0(t)\) is an exponential

Advanced reading: Poisson trick

   

if (!run.large) { # small-sized synthetic data
  dat <- list(time=c(4,3,1,1,2,2,3), 
              status=c(1,1,1,0,1,1,0), 
              x=c(0,1,1,1,1,0,0),
              N=7) # seven subjects
  
} else { # real data (NCCTG Lung Cancer, 1994) in the "survival" package
  dat <- lung[, c("time", "status")]
  dat$status <- ifelse(dat$status==2, 1, 0) # 1=event (165), 0=censored (63)
  dat$x <- ifelse(lung$sex==1, 1, 0) # 1=Male, 0=Female
  dat <- as.list(dat)
  dat$N <- length(dat$time) # 228 subjects (no missing)
}
cat("The exploratory analysis will be conducted on the", ifelse(!run.large, "small", "large"), "data.")
## The exploratory analysis will be conducted on the large data.

 

BIC Approximation

 

Approximating Bayes Factors from Bayesian Information Criterion (BIC)

   

The BIC for model \(i\) with the number of free parameters \(k_i\) and the sample size \(n\) is \(\text{BIC}(\mathcal{M}_i)=-2\ln{\mathcal{L}_i}+k_i\ln{n}\quad\) for \(n\gg k_i\),

where \(\mathcal{L}_i=p(\boldsymbol{y}\mid\hat{\boldsymbol{\theta}}_i,\mathcal{M}_i)\) is the maximum likelihood.

 

Use the second-order Taylor approximation of log-marginal likelihood of each model.

\[\begin{equation} \tag{8} \mathit{BF}_{10}\approx\exp\left\{\frac{\text{BIC}(\mathcal{M}_0)-\text{BIC}(\mathcal{M}_1)}{2}\right\} \end{equation}\]

Derivation in Wagenmakers (2007, Appendix B).

   

survival::coxph()

 

The frequentist Cox model is based on the partial likelihood. The BIC() function applies the BIC formula by using the log of the partial likelihood in place of the full likelihood. This approximation is generally effective when comparing models, particularly when the goal is to evaluate the relative fit of models on the same data.

\(\text{BIC}(\mathcal{M}_i)=-2\ln{\mathcal{L}_i^*}+k_i\ln{n^*}\)

Here, \(n^*\) represents the number of events.

 

fit1 <- coxph(Surv(time, status) ~ x, dat) # fit a Cox proportional-hazards regression model
# fit1 <- coxph(Surv(time, status) ~ 1 + x, dat) # same
summary(fit1)
## Call:
## coxph(formula = Surv(time, status) ~ x, data = dat)
## 
##   n= 228, number of events= 165 
## 
##     coef exp(coef) se(coef)     z Pr(>|z|)   
## x 0.5310    1.7007   0.1672 3.176  0.00149 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   exp(coef) exp(-coef) lower .95 upper .95
## x     1.701      0.588     1.226      2.36
## 
## Concordance= 0.579  (se = 0.021 )
## Likelihood ratio test= 10.63  on 1 df,   p=0.001
## Wald test            = 10.09  on 1 df,   p=0.001
## Score (logrank) test = 10.33  on 1 df,   p=0.001
BIC(fit1)
## [1] 1494.292
fit0 <- coxph(Surv(time, status) ~ 1, dat)
# fit0 <- update(fit1, ~ 1) # same
summary(fit0)
## Call:  coxph(formula = Surv(time, status) ~ 1, data = dat)
## 
## Null model
##   log likelihood= -749.9098 
##   n= 228
BIC(fit0)
## [1] 1499.82
# BF₁₀
cat("The BIC approximation to the Bayes factor supporting the alternative is", exp((BIC(fit0) - BIC(fit1))/2))
## The BIC approximation to the Bayes factor supporting the alternative is 15.86047

 

scroll to top

   

eha::weibreg()

 

The coxph() function from the “survival” package does not allow for specifying an explicit parametric form for the baseline hazard function. It uses a semi-parametric approach where the baseline hazard is left unspecified. However, there are alternative approaches and packages we can use to incorporate a specific baseline hazard function like Weibull.    eha: event history analysis

Take \(p=1\) (one covariate).

?eha::weibreg (deprecated): Wald test \(\hat{\beta}\mathrel{\dot\sim}\mathcal{N}\!\left(\beta,\ \mathcal{I}^{-1}(\beta)\right)\) under \(\mathcal{H}_0:\beta=0\). \(\quad\mathcal{I}(\beta)=-\mathbb{E}\left[\frac{\partial^2}{\partial\beta^2}\ln{f(T;\beta)}\right]\) under regularity conditions.

?eha::phreg: likelihood ratio test \(\Lambda=2\left(\ln{L(\hat{\beta})-\ln{L(0)}}\right)\mathrel{\dot\sim}\chi_1^2\) under \(\mathcal{H}_0:\beta=0\). \(\hspace{16.5em}\)Issue

Note: The parametrization is consistent with coxph, but differs from that used by survreg. Nevertheless, the \(p\)-values and Wald statistics for regression coefficients remain identical.

 

fit1w <- weibreg(Surv(time, status) ~ x, dat) # fit a fully parametric Cox proportional-hazards regression model
# fit1w <- phreg(Surv(time, status) ~ x, dat, dist="weibull")
summary(fit1w)
## Call:
## weibreg(formula = Surv(time, status) ~ x, data = dat)
## 
## Covariate           Mean       Coef Exp(Coef)  se(Coef)    Wald p
## x                   0.562     0.524     1.689     0.167     0.002 
## 
## log(scale)                    6.280   533.650     0.104     0.000 
## log(shape)                    0.281     1.324     0.062     0.000 
## 
## Events                    
## Total time at risk         69593 
## Max. log. likelihood      -1148.7 
## LR test statistic         10.4 
## Degrees of freedom        1 
## Overall p-value           0.00125939
BIC(fit1w)
## [1] 2313.591
fit0w <- weibreg(Surv(time, status) ~ 1, dat)
summary(fit0w)
## Call:
## weibreg(formula = Surv(time, status) ~ 1, data = dat)
## 
## Covariate           Mean       Coef Exp(Coef)  se(Coef)    Wald p
## log(scale)                    6.035   417.759     0.059     0.000 
## log(shape)                    0.275     1.317     0.062     0.000 
## 
## Events                    
## Total time at risk         69593 
## Max. log. likelihood      -1153.9
BIC(fit0w)
## [1] 2318.561
# BF₁₀
cat("The BIC approximation to the Bayes factor supporting the alternative is", exp((BIC(fit0w) - BIC(fit1w))/2))
## The BIC approximation to the Bayes factor supporting the alternative is 12.00053

   

Error Messages : All \(x\)’s are 0.

Inverse failed [phexpreg]; info = 1
Error in summary(fit)$coefficients : 
  $ operator is invalid for atomic vectors
In addition: Warning message:
In eha::phreg(eha::Surv(time, status) ~ x, data, shape = 1) :
  Failed with error code  1

 

scroll to top

   

survival::survreg()

 

In the survreg() function of the “survival” package, the model specification is based on an accelerated failure time (AFT) model with a location-scale formulation for the logarithm of survival times \(T_i\).

\[\begin{equation} \ln T_i=b_0+\mathbf{X}_i^\top\mathbf{b}+s\cdot\epsilon_i,\quad\epsilon_i\overset{\text{i.i.d.}}{\sim}\operatorname{Gumbel}(0,1) \end{equation}\]

The PDF of a Gumbel distribution (also known as the type-I generalized extreme value distribution) is given by \(f_G(g;\mu,\sigma)=\frac{1}{\sigma}\exp\left\{c\cdot\frac{g-\mu}{\sigma}-\exp\left\{c\cdot\frac{g-\mu}{\sigma}\right\}\right\},\quad g,\mu\in\mathbb{R},\ \sigma>0\),
where \(c=-1\) and \(c=1\) are for the maximum and the minimum of a sample of random variables, respectively.

  • If \(G\sim\operatorname{Gumbel}(0,1)\),    then \(a+b\cdot G\sim\operatorname{Gumbel}(a,\lvert b\rvert)\).

  • If \(W\sim\operatorname{Weibull}(\lambda,\kappa)\),    then \(c\cdot\ln{W}\sim\operatorname{Gumbel}(-c\cdot\ln{\lambda},\ 1/\kappa)\).

\(s>0\) denotes the scale in the survreg() output. It can be derived that, in this case, the Weibull hazard function is as follows. \[\begin{align} h(t_i):\!&=\kappa\lambda_i^\kappa t_i^{\kappa-1} \\ \kappa&=1/s \\ \lambda_i&=\exp\{-b_0-\mathbf{X}_i^\top\mathbf{b}\} \end{align}\]

Note: \(\kappa\), \(\lambda\), and \(1/\lambda\) represent the shape, rate, and scale parameters of the Weibull distribution, respectively.

 

While a proportional hazards model assumes that a covariate affects the hazard by multiplying it by a constant, an AFT model posits that a covariate accelerates or decelerates the progression of a disease by a constant factor.

Still linear predictors but different interpretation:
In an AFT model, the survival time for subject \(i\) at \(x_{ij}+1\) is \(e^{b_j}\) times the survival time at \(x_{ij}\), controlling for other variables.

  • \(e^{b_j}\): event time ratio (ETR)

 

fit1p <- survreg(Surv(time, status) ~ x, dat) # fit a parametric survival regression model
# fit1p <- survreg(Surv(time, status) ~ 1 + x, dat, dist="weibull") # same
summary(fit1p)
## 
## Call:
## survreg(formula = Surv(time, status) ~ x, data = dat)
##               Value Std. Error     z       p
## (Intercept)  6.2797     0.1044 60.16 < 2e-16
## x           -0.3956     0.1276 -3.10  0.0019
## Log(scale)  -0.2809     0.0619 -4.54 5.7e-06
## 
## Scale= 0.755 
## 
## Weibull distribution
## Loglik(model)= -1148.7   Loglik(intercept only)= -1153.9
##  Chisq= 10.4 on 1 degrees of freedom, p= 0.0013 
## Number of Newton-Raphson Iterations: 5 
## n= 228
BIC(fit1p)
## [1] 2313.591
fit0p <- survreg(Surv(time, status) ~ 1, dat)
# fit0p <- update(fit1p, ~ 1) # same
summary(fit0p)
## 
## Call:
## survreg(formula = Surv(time, status) ~ 1, data = dat)
##               Value Std. Error      z      p
## (Intercept)  6.0349     0.0591 102.05 <2e-16
## Log(scale)  -0.2752     0.0624  -4.41  1e-05
## 
## Scale= 0.759 
## 
## Weibull distribution
## Loglik(model)= -1153.9   Loglik(intercept only)= -1153.9
## Number of Newton-Raphson Iterations: 6 
## n= 228
BIC(fit0p)
## [1] 2318.561
# BF₁₀
cat("The BIC approximation to the Bayes factor supporting the alternative is", exp((BIC(fit0p) - BIC(fit1p))/2))
## The BIC approximation to the Bayes factor supporting the alternative is 12.00053

 

scroll to top

   

Approximate Bayes Factor

 

The test statistic in a Wald test is \(W=\left(\frac{\hat{\theta}-\theta_0}{\text{se}(\hat{\theta})}\right)^2\sim\chi_1^2\) under \(\mathcal{H}_0\), where \(\hat{\theta}\) is the maximum likelihood estimate, and \(\text{se}(\hat{\theta})\) is the standard error.

Wagenmakers (2022; its Eq. 6) reviewed the Jeffreys-style approximate Bayes factor (JAB) and proposed a piecewise approximation (WAB) from \(p\)-value and effective sample size \(N\) in Eq. 10.

The complication with JAB is that the approximation is valid only when \(\text{se}(\hat{\theta})\ll\sigma_g\), where \(\sigma_g\) is the scale of the prior distribution \(g(\theta)\). Note that with very small sample sizes, this assumption is not true and Eq. 9 is biased against \(\mathcal{H}_0\).

\[\begin{equation} \tag{9} \textit{JAB}_{01}=A\cdot\sqrt{N}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]

A normal unit-information prior for \(\theta\) yields \(A=1\). Based on the Jeffreys prior, the general approximation specifies \(A=\sqrt{\pi/2}\approx1.253\).

 

Wagenmakers’s piecewise approximation is

\[ \textit{WAB}_{01}\approx \begin{cases} \tag{10} \text{$\ p^{\frac{1}{4}}\sqrt{N}$,} & \text{$p>0.5$} \\ \\ \text{$\ \sqrt{pN}$,} & \text{$0.1<p\leqslant0.5$ (simpler)} \\ \\ \text{$\ \frac{4}{3}p^{\frac{2}{3}}\sqrt{N}$,} & \text{$0.1<p\leqslant0.5$ (more precise)} \\ \\ \text{$\ 3p\sqrt{N}$,} & \text{$p\leqslant0.1$} \end{cases} \]

   

   

Asymptotics 101

PDF \(f_Y(y_i;\boldsymbol{\theta})\),    joint likelihood \(L(\boldsymbol{\theta};\boldsymbol{y})\),    log likelihood \(l(\boldsymbol{\theta};\boldsymbol{y})\),    score \(l^\prime(\boldsymbol{\theta};\boldsymbol{y})\)

While \(f(y_i\mid\boldsymbol{\theta})\) and \(f(y_i;\boldsymbol{\theta})\) can represent the same mathematical relationship, the notation signals different interpretations and emphases in statistical analysis.

The expected Fisher information matrix, evaluated at \(\boldsymbol{\theta}_0\), is \(\left[\ \mathcal{I}(\boldsymbol{\theta}_0)\ \right]_{ij}:=\mathbb{E}\left[\left(\frac{\partial}{\partial\theta_i}\ln{f(Y;\boldsymbol{\theta})}\right)\left(\frac{\partial}{\partial\theta_j}\ln{f(Y;\boldsymbol{\theta})}\right)\ \bigg|\ \boldsymbol{\theta}_0\right]\).

Under certain regularity conditions, it is easier to compute \(\left[\ \mathcal{I}(\boldsymbol{\theta}_0)\ \right]_{ij}=-\mathbb{E}\left[\left(\frac{\partial^2}{\partial\theta_i\partial\theta_j}\ln{f(Y;\boldsymbol{\theta})}\right)\ \bigg|\ \boldsymbol{\theta}_0\right]\).

The observed Fisher information matrix is denoted as \(\left[\ I(\hat{\boldsymbol{\theta}})\ \right]_{ij}\).

 

scroll to top

   

Stan

rstanarm” + “bridgesampling

 

The “rstanarm” package provides a Bayesian implementation of several applied regression models (arm), including survival models like the Cox proportional-hazards model. Once you have a Bayesian Cox model fit using stan_surv(), you can compute the bridge sampling estimate of the log marginal likelihood, which is essential for calculating the Bayes factor.
\(\textit{BF}_{01}:=\frac{p(\text{Data}\ \mid\ \mathcal{M}_0)}{p(\text{Data}\ \mid\ \mathcal{M}_1)}\)

Note: Not all binary builds on CRAN and GitHub include the stan_surv() function for certain versions of R. Installation Guide

 

Tutorial

fit_rstanarm_o <- stan_surv(Surv(time, status) ~ 1 + x, as.data.frame(dat),
                            basehaz="weibull", # Weibull baseline hazard
                            refresh=0, cores=4, seed=277)
summary(fit_rstanarm_o)
## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: weibull
##  formula:         Surv(time, status) ~ 1 + x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    228
##  events:          165 (72.4%)
##  right censored:  63 (27.6%)
##  delayed entry:   no
## 
## Estimates:
##                 mean   sd   10%   50%   90%
## (Intercept)   -8.3    0.5 -9.0  -8.3  -7.7 
## x              0.5    0.2  0.3   0.5   0.7 
## weibull-shape  1.3    0.1  1.2   1.3   1.4 
## 
## MCMC diagnostics
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  1251 
## x             0.0  1.0  1595 
## weibull-shape 0.0  1.0  1239 
## log-posterior 0.0  1.0  1305 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
fit_rstanarm <- stan_surv(Surv(time, status) ~ x, as.data.frame(dat),
                          basehaz="weibull",
                          # iter=15000, warmup=5000, chains=1,
                          refresh=0, cores=4, seed=277)
summary(fit_rstanarm) # same
## 
## Model Info:
## 
##  function:        stan_surv
##  baseline hazard: weibull
##  formula:         Surv(time, status) ~ x
##  algorithm:       sampling
##  sample:          4000 (posterior sample size)
##  priors:          see help('prior_summary')
##  observations:    228
##  events:          165 (72.4%)
##  right censored:  63 (27.6%)
##  delayed entry:   no
## 
## Estimates:
##                 mean   sd   10%   50%   90%
## (Intercept)   -8.3    0.5 -9.0  -8.3  -7.7 
## x              0.5    0.2  0.3   0.5   0.7 
## weibull-shape  1.3    0.1  1.2   1.3   1.4 
## 
## MCMC diagnostics
##               mcse Rhat n_eff
## (Intercept)   0.0  1.0  1251 
## x             0.0  1.0  1595 
## weibull-shape 0.0  1.0  1239 
## log-posterior 0.0  1.0  1305 
## 
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
prior_summary(fit_rstanarm) # priors
## Priors for model 'fit_rstanarm' 
## ------
## Intercept
##  ~ normal(location = 0, scale = 20)
## 
## Coefficients
##  ~ normal(location = 0, scale = 2.5)
## 
## Auxiliary (weibull-shape)
##  ~ half-normal(location = 0, scale = 2)
## ------
## See help('prior_summary.stanreg') for more details
# M1_rstanarm <- bridge_sampler(fit_rstanarm, silent=T) # not work?
# M1_rstanarm$logml

# BIC(fit_brm) # not work

 

Stan Forums

 

scroll to top

   

brms” + “bridgesampling

 

The “brms” package fits Bayesian generalized multilevel models using Stan as the backend. Ref1

  • time: the time until an event occurs or until the subject is censored

  • | cens(1 - status): the censoring mechanism; censoring status = 1 - event status

  • ~ x or ~ 1 + x: the model formula; some textbooks refer to \(x=0\) as “the” baseline hazard model

  • family=brmsfamily("cox"): the likelihood family for the model

  • iter: the number of total iterations per chain (including warmup; defaults to 2000)

  • warmup: the default is floor(iter/2)

 

fit_brm_o <- brm(time | cens(1 - status) ~ 1 + x, data=dat, family=brmsfamily("cox"),
                 refresh=0, cores=4, seed=277)
summary(fit_brm_o)
##  Family: cox 
##   Links: mu = log 
## Formula: time | cens(1 - status) ~ 1 + x 
##    Data: dat (Number of observations: 228) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.83      0.19     0.44     1.20 1.00     2472     2335
## x             0.53      0.17     0.20     0.87 1.00     3485     2402
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
fit_brm <- brm(time | cens(1 - status) ~ x, data=dat, family=brmsfamily("cox"),
               # iter=15000, warmup=5000, chains=1,
               refresh=0, cores=4, seed=277) # roughly 27 seconds of run time
summary(fit_brm) # same
##  Family: cox 
##   Links: mu = log 
## Formula: time | cens(1 - status) ~ x 
##    Data: dat (Number of observations: 228) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.83      0.19     0.44     1.20 1.00     2472     2335
## x             0.53      0.17     0.20     0.87 1.00     3485     2402
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# BIC(fit_brm) # not work

M1_brm <- bridge_sampler(fit_brm, silent=T)
M1_brm$logml
## [1] -1158.851
fit0_brm <- brm(time | cens(1 - status) ~ 1, data=dat, family=brmsfamily("cox"),
                refresh=0, cores=4, seed=277)
summary(fit0_brm)
##  Family: cox 
##   Links: mu = log 
## Formula: time | cens(1 - status) ~ 1 
##    Data: dat (Number of observations: 228) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.17      0.16     0.88     1.51 1.00     2199     2181
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
M0_brm <- bridge_sampler(fit0_brm, silent=T)
M0_brm$logml
## [1] -1163.244
bridgesampling::bf(M1_brm, M0_brm) # exp(M1_brm$logml - M0_brm$logml) # BF₁₀
## Estimated Bayes factor in favor of M1_brm over M0_brm: 80.85241

 

scroll to top

   

   

Check the Monte Carlo error of estimating the Bayes factor

NSim <- 100
BSBF <- numeric(NSim)
system.time(for (i in 1:NSim) {
  fit1_brm <- brm(time | cens(1 - status) ~ x, data=dat, family=brmsfamily("cox"),
                  refresh=0, cores=4, seed=i)
  fit0_brm <- brm(time | cens(1 - status) ~ 1, data=dat, family=brmsfamily("cox"),
                  refresh=0, cores=4, seed=i)
  
  M1_brm <- bridge_sampler(fit1_brm, silent=T)
  M0_brm <- bridge_sampler(fit0_brm, silent=T)
  
  BSBF[i] <- bridgesampling::bf(M0_brm, M1_brm)$bf # BF₀₁
}) # roughly 2.2 / 2.8 hours (small / large size) of run time

hist(BSBF, prob=T, col="white", yaxt="n", ylab="", main="",
     sub="100 runs",
     xlab="Bayes factor estimates supporting the null") # BF₀₁
lines(density(BSBF), col="red", lwd=2)

 

scroll to top

   

rstan” + Savage–Dickey Density Ratio

 

The Savage–Dickey density ratio is a special form of the Bayes factor for nested models by dividing the value of
the posterior density over the parameters for the alternative model evaluated at the hypothesized value by
the prior for the same model evaluated at the same point.

\[\begin{equation} \tag{11} \textit{BF}_{01}:=\frac{p(\boldsymbol{y}\mid \mathcal{M}_0)}{p(\boldsymbol{y}\mid \mathcal{M}_1)}=\frac{\color{#0055A4}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid\boldsymbol{y},\mathcal{M}_1)}}{\color{#EF4135}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid \mathcal{M}_1)}} \end{equation}\]

 

fit1e <- weibreg(Surv(time, status) ~ x, dat, shape=1) # exponential baseline hazard function
dat$UI <- 1 # assume a multivariate unit-information prior for the vector of slope and log-scale
dat$mu <- coef(fit1e) # MLE for the slope
dat$Sigma <- fit1e$var * sum(dat$status==1) # inverse of the observed Fisher information × number of events

stancodeM1 <- "
data {
  int<lower=1> N;                       // number of observations
  vector[N] time;                       // observed time
  int<lower=0,upper=1> status[N];       // event indicator (1=event, 0=censored)
  vector[N] x;                          // predictor variable
  int<lower=0,upper=1> UI;              // prior option (1=unit information, 0=Gelman et al. (2008))
  vector[2] mu;                         // mean vector for theta if UI==1
  matrix[2,2] Sigma;                    // covariance matrix for theta if UI==1
}

parameters {
  vector[2] theta;                      // combined vector for beta and logScale
}

transformed parameters {
  real beta = theta[1];                   // regression coefficient
  real logScale = theta[2];               // log scale parameter
  real<lower=0> lambda = exp(-logScale);  // baseline hazard (rate parameter for the exponential distribution)
}

model {
  // Priors (JAGS does not support direct if-else statements)
  if (UI == 1) {
    theta ~ multi_normal(mu, Sigma);    // unit information
  } else {
    beta ~ cauchy(0, 2.5);              // <== scale (Gelman et al., 2008)
    lambda ~ gamma(0.1, 0.1);
  }

  // Log Likelihood (fully parametric Cox proportional-hazards regression model)
  for (i in 1:N) {
    if (status[i] == 1) {
      // If event occurred, use the full likelihood
      target += log(lambda) + beta * x[i] - lambda * time[i] * exp(beta * x[i]);
      
    } else {
      // If censored, use the survival function
      target += -lambda * time[i] * exp(beta * x[i]);
    }
  }
}"

stanmodelM1 <- stan_model(model_code=stancodeM1)
stanfitM1 <- sampling(stanmodelM1, data=dat,
                      iter=15000, warmup=5000, chains=3, seed=277, refresh=0)

betaS <- extract(stanfitM1, pars="beta")$beta
dPost <- dlogspline(0, logspline(betaS)) # estimated posterior density
dPrior <- ifelse(dat$UI==1, dnorm(0, dat$mu[1], sqrt(dat$Sigma[1,1])), dcauchy(0, 0, 2.5))
dPost / dPrior # Savage–Dickey density ratio (BF₀₁)
## [1] 0.1361232
rstan::summary(stanfitM1)[[1]] %>% kable(digits=4) %>% kable_classic(full_width=F)
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
theta[1] 0.5020 0.0018 0.1674 0.1728 0.3895 0.5011 0.6130 0.8369 8750.098 1.0003
theta[2] 6.3621 0.0015 0.1376 6.1017 6.2680 6.3601 6.4519 6.6428 8623.613 1.0003
beta 0.5020 0.0018 0.1674 0.1728 0.3895 0.5011 0.6130 0.8369 8750.098 1.0003
logScale 6.3621 0.0015 0.1376 6.1017 6.2680 6.3601 6.4519 6.6428 8623.613 1.0003
lambda 0.0017 0.0000 0.0002 0.0013 0.0016 0.0017 0.0019 0.0022 8704.883 1.0003
lp__ -1158.6107 0.0108 1.0124 -1161.3280 -1159.0022 -1158.3010 -1157.8922 -1157.6250 8781.423 1.0008

   

   

Check the Monte Carlo error of estimating the Bayes factor

NSim <- 100
SDdr <- numeric(NSim)
system.time(for (i in 1:NSim) {
  stanfitM1 <- sampling(stanmodelM1, data=dat, iter=15000, warmup=5000, chains=1, seed=i, refresh=0)
  betaS <- extract(stanfitM1, pars="beta")$beta
  SDdr[i] <- dlogspline(0, logspline(betaS)) / dPrior # Savage–Dickey density ratio (BF₀₁)
}) # roughly 19 / 151 seconds (small / large size) of run time

hist(SDdr, prob=T, col="white", yaxt="n", ylab="", main="",
     sub="100 runs of one chain with 10,000 iterations",
     xlab="Bayes factor estimates supporting the null") # BF₀₁
lines(density(SDdr), col="red", lwd=2)

 

scroll to top

   

R-INLA

 

The R-INLA package (integrated nested Laplace approximation) allows for fitting Bayesian models, including survival models, without relying on Markov chain Monte Carlo (MCMC). We can then extract the marginal likelihood from competing models to compute the Bayes factor.
\(\textit{BF}_{01}:=\frac{p(\text{Data}\ \mid\ \mathcal{M}_0)}{p(\text{Data}\ \mid\ \mathcal{M}_1)}\)

Note: The reliability of INLA results appears to depend on having a large sample size. https://rpubs.com/sherloconan/1198601

 

fit1_INLA <- inla(inla.surv(time, status) ~ x, family="coxph", data=dat)

fit0_INLA <- inla(inla.surv(time, status) ~ 1, family="coxph", data=dat)

bf01 <- exp(fit0_INLA$mlik - fit1_INLA$mlik) # BF₀₁
rownames(bf01) <- c("integration", "Gaussian"); colnames(bf01) <- "BF01.est"
bf01 #  <==
##             BF01.est
## integration 1.223354
## Gaussian    1.227115
fit1o_INLA <- inla(inla.surv(time, status) ~ 1 + x, family="coxph", data=dat)
setNames(c(fit1_INLA$mlik, fit1o_INLA$mlik),
         c("int.", "Gauss", "int. & intercept", "Gauss & intercept"))
##              int.             Gauss  int. & intercept Gauss & intercept 
##         -591.8905         -592.3084         -591.8905         -592.3084

   

Two methods of computing the log marginal likelihood are

  • Integration: This approach computes the mlik by directly integrating the likelihood function over the parameter space. This method is typically more accurate but can be computationally intensive. It’s often used when precision is essential for model comparison.

  • Gaussian: This method approximates mlik using a normal approximation. It is generally faster and less computationally demanding than direct integration, but it might be less accurate, particularly in cases where the true likelihood is not well-approximated by a normal distribution (Rue, Martino, & Chopin, 2009, p. 345).

 

scroll to top

   

2. Simulation

 

  • Weibull-Cox proportional-hazards model: \(h(t_i)=\kappa\ \lambda^\kappa\ t_i^{\kappa-1}\cdot\exp\{\mathbf{X}_i^\top\!\boldsymbol{\beta}\}\)

  • Weibull accelerated failure time model: \(h(t_i)=\frac{1}{s}\ \!\exp\left\{-\frac{b_0}{s}\right\}\ t_i^{\frac{1}{s}-1}\!\cdot\exp\left\{-\frac{1}{s}\mathbf{X}_i^\top\mathbf{b}\right\}\)

  • Exponential: Let \(\kappa=1/s=1\). \(\quad{\color{purple}{\mathit{OR}}}:=\frac{F(t;\ x=1,\ \lambda)\ /\ S(t;\ x=1,\ \lambda)}{F(t;\ x=0,\ \lambda)\ /\ S(t;\ x=0,\ \lambda)}=\frac{e^{\lambda t\cdot{\color{purple}{\mathit{HR}}}}-1}{e^{\lambda t}-1}\)

   

Effect Size

How big is a big hazard ratio? (Table 2; Lu et al., 2023)

   

Data : Ref2, Ref3. Try also “simsurv

The expected censoring fraction is \(\mathbb{E}[\mathbb{P}(C<T\mid X)]=\mathbb{E}_{X\sim\operatorname{Bern}}\left[\frac{0.03}{0.03+0.1\cdot\exp\{\beta X\}} \right]=\frac{1}{2}\cdot\frac{0.03}{0.03+0.1}+\frac{1}{2}\cdot\frac{0.03}{0.03+0.1\cdot\exp\{\beta\}}\).

simData7 <- function(n, beta=NULL, lambda=0.1, cenrate=0.03, maxt=NULL) {
  #' Input - 
  #' n:           number of subjects
  #' beta:        true regression coefficient
  #' lambda:      rate parameter of the exponential baseline hazard function
  #' cenrate:     censoring rate (higher value = more censoring)
  #' maxt:        maximum event time;
  #'              need to input either `cenrate` or `maxt`
  #' 
  #' Output -     a list of time-to-event data
  #' time:        observed time
  #' status:      event indicator (1 if the event occurred, 0 if censored)
  #' x:           binary covariate
  #' N:           number of subjects
  
  # no random seed
  if (is.null(maxt) & !is.null(cenrate)) {
    cenTime <- stats::rexp(n, cenrate) # right-censoring time (non-informative)
  } else if (!is.null(maxt) & is.null(cenrate)) {
    cenTime <- rep(maxt, n) # right-censoring time (fixed-time)
  } else {stop("Bad input!")}
  
  # trt <- stats::rbinom(n, 1, .5) # Ensure that the treatment indicator remains numeric!  <==
  trt <- rep(c(0, 1), c(floor(n/2), ceiling(n/2)))[sample(1:n, n)] # in case of all 0s or all 1s  <==
  survTime <- stats::rexp(n, lambda * exp(beta * trt)) # survival time
  
  list(time=pmin(survTime, cenTime),
       status=as.numeric(survTime <= cenTime),
       x=trt, N=n)
}

# set.seed(277)
# test7 <- simData7(100, 0.2)
# table("status"=test7$status, "x"=test7$x)
# sum(test7$status[test7$x==0]==0) / length(test7$status[test7$x==0]) # ≈ cenrate / (lambda + cenrate)

   

The Bayes Factors

Note: All the Bayes factors are \(\textit{BF}_{01}\).

computeBFs7 <- function(data, prior=1, precise=F) {
  #' Input -
  #' data:     a list of time-to-event data whose variable `x` must be binary
  #' prior:    prior option (1=unit information, 0=Gelman et al. (2008))
  #' precise:  logical; if FALSE (default), use the simpler form for Output 5;
  #'                             otherwise, use the more precise form
  #'
  #' Global -  stanmodelM1
  #' Dependency - eha (v2.11.5), rstan (v2.32.6), polspline (v1.1.25)
  #' 
  #' Output -  a vector of the Bayes factors in favor of the null
  #'           1. Savage–Dickey density ratio using the normal unit-information prior
  #'           2. BIC approximation to the Bayes factor; BIC(fit)
  #'           3. Jeffreys approximate Bayes factor using the normal unit-information prior
  #'           4. Jeffreys approximate Bayes factor using the Jeffreys prior
  #'           5. Wagenmakers approximate Bayes factor based on the p-value and number of events
  #'           6. p-value of the parametric proportional hazards regression
  #'           7. number of events
  
  nEvent <- sum(data$status==1) # number of events

  fit0 <- eha::phreg(eha::Surv(time, status) ~ 1, data, shape=1) # null model fit
  fit1 <- eha::phreg(eha::Surv(time, status) ~ x, data, shape=1) # exponential baseline hazard function
  pVal <- summary(fit1)$coefficients[,5] # p-value of the regression coefficient
  W <- stats::qchisq(pVal, 1, lower.tail=F) # Wald statistic, summary(fit1)$coefficients[,4]^2
  
  data$UI <- prior # assume a multivariate unit-information prior for the vector of slope and log-scale
  data$mu <- stats::coef(fit1) # MLE for the slope
  data$Sigma <- fit1$var * nEvent # (standard error)² × number of events
  if(prior==0) data$x <- data$x - mean(data$x) # center each binary input to have mean 0
  
  stanfitM1 <- rstan::sampling(stanmodelM1, data=data, iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
  betaS <- rstan::extract(stanfitM1, pars="beta")$beta
  dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
  dPrior <- ifelse(data$UI==1, stats::dnorm(0, data$mu[1], sqrt(data$Sigma[1,1])), stats::dcauchy(0, 0, 2.5))
  SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
  
  BICB01 <- exp((stats::BIC(fit1) - stats::BIC(fit0)) / 2)
  
  JAB01_UI <- sqrt(nEvent) * exp(-0.5 * W) # based on the unit-information prior
  
  WAB01 <- ifelse(pVal <= .5,
                  ifelse(pVal > .1,
                         ifelse(precise,
                                4 * pVal^(2/3) * sqrt(nEvent) / 3, # 0.1 < p <= 0.5 (more precise)
                                sqrt(pVal * nEvent)), # 0.1 < p <= 0.5 (simpler)
                         3 * pVal * sqrt(nEvent)), # p <= 0.1
                  pVal^(1/4) * sqrt(nEvent)) # p > 0.5
  
  c("SD"=SDdr01, "BIC_approx"=BICB01, "JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01,
    "p"=pVal, "nEvent"=nEvent)
}

# computeBFs7(test7)

   

Simulation Study

We aim to conduct several simulation studies to evaluate the accuracy of approximate objective Bayes factors in their ability to convert \(p\)-values and \(n\) reported in scientific articles to Bayesian measures of evidence.

nSim <- 1000 # number of simulation runs for each setting
nObs <- c(10, 20, 30, 100, 500) # number of observations
BETA1 <- c("Null"=0, "Small"=0.2, "Medium"=0.5, "Large"=0.75) # regression coefficient (slope)
# exp(BETA1) # hazard ratio

reportP <- reportBF <- reportBF_avg <- reportBF_sd <- reportERR <- reportERR_avg <- reportERR_sd <- 
  reportRULE <- reportRULE_H1 <- reportRULE_H0 <- reportRULE_inc <-
  setNames(vector(mode="list", length=length(nObs) * length(BETA1)), # pre-allocation
           apply(expand.grid(BETA1, nObs), 1, function(r) paste0("n = ", r[2], ", slope = ", r[1])))

index <- 1 # initial list index

for (n in nObs) {
    
  for (slope in BETA1) {
    set.seed(n+slope+277)
    temp <- t(replicate(nSim, computeBFs7(simData7(n, slope)))) # replicate the Bayes factors
    reportP[[index]] <- temp[, 6:7] # last two columns are p-values and the number of events
    temp <- temp[, 1:5]
    reportBF[[index]] <- temp
    reportBF_avg[[index]] <- colMeans(temp, na.rm=T) # mean Bayes factors in favor of the null
    reportBF_sd[[index]] <- apply(na.omit(temp), 2, sd) # sample standard deviations of them
        
    tempERR <- 100* (temp - temp[,1]) / temp[,1] # percent errors; the first column should be all 0
    reportERR[[index]] <- tempERR # (can take in NA)
    reportERR_avg[[index]] <- colMeans(tempERR, na.rm=T) # mean percent errors
    reportERR_sd[[index]] <- apply(na.omit(tempERR), 2, sd) # sample standard deviations of the percent errors
    
    tempRULE_H1 <- temp < 1/3 # support H1
    tempRULE_H0 <- temp > 3 # support H0
    tempRULE_inc <- temp >= 1/3 & temp <= 3 # inconclusive
    
    # counts of matching decisions for H1
    reportRULE_H1[[index]] <- colSums(tempRULE_H1 * tempRULE_H1[,1], na.rm=T)
    
    # counts of matching decisions for H0
    reportRULE_H0[[index]] <- colSums(tempRULE_H0 * tempRULE_H0[,1], na.rm=T)
    
    # counts of matching inconclusiveness
    reportRULE_inc[[index]] <- colSums(tempRULE_inc * tempRULE_inc[,1], na.rm=T)
    
    tempRULE <- 1*(temp < 1/3) - 1*(temp > 3) # 1: support H1;  0: inconclusive;  -1: support H0
    reportRULE[[index]] <- colSums(tempRULE[,1] == tempRULE, na.rm=T) # counts of matching decisions
    # colMeans(); a value near 1 suggests matching results; the first column should be all 1
     
    index <- index + 1
  }
} # roughly 10.6 hours of run time

# Warning messages:
# 1: There were 1 divergent transitions after warmup. See
# https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
# to find out why this is a problem and how to eliminate them. 
# 2: Examine the pairs() plot to diagnose sampling problems
# 3-4: In polspline::logspline(betaS) :  Not all models could be fitted

 

reshapeW2L2 <- function(list, prob=T, lab=c("SD", "BIC", "JAB", "JAB_J", "WAB")) {
  #' Input -
  #' list:    a list of size length(nObs) * length(BETA1);
  #'          reportBF, reportERR (unaggregated),
  #'          reportRULE, reportRULE_H1, reportRULE_H0, or reportRULE_inc (aggregated)
  #' prob:    logical; if TRUE (default), return the proportions;
  #'                           otherwise, return the original values
  #' lab:     a vector of characters representing the Bayes factor methods
  #' 
  #' Global -
  #' nSim:    number of simulation runs for each setting
  #' nObs:    number of observations
  #' BETA1:   regression coefficient (slope)
  #' 
  #' Output - unlist the report and reshape the wide into the long
  
  len <- length(lab) # five methods
  nRep <- ifelse(!prob, nSim, 1) # nSim if unaggregated; 1 if aggregated
  long <- data.frame("value"=unname(unlist(list)),
                     "method"=factor(rep(rep(1:len, each=nRep), length(list)), labels=lab),
                     "n"=rep(nObs, each=nRep*len*length(BETA1)),
                     "slope"=rep(rep(unname(BETA1), each=nRep*len), length(nObs)))
  
  if (prob) {
    list2 <- lapply(list, function(vec) {
      if (vec[1] == 0) {
        vec * NA
      } else {
        vec / vec[1] # convert counts to proportions
      }
    })
    list3 <- lapply(list, function(vec) rep(vec[1], length(vec))) # denominator
    long$total <- unname(unlist(list3))
    long$prop <- unname(unlist(list2))
  }
  long
}


formatting <- function(val) {
  #' format the value using scientific notation
  if (abs(val) >= 10) {
    paste("~italic(BF)[\"01\"] %~~%", 
          gsub("e\\+0*(\\d+)", "%*%10^\\1", sprintf("%.2e", val)))
  } else if (abs(val) < 1) {
    paste("~italic(BF)[\"01\"] %~~%", 
          gsub("e-0*(\\d+)", "%*%10^{-\\1}", sprintf("%.2e", val)))
  } else {
    paste("~italic(BF)[\"01\"] %~~%", sprintf("%.2f", val))
  }
}

 

scroll to top

   

3. Graphic Results

 

Visualization

Note: The Jeffreys approximate Bayes factor (JAB_J) in Eq. 9, using the Jeffreys prior, will not be plotted.

Boxplot (\(\textit{BF}_{01}\))

df_BF <- reshapeW2L2(reportBF, prob=F)

# boxplot of the BF₀₁
for (beta in BETA1) {
  sub <- subset(df_BF, slope == beta & method != "JAB_J") # subset the long
  print(ggplot(sub, aes(x=method, y=value)) +
          geom_boxplot(alpha=0.3, na.rm=T) +
          facet_wrap(.~n, scales="free_y", nrow=1,
                     labeller=label_bquote(cols=italic(n)==.(n))) +
          labs(x="\nBayes Factor Methods", y=expression(italic("BF")["01"])) +
          ggtitle(paste0(names(BETA1[BETA1==beta]), " Hazard Ratio")) +
          theme_classic() +
          theme(axis.text.x=element_text(angle=90)))
  cat("<br><br><br><br><br>")
}





















 

scroll to top

   

Boxplot (Percent Errors)

# percent errors
df7 <- reshapeW2L2(reportERR, prob=F)

index <- 0

# boxplot of the percent errors
for (beta in BETA1) {
  baseBF01 <- sapply(reportBF_avg[seq(1, by=length(BETA1), length.out=length(nObs)) + index], 
                     function(x) x[1]) # baseline is the Savage–Dickey density ratio
  baseBF01_lab <- sapply(baseBF01, formatting) # scientific notation
  index <- index + 1
  
  sub <- subset(df7, !(method %in% c("SD", "JAB_J")) & slope == beta) # subset the long
  anno <- data.frame("label"=baseBF01_lab, "n"=unique(sub$n),
                     "x"=2, "y"=.4 * aggregate(value~n, sub, FUN=max)$value) # for annotation use
  
  print(ggplot(sub, aes(x=method, y=value)) +
          geom_boxplot(alpha=0.3, na.rm=T) +
          geom_text(data=anno, aes(label=label, x=x, y=y), parse=T, col="red") +
          facet_wrap(.~n, scales="free_y", nrow=1,
                     labeller=label_bquote(cols=italic(n)==.(n))) +
          labs(x="\nBayes Factor Methods", y="Percent Error (%)\n") +
          ggtitle(paste0(names(BETA1[BETA1==beta]), " Hazard Ratio")) +
          geom_hline(yintercept=0, linetype="dashed", color="gray") + # reference line 0%
          theme_classic() +
          theme(axis.text.x=element_text(angle=90)))
  cat("<br><br><br><br><br>")
}





















 

scroll to top

   

Line Chart (Proportions of Agreement)

# decisions
decis7 <- rbind(reshapeW2L2(reportRULE),
                reshapeW2L2(reportRULE_H1),
                reshapeW2L2(reportRULE_H0),
                reshapeW2L2(reportRULE_inc))
decis7$result <- factor(rep(c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"),
                            each=length(levels(df7$method)) * length(nObs) * length(BETA1)),
                        levels=c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"))
decis7$delta <- paste0("Slope = ", decis7$slope)
decis7 <- decis7[!is.na(decis7[,"prop"]),]
if (F) {
  # compute the pointwise confidence intervals using the normal approximation
  decis7$half.len <- qnorm(1-.05/2) * 
    sqrt(decis7$value * (decis7$total - decis7$value) / decis7$total^3) # 95% CI width
  decis7$lwr <- max(0, decis7$value / decis7$total - decis7$half.len) # 95% CI lower bound
  decis7$upr <- min(1, decis7$value / decis7$total + decis7$half.len) # 95% CI upper bound
} else { # Jeffreys intervals
  decis7$lwr <- qbeta(.05/2, decis7$value, decis7$total - decis7$value + 1) # 95% CI lower bound
  decis7$upr <- qbeta(1-.05/2, decis7$value + 1, decis7$total - decis7$value) # 95% CI upper bound
  decis7$half.len <- (decis7$upr - decis7$lwr) / 2 # 95% CI width
}

# line chart of the proportions of agreement
ggplotly(ggplot(decis7, aes(n, prop, color=method, label=value)) +
           geom_line(linewidth=1.2, na.rm=T) + geom_point(size=2, na.rm=T) +
           geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=.2) +
           scale_color_manual(values=c("black", "#E69F00", "#CC79A7", "#D55E00", "#0072B2"),
                              labels=c("SD", "BIC", "JAB", "JAB_J", "WAB")) +
           facet_grid(result~delta) +
           labs(x="Number of Observations", y="Proportion of Agreement", color="Method") +
           scale_x_continuous(breaks=c(10, 100, 500)) +
           scale_y_continuous(breaks=c(0, 0.5, 1)) + # min(decis7$prop, na.rm=T)
           theme_minimal()) # interactive plots

     

 

scroll to top

   

4. Finite Sample Correction

 

Centering the normal unit-information prior on the null hypothesized value \(\theta_0\), the customary choice in objective Bayesian testing yields the following JAB* (its Eq. 4; Wagenmakers, 2022).

\[\begin{equation} \textit{JAB}_{01}^*\approx\sqrt{N}\exp{\left\{-\frac{N-1}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]

Alternatively, the JAB in Eq. 9 centers the prior on the maximum likelihood estimate.

\[\begin{equation} \textit{JAB}_{01}\approx\sqrt{N}\exp{\left\{-\frac{N}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]

Using a power transformation, it can be shown that \(\textit{JAB}_{01}^*=\textit{JAB}_{01}^{\frac{N-1}{N}}\cdot N^{\frac{1}{2N}}\).

 

\(N=10\)

par(mfrow=c(1, length(BETA1)))
for (i in 1:length(BETA1)) {
  reportBF[[i]][,4] <- reportBF[[i]][,3]^(9/10) * 10^(1/20) # number of observations is 10
  boxplot(reportBF[[i]][,-2], names=c("SD", "JAB", "JAB*", "WAB"),
          xlab="", ylab=expression(italic("BF")["01"]), main=paste0(names(BETA1)[i], " Slope"), las=2)
}

 

scroll to top

   

5. eJAB

 

To test the hypotheses \(\mathcal{H}_0:\boldsymbol{\theta}=\boldsymbol{\theta}_0\) versus \(\mathcal{H}_1:\boldsymbol{\theta}\neq\boldsymbol{\theta}_0\), we define the extended Jeffreys approximate objective Bayes factor (eJAB) as \[\begin{equation} \mathit{eJAB}_{01}=\sqrt{N}\exp\left\{-\frac{1}{2}\frac{N^{1/q}-1}{N^{1/q}}\cdot Q_{\chi^{2}_{q}}(1-p)\right\}, \end{equation}\]

where \(q\) is the size of the parameter vector \(\boldsymbol{\theta}\), \(N\) is the sample size, \(Q_{\chi^{2}_{q}}(\cdot)\) is the quantile function of the chi-squared distribution with \(q\) degrees of freedom, and \(p\) is the \(p\)-value from a null-hypothesis significance test. We further define \(\mathit{eJAB}_{10}=1\ \!/\ \!\mathit{eJAB}_{01}\).

 

The sample size \(N\) is

  • the total number of observations for for \(t\)-tests, linear regression, logistic regression, one-way ANOVA, and chi-squared tests.

  • the number of events for Cox models.

  • the number of independent observations for one-way repeated-measures ANOVA (Nathoo & Masson, 2016).

 

The degrees of freedom are

  • \(q=1\) for \(t\)-tests, linear regression, logistic regression, and Cox models, where \(\mathcal{H}_0\) specifies a point-null hypothesis (Wagenmakers, 2022).

  • \(q=I-1\) for one-way ANOVA and one-way repeated-measures ANOVA, where \(I\) is the number of conditions.

  • \(q=(R-1)(C-1)\) for chi-squared tests, where \(R\) and \(C\) are the numbers of rows and columns, respectively.

Since the Bayes factor is the ratio of the marginal likelihoods of two competing models, \(q=p_1-p_0\) also represents the difference in the number of free parameters between the alternative and null models.

 

For further reading, please visit https://rpubs.com/sherloconan/1361280.

 

scroll to top