============================================================================================================

This up-to-date document is available at https://rpubs.com/sherloconan/1206247

 

0. Getting Started

 

\[\begin{equation} \tag{1} Y_i\mid x_i\sim\text{Bernoulli}(p_i)=p_i^{y}\cdot(1-p_i)^{1-y}\quad\text{for }y\in\{0,1\}\text{ and }i=1,\dotsb,n \end{equation}\]

\(\mathbb{E}[Y_i\mid x_i]=p_i\) and the most commonly used link function is the log-odds \(\eta_i=\text{logit}(p_i)\equiv\ln\left(\frac{p_i}{1-p_i}\right)\) for \(p_i\in(0,1)\).

\[\begin{align} \tag{2} \mathcal{M}_1:\ \eta_i&=\beta_0+\beta_1 x_i\\ \text{versus}\quad\mathcal{M}_0:\ \eta_i&=\beta_0 \end{align}\]

 

Recall that the logistic sigmoid function is invertible, and its inverse is the logit function.

Thus, \(p_i=\frac{\exp\{\eta_i\}}{\exp\{\eta_i\}+1}=\frac{1}{1+\exp\{-\beta_0-\beta_1 x_i\}}\), which maps any real-valued number into the range \((0,1)\).

Other link functions that can be used include the probit link for modeling normal latent variables
and the complementary log-log link when the probability \(p\) is close to 1 or 0.

   

   

Generalized Linear Model (GLM)

 

  1. Stochastic part

\[\begin{align} Y_i\sim f(y_i;\boldsymbol{\theta}_i)&=\exp\left\{a(y_i)\cdot b(\boldsymbol{\theta}_i)+c(\boldsymbol{\theta}_i)+d(y_i)\right\} \\ a(y_i)&=y_i\quad\text{canonical form}\implies\text{exponential family} \\ b(\boldsymbol{\theta}_i)&\quad\text{natural parameter(s)} \end{align}\]

  1. Systematic part: linear predictors

\[\begin{equation} \boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta} \end{equation}\]

  1. Link function(s)

\[\begin{equation} \eta_i=g\left(\mu_i(\boldsymbol{\theta}_i)\right),\ \text{where }\mu_i(\boldsymbol{\theta}_i)=\mathbb{E}[Y_i\mid\boldsymbol{x}_i] \end{equation}\]

 

\(Y\sim\operatorname{Pois}(\lambda),\hspace{2.1em}\mathbb{E}[Y]=\text{Var}(Y),\hspace{2em}\text{equidispersion}\)

\(Y\sim\textit{B}(n,p),\hspace{2.25em}\mathbb{E}[Y]>\text{Var}(Y),\hspace{2.02em}\text{underdispersion}\)

\(Y\sim\textit{NB}(r,p),\hspace{1.7em}\mathbb{E}[Y]<\text{Var}(Y),\hspace{2em}\text{overdispersion}\)

   

GLM 101

Why is there no error term in Eq. 2? See Wooldridge (2010, p. 565-567), “Econometric Bible”.

 

scroll to top

   

1. Methods

 

BFpack” R Package

 

Using the “BFpack” Package      https://github.com/jomulder/BFpack

  • Usage rank 5073 (“BFpack”) and rank 818 (“BayesFactor”)

  • The fractional Bayes factor

  • Exploratory hypothesis testing, e.g., \(\theta=0\) versus \(\theta<0\) versus \(\theta>0\)

  • Confirmatory hypothesis testing, e.g., \(\theta_1=\theta_2=\theta_3\) versus \(\theta_1<\theta_2<\theta_3\)

  • Slightly different estimates. For a continuous dependent variable and binary independent variables, the Bayesian (1) two-sample \(t\)-test, (2) one-way analysis of variance (ANOVA) with two conditions, and (3) simple linear regression should yield the same Bayes factor estimates. However, this feature is not implemented in “BFpack”.

  • Same methods. The Bayes factor estimate in “BFpack” appears to be the same as the Jeffreys-style approximate Bayes factor (JAB) using the unit-information prior when conducting (4) logistic regression.

 

Toy Dataset ?mtcars

  • mpg - [continuous] miles per gallon

  • mpg_binary - [binary] (0 = mpg <= 20,   1 = mpg > 20)

  • am - [binary] transmission (0 = automatic,   1 = manual)

  • wt - [continuous] weight (1000 lbs)

32 observations

 

\(t\)-Test

 

The fractional Bayes factor (FBF; O’Hagan, 1995)

The prior adjusted fractional Bayes factor (PAFBF; Mulder, 2014; Mulder & Gu, 2022) by default

 

test_t <- bain::t_test(mpg ~ am, data=mtcars, var.equal=T) # two-sample t-test
test_aov <- aov(mpg ~ am, mtcars) # one-way ANOVA with two conditions
test_lm <- lm(mpg ~ am, mtcars) # simple linear regression

matrix(
  c(BFpack::BF(test_t, BF.type="FBF")$BFtu_exploratory[1],
    BFpack::BF(test_t, BF.type="AFBF")$BFtu_exploratory[1],
    BFpack::BF(test_aov, BF.type="FBF")$BFtu_exploratory["am", "BF0u"],
    BFpack::BF(test_aov, BF.type="AFBF")$BFtu_exploratory["am", "BF0u"],
    BFpack::BF(test_lm, BF.type="FBF")$BFtu_exploratory["am", "BF0u"],
    BFpack::BF(test_lm, BF.type="AFBF")$BFtu_exploratory["am", "BF0u"]), # BF₀₁
  nrow=2, dimnames=list(c("FBF", "PAFBF"),
                        c("t-Test", "ANOVA", "Regression")))
##            t-Test      ANOVA Regression
## FBF   0.010657874 0.01058254 0.01058254
## PAFBF 0.006941813 0.00677495 0.00677495
# cf. ttestBF
1 / c("γ=medium"=as.data.frame(BayesFactor::ttestBF(formula=mpg~am, data=mtcars, rscale="medium"))$bf, # default
      "γ=wide"=as.data.frame(BayesFactor::ttestBF(formula=mpg~am, data=mtcars, rscale="wide"))$bf,
      "γ=ultrawide"=as.data.frame(BayesFactor::ttestBF(formula=mpg~am, data=mtcars, rscale="ultrawide"))$bf)
##    γ=medium      γ=wide γ=ultrawide 
## 0.011548714 0.010247026 0.009959575

 

\(\mathbb{P}(\mathcal{H}_0)=\mathbb{P}(\mathcal{H}_-)=\mathbb{P}(\mathcal{H}_+)=\frac{1}{3}\)

\(\textit{BF}_{01}:=\frac{\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})\ /\ \left(1-\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})\right)}{\mathbb{P}(\mathcal{H}_0)\ /\ \left(1-\mathbb{P}(\mathcal{H}_0)\right)}=2\cdot\frac{\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})}{1-\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})}\)

\(\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})+\mathbb{P}(\mathcal{H}_-\mid\boldsymbol{y})+\mathbb{P}(\mathcal{H}_+\mid\boldsymbol{y})=1\)

 

scroll to top

   

Regression

 

The FBF (O’Hagan, 1995)

The PAFBF (Mulder, 2014; Mulder & Gu, 2022)

 

fit_lm <- lm(mpg ~ wt, mtcars) # simple linear regression
N <- sum(summary(fit_lm)$df[1:2]) # number of observations, 32
pVal <- summary(fit_lm)$coefficients["wt", "Pr(>|t|)"]
tVal <- summary(fit_lm)$coefficients["wt", "t value"] # qt(pVal/2, N-2) # same

c("BFpack"=BFpack::BF(fit_lm)$BFtu_exploratory["wt", "BF0u"], # prior adjusted fractional Bayes factor
  "JAB_UI_t"=sqrt(N) * exp(-0.5 * tVal * tVal),
  "JAB_UI_p"=sqrt(N) * exp(-0.5 * qchisq(1-pVal, 1)),
  "JAB_t"=sqrt(N*pi/2) * exp(-0.5 * tVal * tVal),
  "JAB_p"=sqrt(N*pi/2) * exp(-0.5 * qchisq(1-pVal, 1))) # BF₀₁
##       BFpack     JAB_UI_t     JAB_UI_p        JAB_t        JAB_p 
## 1.074903e-08 8.140953e-20 6.033415e-09 1.020317e-19 7.561764e-09

 

The approximated adjusted default Bayes factor (Gu et al., 2018)

 

mtcars$mpg_binary <- ifelse(mtcars$mpg > 20, 1, 0)
fit_glm <- glm(mpg_binary ~ wt, mtcars, family="binomial") # simple logistic regression
pVal2 <- summary(fit_glm)$coefficients["wt", "Pr(>|z|)"]
zVal <- summary(fit_glm)$coefficients["wt", "z value"] # qnorm(pVal2/2) # same

c("BFpack"=BFpack::BF(fit_glm)$BFtu_exploratory["wt", "H(=0)"], # approximated adjusted default Bayes factor
  "JAB_UI_z"=sqrt(N) * exp(-0.5 * zVal * zVal),
  "JAB_UI_p"=sqrt(N) * exp(-0.5 * qchisq(1-pVal2, 1)),
  "JAB_z"=sqrt(N*pi/2) * exp(-0.5 * zVal * zVal),
  "JAB_p"=sqrt(N*pi/2) * exp(-0.5 * qchisq(1-pVal2, 1))) # BF₀₁
##    BFpack  JAB_UI_z  JAB_UI_p     JAB_z     JAB_p 
## 0.2655032 0.2655032 0.2655032 0.3327589 0.3327589

 

scroll to top

   

BIC Approximation

 

Approximating Bayes Factors from Bayesian Information Criterion (BIC)

   

The BIC for model \(i\) with the number of free parameters \(k_i\) and the sample size \(n\) is \(\text{BIC}(\mathcal{M}_i)=-2\ln{\mathcal{L}_i}+k_i\ln{n}\quad\) for \(n\gg k_i\),

where \(\mathcal{L}_i=p(\boldsymbol{y}\mid\hat{\boldsymbol{\theta}}_i,\mathcal{M}_i)\) is the maximum likelihood.

 

Use the second-order Taylor approximation of log-marginal likelihood of each model.

\[\begin{equation} \tag{8} \mathit{BF}_{10}\approx\exp\left\{\frac{\text{BIC}(\mathcal{M}_0)-\text{BIC}(\mathcal{M}_1)}{2}\right\} \end{equation}\]

Derivation in Wagenmakers (2007, Appendix B).

 

?BIC

 

scroll to top

   

Approximate Bayes Factor

 

The test statistic in a Wald test is \(W=\left(\frac{\hat{\theta}-\theta_0}{\text{se}(\hat{\theta})}\right)^2\sim\chi_1^2\) under \(\mathcal{H}_0\), where \(\hat{\theta}\) is the maximum likelihood estimate, and \(\text{se}(\hat{\theta})\) is the standard error.

Wagenmakers (2022; its Eq. 6) reviewed the JAB and proposed a piecewise approximation (WAB) from \(p\)-value and effective sample size \(N\) in Eq. 10.

The complication with JAB is that the approximation is valid only when \(\text{se}(\hat{\theta})\ll\sigma_g\), where \(\sigma_g\) is the scale of the prior distribution \(g(\theta)\). Note that with very small sample sizes, this assumption is not true and Eq. 9 is biased against \(\mathcal{H}_0\).

\[\begin{equation} \tag{9} \textit{JAB}_{01}=A\cdot\sqrt{N}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]

A normal unit-information prior for \(\theta\) yields \(A=1\). Based on the Jeffreys prior, the general approximation specifies \(A=\sqrt{\pi/2}\approx1.253\).

 

Wagenmakers’s piecewise approximation is

\[ \textit{WAB}_{01}\approx \begin{cases} \tag{10} \text{$\ p^{\frac{1}{4}}\sqrt{N}$,} & \text{$p>0.5$} \\ \\ \text{$\ \sqrt{pN}$,} & \text{$0.1<p\leqslant0.5$ (simpler)} \\ \\ \text{$\ \frac{4}{3}p^{\frac{2}{3}}\sqrt{N}$,} & \text{$0.1<p\leqslant0.5$ (more precise)} \\ \\ \text{$\ 3p\sqrt{N}$,} & \text{$p\leqslant0.1$} \end{cases} \]

 

scroll to top

   

JAGS

 

First, install the JAGS (just another Gibbs sampler) program http://mcmc-jags.sourceforge.net/

Note dnorm(mean, precision) and dgamma(shape, rate) in JAGS

 

Next, we implement the priors as considered in Gelman et al. (2008).

  • Scale \(\boldsymbol{x}\) to have mean 0 and standard deviation 0.5

  • \(\beta_0\sim\text{Cauchy}(0,\ 10)\)

  • \(\beta_1\sim\text{Cauchy}(0,\ 2.5)\)

   

The Savage–Dickey density ratio is a special form of the Bayes factor for nested models by dividing the value of
the posterior density over the parameters for the alternative model evaluated at the hypothesized value by
the prior for the same model evaluated at the same point.

\[\begin{equation} \tag{11} \textit{BF}_{01}:=\frac{p(\boldsymbol{y}\mid \mathcal{M}_0)}{p(\boldsymbol{y}\mid \mathcal{M}_1)}=\frac{\color{#0055A4}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid\boldsymbol{y},\mathcal{M}_1)}}{\color{#EF4135}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid \mathcal{M}_1)}} \end{equation}\]

 

nObs <- 30  # number of observations
beta0 <- -1  # true intercept
beta1 <- 1   # true slope

set.seed(277)
x <- rnorm(nObs)  # predictor variable
p <- 1 / (1 + exp(-beta0 - beta1 * x))  # the probability of y = 1
y <- rbinom(nObs, 1, p)  # outcome variable

x_scale <- c(scale(x)) * 0.5 # scale to have mean 0 and standard deviation 0.5
dat <- data.frame(x_scale, y)
datalist <- list(y=y, x=x_scale, n=nObs, 
                 m0=0, # "m0"=summary(fit)$coefficients[1,1] # center at MLE
                 m1=0) # "m1"=summary(fit)$coefficients[2,1]

fit <- glm(y ~ x_scale, dat, family="binomial")
summary(fit)
## 
## Call:
## glm(formula = y ~ x_scale, family = "binomial", data = dat)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.4817     0.4160  -1.158   0.2469  
## x_scale       1.9656     0.9105   2.159   0.0309 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 40.381  on 29  degrees of freedom
## Residual deviance: 34.752  on 28  degrees of freedom
## AIC: 38.752
## 
## Number of Fisher Scoring iterations: 4
model_JAGS <- "model {
  for (i in 1:n) {
    y[i] ~ dbern(p[i])
    logit(p[i]) <- beta0 + beta1 * x[i]
  }
  
  beta0 ~ dt(m0, pow(10, -2), 1)  #  <== precision;  no hyperpriors
  beta1 ~ dt(m1, pow(2.5, -2), 1) # Cauchy (Gelman et al., 2008, p. 1366)
}"

fit_JAGS <- jags.model(textConnection(model_JAGS), data=datalist,
                       # inits=function() {list("beta0"=rnorm(1), "beta1"=rnorm(1))},
                       inits=list(list("beta0"=0, "beta1"=1),
                                  list("beta0"=-0.48, "beta1"=1.97),
                                  list("beta0"=1, "beta1"=0)),
                       n.chains=3, n.adapt=5000, quiet=T) # initialize the model
update(fit_JAGS, 5000, progress.bar="none") # run Markov chain to burn-in

samples_JAGS <- coda.samples(fit_JAGS, variable.names=c("beta0", "beta1"), 
                             n.iter=10000, progress.bar="none") # generate posterior samples
beta1S1 <- samples_JAGS[[1]][,2]
beta1S2 <- samples_JAGS[[2]][,2]
beta1S3 <- samples_JAGS[[3]][,2] # posterior samples of β₁

dPostS1 <- dlogspline(0, logspline(beta1S1)) # estimated posterior density from the first chain
dPostS2 <- dlogspline(0, logspline(beta1S2)) # estimated posterior density from the second chain
dPostS3 <- dlogspline(0, logspline(beta1S3)) # estimated posterior density from the third chain

c(dPostS1, dPostS2, dPostS3) / dcauchy(0, datalist$m1, 2.5) # BF₀₁
## [1] 0.3124615 0.2878228 0.3024883

   

Please verify if this method is not performing well on the real dataset mtcars.

 

scroll to top

   

   

Check the Monte Carlo error of estimating the Bayes factor

num <- 100; SD <- numeric(num)

for (i in 1:num) {
  fit_JAGS <- jags.model(textConnection(model_JAGS), data=datalist,
                         n.chains=1, n.adapt=5000, quiet=T) # initialize the model
  update(fit_JAGS, 5000, progress.bar="none") # run Markov chain to burn-in

  samples_JAGS <- coda.samples(fit_JAGS, variable.names="beta1", 
                               n.iter=10000, progress.bar="none") # generate posterior samples
  beta1S <- samples_JAGS[[1]][,"beta1"]
  dPostS <- dlogspline(0, logspline(beta1S)) # estimated posterior density from one chain
  SD[i] <- dPostS / dcauchy(0, datalist$m1, 2.5)
}

hist(SD, prob=T, col="white", yaxt="n", ylab="", main="rjags + SD",
     sub="100 runs of one chain with 10,000 iterations",
     xlab="Bayes factor estimates supporting the null") # BF₀₁
lines(density(SD),col="red",lwd=2)

 

scroll to top

   

Stan

 

  • Compile and fit the model in Stan.

Note that to compute the (log) marginal likelihood for a Stan model, we need to specify the model in a certain way. Instead of using “~” signs for specifying distributions, we need to directly use the (log) density functions. The reason for this is that when using the “~” sign, constant terms are dropped which are not needed for sampling from the posterior. However, for computing the marginal likelihood, these constants need to be retained. For instance, instead of writing y ~ normal(mu, sigma); we would need to write target += normal_lpdf(y | mu, sigma); (Gronau, 2021).

   

  • Compute log marginal likelihoods for full and null models via bridge sampling. Then, \(\textit{BF}_{01}:=\frac{p(\boldsymbol{y}\ \mid\ \mathcal{M}_0)}{p(\boldsymbol{y}\ \mid\ \mathcal{M}_1)}\).

Bridge Sampling devtools::install_github("quentingronau/bridgesampling@master")

All terms are conditional on \(\mathcal{M}_i\).

\[\begin{align} \tag{12} \color{#EE1C25}{p(\boldsymbol{y})}=&\frac{\mathbb{E}_{g(\boldsymbol{\theta})}[\hspace{0.1em}p(\boldsymbol{y}\mid\boldsymbol{\theta})\cdot p(\boldsymbol{\theta})\cdot h(\boldsymbol{\theta})]}{\mathbb{E}_\text{post}[\hspace{0.1em}g(\boldsymbol{\theta})\cdot h(\boldsymbol{\theta})]} \\ \\ \approx&\frac{\frac{1}{N_2}\sum_{r=1}^{N_2}p(\boldsymbol{y}\mid\boldsymbol{\theta}_r^{\text{prop}})\cdot p(\boldsymbol{\theta}_r^{\text{prop}})\cdot h(\boldsymbol{\theta}_r^{\text{prop}})}{\frac{1}{N_1}\sum_{s=1}^{N_1}g(\boldsymbol{\theta}_s^\text{post})\cdot h(\boldsymbol{\theta}_s^\text{post})}, \\ \\ &\hspace{15em}\boldsymbol{\theta}_r^{\text{prop}}\sim g(\boldsymbol{\theta}), \\ &\hspace{15em}\boldsymbol{\theta}_s^\text{post}\sim p(\boldsymbol{\theta}\mid\boldsymbol{y}), \end{align}\]

where \(g(\boldsymbol{\theta})\) is the proposal distribution

and \(h(\boldsymbol{\theta})\) is the bridge function, \(h(\boldsymbol{\theta})=\frac{C}{s_1\cdot p(\boldsymbol{y}\mid\boldsymbol{\theta})\cdot p(\boldsymbol{\theta})+s_2\cdot \color{#EE1C25}{p(\boldsymbol{y})}\cdot g(\boldsymbol{\theta})}\) and \(s_j=\frac{N_j}{N_1+N_2}\) for \(j\in\{1,2\}\).

 

Check the use_neff argument in bridgesampling::bridge_sampler.

 

stancodeM1 <- "
data {
  int<lower=1> n;             // number of observations
  vector[n] x;                // predictor variable
  int<lower=0,upper=1> y[n];  // outcome variable
  real m0;                    // center for beta0
  real m1;                    // center for beta1
}

parameters {
  real beta0;                 // intercept
  real beta1;                 // slope
}

model {
  target += cauchy_lpdf(beta0 | m0, 10);
  target += cauchy_lpdf(beta1 | m1, 2.5);  // <== scale
  
  for (i in 1:n) {
    target += bernoulli_logit_lpmf(y[i] | beta0 + beta1 * x[i]);
  }
}"

stanmodelM1 <- stan_model(model_code=stancodeM1)
stanfitM1 <- sampling(stanmodelM1, data=datalist,
                      iter=15000, warmup=5000, chains=3, seed=277, refresh=0)

rstan::summary(stanfitM1)[[1]] %>% kable(digits=4) %>% kable_classic(full_width=F)
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 -0.4950 0.0028 0.4197 -1.3453 -0.7707 -0.4866 -0.2094 0.3032 22213.30 0.9999
beta1 1.8821 0.0063 0.8985 0.2598 1.2569 1.8314 2.4454 3.7682 20395.86 1.0001
lp__ -24.3457 0.0094 1.0415 -27.1401 -24.7364 -24.0238 -23.6118 -23.3355 12269.44 1.0000
stancodeM0 <- "
data {
  int<lower=1> n;             // number of observations
  int<lower=0,upper=1> y[n];  // outcome variable
  real m0;                    // center for beta0
}

parameters {
  real beta0;                 // intercept
}

model {
  target += cauchy_lpdf(beta0 | m0, 10);

  for (i in 1:n) {
    target += bernoulli_logit_lpmf(y[i] | beta0);
  }
}"

stanmodelM0 <- stan_model(model_code=stancodeM0)
stanfitM0 <- sampling(stanmodelM0, data=datalist[-c(2, 5)],
                      iter=15000, warmup=5000, chains=3, seed=277, refresh=0)

rstan::summary(stanfitM0)[[1]] %>% kable(digits=4) %>% kable_classic(full_width=F)
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta0 -0.4167 0.0036 0.3793 -1.1727 -0.6672 -0.4093 -0.1636 0.3134 11194.66 1.0004
lp__ -24.1490 0.0063 0.7288 -26.2092 -24.3111 -23.8672 -23.6889 -23.6398 13273.91 1.0003
M1 <- bridge_sampler(stanfitM1, silent=T)
M0 <- bridge_sampler(stanfitM0, silent=T) #  <== Warning
## Warning: effective sample size cannot be calculated, has been replaced by
## number of samples.
bf(M0, M1) # BF₀₁ via bridge sampling, exp(M0$logml - M1$logml)
## Estimated Bayes factor in favor of M0 over M1: 0.30369
summary(M0)
## 
## Bridge sampling log marginal likelihood estimate 
## (method = "normal", repetitions = 1):
## 
##  -23.6997
## 
## Error Measures:
## 
##  Relative Mean-Squared Error: 2.14592e-08
##  Coefficient of Variation: 0.0001464896
##  Percentage Error: 0%
## 
## Note:
## All error measures are approximate.
summary(M1)
## 
## Bridge sampling log marginal likelihood estimate 
## (method = "normal", repetitions = 1):
## 
##  -22.50794
## 
## Error Measures:
## 
##  Relative Mean-Squared Error: 4.483659e-07
##  Coefficient of Variation: 0.0006696013
##  Percentage Error: 0%
## 
## Note:
## All error measures are approximate.

 

scroll to top

   

   

Check the Monte Carlo error of estimating the Bayes factor

BFBS <- function(seed=277, diagnostics=F) {
  stanfitM0 <- sampling(stanmodelM0, data=datalist[-c(2, 5)],
                        iter=15000, warmup=5000, chains=1, seed=seed, refresh=0)
  
  stanfitM1 <- sampling(stanmodelM1, data=datalist,
                        iter=15000, warmup=5000, chains=1, seed=seed, refresh=0)
  
  beta1S <- extract(stanfitM1, pars="beta1")$beta1
  dPostS <- dlogspline(0, logspline(beta1S)) # estimated posterior density from one chain
  SDdr <- dPostS / dcauchy(0, datalist$m1, 2.5) # Savage–Dickey density ratio (BF₀₁)
  
  set.seed(seed)
  M0 <- bridge_sampler(stanfitM0, silent=T)
  
  set.seed(seed)
  M1 <- bridge_sampler(stanfitM1, silent=T)
  
  if (diagnostics) {
    list("bf"=exp(M0$logml - M1$logml),
         "pererr.M1"=error_measures(M1)$cv,
         "pererr.M0"=error_measures(M0)$cv,
         "stan.M1"=stanfitM1, "stan.M0"=stanfitM0,
         "savage.dickey"=SDdr)
  } else {
    list("bf"=exp(M0$logml - M1$logml),
         "pererr.M1"=error_measures(M1)$cv,
         "pererr.M0"=error_measures(M0)$cv,
         "savage.dickey"=SDdr)
  }
}

num <- 100; BF <- PerErr.M1 <- PerErr.M0 <- SDdr <- numeric(num)
system.time(
for (i in 1:num) { # roughly 139 seconds of run time
  result <- BFBS(seed=i)
  BF[i] <- result$bf
  PerErr.M1[i] <- result$pererr.M1
  PerErr.M0[i] <- result$pererr.M0
  SDdr[i] <- result$savage.dickey
})

# range(PerErr.M1); range(PerErr.M0)

par(mfrow=c(1,2))
hist(BF, prob=T, col="white", yaxt="n", ylab="", main="rstan + BS",
     sub="100 runs of one chain with 10,000 iterations",
     xlab="Bayes Factor Estimates Supporting the Null") # BF₀₁: rstan + BS
lines(density(BF), col="blue", lwd=2)
hist(SDdr, prob=T, col="white", yaxt="n", ylab="", main="rstan + SD",
     sub="100 runs of one chain with 10,000 iterations",
     xlab="Savage–Dickey Density Ratio Supporting the Null") # BF₀₁: rstan + SD
lines(density(SDdr), col="red", lwd=2)

 

scroll to top

   

2. Simulation

 

As \(x\) increases \(\Delta x\) unit, the estimate of odds ratio is \(\widehat{\textit{OR}}:=\frac{\hat{p}_2}{1-\hat{p}_2}\ /\ \frac{\hat{p}_1}{1-\hat{p}_1}=e^{\hat{\beta}_1\cdot\Delta x}\).

  • \(\widehat{\textit{OR}}>1\): The event is more likely to occur as \(x\) increases. The predictor is associated with higher odds of the outcome.

  • \(\widehat{\textit{OR}}<1\): The event is less likely to occur as \(x\) increases. The predictor is associated with lower odds of the outcome.

  • \(\widehat{\textit{OR}}=1\): No association between the predictor and the outcome.

The further \(\widehat{\textit{OR}}\) is from 1, the stronger the effect. For example, \(\widehat{\textit{OR}}=2\) indicates that the odds of the outcome double for each unit increase in the predictor.

 

How big is a big odds ratio? (Chen, Cohen, & Chen, 2010; Chinn, 2000)

   

Warning Messages

set.seed(25)
x <- rnorm(10) # predictor variable of insufficient sample size 10
p <- 1 / (1 + exp(0.5 - 0.4 * x)) # the probability of y = 1
(y <- rbinom(10, 1, p))
##  [1] 0 0 0 0 0 0 1 1 0 0
summary(glm(y ~ x, family=binomial)) # WARNINGS
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## 
## Call:
## glm(formula = y ~ x, family = binomial)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)    -93.53   80937.33  -0.001    0.999
## x              225.02  197019.25   0.001    0.999
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.0008e+01  on 9  degrees of freedom
## Residual deviance: 2.1701e-09  on 8  degrees of freedom
## AIC: 4
## 
## Number of Fisher Scoring iterations: 25

 

Data

simData3 <- function(n, beta0=NULL, beta1=NULL, sd.X=1) {
  #' Input - 
  #' n :             number of observations
  #' beta0:          true intercept
  #' beta1:          true coefficient for X
  #' sd.X:           true standard deviation of X
  #' 
  #' Output - binary outcome and predictor variables, y and x
  
  repeat { # loop until the glm() function executes without warnings
    # try-catch block to handle the function execution and capture warnings
    warnings <- tryCatch({
      
      x <- stats::rnorm(n, 0, sd.X) # predictor variable
      p <- 1 / (1 + exp(-beta0 - beta1 * x)) # the probability of y = 1
      y <- stats::rbinom(n, 1, p) # outcome variable
      
      fit1 <- stats::glm(y ~ x, family=binomial) # full model fit
      NULL # no warnings, return NULL
      
    }, warning = function(w) {
      w # return the warning message
    })
    
    # break the loop if there were no warnings
    if (is.null(warnings)) {
      break
    }
  }
  
  data.frame("Outcome"=y, "Predictor"=x)
}

   

The Bayes Factors

Note: All the Bayes factors are \(\textit{BF}_{01}\).

stancode2M1 <- "
data {
  int<lower=1> n;                       // number of observations
  vector[n] x;                          // predictor variable
  int<lower=0,upper=1> y[n];            // outcome variable
  int<lower=0,upper=1> UI;              // prior option (1=unit information, 0=Gelman et al. (2008))
  vector[2] mu;                         // mean vector for theta if UI==1
  matrix[2,2] Sigma;                    // covariance matrix for theta if UI==1
}

parameters {
  vector[2] theta;                      // combined vector for intercept and slope
}

transformed parameters {
  real alpha = theta[1];                // intercept
  real beta = theta[2];                 // slope
}

model {
  // Priors (JAGS does not support direct if-else statements)
  if (UI == 1) {
    theta ~ multi_normal(mu, Sigma);    // unit information
  } else {
    alpha ~ cauchy(0, 10);
    beta ~ cauchy(0, 2.5);              // <== scale (Gelman et al., 2008)
  }
  
  // Likelihood
  for (i in 1:n) {
    target += bernoulli_logit_lpmf(y[i] | alpha + beta * x[i]);
  }
}"

stanmodel2M1 <- rstan::stan_model(model_code=stancode2M1)

computeBFs3 <- function(data, prior=1, precise=F) {
  #' Input -
  #' data:     long format data whose columns must be `Outcome` and `Predictor`
  #' prior:    prior option (1=unit information, 0=Gelman et al. (2008))
  #' precise:  logical; if FALSE (default), use the simpler form for Output 5;
  #'                             otherwise, use the more precise form
  #'
  #' Global -  stanmodel2M1
  #' Dependency - rstan (v2.32.6), polspline (v1.1.25)
  #' 
  #' Output -  a vector of the Bayes factors in favor of the null
  #'           1. Savage–Dickey density ratio (Eq. 11) using the normal unit-information prior
  #'           2. BIC approximation to the Bayes factor in Eq. 8; BIC(fit)
  #'           3. Jeffreys approximate Bayes factor in Eq. 9 using the normal unit-information prior
  #'           4. Jeffreys approximate Bayes factor in Eq. 9 using the Jeffreys prior
  #'           5. Wagenmakers approximate Bayes factor based on the p-value and sample size in Eq. 10
  
  data <- stats::na.omit(data)
  N <- nrow(data) # number of observations
  
  fit0 <- stats::glm(Outcome ~ 1, data, family=binomial) # null model fit
  fit1 <- stats::glm(Outcome ~ Predictor, data, family=binomial) # full model fit
  pVal <- summary(fit1)$coefficients["Predictor", "Pr(>|z|)"] # p-value of the regression coefficient
  W <- stats::qchisq(pVal, 1, lower.tail=F) # Wald statistic, ["Predictor", "z value"]^2
  
  datalist <- list(n=N, x=data$Predictor, y=data$Outcome, UI=prior,
                   mu=stats::coef(fit1), Sigma=stats::vcov(fit1)*N)
  if(prior==0) datalist$x <- c(scale(datalist$x)) * 0.5 # scale to have mean 0 and standard deviation 0.5
  stanfitM1 <- rstan::sampling(stanmodel2M1, data=datalist,
                               iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
  betaS <- rstan::extract(stanfitM1, pars="beta")$beta
  dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
  dPrior <- ifelse(datalist$UI==1, 
                   stats::dnorm(0, datalist$mu[2], sqrt(datalist$Sigma[2,2])), stats::dcauchy(0, 0, 2.5))
  SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
  
  BICB01 <- exp((stats::BIC(fit1) - stats::BIC(fit0)) / 2) # Eq. 8
  
  JAB01_UI <- sqrt(N) * exp(-0.5 * W) # Eq. 9, unit-information prior
  
  WAB01 <- ifelse(pVal <= .5,
                  ifelse(pVal > .1,
                         ifelse(precise,
                                4 * pVal^(2/3) * sqrt(N) / 3, # 0.1 < p <= 0.5 (more precise)
                                sqrt(pVal * N)), # 0.1 < p <= 0.5 (simpler)
                         3 * pVal * sqrt(N)), # p <= 0.1
                  pVal^(1/4) * sqrt(N)) # p > 0.5, Eq. 10
  
  c("SD"=SDdr01, "BIC_approx"=BICB01, "JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01)
}

# set.seed(277)
# test3 <- simData3(30, -0.5, 1.5)
# computeBFs3(test3) # BFpack and JAB methods seem to be the same
colnames(dat) <- c("Predictor", "Outcome")
computeBFs3(dat)
##         SD BIC_approx        JAB      JAB_J        WAB 
##  0.2898725  0.3282977  0.5326438  0.6675700  0.5070023

   

Simulation Study

We aim to conduct several simulation studies to evaluate the accuracy of approximate objective Bayes factors in their ability to convert \(p\)-values and \(n\) reported in scientific articles to Bayesian measures of evidence.

 

nSim <- 1000 # number of simulation runs for each setting
nObs <- c(10, 20, 30, 100, 500) # number of observations
BETA1 <- c("Null"=0, "Small"=0.4, "Medium"=1.1, "Large"=1.6) # regression coefficient (slope)
# exp(BETA1) # odds ratio

reportBF <- reportBF_avg <- reportBF_sd <- reportERR <- reportERR_avg <- reportERR_sd <- 
  reportRULE <- reportRULE_H1 <- reportRULE_H0 <- reportRULE_inc <-
  setNames(vector(mode="list", length=length(nObs) * length(BETA1)), # pre-allocation
           apply(expand.grid(BETA1, nObs), 1, function(r) paste0("n = ", r[2], ", slope = ", r[1])))

index <- 1 # initial list index

for (n in nObs) {
  
  for (slope in BETA1) {
    set.seed(n+slope+277)
    temp <- t(replicate(nSim, computeBFs3(simData3(n, beta0=-0.5, beta1=slope)))) # replicate the Bayes factors
    reportBF[[index]] <- temp
    reportBF_avg[[index]] <- colMeans(temp, na.rm=T) # mean Bayes factors in favor of the null
    reportBF_sd[[index]] <- apply(na.omit(temp), 2, sd) # sample standard deviations of them
    
    tempERR <- 100* (temp - temp[,1]) / temp[,1] # percent errors; the first column should be all 0
    reportERR[[index]] <- tempERR # (can take in NA)
    reportERR_avg[[index]] <- colMeans(tempERR, na.rm=T) # mean percent errors
    reportERR_sd[[index]] <- apply(na.omit(tempERR), 2, sd) # sample standard deviations of the percent errors
    
    tempRULE_H1 <- temp < 1/3 # support H1
    tempRULE_H0 <- temp > 3 # support H0
    tempRULE_inc <- temp >= 1/3 & temp <= 3 # inconclusive
    
    # counts of matching decisions for H1
    reportRULE_H1[[index]] <- colSums(tempRULE_H1 * tempRULE_H1[,1], na.rm=T)
    
    # counts of matching decisions for H0
    reportRULE_H0[[index]] <- colSums(tempRULE_H0 * tempRULE_H0[,1], na.rm=T)
    
    # counts of matching inconclusiveness
    reportRULE_inc[[index]] <- colSums(tempRULE_inc * tempRULE_inc[,1], na.rm=T)
    
    tempRULE <- 1*(temp < 1/3) - 1*(temp > 3) # 1: support H1;  0: inconclusive;  -1: support H0
    reportRULE[[index]] <- colSums(tempRULE[,1] == tempRULE, na.rm=T) # counts of matching decisions
    # colMeans(); a value near 1 suggests matching results; the first column should be all 1
    
    index <- index + 1
  }
} # roughly 13.4 hours of run time

# There were 50 or more warnings:
# 1: There were 17061 divergent transitions after warmup. See
# https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
# to find out why this is a problem and how to eliminate them.
# 2: Examine the pairs() plot to diagnose sampling problems
# ...

 

reshapeW2L2 <- function(list, prob=T, lab=c("SD", "BIC", "JAB", "JAB_J", "WAB")) {
  #' Input -
  #' list:    a list of size length(nObs) * length(BETA1);
  #'          reportERR (unaggregated),
  #'          reportRULE, reportRULE_H1, reportRULE_H0, or reportRULE_inc (aggregated)
  #' prob:    logical; if TRUE (default), return the proportions;
  #'                           otherwise, return the original values
  #' lab:     a vector of characters representing the Bayes factor methods
  #' 
  #' Global -
  #' nSim:    number of simulation runs for each setting
  #' nObs:    number of observations
  #' BETA1:   regression coefficient (slope)
  #' 
  #' Output - unlist the report and reshape the wide into the long
  
  len <- length(lab) # five methods
  nRep <- ifelse(!prob, nSim, 1) # nSim if unaggregated; 1 if aggregated
  long <- data.frame("value"=unname(unlist(list)),
                     "method"=factor(rep(rep(1:len, each=nRep), length(list)), labels=lab),
                     "n"=rep(nObs, each=nRep*len*length(BETA1)),
                     "slope"=rep(rep(unname(BETA1), each=nRep*len), length(nObs)))
  
  if (prob) {
    list2 <- lapply(list, function(vec) {
      if (vec[1] == 0) {
        vec * NA
      } else {
        vec / vec[1] # convert counts to proportions
      }
    })
    list3 <- lapply(list, function(vec) rep(vec[1], length(vec))) # denominator
    long$total <- unname(unlist(list3))
    long$prop <- unname(unlist(list2))
  }
  long
}


formatting <- function(val) {
  #' format the value using scientific notation
  if (abs(val) >= 10) {
    paste("~italic(BF)[\"01\"] %~~%", 
          gsub("e\\+0*(\\d+)", "%*%10^\\1", sprintf("%.2e", val)))
  } else if (abs(val) < 1) {
    paste("~italic(BF)[\"01\"] %~~%", 
          gsub("e-0*(\\d+)", "%*%10^{-\\1}", sprintf("%.2e", val)))
  } else {
    paste("~italic(BF)[\"01\"] %~~%", sprintf("%.2f", val))
  }
}

 

scroll to top

   

3. Graphic Results

 

Visualization

Note: The Jeffreys approximate Bayes factor (JAB_J) in Eq. 9, using the Jeffreys prior, will not be plotted.

Boxplot (Percent Errors)

# percent errors
df3 <- reshapeW2L2(reportERR, prob=F)

index <- 0

# boxplot of the percent errors
for (beta in BETA1) {
  baseBF01 <- sapply(reportBF_avg[seq(1, by=length(BETA1), length.out=length(nObs)) + index], 
                     function(x) x[1]) # baseline is the Savage–Dickey density ratio
  baseBF01_lab <- sapply(baseBF01, formatting) # scientific notation
  index <- index + 1
  
  sub <- subset(df3, !(method %in% c("SD", "JAB_J")) & slope == beta) # subset the long
  anno <- data.frame("label"=baseBF01_lab, "n"=unique(sub$n),
                     "x"=2, "y"=.8 * aggregate(value~n, sub, FUN=max)$value) # for annotation use
  
  print(ggplot(sub, aes(x=method, y=value)) +
          geom_boxplot(alpha=0.3, na.rm=T) +
          # ylim(quantile(sub$value, probs=c(.01, .99))) + # restricted  <== CAUTION
          geom_text(data=anno, aes(label=label, x=x, y=y), parse=T, col="red") +
          # facet_grid(.~n, labeller=label_bquote(cols=italic(n)==.(n))) + # same scale on y-axis
          facet_wrap(.~n, scales="free_y", nrow=1,
                     labeller=label_bquote(cols=italic(n)==.(n))) +
          labs(x="\nBayes Factor Methods", y="Percent Error (%)\n") +
          ggtitle(paste0(names(BETA1[BETA1==beta]), " Odds Ratio")) +
          geom_hline(yintercept=0, linetype="dashed", color="gray") + # reference line 0%
          theme_classic() +
          theme(axis.text.x=element_text(angle=90)))
  cat("<br><br><br><br><br>")
}





















 

scroll to top

   

Boxplot (\(\textit{BF}_{01}\))

df_BF <- reshapeW2L2(reportBF, prob=F)

# boxplot of the BF₀₁
for (beta in BETA1) {
  sub <- subset(df_BF, slope == beta & method != "JAB_J") # subset the long
  print(ggplot(sub, aes(x=method, y=value)) +
          geom_boxplot(alpha=0.3, na.rm=T) +
          facet_wrap(.~n, scales="free_y", nrow=1,
                     labeller=label_bquote(cols=italic(n)==.(n))) +
          labs(x="\nBayes Factor Methods", y=expression(italic("BF")["01"])) +
          ggtitle(paste0(names(BETA1[BETA1==beta]), " Odds Ratio")) +
          theme_classic() +
          theme(axis.text.x=element_text(angle=90)))
  cat("<br><br><br><br><br>")
}





















 

scroll to top

   

Line Chart (Proportions of Agreement)

# decisions
decis3 <- rbind(reshapeW2L2(reportRULE),
                reshapeW2L2(reportRULE_H1),
                reshapeW2L2(reportRULE_H0),
                reshapeW2L2(reportRULE_inc))
decis3$result <- factor(rep(c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"),
                            each=length(levels(df3$method)) * length(nObs) * length(BETA1)),
                        levels=c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"))
decis3$delta <- paste0("Slope = ", decis3$slope)
decis3 <- decis3[!is.na(decis3[,"prop"]),]
if (F) {
  # compute the pointwise confidence intervals using the normal approximation
  decis3$half.len <- qnorm(1-.05/2) * 
    sqrt(decis3$value * (decis3$total - decis3$value) / decis3$total^3) # 95% CI width
  decis3$lwr <- max(0, decis3$value / decis3$total - decis3$half.len) # 95% CI lower bound
  decis3$upr <- min(1, decis3$value / decis3$total + decis3$half.len) # 95% CI upper bound
} else { # Jeffreys intervals
  decis3$lwr <- qbeta(.05/2, decis3$value, decis3$total - decis3$value + 1) # 95% CI lower bound
  decis3$upr <- qbeta(1-.05/2, decis3$value + 1, decis3$total - decis3$value) # 95% CI upper bound
  decis3$half.len <- (decis3$upr - decis3$lwr) / 2 # 95% CI width
}

# line chart of the proportions of agreement
ggplotly(ggplot(decis3, aes(n, prop, color=method, label=value)) +
           geom_line(linewidth=1.2, na.rm=T) + geom_point(size=2, na.rm=T) +
           geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=.2) +
           scale_color_manual(values=c("black", "#E69F00", "#CC79A7", "#D55E00", "#0072B2"),
                              labels=c("SD", "BIC", "JAB", "JAB_J", "WAB")) +
           facet_grid(result~delta) +
           labs(x="Number of Observations", y="Proportion of Agreement", color="Method") +
           scale_x_continuous(breaks=c(10, 100, 500)) +
           scale_y_continuous(breaks=c(0, 0.5, 1)) + # min(decis3$prop, na.rm=T)
           theme_minimal()) # interactive plots

     

 

scroll to top

   

4. Decision Rules

 

Debugging

(mat <- matrix(rep(c(0, -1), 5), nrow=2))
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    0    0    0    0
## [2,]   -1   -1   -1   -1   -1
(res <- mat[mat[,1] == -1, 1] == mat[mat[,1] == -1,]) # only one row ==> column vector
## [1] TRUE TRUE TRUE TRUE TRUE
colSums(res) #  ERROR!
## Error in base::colSums(x, na.rm = na.rm, dims = dims, ...): 'x' must be an array of at least two dimensions
colSums(rbind(rep(F, 5), res)) # counts of matching decisions
## [1] 1 1 1 1 1
colSums((mat == -1) * (mat == -1)[,1]) # counts of matching decisions
## [1] 1 1 1 1 1

 

The counts of matching Bayes factors between the gold standard and approximate methods.

reportRULE
## $`n = 10, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        858        805        515        847 
## 
## $`n = 10, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        862        790        540        832 
## 
## $`n = 10, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        906        687        546        697 
## 
## $`n = 10, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        887        545        456        549 
## 
## $`n = 20, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        960        919        807        923 
## 
## $`n = 20, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        966        856        742        877 
## 
## $`n = 20, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        963        598        505        605 
## 
## $`n = 20, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        970        369        297        406 
## 
## $`n = 30, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        974        948        858        967 
## 
## $`n = 30, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        973        892        808        916 
## 
## $`n = 30, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        970        728        614        771 
## 
## $`n = 30, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        979        787        676        827 
## 
## $`n = 100, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        993        983        947        964 
## 
## $`n = 100, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        985        945        867        922 
## 
## $`n = 100, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        998        985        976        990 
## 
## $`n = 100, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000       1000       1000       1000       1000 
## 
## $`n = 500, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        998        995        984        996 
## 
## $`n = 500, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000        996        992        980        992 
## 
## $`n = 500, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000       1000       1000       1000       1000 
## 
## $`n = 500, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000       1000       1000       1000       1000
reportRULE_inc
## $`n = 10, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        820        702        700        410        744 
## 
## $`n = 10, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        806        691        690        439        734 
## 
## $`n = 10, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        696        647        646        505        658 
## 
## $`n = 10, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        545        522        522        433        527 
## 
## $`n = 20, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        380        353        343        232        380 
## 
## $`n = 20, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        486        463        446        334        486 
## 
## $`n = 20, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        461        458        455        396        461 
## 
## $`n = 20, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        274        272        268        253        274 
## 
## $`n = 30, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        276        258        249        163        275 
## 
## $`n = 30, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        406        394        382        305        404 
## 
## $`n = 30, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        327        324        319        293        327 
## 
## $`n = 30, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        136        134        135        123        136 
## 
## $`n = 100, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        118        113        105         71         84 
## 
## $`n = 100, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        402        396        386        334        350 
## 
## $`n = 100, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         17         16         17         17         17 
## 
## $`n = 100, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0 
## 
## $`n = 500, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         45         43         42         32         41 
## 
## $`n = 500, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         70         70         70         67         62 
## 
## $`n = 500, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0 
## 
## $`n = 500, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0
reportRULE_H1
## $`n = 10, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         75         51          0          0          0 
## 
## $`n = 10, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         93         71          0          0          0 
## 
## $`n = 10, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        263        218          0          0          0 
## 
## $`n = 10, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        432        342          0          0          0 
## 
## $`n = 20, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         45         32          1          0          3 
## 
## $`n = 20, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        108         97          4          2         12 
## 
## $`n = 20, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        449        415         53         19         72 
## 
## $`n = 20, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        705        677         80         23        115 
## 
## $`n = 30, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         30         24          5          1          6 
## 
## $`n = 30, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        114         99         30         23         39 
## 
## $`n = 30, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        613        586        349        261        385 
## 
## $`n = 30, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        858        839        646        547        685 
## 
## $`n = 100, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         15         13         11          9         13 
## 
## $`n = 100, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        218        209        179        153        192 
## 
## $`n = 100, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        983        982        968        959        973 
## 
## $`n = 100, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000       1000       1000       1000       1000 
## 
## $`n = 500, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          5          5          3          2          5 
## 
## $`n = 500, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        920        916        912        903        920 
## 
## $`n = 500, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000       1000       1000       1000       1000 
## 
## $`n = 500, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##       1000       1000       1000       1000       1000
reportRULE_H0
## $`n = 10, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        105        105        105        105        103 
## 
## $`n = 10, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        101        100        100        101         98 
## 
## $`n = 10, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         41         41         41         41         39 
## 
## $`n = 10, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         23         23         23         23         22 
## 
## $`n = 20, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        575        575        575        575        540 
## 
## $`n = 20, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        406        406        406        406        379 
## 
## $`n = 20, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         90         90         90         90         72 
## 
## $`n = 20, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         21         21         21         21         17 
## 
## $`n = 30, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        694        692        694        694        686 
## 
## $`n = 30, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        480        480        480        480        473 
## 
## $`n = 30, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         60         60         60         60         59 
## 
## $`n = 30, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          6          6          6          6          6 
## 
## $`n = 100, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        867        867        867        867        867 
## 
## $`n = 100, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        380        380        380        380        380 
## 
## $`n = 100, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0 
## 
## $`n = 100, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0 
## 
## $`n = 500, slope = 0`
##         SD BIC_approx        JAB      JAB_J        WAB 
##        950        950        950        950        950 
## 
## $`n = 500, slope = 0.4`
##         SD BIC_approx        JAB      JAB_J        WAB 
##         10         10         10         10         10 
## 
## $`n = 500, slope = 1.1`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0 
## 
## $`n = 500, slope = 1.6`
##         SD BIC_approx        JAB      JAB_J        WAB 
##          0          0          0          0          0

 

scroll to top

   

5. Finite Sample Correction

 

Centering the normal unit-information prior on the null hypothesized value \(\theta_0\), the customary choice in objective Bayesian testing yields the following JAB* (its Eq. 4; Wagenmakers, 2022).

\[\begin{equation} \textit{JAB}_{01}^*\approx\sqrt{N}\exp{\left\{-\frac{N-1}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]

Alternatively, the JAB in Eq. 9 centers the prior on the maximum likelihood estimate.

\[\begin{equation} \textit{JAB}_{01}\approx\sqrt{N}\exp{\left\{-\frac{N}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]

Using a power transformation, it can be shown that \(\textit{JAB}_{01}^*=\textit{JAB}_{01}^{\frac{N-1}{N}}\cdot N^{\frac{1}{2N}}\).

 

\(N=10\)

par(mfrow=c(1, length(BETA1)))
for (i in 1:length(BETA1)) {
  reportBF[[i]][,4] <- reportBF[[i]][,3]^(9/10) * 10^(1/20) # number of observations is 10
  boxplot(reportBF[[i]][,-2], names=c("SD", "JAB", "JAB*", "WAB"),
          xlab="", ylab=expression(italic("BF")["01"]), main=paste0(names(BETA1)[i], " Slope"), las=2)
}

 

scroll to top

   

6. clogit

 

Simple Conditional Logistic Regression for a 1:1 Matched Case–Control Study

\[\begin{align} &\mathbb{P}(Y_{i1}=1\mid x_{ij},\ Y_{i1}+Y_{i2}=1) \\ =&\frac{\mathbb{P}(Y_{i1}=1,\ Y_{i2}=0\mid x_{ij})}{\mathbb{P}(Y_{i1}=1,\ Y_{i2}=0\mid x_{ij})+\mathbb{P}(Y_{i1}=0,\ Y_{i2}=1\mid x_{ij})} \\ =&\frac{\mathbb{P}(Y_{i1}=1\mid x_{i1})\left(1-\mathbb{P}(Y_{i2}=1\mid x_{i2})\right)}{\mathbb{P}(Y_{i1}=1\mid x_{i1})\left(1-\mathbb{P}(Y_{i2}=1\mid x_{i2})\right)+\mathbb{P}(Y_{i2}=1\mid x_{i2})\left(1-\mathbb{P}(Y_{i1}=1\mid x_{i1})\right)} \\ =&\frac{\operatorname{logistic}(\alpha+\beta x_{i1})\ \!\operatorname{logistic}(-\alpha-\beta x_{i2})}{\operatorname{logistic}(\alpha+\beta x_{i1})\ \!\operatorname{logistic}(-\alpha-\beta x_{i2})+\operatorname{logistic}(\alpha+\beta x_{i2})\ \!\operatorname{logistic}(-\alpha-\beta x_{i1})} \\ =&\operatorname{logistic}\left(\beta (x_{i1}-x_{i2})\right) \end{align}\]

Thus,

\[\begin{equation} \mathbb{P}(Y_{ij}=1\mid x_{ij},\ Y_{i1}+Y_{i2}=1)=\frac{\exp\{\beta x_{ij}\}}{\exp\{\beta x_{i1}\}+\exp\{\beta x_{i2}\}} \end{equation}\]

for the observation \(j\in\{1,2\}\) of the stratum \(i=1,\dotsb,n\).

In the “one case per pair” setting, the matching factors (i.e., a pair-specific intercept \(\alpha\)) are conditioned out — similar to how a baseline hazard is conditioned out in the Cox model. Conditional logistic for matched sets is mathematically equivalent to a stratified Cox model:

  • each matched set in a case–control study as a risk set in a survival study, and

  • each case as a failure (event) occurring in that risk set.

 

Stan

 

Consider a single binary exposure, \(x_{ij}\in\{0,1\}\).

A pair is concordant if the case and control have the same exposure status. These pairs do not contribute to estimating the exposure effect (matched odds ratio) — because there is no difference in exposure within the pair, the exposure cannot explain why one is a case and the other is a control.

A pair is discordant if the case and control differ in exposure status. Only these pairs contribute to the estimation of the exposure effect.

Pair-level (Bernoulli) modeling

stancodeCH1 <- "
data {
  int<lower=1> d;                          // number of discordant pairs
  // int<lower=0, upper=1> y[d];              // y[i]=1 if CASE is exposed, 0 if CONTROL is exposed [removed]
  array[d] int<lower=0, upper=1> y;        // [Stan 2.26+ syntax for array declarations]
  real m;                                  // center for beta
  real s;                                  // scale for beta
}

parameters {
  real beta;                               // log matched odds ratio
}

model {
  beta ~ normal(m, s);                     // unit information

  // conditional likelihood for discordant pairs:
  // Pr(case is exposed) = logistic(beta)
  y ~ bernoulli_logit(beta);
}"

stanmodelCH1 <- rstan::stan_model(model_code=stancodeCH1)

Note: Drop concordant pairs before constructing y[d].

 

Aggregated (binomial) modeling

The estimated matched odds ratio can be shown to simplify to the ratio of pairs with an exposed case and unexposed control to those with an unexposed case and exposed control, b / c (McNemar’s test).

stancodeCM1 <- "
data {
  int<lower=0> b;                          // number of pairs with case exposed / control unexposed
  int<lower=0> c;                          // number of pairs with case unexposed / control exposed
  real m;                                  // center for beta
  real s;                                  // scale for beta
}
parameters {
  real beta;                               // log matched odds ratio
}
model {
  beta ~ normal(m, s);                     // unit information
  b ~ binomial_logit(b + c, beta);         // logistic(beta) is Pr(case exposed)
}"

stanmodelCM1 <- rstan::stan_model(model_code=stancodeCM1)

For a single binary exposure, both formulations are equivalent; the Bernoulli version stancodeCH1 is just the unpooled form of the binomial version stancodeCM1. The binomial version is numerically stable and fastest when the linear predictor is constant across pairs.

 

scroll to top

   

 

Simulations

 

Data

simDataC <- function(n, beta=NULL) {
  #' Input - 
  #' n :             number of strata
  #' beta:           true log matched odds ratio
  #' 
  #' Output - 1:1 matched case-control pairs with a binary exposure in the long format
  
  x1 <- stats::rbinom(n, 1, .5) # exposure for subject 1 in each pair
  x2 <- stats::rbinom(n, 1, .5) # exposure for subject 2 in each pair (independent Bernoulli)
  p1 <- stats::plogis(beta * (x1 - x2)) # conditional prob that subject 1 is the case
  y1 <- stats::rbinom(n, 1, p1) # 1 if subject 1 is the case (subject 2 is the control)

  data.frame("Case"=c(y1, 1-y1),
             "Exposure"=c(x1, x2),
             "ID"=rep(1:n, 2))
}

   

The Bayes Factors

Note: All the Bayes factors are \(\textit{BF}_{01}\).

computeBFsC <- function(data, precise=F) {
  #' Input -
  #' data:     long format data whose columns must be `Case`, `Exposure`, and `ID`
  #' precise:  logical; if FALSE (default), use the simpler form for Output 5;
  #'                             otherwise, use the more precise form
  #'
  #' Global -  stanmodelCM1 (aggregated binomial modeling)
  #' Dependency - survival (v3.8-3), rstan (v2.32.7), polspline (v1.1.25)
  #' 
  #' Output -  a vector of the Bayes factors in favor of the null
  #'           1. Savage–Dickey density ratio (Eq. 11) using the normal unit-information prior
  #'           2. BIC approximation to the Bayes factor in Eq. 8; BIC(fit)
  #'           3. Jeffreys approximate Bayes factor in Eq. 9 using the normal unit-information prior
  #'           4. Jeffreys approximate Bayes factor in Eq. 9 using the Jeffreys prior
  #'           5. Wagenmakers approximate Bayes factor based on the p-value and sample size in Eq. 10
  #'           6. number of discordant pairs
  #'           7. number of pairs with case exposed / control unexposed
  #'           8. number of pairs with case unexposed / control exposed
  #'           9. p-value of the conditional logistic regression
  
  data <- stats::na.omit(data)
  
  N <- sum(tapply(data$Exposure, data$ID, function(v) length(unique(v))==2)) # number of discordant pairs
  b <- sum(with(data, # number of pairs with case exposed / control unexposed
                tapply(1:nrow(data), ID, 
                       function(i) any(Case[i]==1 & Exposure[i]==1) && any(Case[i]==0 & Exposure[i]==0))))
  c <- sum(with(data, # number of pairs with case unexposed / control exposed
                tapply(1:nrow(data), ID, 
                       function(i) any(Case[i]==1 & Exposure[i]==0) && any(Case[i]==0 & Exposure[i]==1))))
  if (c==0 | b==0)  return(c("SD"=NA, "BIC_approx"=NA, "JAB"=NA, "JAB_J"=NA, "WAB"=NA, 
                             "N"=N, "b"=b, "c"=c, "pVal"=NA))
  
  fit1 <- survival::clogit(Case ~ Exposure + survival::strata(ID), data) # full model fit
  pVal <- summary(fit1)$coefficients[,"Pr(>|z|)"][1] # p-value of the regression coefficient
  W <- stats::qchisq(pVal, 1, lower.tail=F) # Wald statistic, [,"z"]^2
  
  datalist <- list(b=b, c=c, m=fit1$coefficients, s=c(sqrt(fit1$var*N)))
  stanfitH1 <- rstan::sampling(stanmodelCM1, data=datalist,
                               iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
  betaS <- rstan::extract(stanfitH1, pars="beta")$beta
  dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
  dPrior <- stats::dnorm(0, datalist$m, datalist$s)
  SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
  
  BICB01 <- exp(-diff(fit1$loglik) + log(N)/2) # Eq. 8
  
  JAB01_UI <- sqrt(N) * exp(-0.5 * W) # Eq. 9, unit-information prior
  
  WAB01 <- ifelse(pVal <= .5,
                  ifelse(pVal > .1,
                         ifelse(precise,
                                4 * pVal^(2/3) * sqrt(N) / 3, # 0.1 < p <= 0.5 (more precise)
                                sqrt(pVal * N)), # 0.1 < p <= 0.5 (simpler)
                         3 * pVal * sqrt(N)), # p <= 0.1
                  pVal^(1/4) * sqrt(N)) # p > 0.5, Eq. 10
  
  c("SD"=SDdr01, "BIC_approx"=BICB01, "JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01,
    "N"=N, "b"=b, "c"=c, "pVal"=pVal)
}

# set.seed(277)
# testC <- simDataC(100, 1)
# computeBFsC(testC)

 

scroll to top

   

7. eJAB

 

To test the hypotheses \(\mathcal{H}_0:\boldsymbol{\theta}=\boldsymbol{\theta}_0\) versus \(\mathcal{H}_1:\boldsymbol{\theta}\neq\boldsymbol{\theta}_0\), we define the extended Jeffreys approximate objective Bayes factor (eJAB) as \[\begin{equation} \mathit{eJAB}_{01}=\sqrt{N}\exp\left\{-\frac{1}{2}\frac{N^{1/q}-1}{N^{1/q}}\cdot Q_{\chi^{2}_{q}}(1-p)\right\}, \end{equation}\]

where \(q\) is the size of the parameter vector \(\boldsymbol{\theta}\), \(N\) is the sample size, \(Q_{\chi^{2}_{q}}(\cdot)\) is the quantile function of the chi-squared distribution with \(q\) degrees of freedom, and \(p\) is the \(p\)-value from a null-hypothesis significance test. We further define \(\mathit{eJAB}_{10}=1\ \!/\ \!\mathit{eJAB}_{01}\).

 

The sample size \(N\) is

  • the total number of observations for for \(t\)-tests, linear regression, logistic regression, one-way ANOVA, and chi-squared tests.

  • the number of events for Cox models.

  • the number of independent observations for one-way repeated-measures ANOVA (Nathoo & Masson, 2016).

 

The degrees of freedom are

  • \(q=1\) for \(t\)-tests, linear regression, logistic regression, and Cox models, where \(\mathcal{H}_0\) specifies a point-null hypothesis (Wagenmakers, 2022).

  • \(q=I-1\) for one-way ANOVA and one-way repeated-measures ANOVA, where \(I\) is the number of conditions.

  • \(q=(R-1)(C-1)\) for chi-squared tests, where \(R\) and \(C\) are the numbers of rows and columns, respectively.

Since the Bayes factor is the ratio of the marginal likelihoods of two competing models, \(q=p_1-p_0\) also represents the difference in the number of free parameters between the alternative and null models.

 

For further reading, please visit https://rpubs.com/sherloconan/1361280.

 

scroll to top