============================================================================================================
This up-to-date document is available at https://rpubs.com/sherloconan/1206247
\[\begin{equation} \tag{1} Y_i\mid x_i\sim\text{Bernoulli}(p_i)=p_i^{y}\cdot(1-p_i)^{1-y}\quad\text{for }y\in\{0,1\}\text{ and }i=1,\dotsb,n \end{equation}\]
\(\mathbb{E}[Y_i\mid x_i]=p_i\) and the most commonly used link function is the log-odds \(\eta_i=\text{logit}(p_i)\equiv\ln\left(\frac{p_i}{1-p_i}\right)\) for \(p_i\in(0,1)\).
\[\begin{align} \tag{2} \mathcal{M}_1:\ \eta_i&=\beta_0+\beta_1 x_i\\ \text{versus}\quad\mathcal{M}_0:\ \eta_i&=\beta_0 \end{align}\]
Recall that the logistic sigmoid function is invertible, and its inverse is the logit function.
Thus, \(p_i=\frac{\exp\{\eta_i\}}{\exp\{\eta_i\}+1}=\frac{1}{1+\exp\{-\beta_0-\beta_1 x_i\}}\), which maps any real-valued number into the range \((0,1)\).
Other link functions that can be used include the probit link for
modeling normal latent variables
and the complementary log-log link
when the probability \(p\) is close to
1 or 0.
Generalized Linear Model (GLM)
\[\begin{align} Y_i\sim f(y_i;\boldsymbol{\theta}_i)&=\exp\left\{a(y_i)\cdot b(\boldsymbol{\theta}_i)+c(\boldsymbol{\theta}_i)+d(y_i)\right\} \\ a(y_i)&=y_i\quad\text{canonical form}\implies\text{exponential family} \\ b(\boldsymbol{\theta}_i)&\quad\text{natural parameter(s)} \end{align}\]
\[\begin{equation} \boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta} \end{equation}\]
\[\begin{equation} \eta_i=g\left(\mu_i(\boldsymbol{\theta}_i)\right),\ \text{where }\mu_i(\boldsymbol{\theta}_i)=\mathbb{E}[Y_i\mid\boldsymbol{x}_i] \end{equation}\]
\(Y\sim\operatorname{Pois}(\lambda),\hspace{2.1em}\mathbb{E}[Y]=\text{Var}(Y),\hspace{2em}\text{equidispersion}\)
\(Y\sim\textit{B}(n,p),\hspace{2.25em}\mathbb{E}[Y]>\text{Var}(Y),\hspace{2.02em}\text{underdispersion}\)
\(Y\sim\textit{NB}(r,p),\hspace{1.7em}\mathbb{E}[Y]<\text{Var}(Y),\hspace{2em}\text{overdispersion}\)
GLM 101
Why is there no error term in Eq. 2? See Wooldridge (2010, p. 565-567), “Econometric Bible”.
Using the “BFpack” Package https://github.com/jomulder/BFpack
Usage rank 5073 (“BFpack”) and rank 818 (“BayesFactor”)
Exploratory hypothesis testing, e.g., \(\theta=0\) versus \(\theta<0\) versus \(\theta>0\)
Confirmatory hypothesis testing, e.g., \(\theta_1=\theta_2=\theta_3\) versus \(\theta_1<\theta_2<\theta_3\)
Slightly different estimates. For a continuous dependent variable and binary independent variables, the Bayesian (1) two-sample \(t\)-test, (2) one-way analysis of variance (ANOVA) with two conditions, and (3) simple linear regression should yield the same Bayes factor estimates. However, this feature is not implemented in “BFpack”.
Same methods. The Bayes factor estimate in “BFpack” appears to be the same as the Jeffreys-style approximate Bayes factor (JAB) using the unit-information prior when conducting (4) logistic regression.
Toy Dataset ?mtcars
mpg - [continuous] miles per gallon
mpg_binary - [binary] (0 = mpg <= 20, 1 = mpg > 20)
am - [binary] transmission (0 = automatic, 1 = manual)
wt - [continuous] weight (1000 lbs)
32 observations
The fractional Bayes factor (FBF; O’Hagan, 1995)
The prior adjusted fractional Bayes factor (PAFBF; Mulder, 2014; Mulder & Gu, 2022) by default
test_t <- bain::t_test(mpg ~ am, data=mtcars, var.equal=T) # two-sample t-test
test_aov <- aov(mpg ~ am, mtcars) # one-way ANOVA with two conditions
test_lm <- lm(mpg ~ am, mtcars) # simple linear regression
matrix(
c(BFpack::BF(test_t, BF.type="FBF")$BFtu_exploratory[1],
BFpack::BF(test_t, BF.type="AFBF")$BFtu_exploratory[1],
BFpack::BF(test_aov, BF.type="FBF")$BFtu_exploratory["am", "BF0u"],
BFpack::BF(test_aov, BF.type="AFBF")$BFtu_exploratory["am", "BF0u"],
BFpack::BF(test_lm, BF.type="FBF")$BFtu_exploratory["am", "BF0u"],
BFpack::BF(test_lm, BF.type="AFBF")$BFtu_exploratory["am", "BF0u"]), # BF₀₁
nrow=2, dimnames=list(c("FBF", "PAFBF"),
c("t-Test", "ANOVA", "Regression")))
## t-Test ANOVA Regression
## FBF 0.010657874 0.01058254 0.01058254
## PAFBF 0.006941813 0.00677495 0.00677495
# cf. ttestBF
1 / c("γ=medium"=as.data.frame(BayesFactor::ttestBF(formula=mpg~am, data=mtcars, rscale="medium"))$bf, # default
"γ=wide"=as.data.frame(BayesFactor::ttestBF(formula=mpg~am, data=mtcars, rscale="wide"))$bf,
"γ=ultrawide"=as.data.frame(BayesFactor::ttestBF(formula=mpg~am, data=mtcars, rscale="ultrawide"))$bf)
## γ=medium γ=wide γ=ultrawide
## 0.011548714 0.010247026 0.009959575
\(\mathbb{P}(\mathcal{H}_0)=\mathbb{P}(\mathcal{H}_-)=\mathbb{P}(\mathcal{H}_+)=\frac{1}{3}\)
\(\textit{BF}_{01}:=\frac{\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})\ /\ \left(1-\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})\right)}{\mathbb{P}(\mathcal{H}_0)\ /\ \left(1-\mathbb{P}(\mathcal{H}_0)\right)}=2\cdot\frac{\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})}{1-\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})}\)
\(\mathbb{P}(\mathcal{H}_0\mid\boldsymbol{y})+\mathbb{P}(\mathcal{H}_-\mid\boldsymbol{y})+\mathbb{P}(\mathcal{H}_+\mid\boldsymbol{y})=1\)
The FBF (O’Hagan, 1995)
The PAFBF (Mulder, 2014; Mulder & Gu, 2022)
fit_lm <- lm(mpg ~ wt, mtcars) # simple linear regression
N <- sum(summary(fit_lm)$df[1:2]) # number of observations, 32
pVal <- summary(fit_lm)$coefficients["wt", "Pr(>|t|)"]
tVal <- summary(fit_lm)$coefficients["wt", "t value"] # qt(pVal/2, N-2) # same
c("BFpack"=BFpack::BF(fit_lm)$BFtu_exploratory["wt", "BF0u"], # prior adjusted fractional Bayes factor
"JAB_UI_t"=sqrt(N) * exp(-0.5 * tVal * tVal),
"JAB_UI_p"=sqrt(N) * exp(-0.5 * qchisq(1-pVal, 1)),
"JAB_t"=sqrt(N*pi/2) * exp(-0.5 * tVal * tVal),
"JAB_p"=sqrt(N*pi/2) * exp(-0.5 * qchisq(1-pVal, 1))) # BF₀₁
## BFpack JAB_UI_t JAB_UI_p JAB_t JAB_p
## 1.074903e-08 8.140953e-20 6.033415e-09 1.020317e-19 7.561764e-09
The approximated adjusted default Bayes factor (Gu et al., 2018)
mtcars$mpg_binary <- ifelse(mtcars$mpg > 20, 1, 0)
fit_glm <- glm(mpg_binary ~ wt, mtcars, family="binomial") # simple logistic regression
pVal2 <- summary(fit_glm)$coefficients["wt", "Pr(>|z|)"]
zVal <- summary(fit_glm)$coefficients["wt", "z value"] # qnorm(pVal2/2) # same
c("BFpack"=BFpack::BF(fit_glm)$BFtu_exploratory["wt", "H(=0)"], # approximated adjusted default Bayes factor
"JAB_UI_z"=sqrt(N) * exp(-0.5 * zVal * zVal),
"JAB_UI_p"=sqrt(N) * exp(-0.5 * qchisq(1-pVal2, 1)),
"JAB_z"=sqrt(N*pi/2) * exp(-0.5 * zVal * zVal),
"JAB_p"=sqrt(N*pi/2) * exp(-0.5 * qchisq(1-pVal2, 1))) # BF₀₁
## BFpack JAB_UI_z JAB_UI_p JAB_z JAB_p
## 0.2655032 0.2655032 0.2655032 0.3327589 0.3327589
Approximating Bayes Factors from Bayesian Information Criterion (BIC)
The BIC for model \(i\) with the number of free parameters \(k_i\) and the sample size \(n\) is \(\text{BIC}(\mathcal{M}_i)=-2\ln{\mathcal{L}_i}+k_i\ln{n}\quad\) for \(n\gg k_i\),
where \(\mathcal{L}_i=p(\boldsymbol{y}\mid\hat{\boldsymbol{\theta}}_i,\mathcal{M}_i)\) is the maximum likelihood.
Use the second-order Taylor approximation of log-marginal likelihood of each model.
\[\begin{equation} \tag{8} \mathit{BF}_{10}\approx\exp\left\{\frac{\text{BIC}(\mathcal{M}_0)-\text{BIC}(\mathcal{M}_1)}{2}\right\} \end{equation}\]
Derivation in Wagenmakers (2007, Appendix B).
?BIC
The test statistic in a Wald test is \(W=\left(\frac{\hat{\theta}-\theta_0}{\text{se}(\hat{\theta})}\right)^2\sim\chi_1^2\) under \(\mathcal{H}_0\), where \(\hat{\theta}\) is the maximum likelihood estimate, and \(\text{se}(\hat{\theta})\) is the standard error.
Wagenmakers (2022; its Eq. 6) reviewed the JAB and proposed a piecewise approximation (WAB) from \(p\)-value and effective sample size \(N\) in Eq. 10.
The complication with JAB is that the approximation is valid only when \(\text{se}(\hat{\theta})\ll\sigma_g\), where \(\sigma_g\) is the scale of the prior distribution \(g(\theta)\). Note that with very small sample sizes, this assumption is not true and Eq. 9 is biased against \(\mathcal{H}_0\).
\[\begin{equation} \tag{9} \textit{JAB}_{01}=A\cdot\sqrt{N}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]
A normal unit-information prior for \(\theta\) yields \(A=1\). Based on the Jeffreys prior, the general approximation specifies \(A=\sqrt{\pi/2}\approx1.253\).
Wagenmakers’s piecewise approximation is
\[ \textit{WAB}_{01}\approx \begin{cases} \tag{10} \text{$\ p^{\frac{1}{4}}\sqrt{N}$,} & \text{$p>0.5$} \\ \\ \text{$\ \sqrt{pN}$,} & \text{$0.1<p\leqslant0.5$ (simpler)} \\ \\ \text{$\ \frac{4}{3}p^{\frac{2}{3}}\sqrt{N}$,} & \text{$0.1<p\leqslant0.5$ (more precise)} \\ \\ \text{$\ 3p\sqrt{N}$,} & \text{$p\leqslant0.1$} \end{cases} \]
First, install the JAGS (just another Gibbs sampler) program http://mcmc-jags.sourceforge.net/
Note dnorm(mean, precision) and
dgamma(shape, rate) in JAGS
Next, we implement the priors as considered in Gelman et al. (2008).
Scale \(\boldsymbol{x}\) to have mean 0 and standard deviation 0.5
\(\beta_0\sim\text{Cauchy}(0,\ 10)\)
\(\beta_1\sim\text{Cauchy}(0,\ 2.5)\)
The Savage–Dickey density ratio is a special form of
the Bayes factor for nested models by dividing the
value of
the posterior density over the
parameters for the alternative model evaluated at the hypothesized
value by
the prior for the same
model evaluated at the same point.
\[\begin{equation} \tag{11} \textit{BF}_{01}:=\frac{p(\boldsymbol{y}\mid \mathcal{M}_0)}{p(\boldsymbol{y}\mid \mathcal{M}_1)}=\frac{\color{#0055A4}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid\boldsymbol{y},\mathcal{M}_1)}}{\color{#EF4135}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid \mathcal{M}_1)}} \end{equation}\]
nObs <- 30 # number of observations
beta0 <- -1 # true intercept
beta1 <- 1 # true slope
set.seed(277)
x <- rnorm(nObs) # predictor variable
p <- 1 / (1 + exp(-beta0 - beta1 * x)) # the probability of y = 1
y <- rbinom(nObs, 1, p) # outcome variable
x_scale <- c(scale(x)) * 0.5 # scale to have mean 0 and standard deviation 0.5
dat <- data.frame(x_scale, y)
datalist <- list(y=y, x=x_scale, n=nObs,
m0=0, # "m0"=summary(fit)$coefficients[1,1] # center at MLE
m1=0) # "m1"=summary(fit)$coefficients[2,1]
fit <- glm(y ~ x_scale, dat, family="binomial")
summary(fit)
##
## Call:
## glm(formula = y ~ x_scale, family = "binomial", data = dat)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4817 0.4160 -1.158 0.2469
## x_scale 1.9656 0.9105 2.159 0.0309 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 40.381 on 29 degrees of freedom
## Residual deviance: 34.752 on 28 degrees of freedom
## AIC: 38.752
##
## Number of Fisher Scoring iterations: 4
model_JAGS <- "model {
for (i in 1:n) {
y[i] ~ dbern(p[i])
logit(p[i]) <- beta0 + beta1 * x[i]
}
beta0 ~ dt(m0, pow(10, -2), 1) # <== precision; no hyperpriors
beta1 ~ dt(m1, pow(2.5, -2), 1) # Cauchy (Gelman et al., 2008, p. 1366)
}"
fit_JAGS <- jags.model(textConnection(model_JAGS), data=datalist,
# inits=function() {list("beta0"=rnorm(1), "beta1"=rnorm(1))},
inits=list(list("beta0"=0, "beta1"=1),
list("beta0"=-0.48, "beta1"=1.97),
list("beta0"=1, "beta1"=0)),
n.chains=3, n.adapt=5000, quiet=T) # initialize the model
update(fit_JAGS, 5000, progress.bar="none") # run Markov chain to burn-in
samples_JAGS <- coda.samples(fit_JAGS, variable.names=c("beta0", "beta1"),
n.iter=10000, progress.bar="none") # generate posterior samples
beta1S1 <- samples_JAGS[[1]][,2]
beta1S2 <- samples_JAGS[[2]][,2]
beta1S3 <- samples_JAGS[[3]][,2] # posterior samples of β₁
dPostS1 <- dlogspline(0, logspline(beta1S1)) # estimated posterior density from the first chain
dPostS2 <- dlogspline(0, logspline(beta1S2)) # estimated posterior density from the second chain
dPostS3 <- dlogspline(0, logspline(beta1S3)) # estimated posterior density from the third chain
c(dPostS1, dPostS2, dPostS3) / dcauchy(0, datalist$m1, 2.5) # BF₀₁
## [1] 0.3124615 0.2878228 0.3024883
Please verify if this method is not performing well on the real
dataset mtcars.
Check the Monte Carlo error of estimating the Bayes factor
num <- 100; SD <- numeric(num)
for (i in 1:num) {
fit_JAGS <- jags.model(textConnection(model_JAGS), data=datalist,
n.chains=1, n.adapt=5000, quiet=T) # initialize the model
update(fit_JAGS, 5000, progress.bar="none") # run Markov chain to burn-in
samples_JAGS <- coda.samples(fit_JAGS, variable.names="beta1",
n.iter=10000, progress.bar="none") # generate posterior samples
beta1S <- samples_JAGS[[1]][,"beta1"]
dPostS <- dlogspline(0, logspline(beta1S)) # estimated posterior density from one chain
SD[i] <- dPostS / dcauchy(0, datalist$m1, 2.5)
}
hist(SD, prob=T, col="white", yaxt="n", ylab="", main="rjags + SD",
sub="100 runs of one chain with 10,000 iterations",
xlab="Bayes factor estimates supporting the null") # BF₀₁
lines(density(SD),col="red",lwd=2)
Note that to compute the (log) marginal likelihood for a Stan model,
we need to specify the model in a certain way. Instead of using “~”
signs for specifying distributions, we need to directly use the (log)
density functions. The reason for this is that when using the “~” sign,
constant terms are dropped which are not needed for sampling from the
posterior. However, for computing the marginal likelihood, these
constants need to be retained. For instance, instead of writing
y ~ normal(mu, sigma); we would need to write
target += normal_lpdf(y | mu, sigma); (Gronau,
2021).
Bridge Sampling
devtools::install_github("quentingronau/bridgesampling@master")
All terms are conditional on \(\mathcal{M}_i\).
\[\begin{align} \tag{12} \color{#EE1C25}{p(\boldsymbol{y})}=&\frac{\mathbb{E}_{g(\boldsymbol{\theta})}[\hspace{0.1em}p(\boldsymbol{y}\mid\boldsymbol{\theta})\cdot p(\boldsymbol{\theta})\cdot h(\boldsymbol{\theta})]}{\mathbb{E}_\text{post}[\hspace{0.1em}g(\boldsymbol{\theta})\cdot h(\boldsymbol{\theta})]} \\ \\ \approx&\frac{\frac{1}{N_2}\sum_{r=1}^{N_2}p(\boldsymbol{y}\mid\boldsymbol{\theta}_r^{\text{prop}})\cdot p(\boldsymbol{\theta}_r^{\text{prop}})\cdot h(\boldsymbol{\theta}_r^{\text{prop}})}{\frac{1}{N_1}\sum_{s=1}^{N_1}g(\boldsymbol{\theta}_s^\text{post})\cdot h(\boldsymbol{\theta}_s^\text{post})}, \\ \\ &\hspace{15em}\boldsymbol{\theta}_r^{\text{prop}}\sim g(\boldsymbol{\theta}), \\ &\hspace{15em}\boldsymbol{\theta}_s^\text{post}\sim p(\boldsymbol{\theta}\mid\boldsymbol{y}), \end{align}\]
where \(g(\boldsymbol{\theta})\) is the proposal distribution
and \(h(\boldsymbol{\theta})\) is the bridge function, \(h(\boldsymbol{\theta})=\frac{C}{s_1\cdot p(\boldsymbol{y}\mid\boldsymbol{\theta})\cdot p(\boldsymbol{\theta})+s_2\cdot \color{#EE1C25}{p(\boldsymbol{y})}\cdot g(\boldsymbol{\theta})}\) and \(s_j=\frac{N_j}{N_1+N_2}\) for \(j\in\{1,2\}\).
Check the use_neff argument in
bridgesampling::bridge_sampler.
stancodeM1 <- "
data {
int<lower=1> n; // number of observations
vector[n] x; // predictor variable
int<lower=0,upper=1> y[n]; // outcome variable
real m0; // center for beta0
real m1; // center for beta1
}
parameters {
real beta0; // intercept
real beta1; // slope
}
model {
target += cauchy_lpdf(beta0 | m0, 10);
target += cauchy_lpdf(beta1 | m1, 2.5); // <== scale
for (i in 1:n) {
target += bernoulli_logit_lpmf(y[i] | beta0 + beta1 * x[i]);
}
}"
stanmodelM1 <- stan_model(model_code=stancodeM1)
stanfitM1 <- sampling(stanmodelM1, data=datalist,
iter=15000, warmup=5000, chains=3, seed=277, refresh=0)
rstan::summary(stanfitM1)[[1]] %>% kable(digits=4) %>% kable_classic(full_width=F)
| mean | se_mean | sd | 2.5% | 25% | 50% | 75% | 97.5% | n_eff | Rhat | |
|---|---|---|---|---|---|---|---|---|---|---|
| beta0 | -0.4950 | 0.0028 | 0.4197 | -1.3453 | -0.7707 | -0.4866 | -0.2094 | 0.3032 | 22213.30 | 0.9999 |
| beta1 | 1.8821 | 0.0063 | 0.8985 | 0.2598 | 1.2569 | 1.8314 | 2.4454 | 3.7682 | 20395.86 | 1.0001 |
| lp__ | -24.3457 | 0.0094 | 1.0415 | -27.1401 | -24.7364 | -24.0238 | -23.6118 | -23.3355 | 12269.44 | 1.0000 |
stancodeM0 <- "
data {
int<lower=1> n; // number of observations
int<lower=0,upper=1> y[n]; // outcome variable
real m0; // center for beta0
}
parameters {
real beta0; // intercept
}
model {
target += cauchy_lpdf(beta0 | m0, 10);
for (i in 1:n) {
target += bernoulli_logit_lpmf(y[i] | beta0);
}
}"
stanmodelM0 <- stan_model(model_code=stancodeM0)
stanfitM0 <- sampling(stanmodelM0, data=datalist[-c(2, 5)],
iter=15000, warmup=5000, chains=3, seed=277, refresh=0)
rstan::summary(stanfitM0)[[1]] %>% kable(digits=4) %>% kable_classic(full_width=F)
| mean | se_mean | sd | 2.5% | 25% | 50% | 75% | 97.5% | n_eff | Rhat | |
|---|---|---|---|---|---|---|---|---|---|---|
| beta0 | -0.4167 | 0.0036 | 0.3793 | -1.1727 | -0.6672 | -0.4093 | -0.1636 | 0.3134 | 11194.66 | 1.0004 |
| lp__ | -24.1490 | 0.0063 | 0.7288 | -26.2092 | -24.3111 | -23.8672 | -23.6889 | -23.6398 | 13273.91 | 1.0003 |
M1 <- bridge_sampler(stanfitM1, silent=T)
M0 <- bridge_sampler(stanfitM0, silent=T) # <== Warning
## Warning: effective sample size cannot be calculated, has been replaced by
## number of samples.
bf(M0, M1) # BF₀₁ via bridge sampling, exp(M0$logml - M1$logml)
## Estimated Bayes factor in favor of M0 over M1: 0.30369
summary(M0)
##
## Bridge sampling log marginal likelihood estimate
## (method = "normal", repetitions = 1):
##
## -23.6997
##
## Error Measures:
##
## Relative Mean-Squared Error: 2.14592e-08
## Coefficient of Variation: 0.0001464896
## Percentage Error: 0%
##
## Note:
## All error measures are approximate.
summary(M1)
##
## Bridge sampling log marginal likelihood estimate
## (method = "normal", repetitions = 1):
##
## -22.50794
##
## Error Measures:
##
## Relative Mean-Squared Error: 4.483659e-07
## Coefficient of Variation: 0.0006696013
## Percentage Error: 0%
##
## Note:
## All error measures are approximate.
Check the Monte Carlo error of estimating the Bayes factor
BFBS <- function(seed=277, diagnostics=F) {
stanfitM0 <- sampling(stanmodelM0, data=datalist[-c(2, 5)],
iter=15000, warmup=5000, chains=1, seed=seed, refresh=0)
stanfitM1 <- sampling(stanmodelM1, data=datalist,
iter=15000, warmup=5000, chains=1, seed=seed, refresh=0)
beta1S <- extract(stanfitM1, pars="beta1")$beta1
dPostS <- dlogspline(0, logspline(beta1S)) # estimated posterior density from one chain
SDdr <- dPostS / dcauchy(0, datalist$m1, 2.5) # Savage–Dickey density ratio (BF₀₁)
set.seed(seed)
M0 <- bridge_sampler(stanfitM0, silent=T)
set.seed(seed)
M1 <- bridge_sampler(stanfitM1, silent=T)
if (diagnostics) {
list("bf"=exp(M0$logml - M1$logml),
"pererr.M1"=error_measures(M1)$cv,
"pererr.M0"=error_measures(M0)$cv,
"stan.M1"=stanfitM1, "stan.M0"=stanfitM0,
"savage.dickey"=SDdr)
} else {
list("bf"=exp(M0$logml - M1$logml),
"pererr.M1"=error_measures(M1)$cv,
"pererr.M0"=error_measures(M0)$cv,
"savage.dickey"=SDdr)
}
}
num <- 100; BF <- PerErr.M1 <- PerErr.M0 <- SDdr <- numeric(num)
system.time(
for (i in 1:num) { # roughly 139 seconds of run time
result <- BFBS(seed=i)
BF[i] <- result$bf
PerErr.M1[i] <- result$pererr.M1
PerErr.M0[i] <- result$pererr.M0
SDdr[i] <- result$savage.dickey
})
# range(PerErr.M1); range(PerErr.M0)
par(mfrow=c(1,2))
hist(BF, prob=T, col="white", yaxt="n", ylab="", main="rstan + BS",
sub="100 runs of one chain with 10,000 iterations",
xlab="Bayes Factor Estimates Supporting the Null") # BF₀₁: rstan + BS
lines(density(BF), col="blue", lwd=2)
hist(SDdr, prob=T, col="white", yaxt="n", ylab="", main="rstan + SD",
sub="100 runs of one chain with 10,000 iterations",
xlab="Savage–Dickey Density Ratio Supporting the Null") # BF₀₁: rstan + SD
lines(density(SDdr), col="red", lwd=2)
As \(x\) increases \(\Delta x\) unit, the estimate of odds ratio is \(\widehat{\textit{OR}}:=\frac{\hat{p}_2}{1-\hat{p}_2}\ /\ \frac{\hat{p}_1}{1-\hat{p}_1}=e^{\hat{\beta}_1\cdot\Delta x}\).
\(\widehat{\textit{OR}}>1\): The event is more likely to occur as \(x\) increases. The predictor is associated with higher odds of the outcome.
\(\widehat{\textit{OR}}<1\): The event is less likely to occur as \(x\) increases. The predictor is associated with lower odds of the outcome.
\(\widehat{\textit{OR}}=1\): No association between the predictor and the outcome.
The further \(\widehat{\textit{OR}}\) is from 1, the stronger the effect. For example, \(\widehat{\textit{OR}}=2\) indicates that the odds of the outcome double for each unit increase in the predictor.
How big is a big odds ratio? (Chen, Cohen, & Chen, 2010; Chinn, 2000)
Warning Messages
set.seed(25)
x <- rnorm(10) # predictor variable of insufficient sample size 10
p <- 1 / (1 + exp(0.5 - 0.4 * x)) # the probability of y = 1
(y <- rbinom(10, 1, p))
## [1] 0 0 0 0 0 0 1 1 0 0
summary(glm(y ~ x, family=binomial)) # WARNINGS
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Call:
## glm(formula = y ~ x, family = binomial)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -93.53 80937.33 -0.001 0.999
## x 225.02 197019.25 0.001 0.999
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.0008e+01 on 9 degrees of freedom
## Residual deviance: 2.1701e-09 on 8 degrees of freedom
## AIC: 4
##
## Number of Fisher Scoring iterations: 25
Data
simData3 <- function(n, beta0=NULL, beta1=NULL, sd.X=1) {
#' Input -
#' n : number of observations
#' beta0: true intercept
#' beta1: true coefficient for X
#' sd.X: true standard deviation of X
#'
#' Output - binary outcome and predictor variables, y and x
repeat { # loop until the glm() function executes without warnings
# try-catch block to handle the function execution and capture warnings
warnings <- tryCatch({
x <- stats::rnorm(n, 0, sd.X) # predictor variable
p <- 1 / (1 + exp(-beta0 - beta1 * x)) # the probability of y = 1
y <- stats::rbinom(n, 1, p) # outcome variable
fit1 <- stats::glm(y ~ x, family=binomial) # full model fit
NULL # no warnings, return NULL
}, warning = function(w) {
w # return the warning message
})
# break the loop if there were no warnings
if (is.null(warnings)) {
break
}
}
data.frame("Outcome"=y, "Predictor"=x)
}
The Bayes Factors
Note: All the Bayes factors are \(\textit{BF}_{01}\).
stancode2M1 <- "
data {
int<lower=1> n; // number of observations
vector[n] x; // predictor variable
int<lower=0,upper=1> y[n]; // outcome variable
int<lower=0,upper=1> UI; // prior option (1=unit information, 0=Gelman et al. (2008))
vector[2] mu; // mean vector for theta if UI==1
matrix[2,2] Sigma; // covariance matrix for theta if UI==1
}
parameters {
vector[2] theta; // combined vector for intercept and slope
}
transformed parameters {
real alpha = theta[1]; // intercept
real beta = theta[2]; // slope
}
model {
// Priors (JAGS does not support direct if-else statements)
if (UI == 1) {
theta ~ multi_normal(mu, Sigma); // unit information
} else {
alpha ~ cauchy(0, 10);
beta ~ cauchy(0, 2.5); // <== scale (Gelman et al., 2008)
}
// Likelihood
for (i in 1:n) {
target += bernoulli_logit_lpmf(y[i] | alpha + beta * x[i]);
}
}"
stanmodel2M1 <- rstan::stan_model(model_code=stancode2M1)
computeBFs3 <- function(data, prior=1, precise=F) {
#' Input -
#' data: long format data whose columns must be `Outcome` and `Predictor`
#' prior: prior option (1=unit information, 0=Gelman et al. (2008))
#' precise: logical; if FALSE (default), use the simpler form for Output 5;
#' otherwise, use the more precise form
#'
#' Global - stanmodel2M1
#' Dependency - rstan (v2.32.6), polspline (v1.1.25)
#'
#' Output - a vector of the Bayes factors in favor of the null
#' 1. Savage–Dickey density ratio (Eq. 11) using the normal unit-information prior
#' 2. BIC approximation to the Bayes factor in Eq. 8; BIC(fit)
#' 3. Jeffreys approximate Bayes factor in Eq. 9 using the normal unit-information prior
#' 4. Jeffreys approximate Bayes factor in Eq. 9 using the Jeffreys prior
#' 5. Wagenmakers approximate Bayes factor based on the p-value and sample size in Eq. 10
data <- stats::na.omit(data)
N <- nrow(data) # number of observations
fit0 <- stats::glm(Outcome ~ 1, data, family=binomial) # null model fit
fit1 <- stats::glm(Outcome ~ Predictor, data, family=binomial) # full model fit
pVal <- summary(fit1)$coefficients["Predictor", "Pr(>|z|)"] # p-value of the regression coefficient
W <- stats::qchisq(pVal, 1, lower.tail=F) # Wald statistic, ["Predictor", "z value"]^2
datalist <- list(n=N, x=data$Predictor, y=data$Outcome, UI=prior,
mu=stats::coef(fit1), Sigma=stats::vcov(fit1)*N)
if(prior==0) datalist$x <- c(scale(datalist$x)) * 0.5 # scale to have mean 0 and standard deviation 0.5
stanfitM1 <- rstan::sampling(stanmodel2M1, data=datalist,
iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
betaS <- rstan::extract(stanfitM1, pars="beta")$beta
dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
dPrior <- ifelse(datalist$UI==1,
stats::dnorm(0, datalist$mu[2], sqrt(datalist$Sigma[2,2])), stats::dcauchy(0, 0, 2.5))
SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
BICB01 <- exp((stats::BIC(fit1) - stats::BIC(fit0)) / 2) # Eq. 8
JAB01_UI <- sqrt(N) * exp(-0.5 * W) # Eq. 9, unit-information prior
WAB01 <- ifelse(pVal <= .5,
ifelse(pVal > .1,
ifelse(precise,
4 * pVal^(2/3) * sqrt(N) / 3, # 0.1 < p <= 0.5 (more precise)
sqrt(pVal * N)), # 0.1 < p <= 0.5 (simpler)
3 * pVal * sqrt(N)), # p <= 0.1
pVal^(1/4) * sqrt(N)) # p > 0.5, Eq. 10
c("SD"=SDdr01, "BIC_approx"=BICB01, "JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01)
}
# set.seed(277)
# test3 <- simData3(30, -0.5, 1.5)
# computeBFs3(test3) # BFpack and JAB methods seem to be the same
colnames(dat) <- c("Predictor", "Outcome")
computeBFs3(dat)
## SD BIC_approx JAB JAB_J WAB
## 0.2898725 0.3282977 0.5326438 0.6675700 0.5070023
Simulation Study
We aim to conduct several simulation studies to evaluate the accuracy of approximate objective Bayes factors in their ability to convert \(p\)-values and \(n\) reported in scientific articles to Bayesian measures of evidence.
nSim <- 1000 # number of simulation runs for each setting
nObs <- c(10, 20, 30, 100, 500) # number of observations
BETA1 <- c("Null"=0, "Small"=0.4, "Medium"=1.1, "Large"=1.6) # regression coefficient (slope)
# exp(BETA1) # odds ratio
reportBF <- reportBF_avg <- reportBF_sd <- reportERR <- reportERR_avg <- reportERR_sd <-
reportRULE <- reportRULE_H1 <- reportRULE_H0 <- reportRULE_inc <-
setNames(vector(mode="list", length=length(nObs) * length(BETA1)), # pre-allocation
apply(expand.grid(BETA1, nObs), 1, function(r) paste0("n = ", r[2], ", slope = ", r[1])))
index <- 1 # initial list index
for (n in nObs) {
for (slope in BETA1) {
set.seed(n+slope+277)
temp <- t(replicate(nSim, computeBFs3(simData3(n, beta0=-0.5, beta1=slope)))) # replicate the Bayes factors
reportBF[[index]] <- temp
reportBF_avg[[index]] <- colMeans(temp, na.rm=T) # mean Bayes factors in favor of the null
reportBF_sd[[index]] <- apply(na.omit(temp), 2, sd) # sample standard deviations of them
tempERR <- 100* (temp - temp[,1]) / temp[,1] # percent errors; the first column should be all 0
reportERR[[index]] <- tempERR # (can take in NA)
reportERR_avg[[index]] <- colMeans(tempERR, na.rm=T) # mean percent errors
reportERR_sd[[index]] <- apply(na.omit(tempERR), 2, sd) # sample standard deviations of the percent errors
tempRULE_H1 <- temp < 1/3 # support H1
tempRULE_H0 <- temp > 3 # support H0
tempRULE_inc <- temp >= 1/3 & temp <= 3 # inconclusive
# counts of matching decisions for H1
reportRULE_H1[[index]] <- colSums(tempRULE_H1 * tempRULE_H1[,1], na.rm=T)
# counts of matching decisions for H0
reportRULE_H0[[index]] <- colSums(tempRULE_H0 * tempRULE_H0[,1], na.rm=T)
# counts of matching inconclusiveness
reportRULE_inc[[index]] <- colSums(tempRULE_inc * tempRULE_inc[,1], na.rm=T)
tempRULE <- 1*(temp < 1/3) - 1*(temp > 3) # 1: support H1; 0: inconclusive; -1: support H0
reportRULE[[index]] <- colSums(tempRULE[,1] == tempRULE, na.rm=T) # counts of matching decisions
# colMeans(); a value near 1 suggests matching results; the first column should be all 1
index <- index + 1
}
} # roughly 13.4 hours of run time
# There were 50 or more warnings:
# 1: There were 17061 divergent transitions after warmup. See
# https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
# to find out why this is a problem and how to eliminate them.
# 2: Examine the pairs() plot to diagnose sampling problems
# ...
reshapeW2L2 <- function(list, prob=T, lab=c("SD", "BIC", "JAB", "JAB_J", "WAB")) {
#' Input -
#' list: a list of size length(nObs) * length(BETA1);
#' reportERR (unaggregated),
#' reportRULE, reportRULE_H1, reportRULE_H0, or reportRULE_inc (aggregated)
#' prob: logical; if TRUE (default), return the proportions;
#' otherwise, return the original values
#' lab: a vector of characters representing the Bayes factor methods
#'
#' Global -
#' nSim: number of simulation runs for each setting
#' nObs: number of observations
#' BETA1: regression coefficient (slope)
#'
#' Output - unlist the report and reshape the wide into the long
len <- length(lab) # five methods
nRep <- ifelse(!prob, nSim, 1) # nSim if unaggregated; 1 if aggregated
long <- data.frame("value"=unname(unlist(list)),
"method"=factor(rep(rep(1:len, each=nRep), length(list)), labels=lab),
"n"=rep(nObs, each=nRep*len*length(BETA1)),
"slope"=rep(rep(unname(BETA1), each=nRep*len), length(nObs)))
if (prob) {
list2 <- lapply(list, function(vec) {
if (vec[1] == 0) {
vec * NA
} else {
vec / vec[1] # convert counts to proportions
}
})
list3 <- lapply(list, function(vec) rep(vec[1], length(vec))) # denominator
long$total <- unname(unlist(list3))
long$prop <- unname(unlist(list2))
}
long
}
formatting <- function(val) {
#' format the value using scientific notation
if (abs(val) >= 10) {
paste("~italic(BF)[\"01\"] %~~%",
gsub("e\\+0*(\\d+)", "%*%10^\\1", sprintf("%.2e", val)))
} else if (abs(val) < 1) {
paste("~italic(BF)[\"01\"] %~~%",
gsub("e-0*(\\d+)", "%*%10^{-\\1}", sprintf("%.2e", val)))
} else {
paste("~italic(BF)[\"01\"] %~~%", sprintf("%.2f", val))
}
}
Visualization
Note: The Jeffreys approximate Bayes factor (JAB_J) in
Eq. 9, using the Jeffreys prior, will not be plotted.
# percent errors
df3 <- reshapeW2L2(reportERR, prob=F)
index <- 0
# boxplot of the percent errors
for (beta in BETA1) {
baseBF01 <- sapply(reportBF_avg[seq(1, by=length(BETA1), length.out=length(nObs)) + index],
function(x) x[1]) # baseline is the Savage–Dickey density ratio
baseBF01_lab <- sapply(baseBF01, formatting) # scientific notation
index <- index + 1
sub <- subset(df3, !(method %in% c("SD", "JAB_J")) & slope == beta) # subset the long
anno <- data.frame("label"=baseBF01_lab, "n"=unique(sub$n),
"x"=2, "y"=.8 * aggregate(value~n, sub, FUN=max)$value) # for annotation use
print(ggplot(sub, aes(x=method, y=value)) +
geom_boxplot(alpha=0.3, na.rm=T) +
# ylim(quantile(sub$value, probs=c(.01, .99))) + # restricted <== CAUTION
geom_text(data=anno, aes(label=label, x=x, y=y), parse=T, col="red") +
# facet_grid(.~n, labeller=label_bquote(cols=italic(n)==.(n))) + # same scale on y-axis
facet_wrap(.~n, scales="free_y", nrow=1,
labeller=label_bquote(cols=italic(n)==.(n))) +
labs(x="\nBayes Factor Methods", y="Percent Error (%)\n") +
ggtitle(paste0(names(BETA1[BETA1==beta]), " Odds Ratio")) +
geom_hline(yintercept=0, linetype="dashed", color="gray") + # reference line 0%
theme_classic() +
theme(axis.text.x=element_text(angle=90)))
cat("<br><br><br><br><br>")
}
df_BF <- reshapeW2L2(reportBF, prob=F)
# boxplot of the BF₀₁
for (beta in BETA1) {
sub <- subset(df_BF, slope == beta & method != "JAB_J") # subset the long
print(ggplot(sub, aes(x=method, y=value)) +
geom_boxplot(alpha=0.3, na.rm=T) +
facet_wrap(.~n, scales="free_y", nrow=1,
labeller=label_bquote(cols=italic(n)==.(n))) +
labs(x="\nBayes Factor Methods", y=expression(italic("BF")["01"])) +
ggtitle(paste0(names(BETA1[BETA1==beta]), " Odds Ratio")) +
theme_classic() +
theme(axis.text.x=element_text(angle=90)))
cat("<br><br><br><br><br>")
}
# decisions
decis3 <- rbind(reshapeW2L2(reportRULE),
reshapeW2L2(reportRULE_H1),
reshapeW2L2(reportRULE_H0),
reshapeW2L2(reportRULE_inc))
decis3$result <- factor(rep(c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"),
each=length(levels(df3$method)) * length(nObs) * length(BETA1)),
levels=c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"))
decis3$delta <- paste0("Slope = ", decis3$slope)
decis3 <- decis3[!is.na(decis3[,"prop"]),]
if (F) {
# compute the pointwise confidence intervals using the normal approximation
decis3$half.len <- qnorm(1-.05/2) *
sqrt(decis3$value * (decis3$total - decis3$value) / decis3$total^3) # 95% CI width
decis3$lwr <- max(0, decis3$value / decis3$total - decis3$half.len) # 95% CI lower bound
decis3$upr <- min(1, decis3$value / decis3$total + decis3$half.len) # 95% CI upper bound
} else { # Jeffreys intervals
decis3$lwr <- qbeta(.05/2, decis3$value, decis3$total - decis3$value + 1) # 95% CI lower bound
decis3$upr <- qbeta(1-.05/2, decis3$value + 1, decis3$total - decis3$value) # 95% CI upper bound
decis3$half.len <- (decis3$upr - decis3$lwr) / 2 # 95% CI width
}
# line chart of the proportions of agreement
ggplotly(ggplot(decis3, aes(n, prop, color=method, label=value)) +
geom_line(linewidth=1.2, na.rm=T) + geom_point(size=2, na.rm=T) +
geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=.2) +
scale_color_manual(values=c("black", "#E69F00", "#CC79A7", "#D55E00", "#0072B2"),
labels=c("SD", "BIC", "JAB", "JAB_J", "WAB")) +
facet_grid(result~delta) +
labs(x="Number of Observations", y="Proportion of Agreement", color="Method") +
scale_x_continuous(breaks=c(10, 100, 500)) +
scale_y_continuous(breaks=c(0, 0.5, 1)) + # min(decis3$prop, na.rm=T)
theme_minimal()) # interactive plots
Debugging
(mat <- matrix(rep(c(0, -1), 5), nrow=2))
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] -1 -1 -1 -1 -1
(res <- mat[mat[,1] == -1, 1] == mat[mat[,1] == -1,]) # only one row ==> column vector
## [1] TRUE TRUE TRUE TRUE TRUE
colSums(res) # ERROR!
## Error in base::colSums(x, na.rm = na.rm, dims = dims, ...): 'x' must be an array of at least two dimensions
colSums(rbind(rep(F, 5), res)) # counts of matching decisions
## [1] 1 1 1 1 1
colSums((mat == -1) * (mat == -1)[,1]) # counts of matching decisions
## [1] 1 1 1 1 1
The counts of matching Bayes factors between the gold standard and approximate methods.
reportRULE
## $`n = 10, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 1000 858 805 515 847
##
## $`n = 10, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 1000 862 790 540 832
##
## $`n = 10, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 1000 906 687 546 697
##
## $`n = 10, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 887 545 456 549
##
## $`n = 20, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 1000 960 919 807 923
##
## $`n = 20, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 1000 966 856 742 877
##
## $`n = 20, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 1000 963 598 505 605
##
## $`n = 20, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 970 369 297 406
##
## $`n = 30, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 1000 974 948 858 967
##
## $`n = 30, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 1000 973 892 808 916
##
## $`n = 30, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 1000 970 728 614 771
##
## $`n = 30, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 979 787 676 827
##
## $`n = 100, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 1000 993 983 947 964
##
## $`n = 100, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 1000 985 945 867 922
##
## $`n = 100, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 1000 998 985 976 990
##
## $`n = 100, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 1000 1000 1000 1000
##
## $`n = 500, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 1000 998 995 984 996
##
## $`n = 500, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 1000 996 992 980 992
##
## $`n = 500, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 1000 1000 1000 1000 1000
##
## $`n = 500, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 1000 1000 1000 1000
reportRULE_inc
## $`n = 10, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 820 702 700 410 744
##
## $`n = 10, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 806 691 690 439 734
##
## $`n = 10, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 696 647 646 505 658
##
## $`n = 10, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 545 522 522 433 527
##
## $`n = 20, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 380 353 343 232 380
##
## $`n = 20, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 486 463 446 334 486
##
## $`n = 20, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 461 458 455 396 461
##
## $`n = 20, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 274 272 268 253 274
##
## $`n = 30, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 276 258 249 163 275
##
## $`n = 30, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 406 394 382 305 404
##
## $`n = 30, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 327 324 319 293 327
##
## $`n = 30, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 136 134 135 123 136
##
## $`n = 100, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 118 113 105 71 84
##
## $`n = 100, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 402 396 386 334 350
##
## $`n = 100, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 17 16 17 17 17
##
## $`n = 100, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
##
## $`n = 500, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 45 43 42 32 41
##
## $`n = 500, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 70 70 70 67 62
##
## $`n = 500, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
##
## $`n = 500, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
reportRULE_H1
## $`n = 10, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 75 51 0 0 0
##
## $`n = 10, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 93 71 0 0 0
##
## $`n = 10, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 263 218 0 0 0
##
## $`n = 10, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 432 342 0 0 0
##
## $`n = 20, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 45 32 1 0 3
##
## $`n = 20, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 108 97 4 2 12
##
## $`n = 20, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 449 415 53 19 72
##
## $`n = 20, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 705 677 80 23 115
##
## $`n = 30, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 30 24 5 1 6
##
## $`n = 30, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 114 99 30 23 39
##
## $`n = 30, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 613 586 349 261 385
##
## $`n = 30, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 858 839 646 547 685
##
## $`n = 100, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 15 13 11 9 13
##
## $`n = 100, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 218 209 179 153 192
##
## $`n = 100, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 983 982 968 959 973
##
## $`n = 100, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 1000 1000 1000 1000
##
## $`n = 500, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 5 5 3 2 5
##
## $`n = 500, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 920 916 912 903 920
##
## $`n = 500, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 1000 1000 1000 1000 1000
##
## $`n = 500, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 1000 1000 1000 1000 1000
reportRULE_H0
## $`n = 10, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 105 105 105 105 103
##
## $`n = 10, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 101 100 100 101 98
##
## $`n = 10, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 41 41 41 41 39
##
## $`n = 10, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 23 23 23 23 22
##
## $`n = 20, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 575 575 575 575 540
##
## $`n = 20, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 406 406 406 406 379
##
## $`n = 20, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 90 90 90 90 72
##
## $`n = 20, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 21 21 21 21 17
##
## $`n = 30, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 694 692 694 694 686
##
## $`n = 30, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 480 480 480 480 473
##
## $`n = 30, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 60 60 60 60 59
##
## $`n = 30, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 6 6 6 6 6
##
## $`n = 100, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 867 867 867 867 867
##
## $`n = 100, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 380 380 380 380 380
##
## $`n = 100, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
##
## $`n = 100, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
##
## $`n = 500, slope = 0`
## SD BIC_approx JAB JAB_J WAB
## 950 950 950 950 950
##
## $`n = 500, slope = 0.4`
## SD BIC_approx JAB JAB_J WAB
## 10 10 10 10 10
##
## $`n = 500, slope = 1.1`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
##
## $`n = 500, slope = 1.6`
## SD BIC_approx JAB JAB_J WAB
## 0 0 0 0 0
Centering the normal unit-information prior on the null hypothesized value \(\theta_0\), the customary choice in objective Bayesian testing yields the following JAB* (its Eq. 4; Wagenmakers, 2022).
\[\begin{equation} \textit{JAB}_{01}^*\approx\sqrt{N}\exp{\left\{-\frac{N-1}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]
Alternatively, the JAB in Eq. 9 centers the prior on the maximum likelihood estimate.
\[\begin{equation} \textit{JAB}_{01}\approx\sqrt{N}\exp{\left\{-\frac{N}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]
Using a power transformation, it can be shown that \(\textit{JAB}_{01}^*=\textit{JAB}_{01}^{\frac{N-1}{N}}\cdot N^{\frac{1}{2N}}\).
\(N=10\)
par(mfrow=c(1, length(BETA1)))
for (i in 1:length(BETA1)) {
reportBF[[i]][,4] <- reportBF[[i]][,3]^(9/10) * 10^(1/20) # number of observations is 10
boxplot(reportBF[[i]][,-2], names=c("SD", "JAB", "JAB*", "WAB"),
xlab="", ylab=expression(italic("BF")["01"]), main=paste0(names(BETA1)[i], " Slope"), las=2)
}
Simple Conditional Logistic Regression for a 1:1 Matched Case–Control Study
\[\begin{align} &\mathbb{P}(Y_{i1}=1\mid x_{ij},\ Y_{i1}+Y_{i2}=1) \\ =&\frac{\mathbb{P}(Y_{i1}=1,\ Y_{i2}=0\mid x_{ij})}{\mathbb{P}(Y_{i1}=1,\ Y_{i2}=0\mid x_{ij})+\mathbb{P}(Y_{i1}=0,\ Y_{i2}=1\mid x_{ij})} \\ =&\frac{\mathbb{P}(Y_{i1}=1\mid x_{i1})\left(1-\mathbb{P}(Y_{i2}=1\mid x_{i2})\right)}{\mathbb{P}(Y_{i1}=1\mid x_{i1})\left(1-\mathbb{P}(Y_{i2}=1\mid x_{i2})\right)+\mathbb{P}(Y_{i2}=1\mid x_{i2})\left(1-\mathbb{P}(Y_{i1}=1\mid x_{i1})\right)} \\ =&\frac{\operatorname{logistic}(\alpha+\beta x_{i1})\ \!\operatorname{logistic}(-\alpha-\beta x_{i2})}{\operatorname{logistic}(\alpha+\beta x_{i1})\ \!\operatorname{logistic}(-\alpha-\beta x_{i2})+\operatorname{logistic}(\alpha+\beta x_{i2})\ \!\operatorname{logistic}(-\alpha-\beta x_{i1})} \\ =&\operatorname{logistic}\left(\beta (x_{i1}-x_{i2})\right) \end{align}\]
Thus,
\[\begin{equation} \mathbb{P}(Y_{ij}=1\mid x_{ij},\ Y_{i1}+Y_{i2}=1)=\frac{\exp\{\beta x_{ij}\}}{\exp\{\beta x_{i1}\}+\exp\{\beta x_{i2}\}} \end{equation}\]
for the observation \(j\in\{1,2\}\) of the stratum \(i=1,\dotsb,n\).
In the “one case per pair” setting, the matching factors (i.e., a pair-specific intercept \(\alpha\)) are conditioned out — similar to how a baseline hazard is conditioned out in the Cox model. Conditional logistic for matched sets is mathematically equivalent to a stratified Cox model:
each matched set in a case–control study as a risk set in a survival study, and
each case as a failure (event) occurring in that risk set.
Consider a single binary exposure, \(x_{ij}\in\{0,1\}\).
A pair is concordant if the case and control have the same exposure status. These pairs do not contribute to estimating the exposure effect (matched odds ratio) — because there is no difference in exposure within the pair, the exposure cannot explain why one is a case and the other is a control.
A pair is discordant if the case and control differ in exposure status. Only these pairs contribute to the estimation of the exposure effect.
Pair-level (Bernoulli) modeling
stancodeCH1 <- "
data {
int<lower=1> d; // number of discordant pairs
// int<lower=0, upper=1> y[d]; // y[i]=1 if CASE is exposed, 0 if CONTROL is exposed [removed]
array[d] int<lower=0, upper=1> y; // [Stan 2.26+ syntax for array declarations]
real m; // center for beta
real s; // scale for beta
}
parameters {
real beta; // log matched odds ratio
}
model {
beta ~ normal(m, s); // unit information
// conditional likelihood for discordant pairs:
// Pr(case is exposed) = logistic(beta)
y ~ bernoulli_logit(beta);
}"
stanmodelCH1 <- rstan::stan_model(model_code=stancodeCH1)
Note: Drop concordant pairs before constructing
y[d].
Aggregated (binomial) modeling
The estimated matched odds ratio can be shown to simplify to the
ratio of pairs with an exposed case and unexposed control to those with
an unexposed case and exposed control, b / c
(McNemar’s test).
stancodeCM1 <- "
data {
int<lower=0> b; // number of pairs with case exposed / control unexposed
int<lower=0> c; // number of pairs with case unexposed / control exposed
real m; // center for beta
real s; // scale for beta
}
parameters {
real beta; // log matched odds ratio
}
model {
beta ~ normal(m, s); // unit information
b ~ binomial_logit(b + c, beta); // logistic(beta) is Pr(case exposed)
}"
stanmodelCM1 <- rstan::stan_model(model_code=stancodeCM1)
For a single binary exposure, both formulations are equivalent; the
Bernoulli version stancodeCH1 is just the unpooled form of
the binomial version stancodeCM1. The binomial version is
numerically stable and fastest when the linear predictor is constant
across pairs.
Data
simDataC <- function(n, beta=NULL) {
#' Input -
#' n : number of strata
#' beta: true log matched odds ratio
#'
#' Output - 1:1 matched case-control pairs with a binary exposure in the long format
x1 <- stats::rbinom(n, 1, .5) # exposure for subject 1 in each pair
x2 <- stats::rbinom(n, 1, .5) # exposure for subject 2 in each pair (independent Bernoulli)
p1 <- stats::plogis(beta * (x1 - x2)) # conditional prob that subject 1 is the case
y1 <- stats::rbinom(n, 1, p1) # 1 if subject 1 is the case (subject 2 is the control)
data.frame("Case"=c(y1, 1-y1),
"Exposure"=c(x1, x2),
"ID"=rep(1:n, 2))
}
The Bayes Factors
Note: All the Bayes factors are \(\textit{BF}_{01}\).
computeBFsC <- function(data, precise=F) {
#' Input -
#' data: long format data whose columns must be `Case`, `Exposure`, and `ID`
#' precise: logical; if FALSE (default), use the simpler form for Output 5;
#' otherwise, use the more precise form
#'
#' Global - stanmodelCM1 (aggregated binomial modeling)
#' Dependency - survival (v3.8-3), rstan (v2.32.7), polspline (v1.1.25)
#'
#' Output - a vector of the Bayes factors in favor of the null
#' 1. Savage–Dickey density ratio (Eq. 11) using the normal unit-information prior
#' 2. BIC approximation to the Bayes factor in Eq. 8; BIC(fit)
#' 3. Jeffreys approximate Bayes factor in Eq. 9 using the normal unit-information prior
#' 4. Jeffreys approximate Bayes factor in Eq. 9 using the Jeffreys prior
#' 5. Wagenmakers approximate Bayes factor based on the p-value and sample size in Eq. 10
#' 6. number of discordant pairs
#' 7. number of pairs with case exposed / control unexposed
#' 8. number of pairs with case unexposed / control exposed
#' 9. p-value of the conditional logistic regression
data <- stats::na.omit(data)
N <- sum(tapply(data$Exposure, data$ID, function(v) length(unique(v))==2)) # number of discordant pairs
b <- sum(with(data, # number of pairs with case exposed / control unexposed
tapply(1:nrow(data), ID,
function(i) any(Case[i]==1 & Exposure[i]==1) && any(Case[i]==0 & Exposure[i]==0))))
c <- sum(with(data, # number of pairs with case unexposed / control exposed
tapply(1:nrow(data), ID,
function(i) any(Case[i]==1 & Exposure[i]==0) && any(Case[i]==0 & Exposure[i]==1))))
if (c==0 | b==0) return(c("SD"=NA, "BIC_approx"=NA, "JAB"=NA, "JAB_J"=NA, "WAB"=NA,
"N"=N, "b"=b, "c"=c, "pVal"=NA))
fit1 <- survival::clogit(Case ~ Exposure + survival::strata(ID), data) # full model fit
pVal <- summary(fit1)$coefficients[,"Pr(>|z|)"][1] # p-value of the regression coefficient
W <- stats::qchisq(pVal, 1, lower.tail=F) # Wald statistic, [,"z"]^2
datalist <- list(b=b, c=c, m=fit1$coefficients, s=c(sqrt(fit1$var*N)))
stanfitH1 <- rstan::sampling(stanmodelCM1, data=datalist,
iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
betaS <- rstan::extract(stanfitH1, pars="beta")$beta
dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
dPrior <- stats::dnorm(0, datalist$m, datalist$s)
SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
BICB01 <- exp(-diff(fit1$loglik) + log(N)/2) # Eq. 8
JAB01_UI <- sqrt(N) * exp(-0.5 * W) # Eq. 9, unit-information prior
WAB01 <- ifelse(pVal <= .5,
ifelse(pVal > .1,
ifelse(precise,
4 * pVal^(2/3) * sqrt(N) / 3, # 0.1 < p <= 0.5 (more precise)
sqrt(pVal * N)), # 0.1 < p <= 0.5 (simpler)
3 * pVal * sqrt(N)), # p <= 0.1
pVal^(1/4) * sqrt(N)) # p > 0.5, Eq. 10
c("SD"=SDdr01, "BIC_approx"=BICB01, "JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01,
"N"=N, "b"=b, "c"=c, "pVal"=pVal)
}
# set.seed(277)
# testC <- simDataC(100, 1)
# computeBFsC(testC)
To test the hypotheses \(\mathcal{H}_0:\boldsymbol{\theta}=\boldsymbol{\theta}_0\) versus \(\mathcal{H}_1:\boldsymbol{\theta}\neq\boldsymbol{\theta}_0\), we define the extended Jeffreys approximate objective Bayes factor (eJAB) as \[\begin{equation} \mathit{eJAB}_{01}=\sqrt{N}\exp\left\{-\frac{1}{2}\frac{N^{1/q}-1}{N^{1/q}}\cdot Q_{\chi^{2}_{q}}(1-p)\right\}, \end{equation}\]
where \(q\) is the size of the parameter vector \(\boldsymbol{\theta}\), \(N\) is the sample size, \(Q_{\chi^{2}_{q}}(\cdot)\) is the quantile function of the chi-squared distribution with \(q\) degrees of freedom, and \(p\) is the \(p\)-value from a null-hypothesis significance test. We further define \(\mathit{eJAB}_{10}=1\ \!/\ \!\mathit{eJAB}_{01}\).
The sample size \(N\) is
the total number of observations for for \(t\)-tests, linear regression, logistic regression, one-way ANOVA, and chi-squared tests.
the number of events for Cox models.
the number of independent observations for one-way repeated-measures ANOVA (Nathoo & Masson, 2016).
The degrees of freedom are
\(q=1\) for \(t\)-tests, linear regression, logistic regression, and Cox models, where \(\mathcal{H}_0\) specifies a point-null hypothesis (Wagenmakers, 2022).
\(q=I-1\) for one-way ANOVA and one-way repeated-measures ANOVA, where \(I\) is the number of conditions.
\(q=(R-1)(C-1)\) for chi-squared tests, where \(R\) and \(C\) are the numbers of rows and columns, respectively.
Since the Bayes factor is the ratio of the marginal likelihoods of two competing models, \(q=p_1-p_0\) also represents the difference in the number of free parameters between the alternative and null models.
For further reading, please visit https://rpubs.com/sherloconan/1361280.