============================================================================================================
This up-to-date document is available at https://rpubs.com/sherloconan/1203268
\[\begin{align} &Y_{1j}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\mu+\sigma_\epsilon \delta/2,\ \sigma_\epsilon^2)\quad\text{for }j=1,\dotsb,n_1, \\ &Y_{2j}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\mu-\sigma_\epsilon \delta/2,\ \sigma_\epsilon^2)\quad\text{for }j=1,\dotsb,n_2,\text{ where }\delta=(\mu_1-\mu_2)/\sigma_\epsilon \tag{$*$} \end{align}\]
\[\begin{equation} \tag{2} \pi(\mu,\sigma_\epsilon^2)\propto 1/\ \sigma_\epsilon^{2} \end{equation}\]
\[\begin{equation} \tag{$\dagger$} \delta\sim\text{Cauchy}(0,\ \gamma)=\frac{\gamma}{(\delta^2+\gamma^2)\pi} \end{equation}\]
\(\mathcal{H}_0:\delta=0\) versus \(\mathcal{H}_1:\delta\neq0\)
A Bayesian two-sample \(t\)-test is equivalent to a Bayesian one-way analysis of variance (ANOVA) with two conditions (\(a=2\)).
\[\begin{align} \tag{1} \mathcal{M}_1:\ Y_{ij}&=\mu+\sigma_\epsilon d_i+\epsilon_{ij}\\ \text{versus}\quad\mathcal{M}_0:\ Y_{ij}&=\mu+\epsilon_{ij},\\ \qquad\qquad&\epsilon_{ij}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_\epsilon^2)\quad\text{ for condition } i=1,\dotsb,a;\text{ (unbalanced) subject } j=1,\dotsb,n_i \end{align}\]
\[\begin{equation} \tag{2} \pi(\mu,\sigma_\epsilon^2)\propto 1/\ \sigma_\epsilon^{2} \end{equation}\]
\[\begin{equation} \tag{3} d_i^\star\mid g\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,g) \end{equation}\]
\[\begin{equation} \tag{4} g\sim\text{Scale-inv-}\chi^2(1,h^2) \end{equation}\]
\[\begin{equation} \tag{5} (d_1^\star,\dotsb,d_{a-1}^\star)=(d_1,\dotsb,d_a)\cdot\mathbf{Q} \end{equation}\]
\[\begin{equation} \tag{6} \mathbf{I}_a-\frac{1}{a}\mathbf{J}_a=\mathbf{Q}\cdot\mathbf{Q}^\top \end{equation}\]
By default, \(h=0.5\) for the fixed effects (in this case) and \(h=1\) for the random effects in Eq. 4.
\(\mathbf{Q}\) is an \(a\times(a-1)\) matrix of the \(a-1\) eigenvectors of unit length corresponding to the nonzero eigenvalues of the left side term in Eq. 6. For example, the projected standardized effect \(d^\star=(d_1-d_2)/\sqrt{2}\) when \(a=2\). In the other direction (given \(d_1+d_2=0\)), \(d_1=d^\star/\sqrt{2}\) and \(d_2=-d^\star/\sqrt{2}\). Read more: https://rpubs.com/sherloconan/1070217.
The Jeffreys–Zellner–Siow (JZS) Bayes factor for the Bayesian \(t\)-test is computed by integrating across one dimension in Eq. 7 (Ly et al., 2016, p. 24; Rouder et al., 2012, p. 360; Rouder et al., 2009, p. 237; Wei, Nathoo, & Masson, 2023, p. 1762; see also the balanced one-way between-subjects ANOVA in Morey et al., 2011, p. 374), using Gaussian quadrature.
\[\begin{equation} \tag{7} \textit{JZS-BF}_{10}=(2\pi)^{-\frac{1}{2}}\gamma\cdot\left(1+\frac{t^2}{\nu}\right)^{\frac{\nu+1}{2}}\int_0^\infty(1+Ng)^{-\frac{1}{2}}\left(1+\frac{t^2}{(1+Ng)\nu}\right)^{-\frac{\nu+1}{2}}g^{-\frac{3}{2}}e^{-\frac{\gamma^2}{2g}}\text{d}g, \end{equation}\]
where the default scale \(\gamma=\sqrt{2}/2\) (cf. \(h=0.5\); version 0.9.2 and later), the degrees of freedom \(\nu=n_1+n_2-2\), the \(t\)-statistic \(t=\frac{M_1-M_2}{s_p\cdot\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\), \(M_i\) are the sample condition means, and \(s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}\) is the pooled variance in a two-sample \(t\)-test assuming equal variances between the two groups. The sample sizes are \(n_1\) and \(n_2\), and the sample variances are \(s_1^2\) and \(s_2^2\), respectively.
The effective sample size is \(N=\frac{n_1\cdot n_2}{n_1+n_2}\). See also the harmonic mean.
JZSBF10 <- function(gamma=1/sqrt(2), t, N, nu) {
#' Input -
#' gamma: scale parameter of the Cauchy prior, i.e.,
#' `rscale="medium"` or `rscale=1/sqrt(2)` in BayesFactor::ttestBF, or
#' `rscaleFixed="medium"` or `rscaleFixed=0.5` in BayesFactor::anovaBF
#' t: observed t test statistic
#' N: "effective" sample size
#' nu: degrees of freedom
#' Output - the JZS Bayes factor for a t-test in Eq. 7
coef <- (2*pi)^(-0.5) * gamma * (1+t^2/nu)^((nu+1)/2)
integrand <- function(g) {
(1+N*g)^(-0.5) * (1+t^2/((1+N*g)*nu))^(-(nu+1)/2) * g^(-1.5) * exp(-gamma^2/(2*g))
}
unname(coef * integrate(integrand, lower=0, upper=Inf)$value)
}
test <- t.test(extra ~ group, data=sleep, var.equal=T) # two-sample t-test
t_obs <- test$statistic # t-statistic
n1 <- table(sleep$group)[1]; n2 <- table(sleep$group)[2] # balanced sample sizes, n1 = n2 = 10
JZSBF10(t=t_obs, N=n1*n2/(n1+n2), nu=n1+n2-2) # verifying the formula of the effective sample size
## [1] 1.265925
BayesFactor::ttestBF(formula=extra ~ group, data=sleep)
## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 1.265925 ±0%
##
## Against denominator:
## Null, mu1-mu2 = 0
## ---
## Bayes factor type: BFindepSample, JZS
Unlike the anovaBF function, the ttestBF function of the “BayesFactor” R package yields negligible Monte Carlo errors and does not require specifying the number of Markov chain Monte Carlo iterations.
Default functions return the same Bayes factor estimates for the independent-samples (two-sample) \(t\)-test and one-way ANOVA with two conditions. However, those functions return systematically different estimates for the paired \(t\)-test and one-way repeated-measures ANOVA with two conditions (see discussion and vignettes).
Caution Extreme Bayes Factor Estimates
set.seed(277)
(bf <- BayesFactor::ttestBF(rnorm(1000), rnorm(1000, 2.5)))
## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 1.23212e+420 ±0%
##
## Against denominator:
## Null, mu1-mu2 = 0
## ---
## Bayes factor type: BFindepSample, JZS
(val <- BayesFactor::extractBF(bf)$bf) # the extracted and stored value is infinity
## [1] Inf
Approximating Bayes Factors from Bayesian Information Criterion (BIC)
The BIC for model \(i\) with the number of free parameters \(k_i\) and the sample size \(n\) is \(\text{BIC}(\mathcal{M}_i)=-2\ln{\mathcal{L}_i}+k_i\ln{n}\quad\) for \(n\gg k_i\),
where \(\mathcal{L}_i=p(\boldsymbol{y}\mid\hat{\boldsymbol{\theta}}_i,\mathcal{M}_i)\) is the maximum likelihood.
Use the second-order Taylor approximation of log-marginal likelihood of each model.
\[\begin{equation} \tag{8} \mathit{BF}_{10}\approx\exp\left\{\frac{\text{BIC}(\mathcal{M}_0)-\text{BIC}(\mathcal{M}_1)}{2}\right\} \end{equation}\]
Derivation in Wagenmakers (2007, Appendix B).
Discussion in Nathoo and Masson (2016).
?BIC
The test statistic in a Wald test is \(W=\left(\frac{\hat{\theta}-\theta_0}{\text{se}(\hat{\theta})}\right)^2\sim\chi_1^2\) under \(\mathcal{H}_0\), where \(\hat{\theta}\) is the maximum likelihood estimate, and \(\text{se}(\hat{\theta})\) is the standard error.
Wagenmakers (2022; its Eq. 6) reviewed the Jeffreys-style approximate Bayes factor (JAB) and proposed a piecewise approximation (WAB) from \(p\)-value and effective sample size \(N\) in Eq. 10.
The complication with JAB is that the approximation is valid only when \(\text{se}(\hat{\theta})\ll\sigma_g\), where \(\sigma_g\) is the scale of the prior distribution \(g(\theta)\). Note that with very small sample sizes, this assumption is not true and Eq. 9 is biased against \(\mathcal{H}_0\).
\[\begin{equation} \tag{9} \textit{JAB}_{01}=A\cdot\sqrt{N}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]
A normal unit-information prior for \(\theta\) yields \(A=1\). Based on the Jeffreys prior, the general approximation specifies \(A=\sqrt{\pi/2}\approx1.253\).
The normal unit-information prior is a normal distribution with mean zero and variance equal to the information in a single observation.
Equation 2 specifies the Jeffreys priors for the grand mean and error variance.
Wagenmakers’s piecewise approximation is
\[ \textit{WAB}_{01}\approx \begin{cases} \tag{10} \text{$\ p^{\frac{1}{4}}\sqrt{N}$,} & \text{$p>0.5$} \\ \\ \text{$\ \sqrt{pN}$,} & \text{$0.1<p\leqslant0.5$ (simpler)} \\ \\ \text{$\ \frac{4}{3}p^{\frac{2}{3}}\sqrt{N}$,} & \text{$0.1<p\leqslant0.5$ (more precise)} \\ \\ \text{$\ 3p\sqrt{N}$,} & \text{$p\leqslant0.1$} \end{cases} \]
qchisq(p, 1, lower.tail=F) # inverse CDF of chisq(1): 1-p → W = t² → JAB
qchisq(1-p, 1) # same
Note: The two-sided \(p\)-value of the \(t\)-statistic differs from the \(p\)-value of the Wald statistic, particularly when the sample size is insufficient.
# suppose t(𝜈 = 10 + 10 - 2) = 0.77, compute the p-values
c(pt(0.77, 18, lower.tail=F)*2, pchisq(0.77^2, 1, lower.tail=F))
## [1] 0.4512878 0.4412999
# suppose a two-sided p = 0.45, compute the test statistics t² and W
c(qt(0.45/2, 18, lower.tail=F)^2, qchisq(0.45, 1, lower.tail=F)) # try increasing 𝜈
## [1] 0.5963371 0.5706519
Sampling Distributions
If \(X_i\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)\) for \(i=1,\dotsb,n\), then \(Z=\sum_{i=1}^n X_i\sim\chi_n^2\) with \(\mathbb{E}[Z]=n\) and \(\text{Var}(Z)=2n\).
If \(Z_1\sim\chi_m^2\ \perp \!\!\! \perp\ Z_2\sim\chi_n^2\), then \(F=\frac{Z_1/m}{Z_2/n}\sim F(m,n)\) with \(\mathbb{E}[F]=\frac{n}{n-2}\) for \(n>2\) and \(\text{Var}(Z)=\frac{2n^2(m+n-2)}{m(n-2)^2(n-4)}\) for \(n>4\).
If \(X\sim\mathcal{N}(0,1)\ \perp \!\!\! \perp\ Z\sim\chi_n^2\), then \(T=\frac{X}{\sqrt{Z/n}}\sim t_n\) with \(\mathbb{E}[T]=0\) for \(n>1\) and \(\text{Var}(T)=\frac{n}{n-2}\) for \(n>2\).
The Savage–Dickey density ratio is a special form of
the Bayes factor for nested models by dividing the
value of
the posterior density over the
parameters for the alternative model evaluated at the hypothesized
value by
the prior for the same
model evaluated at the same point.
\[\begin{equation} \tag{11} \textit{BF}_{01}:=\frac{p(\boldsymbol{y}\mid \mathcal{M}_0)}{p(\boldsymbol{y}\mid \mathcal{M}_1)}=\frac{\color{#0055A4}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid\boldsymbol{y},\mathcal{M}_1)}}{\color{#EF4135}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid \mathcal{M}_1)}} \end{equation}\]
stancode1M1 <- "
data {
int<lower=1> N; // total number of observations
vector[N] x; // predictor variable
vector[N] y; // response variable
vector[2] mu; // mean vector for theta
matrix[2,2] Sigma; // covariance matrix for theta
}
parameters {
vector[2] theta; // combined vector for alpha and beta
real<lower=0> sigma; // error standard deviation
}
transformed parameters {
real alpha = theta[1]; // intercept
real beta = theta[2]; // slope
}
model {
// Priors
theta ~ multi_normal(mu, Sigma); // unit information
target += -log(sigma); // Jeffreys prior
// Likelihood (linear regression model)
y ~ normal(alpha + beta * x, sigma);
}"
stanmodel1M1 <- stan_model(model_code=stancode1M1)
Data
simData <- function(n1, n2, m1=0, delta=NULL, sd=1, power=NULL) {
#' Input -
#' n1: sample size of the first group
#' n2: sample size of the second group
#' m1: population mean of the first group
#' delta: population mean difference, m2 - m1
#' sd: population standard deviation
#' power: statistical power;
#' need to input either `delta` or `power`
#'
#' Dependency - pwr (v1.3-0)
#' Output - two-sample (unbalanced) data in the long format
if (is.null(delta) & !is.null(power)) {
delta <- sd * pwr::pwr.t2n.test(n1=n1, n2=n2, power=power)$d
} else if (is.null(delta) & is.null(power)) {
delta <- 0 # simulation under the null
} else if (!is.null(delta) & !is.null(power)) {stop("Bad input!")}
# no random seed
data.frame("resp"=c(stats::rnorm(n1, m1, sd),
stats::rnorm(n2, m1 + delta, sd)),
"grp"=rep(c("grp1", "grp2"), c(n1, n2)),
stringsAsFactors=T)
}
num <- 1000
p_par <- p_nonpar <- numeric(num)
for (i in 1:num) {
set.seed(i)
data <- simData(n1=30, n2=30, power=.8)
p_par[i] <- t.test(resp ~ grp, data=data, var.equal=T)$p.value
p_nonpar[i] <- wilcox.test(formula=resp ~ grp, data=data)$p.value # Mann–Whitney 𝑈-test
}
plot(p_par, p_nonpar, xlim=range(c(p_par, p_nonpar)), ylim=range(c(p_par, p_nonpar)),
xlab=expression(italic(p) * "-Values from " * italic(t) * "-Tests"),
ylab=expression(italic(p) * "-Values from " * italic(U) * "-Tests"),
main=paste0("Two-Sample Designs (", num, " Runs)"))
abline(0, 1, col="gray", lty=2, lwd=1.5)
The Bayes Factors
Note: All the Bayes factors are \(\textit{BF}_{01}\).
computeBFs <- function(data, total=F, precise=F) {
#' Input -
#' data: two-sample data in the long format whose columns must be `resp` and `grp`
#' total: logical; if FALSE (default), use the effective sample size for Output 5-7
#' otherwise, use the total number of observations
#' precise: logical; if FALSE (default), use the simpler form for Output 7;
#' otherwise, use the more precise form
#'
#' Global - stanmodel1M1
#' Dependency - BayesFactor (v0.9.12-4.7), rstan (v2.32.6), polspline (v1.1.25)
#' Output - a vector of the Bayes factors in favor of the null
#' 1. Savage–Dickey density ratio (Eq. 11) using the normal unit-information prior
#' 2. ttestBF function (Eq. 7)
#' 3. BIC approximation to the Bayes factor in Eq. 8; BIC(fit)
#' 4. SBC approximation to the Bayes factor in Eq. 8; AIC(fit, k=log(N)) <== CAUTION
#' 5. Jeffreys approximate Bayes factor in Eq. 9 using the normal unit-information prior
#' 6. Jeffreys approximate Bayes factor in Eq. 9 using the Jeffreys prior
#' 7. Wagenmakers approximate Bayes factor based on the p-value and sample size in Eq. 10
data <- stats::na.omit(data)
grp_sizes <- unname(table(data$grp)) # sample sizes, n1 and n2
Ntot <- sum(grp_sizes) # total number of observations
N <- prod(grp_sizes) / Ntot # effective sample size in the two-sample t-test, n1 * n2 / (n1 + n2)
fit0 <- stats::lm(resp ~ 1, data) # null model fit
fit1 <- stats::lm(resp ~ grp, data) # full model fit
pVal <- summary(fit1)$coefficients["grpgrp2", "Pr(>|t|)"] # p-value of the two-sample t-test
tVal <- stats::qt(pVal/2, Ntot-2) # t-statistic (m2 >= m1), ["grpgrp2", "t value"]
datalist <- list(N=Ntot, x=as.numeric(data$grp)-1, y=data$resp,
mu=stats::coef(fit1), Sigma=stats::vcov(fit1)*Ntot)
stanfitM1 <- rstan::sampling(stanmodel1M1, data=datalist,
iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
betaS <- rstan::extract(stanfitM1, pars="beta")$beta
dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
dPrior <- stats::dnorm(0, datalist$mu[2], sqrt(datalist$Sigma[2,2]))
SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
bf <- BayesFactor::ttestBF(formula=resp ~ grp, data=data) # Eq. 7
BF01 <- ifelse(BayesFactor::extractBF(bf)$error < 0.01, # negligible Monte Carlo errors
1 / BayesFactor::extractBF(bf)$bf, NA)
BICB01 <- exp((stats::BIC(fit1) - stats::BIC(fit0)) / 2) # equiv. AIC(fit, k=log(n1 + n2)); Eq. 8
SBCB01 <- exp((stats::AIC(fit1, k=log(N)) - stats::AIC(fit0, k=log(N))) / 2) # <== CAUTION
N <- ifelse(total, Ntot, N)
# sqrt(N) * exp(-0.5 * stats::qchisq(1-pVal, 1)) # <== CAUTION, tiny p-value --> infinity Wald statistic
JAB01_UI <- sqrt(N) * exp(-0.5 * tVal * tVal) # Eq. 9, unit-information prior
WAB01 <- ifelse(pVal <= .5,
ifelse(pVal > .1,
ifelse(precise,
4 * pVal^(2/3) * sqrt(N) / 3, # 0.1 < p <= 0.5 (more precise)
sqrt(pVal * N)), # 0.1 < p <= 0.5 (simpler)
3 * pVal * sqrt(N)), # p <= 0.1
pVal^(1/4) * sqrt(N)) # p > 0.5, Eq. 10
c("SD"=SDdr01, "ttestBF"=BF01, "BIC_approx"=BICB01, "SBC_approx"=SBCB01,
"JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01)
}
# set.seed(277)
# (test <- simData(10, 15, power=.8))
# computeBFs(test)
Simulation Study
We aim to conduct several simulation studies to evaluate the accuracy of approximate objective Bayes factors in their ability to convert \(p\)-values and \(N\) reported in scientific articles to Bayesian measures of evidence.
Read Cohen’s \(d\): https://en.wikipedia.org/wiki/Effect_size#Cohen's_d
useTotalN <- T
nSim <- 1000 # number of simulation runs for each setting
nSubj <- c(10, 20, 30, 100, 500) # sample sizes of the first group
SMD <- c("Null"=0, "Small"=0.2, "Medium"=0.5, "Large"=0.8) # (standardized) mean differences, m2-m1, given sd=1
reportBF <- reportBF_avg <- reportBF_sd <- reportERR <- reportERR_avg <- reportERR_sd <-
reportRULE <- reportRULE_H1 <- reportRULE_H0 <- reportRULE_inc <-
setNames(vector(mode="list", length=length(nSubj) * length(SMD)), # pre-allocation
apply(expand.grid(names(SMD), nSubj), 1, function(r) paste0("n1 = ", r[2], ", delta = ", r[1])))
index <- 1 # initial list index
for (n1 in nSubj) {
n2 <- round(n1 * 1.2) # set the sample size of the second group <==
for (delta in SMD) {
set.seed(n1+delta)
temp <- t(replicate(nSim, computeBFs(simData(n1, n2, delta=delta), useTotalN))) # replicate the Bayes factors
reportBF[[index]] <- temp
reportBF_avg[[index]] <- colMeans(temp, na.rm=T) # mean Bayes factors in favor of the null
reportBF_sd[[index]] <- apply(na.omit(temp), 2, sd) # sample standard deviations of them
tempERR <- 100* (temp - temp[,1]) / temp[,1] # percent errors; the first column should be all 0
reportERR[[index]] <- tempERR # (can take in NA)
reportERR_avg[[index]] <- colMeans(tempERR, na.rm=T) # mean percent errors
reportERR_sd[[index]] <- apply(na.omit(tempERR), 2, sd) # sample standard deviations of the percent errors
tempRULE_H1 <- temp < 1/3 # support H1
tempRULE_H0 <- temp > 3 # support H0
tempRULE_inc <- temp >= 1/3 & temp <= 3 # inconclusive
# counts of matching decisions for H1
reportRULE_H1[[index]] <- colSums(tempRULE_H1 * tempRULE_H1[,1], na.rm=T)
# counts of matching decisions for H0
reportRULE_H0[[index]] <- colSums(tempRULE_H0 * tempRULE_H0[,1], na.rm=T)
# counts of matching inconclusiveness
reportRULE_inc[[index]] <- colSums(tempRULE_inc * tempRULE_inc[,1], na.rm=T)
tempRULE <- 1*(temp < 1/3) - 1*(temp > 3) # 1: support H1; 0: inconclusive; -1: support H0
reportRULE[[index]] <- colSums(tempRULE[,1] == tempRULE, na.rm=T) # counts of matching decisions
# colMeans(); a value near 1 suggests matching results; the first column should be all 1
index <- index + 1
}
} # roughly 8.3 hours of run time
# Warning messages:
# 1-10: In polspline::logspline(betaS) : Not all models could be fitted
reshapeW2L <- function(list, prob=T, lab=c("SD", "BF", "BIC", "SBC", "JAB", "JAB_J", "WAB")) {
#' Input -
#' list: a list of size length(nSubj) * length(SMD);
#' reportERR (unaggregated),
#' reportRULE, reportRULE_H1, reportRULE_H0, or reportRULE_inc (aggregated)
#' prob: logical; if TRUE (default), return the proportions;
#' otherwise, return the original values
#' lab: a vector of characters representing the Bayes factor methods
#'
#' Global -
#' nSim: number of simulation runs for each setting
#' nSubj: sample sizes of the first group
#' SMD: (standardized) mean differences, m2 - m1, given sd = 1
#'
#' Output - unlist the report and reshape the wide into the long
len <- length(lab) # seven methods
nRep <- ifelse(!prob, nSim, 1) # nSim if unaggregated; 1 if aggregated
long <- data.frame("value"=unname(unlist(list)),
"method"=factor(rep(rep(1:len, each=nRep), length(list)), labels=lab),
"n1"=rep(nSubj, each=nRep*len*length(SMD)),
"delta"=rep(rep(unname(SMD), each=nRep*len), length(nSubj)))
if (prob) {
list2 <- lapply(list, function(vec) {
if (vec[1] == 0) {
vec * NA
} else {
vec / vec[1] # convert counts to proportions
}
})
list3 <- lapply(list, function(vec) rep(vec[1], length(vec))) # denominator
long$total <- unname(unlist(list3))
long$prop <- unname(unlist(list2))
}
long
}
formatting <- function(val) {
#' format the value using scientific notation
if (abs(val) >= 10) {
paste("~italic(BF)[\"01\"] %~~%",
gsub("e\\+0*(\\d+)", "%*%10^\\1", sprintf("%.2e", val)))
} else if (abs(val) < 1) {
paste("~italic(BF)[\"01\"] %~~%",
gsub("e-0*(\\d+)", "%*%10^{-\\1}", sprintf("%.2e", val)))
} else {
paste("~italic(BF)[\"01\"] %~~%", sprintf("%.2f", val))
}
}
Visualization
Note: The effective number of observations is used in the BF, SBC,
JAB, and WAB methods. The Jeffreys approximate Bayes factor
(JAB_J) in Eq. 9, using the Jeffreys prior, will not be
plotted.
# percent errors
df <- reshapeW2L(reportERR, prob=F)
index <- 0
# boxplot of the percent errors
for (smd in SMD) {
baseBF01 <- sapply(reportBF_avg[seq(1, by=length(SMD), length.out=length(nSubj)) + index],
function(x) x[1]) # baseline is the Savage–Dickey density ratio
baseBF01_lab <- sapply(baseBF01, formatting) # scientific notation
index <- index + 1
sub <- subset(df, !(method %in% c("SD", "JAB_J")) & delta == smd) # subset the long
anno <- data.frame("label"=baseBF01_lab, "n1"=unique(sub$n1),
"x"=3, "y"=.8 * aggregate(value~n1, sub, FUN=max)$value) # for annotation use
print(ggplot(sub, aes(x=method, y=value)) +
geom_boxplot(alpha=0.3, na.rm=T) +
geom_text(data=anno, aes(label=label, x=x, y=y), parse=T, col="red") +
facet_wrap(.~n1, scales="free_y", nrow=1,
labeller=label_bquote(cols=italic(n)[1]==.(n1))) +
labs(x="\nBayes Factor Methods", y="Percent Error (%)\n") +
ggtitle(paste0(names(SMD[SMD==smd]), " Standardized Mean Difference")) +
geom_hline(yintercept=0, linetype="dashed", color="gray") + # reference line 0%
theme_classic() +
theme(axis.text.x=element_text(angle=90)))
cat("<br><br><br><br><br><br><br>")
}
df_BF <- reshapeW2L(reportBF, prob=F)
# boxplot of the BF₀₁
for (smd in SMD) {
sub <- subset(df_BF, delta == smd & method != "JAB_J") # subset the long
print(ggplot(sub, aes(x=method, y=value)) +
geom_boxplot(alpha=0.3, na.rm=T) +
facet_wrap(.~n1, scales="free_y", nrow=1,
labeller=label_bquote(cols=italic(n)[1]==.(n1))) +
labs(x="\nBayes Factor Methods", y=expression(italic("BF")["01"])) +
ggtitle(paste0(names(SMD[SMD==smd]), " Standardized Mean Difference")) +
theme_classic() +
theme(axis.text.x=element_text(angle=90)))
cat("<br><br><br><br><br>")
}
# decisions
decis <- rbind(reshapeW2L(reportRULE),
reshapeW2L(reportRULE_H1),
reshapeW2L(reportRULE_H0),
reshapeW2L(reportRULE_inc))
decis$result <- factor(rep(c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"),
each=length(levels(df$method)) * length(nSubj) * length(SMD)),
levels=c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"))
decis$delta <- paste0("ES = ", decis$delta)
decis <- decis[!is.na(decis[,"prop"]),]
if (F) {
# compute the pointwise confidence intervals using the normal approximation
decis$half.len <- qnorm(1-.05/2) *
sqrt(decis$value * (decis$total - decis$value) / decis$total^3) # 95% CI width
decis$lwr <- max(0, decis$value / decis$total - decis$half.len) # 95% CI lower bound
decis$upr <- min(1, decis$value / decis$total + decis$half.len) # 95% CI upper bound
} else { # Jeffreys intervals
decis$lwr <- qbeta(.05/2, decis$value, decis$total - decis$value + 1) # 95% CI lower bound
decis$upr <- qbeta(1-.05/2, decis$value + 1, decis$total - decis$value) # 95% CI upper bound
decis$half.len <- (decis$upr - decis$lwr) / 2 # 95% CI width
}
# line chart of the proportions of agreement
ggplotly(ggplot(decis, aes(n1, prop, color=method, label=value)) +
geom_line(linewidth=1.2, na.rm=T) + geom_point(size=2, na.rm=T) +
geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=.2) +
scale_color_manual(values=c("black", "gray", "#E69F00", "#009E73", "#CC79A7", "#D55E00", "#0072B2"),
labels=c("SD", "BF", "BIC", "SBC", "JAB", "JAB_J", "WAB")) +
facet_grid(result~delta) +
labs(x="Sample Size of the First Group", y="Proportion of Agreement", color="Method") +
scale_x_continuous(breaks=c(10, 100, 500)) +
scale_y_continuous(breaks=c(0, 0.5, 1)) + # min(decis$prop, na.rm=T)
theme_minimal()) # interactive plots
Since all the methods have solutions in \(t\)-tests and balanced one-way ANOVA, simulation studies may not be necessary.
BF10 <- function(t) {
gamma <- 1/sqrt(2) # scale parameter of the Cauchy prior
Neff <- n1*n2/(n1+n2) # effective sample size
nu <- n1+n2-2 # degrees of freedom
coef <- (2*pi)^(-0.5) * gamma * (1+t^2/nu)^((nu+1)/2)
integrand <- function(g) {
(1+Neff*g)^(-0.5) * (1+t^2/((1+Neff*g)*nu))^(-(nu+1)/2) * g^(-1.5) * exp(-gamma^2/(2*g))
}
unname(coef * integrate(integrand, lower=0, upper=Inf)$value)
}
BIC01 <- function(t) {
N <- n1+n2 # total number of observations
nu <- N-2 # degrees of freedom
sqrt(N * (1 + t^2 / nu)^(-N))
}
SBC01 <- function(t) {
Neff <- n1*n2/(n1+n2) # effective sample size
N <- n1+n2 # total number of observations
nu <- N-2 # degrees of freedom
sqrt(Neff * (1 + t^2 / nu)^(-N))
}
JAB01 <- function(t) {
Neff <- n1*n2/(n1+n2) # effective sample size
sqrt(Neff) * exp(-0.5 * t^2)
}
WAB01 <- function(t) {
Neff <- n1*n2/(n1+n2) # effective sample size
pVal <- pt(t, n1+n2-2, lower.tail=F) * 2 # two-sided p-value
ifelse(pVal <= .5,
ifelse(pVal > .1,
ifelse(F,
4 * pVal^(2/3) * sqrt(Neff) / 3, # 0.1 < p <= 0.5 (more precise)
sqrt(pVal * Neff)), # 0.1 < p <= 0.5 (simpler)
3 * pVal * sqrt(Neff)), # p <= 0.1
pVal^(1/4) * sqrt(Neff)) # p > 0.5
}
PBF01 <- function(t) {
N <- n1+n2 # total number of observations
a <- 2 # number of groups
F_ratio <- t^2 # F = t²
zeta <- -0.5 # adjustment of the shape parameters
ifelse(N >= 345,
(N/2)^((a-1)/2) * gamma(1+zeta) *
((N-a) / (N-a+(a-1)*F_ratio))^((N-a)/2-1-zeta) / gamma((a-1)/2+1+zeta), # in case of Inf/Inf
gamma((N-1)/2) * gamma(1+zeta) *
((N-a) / (N-a+(a-1)*F_ratio))^((N-a)/2-1-zeta) / (gamma((a-1)/2+1+zeta) * gamma((N-a)/2)))
}
TSBF01 <- function(t) {
# Johnson (2005), https://doi.org/10.1111/j.1467-9868.2005.00521.x
t <- abs(t)
N <- n1+n2 # total number of observations
m <- N-2
ifelse(t > 1,
((m+1) / (m+t^2))^((m+1)/2) * t,
1)
}
n1 <- 30 # sample size of the first group
n2 <- 35 # sample size of the second group
ts <- qt(c(.05, .01, .005)/2, n1+n2-2, lower.tail=F) # critical values of t
x <- seq(0, ts[3], by=0.1) # input sequence
num <- length(x)
bf <- bic <- sbc <- jab <- wab <- pbf <- tsbf <- numeric(num)
for (i in 1:num) {
bf[i] <- BF10(x[i]) # JZS Bayes factor
bic[i] <- BIC01(x[i]) # BIC approximation
sbc[i] <- SBC01(x[i]) # BIC approximation using the effective sample size
jab[i] <- JAB01(x[i]) # Jeffreys approximate Bayes factor
wab[i] <- WAB01(x[i]) # Wagenmakers approximate Bayes factor
pbf[i] <- PBF01(x[i]) # Pearson Bayes factor
tsbf[i] <- TSBF01(x[i]) # Test-based Bayes factor
}
plot(x, bf, ylab=expression(italic(BF)[10]), xlab=expression("|" * italic("t") * "|-Statistic"),
main=expression("Two-Sample " * italic(t) * "-Test"),
sub=bquote(italic(n)[1]==.(n1)~~"&"~~italic(n)[2]==.(n2))) # scatter plot
points(x, 1/bic, col="#E69F00", pch=8)
points(x, 1/sbc, col="#009E73", pch=16)
points(x, 1/jab, col="#CC79A7", pch=18)
points(x, 1/wab, col="#0072B2", pch=15)
points(x, 1/pbf, col="#56B4E9", pch=17)
points(x, 1/tsbf, col="#EF4135", pch=3)
abline(h=c(1/3, 3), col="gray")
abline(v=c(ts[1], ts[2]), col="red")
legend("topleft", c("ttestBF", "BIC approx", "SBC approx", "JAB", "WAB", "PBF", "TSBF"),
col=c("black", "#E69F00", "#009E73", "#CC79A7", "#0072B2", "#56B4E9", "#EF4135"),
pch=c(1, 8, 16, 18, 15, 17, 3), lty=0, lwd=2, bty="n")
plot(x, 1/bf, ylab=expression(italic(BF)["01"]), xlab=expression("|" * italic("t") * "|-Statistic"),
main=expression("Two-Sample " * italic(t) * "-Test"),
sub=bquote(italic(n)[1]==.(n1)~~"&"~~italic(n)[2]==.(n2))) # scatter plot
points(x, bic, col="#E69F00", pch=8)
points(x, sbc, col="#009E73", pch=16)
points(x, jab, col="#CC79A7", pch=18)
points(x, wab, col="#0072B2", pch=15)
points(x, pbf, col="#56B4E9", pch=17)
points(x, tsbf, col="#EF4135", pch=3)
abline(h=c(1/3, 3), col="gray")
abline(v=c(ts[1], ts[2]), col="red")
legend("bottomleft", c("ttestBF", "BIC approx", "SBC approx", "JAB", "WAB", "PBF", "TSBF"),
col=c("black", "#E69F00", "#009E73", "#CC79A7", "#0072B2", "#56B4E9", "#EF4135"),
pch=c(1, 8, 16, 18, 15, 17, 3), lty=0, lwd=2, bty="n")
Visualization
Note: The total number of observations is used in the SD, BIC, JAB,
and WAB methods. The Jeffreys approximate Bayes factor
(JAB_J) in Eq. 9, using the Jeffreys prior, will not be
plotted.
Centering the normal unit-information prior on the null hypothesized value \(\theta_0\), the customary choice in objective Bayesian testing yields the following JAB* (its Eq. 4; Wagenmakers, 2022).
\[\begin{equation} \textit{JAB}_{01}^*\approx\sqrt{N}\exp{\left\{-\frac{N-1}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]
Alternatively, the JAB in Eq. 9 centers the prior on the maximum likelihood estimate.
\[\begin{equation} \textit{JAB}_{01}\approx\sqrt{N}\exp{\left\{-\frac{N}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]
Using a power transformation, it can be shown that \(\textit{JAB}_{01}^*=\textit{JAB}_{01}^{\frac{N-1}{N}}\cdot N^{\frac{1}{2N}}\).
\(n_1=10,\quad n_2=12\)
par(mfrow=c(1, length(SMD)))
for (i in 1:length(SMD)) {
reportBF[[i]][,6] <- reportBF[[i]][,5]^(21/22) * 22^(1/44) # total number of observations is 10 + 12
boxplot(reportBF[[i]][,c(1,5:7)], names=c("SD", "JAB", "JAB*", "WAB"),
xlab="", ylab=expression(italic("BF")["01"]), main=paste0(names(SMD)[i], " SMD"), las=2)
}
To test the hypotheses \(\mathcal{H}_0:\boldsymbol{\theta}=\boldsymbol{\theta}_0\) versus \(\mathcal{H}_1:\boldsymbol{\theta}\neq\boldsymbol{\theta}_0\), we define the extended Jeffreys approximate objective Bayes factor (eJAB) as \[\begin{equation} \mathit{eJAB}_{01}=\sqrt{N}\exp\left\{-\frac{1}{2}\frac{N^{1/q}-1}{N^{1/q}}\cdot Q_{\chi^{2}_{q}}(1-p)\right\}, \end{equation}\]
where \(q\) is the size of the parameter vector \(\boldsymbol{\theta}\), \(N\) is the sample size, \(Q_{\chi^{2}_{q}}(\cdot)\) is the quantile function of the chi-squared distribution with \(q\) degrees of freedom, and \(p\) is the \(p\)-value from a null-hypothesis significance test. We further define \(\mathit{eJAB}_{10}=1\ \!/\ \!\mathit{eJAB}_{01}\).
The sample size \(N\) is
the total number of observations for for \(t\)-tests, linear regression, logistic regression, one-way ANOVA, and chi-squared tests.
the number of events for Cox models.
the number of independent observations for one-way repeated-measures ANOVA (Nathoo & Masson, 2016).
The degrees of freedom are
\(q=1\) for \(t\)-tests, linear regression, logistic regression, and Cox models, where \(\mathcal{H}_0\) specifies a point-null hypothesis (Wagenmakers, 2022).
\(q=I-1\) for one-way ANOVA and one-way repeated-measures ANOVA, where \(I\) is the number of conditions.
\(q=(R-1)(C-1)\) for chi-squared tests, where \(R\) and \(C\) are the numbers of rows and columns, respectively.
Since the Bayes factor is the ratio of the marginal likelihoods of two competing models, \(q=p_1-p_0\) also represents the difference in the number of free parameters between the alternative and null models.
For further reading, please visit https://rpubs.com/sherloconan/1361280.