Generalized Jeffreys’s Approximate Objective Bayes Factor

============================================================================================================

1. Methods

“BayesFactor” R Package

\[\begin{align} &Y_{1j}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\mu+\sigma_\epsilon \delta/2,\ \sigma_\epsilon^2)\quad\text{for }j=1,\dotsb,n_1, \\ &Y_{2j}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(\mu-\sigma_\epsilon \delta/2,\ \sigma_\epsilon^2)\quad\text{for }j=1,\dotsb,n_2,\text{ where }\delta=(\mu_1-\mu_2)/\sigma_\epsilon \tag{$*$} \end{align}\]

\[\begin{equation} \tag{2} \pi(\mu,\sigma_\epsilon^2)\propto 1/\ \sigma_\epsilon^{2} \end{equation}\]

\[\begin{equation} \tag{$\dagger$} \delta\sim\text{Cauchy}(0,\ \gamma)=\frac{\gamma}{(\delta^2+\gamma^2)\pi} \end{equation}\]

$\mathcal{H}_0:\delta=0$ versus $\mathcal{H}_1:\delta\neq0$

A Bayesian two-sample $t$-test is equivalent to a Bayesian one-way analysis of variance (ANOVA) with two conditions ($a=2$).

\[\begin{align} \tag{1} \mathcal{M}_1:\ Y_{ij}&=\mu+\sigma_\epsilon d_i+\epsilon_{ij}\\ \text{versus}\quad\mathcal{M}_0:\ Y_{ij}&=\mu+\epsilon_{ij},\\ \qquad\qquad&\epsilon_{ij}\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,\sigma_\epsilon^2)\quad\text{ for condition } i=1,\dotsb,a;\text{ (unbalanced) subject } j=1,\dotsb,n_i \end{align}\]

\[\begin{equation} \tag{2} \pi(\mu,\sigma_\epsilon^2)\propto 1/\ \sigma_\epsilon^{2} \end{equation}\]

\[\begin{equation} \tag{3} d_i^\star\mid g\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,g) \end{equation}\]

\[\begin{equation} \tag{4} g\sim\text{Scale-inv-}\chi^2(1,h^2) \end{equation}\]

\[\begin{equation} \tag{5} (d_1^\star,\dotsb,d_{a-1}^\star)=(d_1,\dotsb,d_a)\cdot\mathbf{Q} \end{equation}\]

\[\begin{equation} \tag{6} \mathbf{I}_a-\frac{1}{a}\mathbf{J}_a=\mathbf{Q}\cdot\mathbf{Q}^\top \end{equation}\]

By default, $h=0.5$ for the fixed effects (in this case) and $h=1$ for the random effects in Eq. 4.

$\mathbf{Q}$ is an $a\times(a-1)$ matrix of the $a-1$ eigenvectors of unit length corresponding to the nonzero eigenvalues of the left side term in Eq. 6. For example, the projected standardized effect $d^\star=(d_1-d_2)/\sqrt{2}$ when $a=2$. In the other direction (given $d_1+d_2=0$), $d_1=d^\star/\sqrt{2}$ and $d_2=-d^\star/\sqrt{2}$. Read more: https://rpubs.com/sherloconan/1070217.

The Jeffreys–Zellner–Siow (JZS) Bayes factor for the Bayesian $t$-test is computed by integrating across one dimension in Eq. 7 (Ly et al., 2016, p. 24; Rouder et al., 2012, p. 360; Rouder et al., 2009, p. 237; Wei, Nathoo, & Masson, 2023, p. 1762; see also the balanced one-way between-subjects ANOVA in Morey et al., 2011, p. 374), using Gaussian quadrature.

\[\begin{equation} \tag{7} \textit{JZS-BF}_{10}=(2\pi)^{-\frac{1}{2}}\gamma\cdot\left(1+\frac{t^2}{\nu}\right)^{\frac{\nu+1}{2}}\int_0^\infty(1+Ng)^{-\frac{1}{2}}\left(1+\frac{t^2}{(1+Ng)\nu}\right)^{-\frac{\nu+1}{2}}g^{-\frac{3}{2}}e^{-\frac{\gamma^2}{2g}}\text{d}g, \end{equation}\]

where the default scale $\gamma=\sqrt{2}/2$ (cf. $h=0.5$; version 0.9.2 and later), the degrees of freedom $\nu=n_1+n_2-2$, the $t$-statistic $t=\frac{M_1-M_2}{s_p\cdot\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}$, $M_i$ are the sample condition means, and $s_p^2=\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}$ is the pooled variance in a two-sample $t$-test assuming equal variances between the two groups. The sample sizes are $n_1$ and $n_2$, and the sample variances are $s_1^2$ and $s_2^2$, respectively.

The effective sample size is $N=\frac{n_1\cdot n_2}{n_1+n_2}$. See also the harmonic mean.

JZSBF10 <- function(gamma=1/sqrt(2), t, N, nu) {
  #' Input - 
  #' gamma:   scale parameter of the Cauchy prior, i.e.,
  #'          `rscale="medium"` or `rscale=1/sqrt(2)` in BayesFactor::ttestBF, or
  #'          `rscaleFixed="medium"` or `rscaleFixed=0.5` in BayesFactor::anovaBF
  #' t:       observed t test statistic
  #' N:       "effective" sample size
  #' nu:      degrees of freedom
  #' Output - the JZS Bayes factor for a t-test in Eq. 7

  coef <- (2*pi)^(-0.5) * gamma * (1+t^2/nu)^((nu+1)/2)
  integrand <- function(g) {
    (1+N*g)^(-0.5) * (1+t^2/((1+N*g)*nu))^(-(nu+1)/2) * g^(-1.5) * exp(-gamma^2/(2*g))
  }
  unname(coef * integrate(integrand, lower=0, upper=Inf)$value)
}

test <- t.test(extra ~ group, data=sleep, var.equal=T) # two-sample t-test
t_obs <- test$statistic # t-statistic
n1 <- table(sleep$group)[1]; n2 <- table(sleep$group)[2] # balanced sample sizes, n1 = n2 = 10

JZSBF10(t=t_obs, N=n1*n2/(n1+n2), nu=n1+n2-2) # verifying the formula of the effective sample size

## [1] 1.265925

BayesFactor::ttestBF(formula=extra ~ group, data=sleep)

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 1.265925 ±0%
## 
## Against denominator:
##   Null, mu1-mu2 = 0 
## ---
## Bayes factor type: BFindepSample, JZS

Unlike the anovaBF function, the ttestBF function of the “BayesFactor” R package yields negligible Monte Carlo errors and does not require specifying the number of Markov chain Monte Carlo iterations.

Default functions return the same Bayes factor estimates for the independent-samples (two-sample) $t$-test and one-way ANOVA with two conditions. However, those functions return systematically different estimates for the paired $t$-test and one-way repeated-measures ANOVA with two conditions (see discussion and vignettes).

Caution Extreme Bayes Factor Estimates

set.seed(277)
(bf <- BayesFactor::ttestBF(rnorm(1000), rnorm(1000, 2.5)))

## Bayes factor analysis
## --------------
## [1] Alt., r=0.707 : 1.23212e+420 ±0%
## 
## Against denominator:
##   Null, mu1-mu2 = 0 
## ---
## Bayes factor type: BFindepSample, JZS

(val <- BayesFactor::extractBF(bf)$bf) # the extracted and stored value is infinity

## [1] Inf

scroll to top

BIC Approximation

Approximating Bayes Factors from Bayesian Information Criterion (BIC)

The BIC for model $i$ with the number of free parameters $k_i$ and the sample size $n$ is $\text{BIC}(\mathcal{M}_i)=-2\ln{\mathcal{L}_i}+k_i\ln{n}\quad$ for $n\gg k_i$,

where $\mathcal{L}_i=p(\boldsymbol{y}\mid\hat{\boldsymbol{\theta}}_i,\mathcal{M}_i)$ is the maximum likelihood.

Use the second-order Taylor approximation of log-marginal likelihood of each model.

\[\begin{equation} \tag{8} \mathit{BF}_{10}\approx\exp\left\{\frac{\text{BIC}(\mathcal{M}_0)-\text{BIC}(\mathcal{M}_1)}{2}\right\} \end{equation}\]

Derivation in Wagenmakers (2007, Appendix B).

Discussion in Nathoo and Masson (2016).

?BIC

scroll to top

Approximate Bayes Factor

The test statistic in a Wald test is $W=\left(\frac{\hat{\theta}-\theta_0}{\text{se}(\hat{\theta})}\right)^2\sim\chi_1^2$ under $\mathcal{H}_0$, where $\hat{\theta}$ is the maximum likelihood estimate, and $\text{se}(\hat{\theta})$ is the standard error.

Wagenmakers (2022; its Eq. 6) reviewed the Jeffreys-style approximate Bayes factor (JAB) and proposed a piecewise approximation (WAB) from $p$-value and effective sample size $N$ in Eq. 10.

The complication with JAB is that the approximation is valid only when $\text{se}(\hat{\theta})\ll\sigma_g$, where $\sigma_g$ is the scale of the prior distribution $g(\theta)$. Note that with very small sample sizes, this assumption is not true and Eq. 9 is biased against $\mathcal{H}_0$.

\[\begin{equation} \tag{9} \textit{JAB}_{01}=A\cdot\sqrt{N}\exp\left\{-\frac{1}{2}W\right\} \end{equation}\]

A normal unit-information prior for $\theta$ yields $A=1$. Based on the Jeffreys prior, the general approximation specifies $A=\sqrt{\pi/2}\approx1.253$.

The normal unit-information prior is a normal distribution with mean zero and variance equal to the information in a single observation.

Equation 2 specifies the Jeffreys priors for the grand mean and error variance.

Wagenmakers’s piecewise approximation is

\[ \textit{WAB}_{01}\approx \begin{cases} \tag{10} \text{$\ p^{\frac{1}{4}}\sqrt{N}$,} & \text{$p>0.5$} \\ \\ \text{$\ \sqrt{pN}$,} & \text{$0.1<p\leqslant0.5$ (simpler)} \\ \\ \text{$\ \frac{4}{3}p^{\frac{2}{3}}\sqrt{N}$,} & \text{$0.1<p\leqslant0.5$ (more precise)} \\ \\ \text{$\ 3p\sqrt{N}$,} & \text{$p\leqslant0.1$} \end{cases} \]

qchisq(p, 1, lower.tail=F) # inverse CDF of chisq(1): 1-p → W = t² → JAB
qchisq(1-p, 1) # same

Note: The two-sided $p$-value of the $t$-statistic differs from the $p$-value of the Wald statistic, particularly when the sample size is insufficient.

# suppose t(𝜈 = 10 + 10 - 2) = 0.77, compute the p-values
c(pt(0.77, 18, lower.tail=F)*2, pchisq(0.77^2, 1, lower.tail=F))

## [1] 0.4512878 0.4412999

# suppose a two-sided p = 0.45, compute the test statistics t² and W
c(qt(0.45/2, 18, lower.tail=F)^2, qchisq(0.45, 1, lower.tail=F)) # try increasing 𝜈

## [1] 0.5963371 0.5706519

scroll to top

Sampling Distributions

If $X_i\overset{\text{i.i.d.}}{\sim}\mathcal{N}(0,1)$ for $i=1,\dotsb,n$, then $Z=\sum_{i=1}^n X_i\sim\chi_n^2$ with $\mathbb{E}[Z]=n$ and $\text{Var}(Z)=2n$.
If $Z_1\sim\chi_m^2\ \perp \!\!\! \perp\ Z_2\sim\chi_n^2$, then $F=\frac{Z_1/m}{Z_2/n}\sim F(m,n)$ with $\mathbb{E}[F]=\frac{n}{n-2}$ for $n>2$ and $\text{Var}(Z)=\frac{2n^2(m+n-2)}{m(n-2)^2(n-4)}$ for $n>4$.
If $X\sim\mathcal{N}(0,1)\ \perp \!\!\! \perp\ Z\sim\chi_n^2$, then $T=\frac{X}{\sqrt{Z/n}}\sim t_n$ with $\mathbb{E}[T]=0$ for $n>1$ and $\text{Var}(T)=\frac{n}{n-2}$ for $n>2$.

Savage–Dickey Density Ratio

The Savage–Dickey density ratio is a special form of the Bayes factor for nested models by dividing the value of
the posterior density over the parameters for the alternative model evaluated at the hypothesized value by
the prior for the same model evaluated at the same point.

\[\begin{equation} \tag{11} \textit{BF}_{01}:=\frac{p(\boldsymbol{y}\mid \mathcal{M}_0)}{p(\boldsymbol{y}\mid \mathcal{M}_1)}=\frac{\color{#0055A4}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid\boldsymbol{y},\mathcal{M}_1)}}{\color{#EF4135}{p(\boldsymbol{\theta}=\boldsymbol{\theta}_0\mid \mathcal{M}_1)}} \end{equation}\]

stancode1M1 <- "
data {
  int<lower=1> N;           // total number of observations
  vector[N] x;              // predictor variable
  vector[N] y;              // response variable
  vector[2] mu;             // mean vector for theta
  matrix[2,2] Sigma;        // covariance matrix for theta
}

parameters {
  vector[2] theta;          // combined vector for alpha and beta
  real<lower=0> sigma;      // error standard deviation
}

transformed parameters {
  real alpha = theta[1];    // intercept
  real beta = theta[2];     // slope
}

model {
  // Priors
  theta ~ multi_normal(mu, Sigma);    // unit information
  target += -log(sigma);              // Jeffreys prior

  // Likelihood (linear regression model)
  y ~ normal(alpha + beta * x, sigma);
}"

stanmodel1M1 <- stan_model(model_code=stancode1M1)

scroll to top

2. Simulation

Data

simData <- function(n1, n2, m1=0, delta=NULL, sd=1, power=NULL) {
  #' Input - 
  #' n1:      sample size of the first group
  #' n2:      sample size of the second group
  #' m1:      population mean of the first group
  #' delta:   population mean difference, m2 - m1
  #' sd:      population standard deviation
  #' power:   statistical power;
  #'          need to input either `delta` or `power`
  #' 
  #' Dependency - pwr (v1.3-0)
  #' Output - two-sample (unbalanced) data in the long format
  
  if (is.null(delta) & !is.null(power)) {
    delta <- sd * pwr::pwr.t2n.test(n1=n1, n2=n2, power=power)$d
  } else if (is.null(delta) & is.null(power)) {
    delta <- 0 # simulation under the null
  } else if (!is.null(delta) & !is.null(power)) {stop("Bad input!")}
  
  # no random seed
  data.frame("resp"=c(stats::rnorm(n1, m1,         sd),
                      stats::rnorm(n2, m1 + delta, sd)),
             "grp"=rep(c("grp1", "grp2"), c(n1, n2)),
             stringsAsFactors=T)
}


num <- 1000
p_par <- p_nonpar <- numeric(num)
for (i in 1:num) {
  set.seed(i)
  data <- simData(n1=30, n2=30, power=.8)
  p_par[i] <- t.test(resp ~ grp, data=data, var.equal=T)$p.value
  p_nonpar[i] <- wilcox.test(formula=resp ~ grp, data=data)$p.value # Mann–Whitney 𝑈-test
}
plot(p_par, p_nonpar, xlim=range(c(p_par, p_nonpar)), ylim=range(c(p_par, p_nonpar)),
     xlab=expression(italic(p) * "-Values from " * italic(t) * "-Tests"),
     ylab=expression(italic(p) * "-Values from " * italic(U) * "-Tests"),
     main=paste0("Two-Sample Designs (", num, " Runs)"))
abline(0, 1, col="gray", lty=2, lwd=1.5)

The Bayes Factors

Note: All the Bayes factors are $\textit{BF}_{01}$.

computeBFs <- function(data, total=F, precise=F) {
  #' Input -
  #' data:     two-sample data in the long format whose columns must be `resp` and `grp`
  #' total:    logical; if FALSE (default), use the effective sample size for Output 5-7
  #'                             otherwise, use the total number of observations
  #' precise:  logical; if FALSE (default), use the simpler form for Output 7;
  #'                             otherwise, use the more precise form
  #' 
  #' Global -  stanmodel1M1
  #' Dependency - BayesFactor (v0.9.12-4.7), rstan (v2.32.6), polspline (v1.1.25)
  #' Output -  a vector of the Bayes factors in favor of the null
  #'           1. Savage–Dickey density ratio (Eq. 11) using the normal unit-information prior
  #'           2. ttestBF function (Eq. 7)
  #'           3. BIC approximation to the Bayes factor in Eq. 8; BIC(fit)
  #'           4. SBC approximation to the Bayes factor in Eq. 8; AIC(fit, k=log(N)) <== CAUTION
  #'           5. Jeffreys approximate Bayes factor in Eq. 9 using the normal unit-information prior
  #'           6. Jeffreys approximate Bayes factor in Eq. 9 using the Jeffreys prior
  #'           7. Wagenmakers approximate Bayes factor based on the p-value and sample size in Eq. 10
  
  data <- stats::na.omit(data)
  grp_sizes <- unname(table(data$grp)) # sample sizes, n1 and n2
  Ntot <- sum(grp_sizes) # total number of observations
  N <- prod(grp_sizes) / Ntot # effective sample size in the two-sample t-test, n1 * n2 / (n1 + n2)
  
  fit0 <- stats::lm(resp ~ 1, data) # null model fit
  fit1 <- stats::lm(resp ~ grp, data) # full model fit
  pVal <- summary(fit1)$coefficients["grpgrp2", "Pr(>|t|)"] # p-value of the two-sample t-test
  tVal <- stats::qt(pVal/2, Ntot-2) # t-statistic (m2 >= m1), ["grpgrp2", "t value"]
  
  datalist <- list(N=Ntot, x=as.numeric(data$grp)-1, y=data$resp,
                   mu=stats::coef(fit1), Sigma=stats::vcov(fit1)*Ntot)
  stanfitM1 <- rstan::sampling(stanmodel1M1, data=datalist,
                               iter=15000, warmup=5000, chains=2, seed=277, refresh=0)
  betaS <- rstan::extract(stanfitM1, pars="beta")$beta
  dPost <- polspline::dlogspline(0, polspline::logspline(betaS)) # estimated posterior density
  dPrior <- stats::dnorm(0, datalist$mu[2], sqrt(datalist$Sigma[2,2]))
  SDdr01 <- dPost / dPrior # Savage–Dickey density ratio
  
  bf <- BayesFactor::ttestBF(formula=resp ~ grp, data=data) # Eq. 7
  BF01 <- ifelse(BayesFactor::extractBF(bf)$error < 0.01, # negligible Monte Carlo errors
                 1 / BayesFactor::extractBF(bf)$bf, NA)
  
  BICB01 <- exp((stats::BIC(fit1) - stats::BIC(fit0)) / 2) # equiv. AIC(fit, k=log(n1 + n2)); Eq. 8
  SBCB01 <- exp((stats::AIC(fit1, k=log(N)) - stats::AIC(fit0, k=log(N))) / 2) # <== CAUTION 
  
  N <- ifelse(total, Ntot, N)
  # sqrt(N) * exp(-0.5 * stats::qchisq(1-pVal, 1)) #  <== CAUTION, tiny p-value --> infinity Wald statistic
  JAB01_UI <- sqrt(N) * exp(-0.5 * tVal * tVal) # Eq. 9, unit-information prior

  WAB01 <- ifelse(pVal <= .5,
                  ifelse(pVal > .1,
                         ifelse(precise,
                                4 * pVal^(2/3) * sqrt(N) / 3, # 0.1 < p <= 0.5 (more precise)
                                sqrt(pVal * N)), # 0.1 < p <= 0.5 (simpler)
                         3 * pVal * sqrt(N)), # p <= 0.1
                  pVal^(1/4) * sqrt(N)) # p > 0.5, Eq. 10
  
  c("SD"=SDdr01, "ttestBF"=BF01, "BIC_approx"=BICB01, "SBC_approx"=SBCB01, 
                 "JAB"=JAB01_UI, "JAB_J"=sqrt(pi/2) * JAB01_UI, "WAB"=WAB01)
}

# set.seed(277)
# (test <- simData(10, 15, power=.8))
# computeBFs(test)

Simulation Study

We aim to conduct several simulation studies to evaluate the accuracy of approximate objective Bayes factors in their ability to convert $p$-values and $N$ reported in scientific articles to Bayesian measures of evidence.

Read Cohen’s $d$: https://en.wikipedia.org/wiki/Effect_size#Cohen's_d

useTotalN <- T

nSim <- 1000 # number of simulation runs for each setting
nSubj <- c(10, 20, 30, 100, 500) # sample sizes of the first group
SMD <- c("Null"=0, "Small"=0.2, "Medium"=0.5, "Large"=0.8) # (standardized) mean differences, m2-m1, given sd=1

reportBF <- reportBF_avg <- reportBF_sd <- reportERR <- reportERR_avg <- reportERR_sd <- 
  reportRULE <- reportRULE_H1 <- reportRULE_H0 <- reportRULE_inc <-
  setNames(vector(mode="list", length=length(nSubj) * length(SMD)), # pre-allocation
           apply(expand.grid(names(SMD), nSubj), 1, function(r) paste0("n1 = ", r[2], ", delta = ", r[1])))

index <- 1 # initial list index

for (n1 in nSubj) {
  n2 <- round(n1 * 1.2) # set the sample size of the second group <==
  
  for (delta in SMD) {
    set.seed(n1+delta)
    temp <- t(replicate(nSim, computeBFs(simData(n1, n2, delta=delta), useTotalN))) # replicate the Bayes factors
    reportBF[[index]] <- temp
    reportBF_avg[[index]] <- colMeans(temp, na.rm=T) # mean Bayes factors in favor of the null
    reportBF_sd[[index]] <- apply(na.omit(temp), 2, sd) # sample standard deviations of them
    
    tempERR <- 100* (temp - temp[,1]) / temp[,1] # percent errors; the first column should be all 0
    reportERR[[index]] <- tempERR # (can take in NA)
    reportERR_avg[[index]] <- colMeans(tempERR, na.rm=T) # mean percent errors
    reportERR_sd[[index]] <- apply(na.omit(tempERR), 2, sd) # sample standard deviations of the percent errors
    
    tempRULE_H1 <- temp < 1/3 # support H1
    tempRULE_H0 <- temp > 3 # support H0
    tempRULE_inc <- temp >= 1/3 & temp <= 3 # inconclusive

    # counts of matching decisions for H1
    reportRULE_H1[[index]] <- colSums(tempRULE_H1 * tempRULE_H1[,1], na.rm=T)

    # counts of matching decisions for H0
    reportRULE_H0[[index]] <- colSums(tempRULE_H0 * tempRULE_H0[,1], na.rm=T)

    # counts of matching inconclusiveness
    reportRULE_inc[[index]] <- colSums(tempRULE_inc * tempRULE_inc[,1], na.rm=T)

    tempRULE <- 1*(temp < 1/3) - 1*(temp > 3) # 1: support H1;  0: inconclusive;  -1: support H0
    reportRULE[[index]] <- colSums(tempRULE[,1] == tempRULE, na.rm=T) # counts of matching decisions
    # colMeans(); a value near 1 suggests matching results; the first column should be all 1
    
    index <- index + 1
  }
} # roughly 8.3 hours of run time

# Warning messages:
# 1-10: In polspline::logspline(betaS) :  Not all models could be fitted

reshapeW2L <- function(list, prob=T, lab=c("SD", "BF", "BIC", "SBC", "JAB", "JAB_J", "WAB")) {
  #' Input -
  #' list:    a list of size length(nSubj) * length(SMD);
  #'          reportERR (unaggregated),
  #'          reportRULE, reportRULE_H1, reportRULE_H0, or reportRULE_inc (aggregated)
  #' prob:    logical; if TRUE (default), return the proportions;
  #'                           otherwise, return the original values
  #' lab:     a vector of characters representing the Bayes factor methods
  #' 
  #' Global -
  #' nSim:    number of simulation runs for each setting
  #' nSubj:   sample sizes of the first group
  #' SMD:     (standardized) mean differences, m2 - m1, given sd = 1
  #' 
  #' Output - unlist the report and reshape the wide into the long
  
  len <- length(lab) # seven methods
  nRep <- ifelse(!prob, nSim, 1) # nSim if unaggregated; 1 if aggregated
  long <- data.frame("value"=unname(unlist(list)),
                     "method"=factor(rep(rep(1:len, each=nRep), length(list)), labels=lab),
                     "n1"=rep(nSubj, each=nRep*len*length(SMD)),
                     "delta"=rep(rep(unname(SMD), each=nRep*len), length(nSubj)))
  
  if (prob) {
    list2 <- lapply(list, function(vec) {
      if (vec[1] == 0) {
        vec * NA
      } else {
        vec / vec[1] # convert counts to proportions
      }
    })
    list3 <- lapply(list, function(vec) rep(vec[1], length(vec))) # denominator
    long$total <- unname(unlist(list3))
    long$prop <- unname(unlist(list2))
  }
  long
}


formatting <- function(val) {
  #' format the value using scientific notation
  if (abs(val) >= 10) {
    paste("~italic(BF)[\"01\"] %~~%", 
          gsub("e\\+0*(\\d+)", "%*%10^\\1", sprintf("%.2e", val)))
  } else if (abs(val) < 1) {
    paste("~italic(BF)[\"01\"] %~~%", 
          gsub("e-0*(\\d+)", "%*%10^{-\\1}", sprintf("%.2e", val)))
  } else {
    paste("~italic(BF)[\"01\"] %~~%", sprintf("%.2f", val))
  }
}

scroll to top

3A. Graphic Results (Effective $N=n_1\cdot n_2/(n_1+n_2)$)

Visualization

Note: The effective number of observations is used in the BF, SBC, JAB, and WAB methods. The Jeffreys approximate Bayes factor (JAB_J) in Eq. 9, using the Jeffreys prior, will not be plotted.

Boxplot (Percent Errors)

# percent errors
df <- reshapeW2L(reportERR, prob=F)

index <- 0

# boxplot of the percent errors
for (smd in SMD) {
  baseBF01 <- sapply(reportBF_avg[seq(1, by=length(SMD), length.out=length(nSubj)) + index], 
                     function(x) x[1]) # baseline is the Savage–Dickey density ratio
  baseBF01_lab <- sapply(baseBF01, formatting) # scientific notation
  index <- index + 1
  
  sub <- subset(df, !(method %in% c("SD", "JAB_J")) & delta == smd) # subset the long
  anno <- data.frame("label"=baseBF01_lab, "n1"=unique(sub$n1),
                     "x"=3, "y"=.8 * aggregate(value~n1, sub, FUN=max)$value) # for annotation use
  
  print(ggplot(sub, aes(x=method, y=value)) +
        geom_boxplot(alpha=0.3, na.rm=T) +
        geom_text(data=anno, aes(label=label, x=x, y=y), parse=T, col="red") +
        facet_wrap(.~n1, scales="free_y", nrow=1,
                     labeller=label_bquote(cols=italic(n)[1]==.(n1))) +
        labs(x="\nBayes Factor Methods", y="Percent Error (%)\n") +
        ggtitle(paste0(names(SMD[SMD==smd]), " Standardized Mean Difference")) +
        geom_hline(yintercept=0, linetype="dashed", color="gray") + # reference line 0%
        theme_classic() +
        theme(axis.text.x=element_text(angle=90)))
  cat("<br><br><br><br><br><br><br>")
}

scroll to top

Boxplot ($\textit{BF}_{01}$)

df_BF <- reshapeW2L(reportBF, prob=F)

# boxplot of the BF₀₁
for (smd in SMD) {
  sub <- subset(df_BF, delta == smd & method != "JAB_J") # subset the long
  print(ggplot(sub, aes(x=method, y=value)) +
          geom_boxplot(alpha=0.3, na.rm=T) +
          facet_wrap(.~n1, scales="free_y", nrow=1,
                     labeller=label_bquote(cols=italic(n)[1]==.(n1))) +
          labs(x="\nBayes Factor Methods", y=expression(italic("BF")["01"])) +
          ggtitle(paste0(names(SMD[SMD==smd]), " Standardized Mean Difference")) +
          theme_classic() +
          theme(axis.text.x=element_text(angle=90)))
  cat("<br><br><br><br><br>")
}

scroll to top

Line Chart (Proportions of Agreement)

# decisions
decis <- rbind(reshapeW2L(reportRULE),
               reshapeW2L(reportRULE_H1),
               reshapeW2L(reportRULE_H0),
               reshapeW2L(reportRULE_inc))
decis$result <- factor(rep(c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"),
                           each=length(levels(df$method)) * length(nSubj) * length(SMD)),
                       levels=c("Overall", "BF01 < 1/3", "BF01 > 3", "[1/3, 3]"))
decis$delta <- paste0("ES = ", decis$delta)
decis <- decis[!is.na(decis[,"prop"]),]
if (F) {
  # compute the pointwise confidence intervals using the normal approximation
  decis$half.len <- qnorm(1-.05/2) * 
    sqrt(decis$value * (decis$total - decis$value) / decis$total^3) # 95% CI width
  decis$lwr <- max(0, decis$value / decis$total - decis$half.len) # 95% CI lower bound
  decis$upr <- min(1, decis$value / decis$total + decis$half.len) # 95% CI upper bound
} else { # Jeffreys intervals
  decis$lwr <- qbeta(.05/2, decis$value, decis$total - decis$value + 1) # 95% CI lower bound
  decis$upr <- qbeta(1-.05/2, decis$value + 1, decis$total - decis$value) # 95% CI upper bound
  decis$half.len <- (decis$upr - decis$lwr) / 2 # 95% CI width
}

# line chart of the proportions of agreement
ggplotly(ggplot(decis, aes(n1, prop, color=method, label=value)) +
           geom_line(linewidth=1.2, na.rm=T) + geom_point(size=2, na.rm=T) +
           geom_ribbon(aes(ymin=lwr, ymax=upr), alpha=.2) +
           scale_color_manual(values=c("black", "gray", "#E69F00", "#009E73", "#CC79A7", "#D55E00", "#0072B2"),
                              labels=c("SD", "BF", "BIC", "SBC", "JAB", "JAB_J", "WAB")) +
           facet_grid(result~delta) +
           labs(x="Sample Size of the First Group", y="Proportion of Agreement", color="Method") +
           scale_x_continuous(breaks=c(10, 100, 500)) +
           scale_y_continuous(breaks=c(0, 0.5, 1)) + # min(decis$prop, na.rm=T)
           theme_minimal()) # interactive plots

scroll to top

Eureka!

Since all the methods have solutions in $t$-tests and balanced one-way ANOVA, simulation studies may not be necessary.

BF10 <- function(t) {
  gamma <- 1/sqrt(2) # scale parameter of the Cauchy prior
  Neff <- n1*n2/(n1+n2) # effective sample size
  nu <- n1+n2-2 # degrees of freedom
  coef <- (2*pi)^(-0.5) * gamma * (1+t^2/nu)^((nu+1)/2)
  integrand <- function(g) {
    (1+Neff*g)^(-0.5) * (1+t^2/((1+Neff*g)*nu))^(-(nu+1)/2) * g^(-1.5) * exp(-gamma^2/(2*g))
  }
  unname(coef * integrate(integrand, lower=0, upper=Inf)$value)
}

BIC01 <- function(t) {
  N <- n1+n2 # total number of observations
  nu <- N-2 # degrees of freedom
  sqrt(N * (1 + t^2 / nu)^(-N))
}

SBC01 <- function(t) {
  Neff <- n1*n2/(n1+n2) # effective sample size
  N <- n1+n2 # total number of observations
  nu <- N-2 # degrees of freedom
  sqrt(Neff * (1 + t^2 / nu)^(-N))
}

JAB01 <- function(t) {
  Neff <- n1*n2/(n1+n2) # effective sample size
  sqrt(Neff) * exp(-0.5 * t^2)
}

WAB01 <- function(t) {
  Neff <- n1*n2/(n1+n2) # effective sample size
  pVal <- pt(t, n1+n2-2, lower.tail=F) * 2 # two-sided p-value
  ifelse(pVal <= .5,
         ifelse(pVal > .1,
                ifelse(F,
                       4 * pVal^(2/3) * sqrt(Neff) / 3, # 0.1 < p <= 0.5 (more precise)
                       sqrt(pVal * Neff)), # 0.1 < p <= 0.5 (simpler)
                3 * pVal * sqrt(Neff)), # p <= 0.1
         pVal^(1/4) * sqrt(Neff)) # p > 0.5
}

PBF01 <- function(t) {
  N <- n1+n2 # total number of observations
  a <- 2 # number of groups
  F_ratio <- t^2 # F = t²
  zeta <- -0.5 # adjustment of the shape parameters
  ifelse(N >= 345,
         (N/2)^((a-1)/2) * gamma(1+zeta) * 
           ((N-a) / (N-a+(a-1)*F_ratio))^((N-a)/2-1-zeta) / gamma((a-1)/2+1+zeta), # in case of Inf/Inf
         gamma((N-1)/2) * gamma(1+zeta) * 
           ((N-a) / (N-a+(a-1)*F_ratio))^((N-a)/2-1-zeta) / (gamma((a-1)/2+1+zeta) * gamma((N-a)/2)))
}

TSBF01 <- function(t) {
  # Johnson (2005), https://doi.org/10.1111/j.1467-9868.2005.00521.x
  t <- abs(t)
  N <- n1+n2 # total number of observations
  m <- N-2
  ifelse(t > 1,
         ((m+1) / (m+t^2))^((m+1)/2) * t,
         1)
}


n1 <- 30 # sample size of the first group
n2 <- 35 # sample size of the second group
ts <- qt(c(.05, .01, .005)/2, n1+n2-2, lower.tail=F) # critical values of t
x <- seq(0, ts[3], by=0.1) # input sequence
num <- length(x)
bf <- bic <- sbc <- jab <- wab <- pbf <- tsbf <- numeric(num)
for (i in 1:num) {
  bf[i] <- BF10(x[i])     # JZS Bayes factor
  bic[i] <- BIC01(x[i])   # BIC approximation
  sbc[i] <- SBC01(x[i])   # BIC approximation using the effective sample size
  jab[i] <- JAB01(x[i])   # Jeffreys approximate Bayes factor
  wab[i] <- WAB01(x[i])   # Wagenmakers approximate Bayes factor
  pbf[i] <- PBF01(x[i])   # Pearson Bayes factor
  tsbf[i] <- TSBF01(x[i]) # Test-based Bayes factor
}

plot(x, bf, ylab=expression(italic(BF)[10]), xlab=expression("|" * italic("t") * "|-Statistic"),
     main=expression("Two-Sample " * italic(t) * "-Test"),
     sub=bquote(italic(n)[1]==.(n1)~~"&"~~italic(n)[2]==.(n2))) # scatter plot
points(x, 1/bic, col="#E69F00", pch=8)
points(x, 1/sbc, col="#009E73", pch=16)
points(x, 1/jab, col="#CC79A7", pch=18)
points(x, 1/wab, col="#0072B2", pch=15)
points(x, 1/pbf, col="#56B4E9", pch=17)
points(x, 1/tsbf, col="#EF4135", pch=3)
abline(h=c(1/3, 3), col="gray")
abline(v=c(ts[1], ts[2]), col="red")
legend("topleft", c("ttestBF", "BIC approx", "SBC approx", "JAB", "WAB", "PBF", "TSBF"),
       col=c("black", "#E69F00", "#009E73", "#CC79A7", "#0072B2", "#56B4E9", "#EF4135"),
       pch=c(1, 8, 16, 18, 15, 17, 3), lty=0, lwd=2, bty="n")

plot(x, 1/bf, ylab=expression(italic(BF)["01"]), xlab=expression("|" * italic("t") * "|-Statistic"),
     main=expression("Two-Sample " * italic(t) * "-Test"),
     sub=bquote(italic(n)[1]==.(n1)~~"&"~~italic(n)[2]==.(n2))) # scatter plot
points(x, bic, col="#E69F00", pch=8)
points(x, sbc, col="#009E73", pch=16)
points(x, jab, col="#CC79A7", pch=18)
points(x, wab, col="#0072B2", pch=15)
points(x, pbf, col="#56B4E9", pch=17)
points(x, tsbf, col="#EF4135", pch=3)
abline(h=c(1/3, 3), col="gray")
abline(v=c(ts[1], ts[2]), col="red")
legend("bottomleft", c("ttestBF", "BIC approx", "SBC approx", "JAB", "WAB", "PBF", "TSBF"),
       col=c("black", "#E69F00", "#009E73", "#CC79A7", "#0072B2", "#56B4E9", "#EF4135"),
       pch=c(1, 8, 16, 18, 15, 17, 3), lty=0, lwd=2, bty="n")

scroll to top

3B. Graphic Results (Total $N=n_1+n_2$)

Visualization

Note: The total number of observations is used in the SD, BIC, JAB, and WAB methods. The Jeffreys approximate Bayes factor (JAB_J) in Eq. 9, using the Jeffreys prior, will not be plotted.

4. Finite Sample Correction

Centering the normal unit-information prior on the null hypothesized value $\theta_0$, the customary choice in objective Bayesian testing yields the following JAB* (its Eq. 4; Wagenmakers, 2022).

\[\begin{equation} \textit{JAB}_{01}^*\approx\sqrt{N}\exp{\left\{-\frac{N-1}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]

Alternatively, the JAB in Eq. 9 centers the prior on the maximum likelihood estimate.

\[\begin{equation} \textit{JAB}_{01}\approx\sqrt{N}\exp{\left\{-\frac{N}{2}\left(\frac{\hat{\theta}-\theta_0}{\sigma_g}\right)^2\right\}} \end{equation}\]

Using a power transformation, it can be shown that $\textit{JAB}_{01}^*=\textit{JAB}_{01}^{\frac{N-1}{N}}\cdot N^{\frac{1}{2N}}$.

$n_1=10,\quad n_2=12$

par(mfrow=c(1, length(SMD)))
for (i in 1:length(SMD)) {
  reportBF[[i]][,6] <- reportBF[[i]][,5]^(21/22) * 22^(1/44) # total number of observations is 10 + 12
  boxplot(reportBF[[i]][,c(1,5:7)], names=c("SD", "JAB", "JAB*", "WAB"),
          xlab="", ylab=expression(italic("BF")["01"]), main=paste0(names(SMD)[i], " SMD"), las=2)
}

scroll to top

5. eJAB

To test the hypotheses $\mathcal{H}_0:\boldsymbol{\theta}=\boldsymbol{\theta}_0$ versus $\mathcal{H}_1:\boldsymbol{\theta}\neq\boldsymbol{\theta}_0$, we define the extended Jeffreys approximate objective Bayes factor (eJAB) as \[\begin{equation} \mathit{eJAB}_{01}=\sqrt{N}\exp\left\{-\frac{1}{2}\frac{N^{1/q}-1}{N^{1/q}}\cdot Q_{\chi^{2}_{q}}(1-p)\right\}, \end{equation}\]

where $q$ is the size of the parameter vector $\boldsymbol{\theta}$, $N$ is the sample size, $Q_{\chi^{2}_{q}}(\cdot)$ is the quantile function of the chi-squared distribution with $q$ degrees of freedom, and $p$ is the $p$-value from a null-hypothesis significance test. We further define $\mathit{eJAB}_{10}=1\ \!/\ \!\mathit{eJAB}_{01}$.

The sample size $N$ is

the total number of observations for for $t$-tests, linear regression, logistic regression, one-way ANOVA, and chi-squared tests.
the number of events for Cox models.
the number of independent observations for one-way repeated-measures ANOVA (Nathoo & Masson, 2016).

The degrees of freedom are

$q=1$ for $t$-tests, linear regression, logistic regression, and Cox models, where $\mathcal{H}_0$ specifies a point-null hypothesis (Wagenmakers, 2022).
$q=I-1$ for one-way ANOVA and one-way repeated-measures ANOVA, where $I$ is the number of conditions.
$q=(R-1)(C-1)$ for chi-squared tests, where $R$ and $C$ are the numbers of rows and columns, respectively.

Since the Bayes factor is the ratio of the marginal likelihoods of two competing models, $q=p_1-p_0$ also represents the difference in the number of free parameters between the alternative and null models.

For further reading, please visit https://rpubs.com/sherloconan/1361280.

scroll to top

Generalized Jeffreys’s Approximate Objective Bayes Factor

Part One: Two-Sample t-Test Simulation

Zhengxiao Wei*, Puneet Velidi, Shreena Nisha Kalaria, Yimeng Liu,
Céline M. Laumont, Brad H. Nelson, Farouk S. Nathoo

2024-07-10 version 0.41.9000 ✉*: sherloconan{at}gmail{dot}com

1. Methods

“BayesFactor” R Package

BIC Approximation

Approximate Bayes Factor

Savage–Dickey Density Ratio

2. Simulation

3A. Graphic Results (Effective \(N=n_1\cdot n_2/(n_1+n_2)\))

Boxplot (Percent Errors)

Boxplot (\(\textit{BF}_{01}\))

Line Chart (Proportions of Agreement)

Eureka!

3B. Graphic Results (Total \(N=n_1+n_2\))

Boxplot (Percent Errors)

Boxplot (\(\textit{BF}_{01}\))

Line Chart (Proportions of Agreement)

4. Finite Sample Correction

5. eJAB

Generalized Jeffreys’s Approximate Objective Bayes Factor

Part One: Two-Sample t-Test Simulation

Zhengxiao Wei*, Puneet Velidi, Shreena Nisha Kalaria, Yimeng Liu, Céline M. Laumont, Brad H. Nelson, Farouk S. Nathoo

2024-07-10 version 0.41.9000 ✉*: sherloconan{at}gmail{dot}com

1. Methods

“BayesFactor” R Package

BIC Approximation

Approximate Bayes Factor

Savage–Dickey Density Ratio

2. Simulation

3A. Graphic Results (Effective \(N=n_1\cdot n_2/(n_1+n_2)\))

Boxplot (Percent Errors)

Boxplot (\(\textit{BF}_{01}\))

Line Chart (Proportions of Agreement)

Eureka!

3B. Graphic Results (Total \(N=n_1+n_2\))

Boxplot (Percent Errors)

Boxplot (\(\textit{BF}_{01}\))

Line Chart (Proportions of Agreement)

4. Finite Sample Correction

5. eJAB

Zhengxiao Wei*, Puneet Velidi, Shreena Nisha Kalaria, Yimeng Liu,
Céline M. Laumont, Brad H. Nelson, Farouk S. Nathoo