Generalized Jeffreys’s Approximate Objective Bayes Factor

============================================================================================================

0. Getting Started

$\chi^2$-tests for

goodness of fit: determine whether the data fits a specific distribution

# Is the die fair?
rolls <- c(22, 23, 20, 25, 27, 33) # number of rolls for each face
chisq.test(rolls)

## 
##  Chi-squared test for given probabilities
## 
## data:  rolls
## X-squared = 4.24, df = 5, p-value = 0.5154

independence: determine whether there is a significant association between two categorical variables

For each cell in the contingency table, calculate the expected frequency and, then, the test statistic as

\[\begin{align} E_{ij}&=\frac{O_{i\centerdot}\cdot O_{\centerdot j}}{N},\quad\text{where }N=\sum_{i=1}^R\sum_{j=1}^C O_{ij},\quad O_{i\centerdot}=\sum_{j=1}^C O_{ij},\quad\text{and }O_{\centerdot j}=\sum_{i=1}^R O_{ij} \\ \chi^2&=\sum_{i=1}^R\sum_{j=1}^C\frac{(O_{ij}-E_{ij})^2}{E_{ij}}\overset{\mathcal{H}_0}{\sim}\chi^2_{(R-1)(C-1)} \end{align}\]

These calculations also apply to product multinomial sampling, where one variable’s marginals are fixed.

# Does the seat belt make a difference? (Jobson, 1982, p. 18)
belt <- matrix(c(12813,  647,  359,  42,
                 65963, 4000, 2642, 303), nrow=2, byrow=T,
               dimnames=list(c("yesbelt", "nahbelt"), c("none", "minimal", "minor", "major")))
chisq.test(belt)

## 
##  Pearson's Chi-squared test
## 
## data:  belt
## X-squared = 59.224, df = 3, p-value = 8.61e-13

# https://richarddmorey.github.io/BayesFactor/#ctables
BayesFactor::contingencyTableBF(belt, sampleType="jointMulti") # fixed total

## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 2186082 ±0%
## 
## Against denominator:
##   Null, independence, a = 1 
## ---
## Bayes factor type: BFcontingencyTable, joint multinomial

BayesFactor::contingencyTableBF(belt, sampleType="indepMulti", fixedMargin="rows") # fixed row sums

## Bayes factor analysis
## --------------
## [1] Non-indep. (a=1) : 6455475 ±0%
## 
## Against denominator:
##   Null, independence, a = 1 
## ---
## Bayes factor type: BFcontingencyTable, independent multinomial

Assumption: Expected frequency in each cell $\geqslant5$

Warning Message

mat <- matrix(c(12, 8, 11, 5, 5, 10, 16, 8, 15), nrow=3) # small cell sizes
chisq.test(mat) # WARNING

## Warning in chisq.test(mat): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  mat
## X-squared = 1.899, df = 4, p-value = 0.7543

chisq.test(mat, simulate.p.value=T)

## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  mat
## X-squared = 1.899, df = NA, p-value = 0.7576

fisher.test(mat) # Fisher's exact test

## 
##  Fisher's Exact Test for Count Data
## 
## data:  mat
## p-value = 0.7517
## alternative hypothesis: two.sided

homogeneity: determine whether different populations have the same distribution

		\(B_1\)	\(B_2\)	\(B_3\)	\(B_4\)
\(A_1\)		\(\mu+\alpha_1+\beta_1+(\alpha\beta)_{11}\) \(=\) \(b_0\)	\(\mu+\alpha_1+\beta_2+(\alpha\beta)_{12}\) \(=\) \(b_0+b_2\)	\(\mu+\alpha_1+\beta_3+(\alpha\beta)_{13}\) \(=\) \(b_0+b_3\)	\(\mu+\alpha_1+\beta_4+(\alpha\beta)_{14}\) \(=\) \(b_0+b_4\)
\(A_2\)		\(\mu+\alpha_2+\beta_1+(\alpha\beta)_{21}\) \(=\) \(b_0+b_1\)	\(\mu+\alpha_2+\beta_2+(\alpha\beta)_{22}\) \(=\) \(b_0+b_1+b_2+b_5\)	\(\mu+\alpha_2+\beta_3+(\alpha\beta)_{23}\) \(=\) \(b_0+b_1+b_3+b_6\)	\(\mu+\alpha_2+\beta_4+(\alpha\beta)_{24}\) \(=\) \(b_0+b_1+b_4+b_7\)

	mean	se_mean	sd	2.5%	25%	50%	75%	97.5%	n_eff	Rhat
b[1]	4.6045	0.0009	0.0815	4.4424	4.5498	4.6050	4.6595	4.7643	8481.021	1.0001
b[2]	-0.0016	0.0012	0.1154	-0.2273	-0.0789	-0.0020	0.0757	0.2270	8645.935	1.0000
b[3]	-0.0022	0.0011	0.1156	-0.2295	-0.0799	-0.0014	0.0761	0.2247	10881.324	1.0001
b[4]	-0.0018	0.0011	0.1152	-0.2265	-0.0805	-0.0008	0.0765	0.2244	10147.156	1.0000
b[5]	-0.0017	0.0011	0.1161	-0.2314	-0.0791	-0.0009	0.0765	0.2266	10342.893	1.0000
b[6]	0.0023	0.0016	0.1635	-0.3173	-0.1083	0.0031	0.1126	0.3203	10733.894	1.0000
b[7]	0.0015	0.0016	0.1625	-0.3159	-0.1083	0.0028	0.1112	0.3162	10316.407	1.0000
b[8]	0.0018	0.0016	0.1638	-0.3223	-0.1081	0.0019	0.1118	0.3236	10494.821	1.0001
lp__	-21.5045	0.0188	2.0185	-26.2650	-22.6331	-21.1723	-20.0312	-18.5715	11511.420	1.0002

	mean	se_mean	sd	2.5%	25%	50%	75%	97.5%	n_eff	Rhat
b[1]	4.6039	0.0006	0.0645	4.4759	4.5609	4.6040	4.6478	4.7296	12297.48	1.0002
b[2]	0.0000	0.0004	0.0580	-0.1139	-0.0394	0.0002	0.0393	0.1132	17924.18	1.0002
b[3]	-0.0002	0.0007	0.0817	-0.1611	-0.0552	0.0000	0.0543	0.1596	14973.87	1.0001
b[4]	-0.0002	0.0007	0.0812	-0.1584	-0.0559	-0.0001	0.0542	0.1585	14906.61	1.0001
b[5]	-0.0004	0.0007	0.0813	-0.1602	-0.0547	-0.0003	0.0540	0.1590	14682.10	1.0000
lp__	-21.3608	0.0140	1.5743	-25.2353	-22.1841	-21.0439	-20.1993	-19.2832	12720.81	1.0000

\(\mathcal{H}_1\)	\(\frac{1}{RC}+\delta\)	\(\frac{1}{RC}-\delta\)	\(\dotsb\)	\(\frac{1}{RC}\)	\(\frac{1}{R}\)	\(\mathcal{H}_0\)	\(\frac{1}{RC}\)	\(\frac{1}{RC}\)	\(\dotsb\)	\(\frac{1}{RC}\)	\(\frac{1}{R}\)
	\(\frac{1}{RC}-\delta\)	\(\frac{1}{RC}+\delta\)	\(\dotsb\)	\(\frac{1}{RC}\)	\(\frac{1}{R}\)		\(\frac{1}{RC}\)	\(\frac{1}{RC}\)	\(\dotsb\)	\(\frac{1}{RC}\)	\(\frac{1}{R}\)
	\(\vdots\)	\(\vdots\)	\(\ddots\)	\(\vdots\)	\(\vdots\)		\(\vdots\)	\(\vdots\)	\(\ddots\)	\(\vdots\)	\(\vdots\)
	\(\frac{1}{RC}\)	\(\frac{1}{RC}\)	\(\dotsb\)	\(\frac{1}{RC}\)	\(\frac{1}{R}\)		\(\frac{1}{RC}\)	\(\frac{1}{RC}\)	\(\dotsb\)	\(\frac{1}{RC}\)	\(\frac{1}{R}\)
	\(\frac{1}{C}\)	\(\frac{1}{C}\)	\(\dotsb\)	\(\frac{1}{C}\)	1		\(\frac{1}{C}\)	\(\frac{1}{C}\)	\(\dotsb\)	\(\frac{1}{C}\)	1

Generalized Jeffreys’s Approximate Objective Bayes Factor

Part Six: Chi-Squared Test for Independence Simulation

Zhengxiao Wei*, Puneet Velidi, Shreena Nisha Kalaria, Yimeng Liu, Céline M. Laumont, Brad H. Nelson, Farouk S. Nathoo

2024-08-21 version 0.21.9000 ✉*: sherloconan{at}gmail{dot}com

0. Getting Started

1. Methods

“BayesFactor” R Package

BIC Approximation

Assume a fixed total

Assume fixed row sums

Approximate Bayes Factor

Extended JAB

Test-Based Bayes Factor

TSBFs for goodness of fit

TSBFs for independence

Log-Linear Model

GLM

Dummy Variables

Stan

Savage–Dickey Density Ratio

Bridge Sampling

Monte Carlo Error

Non-Centrality Parameter

2. Simulation

3A. Table \(3\times3\) (jointMulti)

Boxplot (\(\ln\textit{BF}_{01}\))

Boxplot (Percent Errors)

Line Chart (Proportions of Agreement)

3B. Table \(3\times3\) (indepMulti)

Boxplot (\(\ln\textit{BF}_{01}\))

Boxplot (Percent Errors)

Line Chart (Proportions of Agreement)

4A. Table \(4\times5\) (jointMulti)

Boxplot (\(\ln\textit{BF}_{01}\))

Boxplot (Percent Errors)

Line Chart (Proportions of Agreement)

4B. Table \(4\times5\) (indepMulti)

Boxplot (\(\ln\textit{BF}_{01}\))

Boxplot (Percent Errors)

Line Chart (Proportions of Agreement)

5. Table \(10\times10\) (jointMulti)

6. Table \(2\times2\) (jointMulti)

7. eJAB

8. Debugging

Zhengxiao Wei*, Puneet Velidi, Shreena Nisha Kalaria, Yimeng Liu,
Céline M. Laumont, Brad H. Nelson, Farouk S. Nathoo

3A. Table \(3\times3\) (`jointMulti`)

3B. Table \(3\times3\) (`indepMulti`)

4A. Table \(4\times5\) (`jointMulti`)

4B. Table \(4\times5\) (`indepMulti`)

5. Table \(10\times10\) (`jointMulti`)

6. Table \(2\times2\) (`jointMulti`)