Bayes Factor: The Three Cases

Introduction

The Bayes factor is a fundamental tool in Bayesian hypothesis testing. It quantifies the evidence in the data \(y\) in favor of one model (or hypothesis) against another. The general definition is:

\[ B_{01}(y) = \frac{f(y \mid M_0)}{f(y \mid M_1)} \]

where \(f(y \mid M_j) = \int_{\Theta_j} f(y \mid \theta_j) \, \pi_j(\theta_j) \, d\theta_j\) is the marginal likelihood under model \(M_j\).

The specific form of the Bayes factor depends on whether each hypothesis is simple (a single point) or composite (a range of values). The three cases are summarized below.

The Three Cases

Case	\(H_0\) (Null)	\(H_1\) (Alternative)	Bayes Factor \(B_{01}(y)\)
1	Simple: \(\theta = \theta_0\)	Simple: \(\theta = \theta_1\)	\(\frac{f(y \mid \theta_0)}{f(y \mid \theta_1)}\)
2	Simple: \(\theta = \theta_0\)	Composite: \(\theta \in \Theta_1\)	\(\frac{f(y \mid \theta_0)}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\)
3	Composite: \(\theta \in \Theta_0\)	Composite: \(\theta \in \Theta_1\)	\(\frac{\int_{\Theta_0} f(y \mid \theta) \, \pi_0(\theta) \, d\theta}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\)

Detailed Explanations

Case 1: Simple vs. Simple

Hypotheses: - \(H_0: \theta = \theta_0\) - \(H_1: \theta = \theta_1\)

Explanation: Both hypotheses specify a single value for the parameter. Under \(H_0\), the prior \(\pi_0(\theta)\) is a point mass at \(\theta_0\). Therefore, the marginal likelihood is simply the likelihood evaluated at that point:

\[f(y \mid M_0) = f(y \mid \theta_0)\]

Similarly, under \(H_1\):

\[f(y \mid M_1) = f(y \mid \theta_1)\]

Thus, the Bayes factor reduces to a simple likelihood ratio:

\[B_{01}(y) = \frac{f(y \mid \theta_0)}{f(y \mid \theta_1)}\]

Interpretation: - If \(B_{01} > 1\), the data support \(H_0\) over \(H_1\). - If \(B_{01} < 1\), the data support \(H_1\) over \(H_0\).

This is equivalent to the frequentist likelihood ratio test statistic, but interpreted differently.

Case 2: Simple vs. Composite

Hypotheses: - \(H_0: \theta = \theta_0\) (point null) - \(H_1: \theta \in \Theta_1\) (composite alternative, e.g., \(\theta \neq \theta_0\) or \(\theta > \theta_0\))

Explanation:

Under \(H_0\), as before: \(f(y \mid M_0) = f(y \mid \theta_0)\).

Under \(H_1\), \(\theta\) is unknown but follows a prior distribution \(\pi_1(\theta)\) over the set \(\Theta_1\). To obtain the marginal likelihood, we average the likelihood over this prior:

\[f(y \mid M_1) = \int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta\]

Hence:

\[B_{01}(y) = \frac{f(y \mid \theta_0)}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\]

Why integrate? Under a composite hypothesis, the parameter is unknown. The marginal likelihood represents the average predictive performance of \(H_1\) over all possible \(\theta\) values, weighted by how plausible each is a priori. This ensures a fair comparison with the simple null hypothesis.

Important note: This form is the most common in practice (e.g., testing \(\theta = 0\) vs. \(\theta \neq 0\)). The value of \(B_{01}\) depends on the choice of \(\pi_1(\theta)\), even in large samples. This is a feature, not a bug: it reflects prior uncertainty.

Case 3: Composite vs. Composite

Hypotheses: - \(H_0: \theta \in \Theta_0\) - \(H_1: \theta \in \Theta_1\)

Explanation: Both hypotheses are composite. Under each, we average the likelihood over the respective prior distribution:

\[f(y \mid M_0) = \int_{\Theta_0} f(y \mid \theta) \, \pi_0(\theta) \, d\theta\] \[f(y \mid M_1) = \int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta\]

Thus:

\[B_{01}(y) = \frac{\int_{\Theta_0} f(y \mid \theta) \, \pi_0(\theta) \, d\theta}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\]

This is the most general form. Cases 1 and 2 are special simplifications: - If \(\pi_0(\theta)\) is a point mass at \(\theta_0\), the numerator integral collapses to \(f(y \mid \theta_0)\), recovering Case 2. - If both \(\pi_0\) and \(\pi_1\) are point masses, we recover Case 1.

Interpretation: The Bayes factor compares the average likelihood under the null prior to the average likelihood under the alternative prior. It answers: “How much more probable is the observed data under the null model (averaged over its prior) than under the alternative model (averaged over its prior)?”

Summary Table of Key Differences

Feature	Case 1	Case 2	Case 3
\(H_0\) type	Simple	Simple	Composite
\(H_1\) type	Simple	Composite	Composite
Numerator	\(f(y \mid \theta_0)\)	\(f(y \mid \theta_0)\)	\(\int_{\Theta_0} f(y \mid \theta) \pi_0(\theta) d\theta\)
Denominator	\(f(y \mid \theta_1)\)	\(\int_{\Theta_1} f(y \mid \theta) \pi_1(\theta) d\theta\)	\(\int_{\Theta_1} f(y \mid \theta) \pi_1(\theta) d\theta\)
Integrals needed?	No	One (under \(H_1\))	Two (under both)

Practical Example (Optional)

Suppose \(y \sim N(\theta, 1)\) and we observe \(y = 2.5\).

Case 1: \(H_0: \theta = 0\), \(H_1: \theta = 1\)

\[B_{01} = \frac{\phi(2.5)}{\phi(1.5)} \approx \frac{0.0175}{0.1295} \approx 0.135\]

Evidence against \(H_0\).

Case 2: \(H_0: \theta = 0\), \(H_1: \theta \sim N(0, 10)\)

\[B_{01} = \frac{\phi(2.5)}{\int \phi(2.5 - \theta) \cdot \phi(\theta / \sqrt{10}) d\theta} \approx \frac{0.0175}{0.05} = 0.35\]

Weaker evidence against \(H_0\) than in Case 1, because \(H_1\) averages over many \(\theta\) values.

Case 3: \(H_0: \theta \sim N(0, 1)\), \(H_1: \theta \sim N(0, 10)\)

Both numerator and denominator are integrals, computed numerically.

Conclusion

The Bayes factor adapts its mathematical form to the nature of the hypotheses being compared. The general principle is always the same: compare marginal likelihoods. Whether those marginal likelihoods simplify to point evaluations or remain as integrals depends entirely on whether the priors under each hypothesis are point masses or continuous distributions.

Understanding these three forms is essential for correctly implementing and interpreting Bayesian hypothesis tests.