The Bayes factor is a fundamental tool in Bayesian hypothesis testing. It quantifies the evidence in the data \(y\) in favor of one model (or hypothesis) against another. The general definition is:
\[ B_{01}(y) = \frac{f(y \mid M_0)}{f(y \mid M_1)} \]
where \(f(y \mid M_j) = \int_{\Theta_j} f(y \mid \theta_j) \, \pi_j(\theta_j) \, d\theta_j\) is the marginal likelihood under model \(M_j\).
The specific form of the Bayes factor depends on whether each hypothesis is simple (a single point) or composite (a range of values). The three cases are summarized below.
| Case | \(H_0\) (Null) | \(H_1\) (Alternative) | Bayes Factor \(B_{01}(y)\) |
|---|---|---|---|
| 1 | Simple: \(\theta = \theta_0\) | Simple: \(\theta = \theta_1\) | \(\frac{f(y \mid \theta_0)}{f(y \mid \theta_1)}\) |
| 2 | Simple: \(\theta = \theta_0\) | Composite: \(\theta \in \Theta_1\) | \(\frac{f(y \mid \theta_0)}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\) |
| 3 | Composite: \(\theta \in \Theta_0\) | Composite: \(\theta \in \Theta_1\) | \(\frac{\int_{\Theta_0} f(y \mid \theta) \, \pi_0(\theta) \, d\theta}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\) |
Hypotheses: - \(H_0: \theta = \theta_0\) - \(H_1: \theta = \theta_1\)
Explanation: Both hypotheses specify a single value for the parameter. Under \(H_0\), the prior \(\pi_0(\theta)\) is a point mass at \(\theta_0\). Therefore, the marginal likelihood is simply the likelihood evaluated at that point:
\[f(y \mid M_0) = f(y \mid \theta_0)\]
Similarly, under \(H_1\):
\[f(y \mid M_1) = f(y \mid \theta_1)\]
Thus, the Bayes factor reduces to a simple likelihood ratio:
\[B_{01}(y) = \frac{f(y \mid \theta_0)}{f(y \mid \theta_1)}\]
Interpretation: - If \(B_{01} > 1\), the data support \(H_0\) over \(H_1\). - If \(B_{01} < 1\), the data support \(H_1\) over \(H_0\).
This is equivalent to the frequentist likelihood ratio test statistic, but interpreted differently.
Hypotheses: - \(H_0: \theta = \theta_0\) (point null) - \(H_1: \theta \in \Theta_1\) (composite alternative, e.g., \(\theta \neq \theta_0\) or \(\theta > \theta_0\))
Explanation:
Under \(H_0\), as before: \(f(y \mid M_0) = f(y \mid \theta_0)\).
Under \(H_1\), \(\theta\) is unknown but follows a prior distribution \(\pi_1(\theta)\) over the set \(\Theta_1\). To obtain the marginal likelihood, we average the likelihood over this prior:
\[f(y \mid M_1) = \int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta\]
Hence:
\[B_{01}(y) = \frac{f(y \mid \theta_0)}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\]
Why integrate? Under a composite hypothesis, the parameter is unknown. The marginal likelihood represents the average predictive performance of \(H_1\) over all possible \(\theta\) values, weighted by how plausible each is a priori. This ensures a fair comparison with the simple null hypothesis.
Important note: This form is the most common in practice (e.g., testing \(\theta = 0\) vs. \(\theta \neq 0\)). The value of \(B_{01}\) depends on the choice of \(\pi_1(\theta)\), even in large samples. This is a feature, not a bug: it reflects prior uncertainty.
Hypotheses: - \(H_0: \theta \in \Theta_0\) - \(H_1: \theta \in \Theta_1\)
Explanation: Both hypotheses are composite. Under each, we average the likelihood over the respective prior distribution:
\[f(y \mid M_0) = \int_{\Theta_0} f(y \mid \theta) \, \pi_0(\theta) \, d\theta\] \[f(y \mid M_1) = \int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta\]
Thus:
\[B_{01}(y) = \frac{\int_{\Theta_0} f(y \mid \theta) \, \pi_0(\theta) \, d\theta}{\int_{\Theta_1} f(y \mid \theta) \, \pi_1(\theta) \, d\theta}\]
This is the most general form. Cases 1 and 2 are special simplifications: - If \(\pi_0(\theta)\) is a point mass at \(\theta_0\), the numerator integral collapses to \(f(y \mid \theta_0)\), recovering Case 2. - If both \(\pi_0\) and \(\pi_1\) are point masses, we recover Case 1.
Interpretation: The Bayes factor compares the average likelihood under the null prior to the average likelihood under the alternative prior. It answers: “How much more probable is the observed data under the null model (averaged over its prior) than under the alternative model (averaged over its prior)?”
| Feature | Case 1 | Case 2 | Case 3 |
|---|---|---|---|
| \(H_0\) type | Simple | Simple | Composite |
| \(H_1\) type | Simple | Composite | Composite |
| Numerator | \(f(y \mid \theta_0)\) | \(f(y \mid \theta_0)\) | \(\int_{\Theta_0} f(y \mid \theta) \pi_0(\theta) d\theta\) |
| Denominator | \(f(y \mid \theta_1)\) | \(\int_{\Theta_1} f(y \mid \theta) \pi_1(\theta) d\theta\) | \(\int_{\Theta_1} f(y \mid \theta) \pi_1(\theta) d\theta\) |
| Integrals needed? | No | One (under \(H_1\)) | Two (under both) |
Suppose \(y \sim N(\theta, 1)\) and we observe \(y = 2.5\).
Case 1: \(H_0: \theta = 0\), \(H_1: \theta = 1\)
\[B_{01} = \frac{\phi(2.5)}{\phi(1.5)} \approx \frac{0.0175}{0.1295} \approx 0.135\]
Evidence against \(H_0\).
Case 2: \(H_0: \theta = 0\), \(H_1: \theta \sim N(0, 10)\)
\[B_{01} = \frac{\phi(2.5)}{\int \phi(2.5 - \theta) \cdot \phi(\theta / \sqrt{10}) d\theta} \approx \frac{0.0175}{0.05} = 0.35\]
Weaker evidence against \(H_0\) than in Case 1, because \(H_1\) averages over many \(\theta\) values.
Case 3: \(H_0: \theta \sim N(0, 1)\), \(H_1: \theta \sim N(0, 10)\)
Both numerator and denominator are integrals, computed numerically.
The Bayes factor adapts its mathematical form to the nature of the hypotheses being compared. The general principle is always the same: compare marginal likelihoods. Whether those marginal likelihoods simplify to point evaluations or remain as integrals depends entirely on whether the priors under each hypothesis are point masses or continuous distributions.
Understanding these three forms is essential for correctly implementing and interpreting Bayesian hypothesis tests.