Bayes Factors

Jake

28/11/2022

Bayesian Hypothesis Testing

  • Considering a test statistic \(T = T(x_1,...,x_n)\) we can calculate a posterior of a hypothesis given observed data \(T\). For \(H_0\):

\[ P(H_0|T) = \frac{P(T|H_0)P(H_0)}{P(T|H_0)P(H_0)+P(T|H_1)P(H_1)}\] * To avoid computing normalisation constant, can compute the posterior odds ratio + If ratio is \(>1\), accept \(H_0\) (or numerator hypothesis) - Accepting means it is more probable than alternative.

  • Posterior odds for simple hypothesis \(H_0 : \theta = \theta_0, H_1 : \theta = \theta_1\)

\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{P(T|H_0)}{P(T|H_1)}\]

  • Posterior odds for composite hypothesis \(H_0 : \theta = \theta_0, H_1 : \theta\neq \theta_0\)

\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{P(T|H_0,\theta_0)}{\int P(T|H_1,\theta)\pi_1(\theta)d\theta}\] ## Example

  • Test Statistic

\[ T = \frac{1}{n}\sum^n_{i=1}X_i,\quad X_i\sim N(\theta,1) \]

  • Null hypothesis \(h_0:\theta = \theta_0\):

\[ T|H_0\sim N(0,1/n)\]

  • Alternative Hypothesis \(H_1:\theta\neq\theta_0\):

\[ T|H_1,\theta\sim N(\theta,1/n),\quad\theta\sim N(1,1)\] * Therefore the posterior odds ratio:

\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{N(0,1/n)}{\int N(\theta,1/n)*N(1,1)d\theta} \]

Bayes Factors

Encompassing Model

  • We consider an encompassing model that contains all models of interest for formal comparison of models via posterior model probabilities.

  • Likelihood of model \(m\in 1,...,M\)

\[ L_m(x|\theta_m,m),\quad\theta_m\in\Theta_m \]

  • Prior for model \(m\)’s parameters \(\theta_m\):

\[ \pi_m(\theta_m|m)\]

  • Encompassing model can then be thought of being indexed by \(\Theta = (m,\theta_m)\).
    • \(\Theta\) is the entire model parameter space
    • \(\Theta\) is therefore the union of individiual model parameter spaces

\[ \Theta = \bigcup^M_{m=1}\{m\}*\Theta_m\]

  • Encompassing model prior:

\[ \pi(\theta) = \pi(m,\theta_m) = \pi_m(\theta_m|m)\pi(m)\]

Posterior Inference

  • The posterior distribution on \(\pi(\theta|x)\) is:
    • \(\pi_m(\theta_m|x)\) is the posterior of \(\theta_m\)
    • \(\pi(m|x)\) is the posterior of model \(m\)

\[\begin{align}\pi(\theta_m,m|x)&=\frac{L_m(x|\theta_m)\pi(\theta_m|m)\pi(m)}{\pi(x)}\\ &=\frac{L_m(x|\theta_m)\pi_m(\theta_m|m)}{m_m(x)}*\frac{\pi(m)m_m(x)}{\pi(x)}\\ &=\pi_m(\theta_m|x)\pi(m|x)\end{align}\]

  • Where \(m_m(x)\) is the marginal distribution of the data under model \(m\), aka the marginal likelihood for model \(m\):

\[ m_m(x) = \pi(x|m) = \int\pi_m(x,\theta_m)d\theta_m = \int L_m(x|\theta_m)\pi_m(\theta_m|m)d\theta_m\] * To avoid computation of normalised posterior model probabilities are often calculated as: + Note the Bayes factor \(B_ij\)

\[ \text{Posterior Odds} = \frac{\pi(m=i|x)}{\pi(m=j|x)} = \frac{m_i(x)\pi(m=i)}{m_j(x)\pi(m=j)} = \frac{\pi(m=i)}{\pi(m=j)}*B_{ij} \] * Can use Bayes factors to compute the normalized posterior model probabilities:

\[ \pi(m=i|x) = \left(1+\sum_{j\neq i}\frac{\pi(m=j)}{\pi(m=i)}b_{ji}\right)\]

Nested Model Bayes Factors

  • We can use Bayes factors when one model is a strict subset of another
    • \(L_0(x|\theta)=L_1(x|\theta,\phi=\phi_0),\quad L_1(x|\theta,\phi)\)
  • Simplify by defining:

\[ \pi_0(\theta) = \pi_1(\theta|\phi=\phi_0)\]

  • Therefore Bayes factor \(B_{01}\):

\[ B_{01} = \frac{m_0(x)}{m_1(x)} = \frac{\pi_1(\theta_0|x)}{\pi_1(\theta_0)}\]

Bayes Factors and Improper Priors

  • Considering the uniform prior \(\pi(\theta)\propto 1\) as the limit as \(c\rightarrow\infty\) of the proper uniform distribution:

\[ \pi(\theta)=\frac{1}{2c},\quad -c\leq\theta\leq c\]

  • Then, for the marginal likelihood for model \(m\) is:
    • Noting that for most problems \(L_i(x|\phi)\) is finite and approaches \(0\) as \(\phi\rightarrow\infty\)

\[ m_i(x) = \int L_i(x|\phi)\pi_i(\phi)d\phi = \frac{1}{2c}\int^c_{-c}L_i(x|\phi)\]

  • Therefore as \(c\rightarrow\infty\), \(m_i(x)\rightarrow 0\), resulting in an undefined Bayes factor.
    • For the case of a nested model, \(B_{ij}\rightarrow\infty\)
  • Therefore, Bayes factors struggle with weak priors (high c/variance), becoming undefined due to \(c_1/c_2\) ratio.
    • Note that there is an exception for a parameter that is present in all models and hs the same prior under each model, therefore the factor cancels out perfectly in the Bayes factor ratio.
  • Also note that Bayes factors can struggle with particularly strong data too
    • Likelihood is negligible outside of a small interval around its maximum at \(\phi = \hat{\phi}\)
    • Bayes factor becomes proportional to any perturbation in prior \(\pi(\phi)\) that alters its value at \(\phi = \hat{\phi}\) as:

\[ m_i(x) \approx \pi_i(\hat{\phi})\int L_i(x|\phi)d\phi\]

Bayes Factor Alternatives for Improper Priors

Partial Bayesian Factors

  • Split data into \((x_T,x_R)\), with training data providing improved prior information
    • Diminishes prior sensitivity
  • Partial Bayes Factor is therefore:
    • Both factors in fraction have the same problem factor, therefore it cancels out in the partial factor.

\[ B^{R|T}_{12} = \frac{B_{12}}{B_{12}^T}\]

  • Where:

\[ B^{R|T}_{12} = \frac{m_1(x_R|x_T)}{m_2(x_R|x_T)},\quad B^T_{12} = \frac{m_1(x_T)}{m_2(x_T)}\]

  • However choosing how to split the data becomes an issue. Two solutions have been proposed:

Intrinsic Bayes Factors

  • Average partial bayes factor over all combination of minimal training samples
    • \(n_T\) is the size of the minimum training set
  • Arithmetic:

\[ B_{12}^{AI} = \left(\begin{matrix}n\\n_T\end{matrix}\right)^{-1}\sum_{x_T}B^{R|T}_{12}(x_T)\]

  • Geometric:

\[ \left(\prod_{x_T}B^{R|T}_{12}(x_T)\right)^{\left(\begin{matrix}n\\n_T\end{matrix}\right)^{-1}}\]

Fractional Bayes Factors

  • No explicit choice of \(x_T\), instead corresponds to an ‘idealised’ training sample:

\[ B^b_{12}=\frac{m^*_1(x)}{m^*_2(x)}\]

  • Where:
    • Numerator is the typical \(m_m(x)\)

\[ m_m^*(x) = \frac{\int L_m(x|\theta_m)\pi_m(\theta_m)d\theta_m}{\int [L_m(x|\theta_m)]^d\pi_m(\theta_m)d\theta_m},\quad 0<b<1\] * This formula cancels out the unknown normalising constants that were a problem, resulting in: + Note \(\pi_m(\theta_m)=c_mg_m(\theta_m)\)

\[ m_m^*(x) = \frac{L_m(x|\theta_m)^b\pi_m(\theta_m)d\theta_m}{\int [L_m(x|\theta_m)]^d\pi_m(\theta_m)d\theta_m}\]

  • Have to however choose \(b\)
    • \(b > \frac{n_{min}}{n}\) is rule of thumb