Bayesian Hypothesis Testing

Considering a test statistic \(T = T(x_1,...,x_n)\) we can calculate a posterior of a hypothesis given observed data \(T\). For \(H_0\):

\[ P(H_0|T) = \frac{P(T|H_0)P(H_0)}{P(T|H_0)P(H_0)+P(T|H_1)P(H_1)}\] * To avoid computing normalisation constant, can compute the posterior odds ratio + If ratio is \(>1\), accept \(H_0\) (or numerator hypothesis) - Accepting means it is more probable than alternative.

Posterior odds for simple hypothesis \(H_0 : \theta = \theta_0, H_1 : \theta = \theta_1\)

\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{P(T|H_0)}{P(T|H_1)}\]

Posterior odds for composite hypothesis \(H_0 : \theta = \theta_0, H_1 : \theta\neq \theta_0\)

\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{P(T|H_0,\theta_0)}{\int P(T|H_1,\theta)\pi_1(\theta)d\theta}\] ## Example

Test Statistic

\[ T = \frac{1}{n}\sum^n_{i=1}X_i,\quad X_i\sim N(\theta,1) \]

Null hypothesis \(h_0:\theta = \theta_0\):

\[ T|H_0\sim N(0,1/n)\]

Alternative Hypothesis \(H_1:\theta\neq\theta_0\):

\[ T|H_1,\theta\sim N(\theta,1/n),\quad\theta\sim N(1,1)\] * Therefore the posterior odds ratio:

\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{N(0,1/n)}{\int N(\theta,1/n)*N(1,1)d\theta} \]

Bayes Factors

Encompassing Model

We consider an encompassing model that contains all models of interest for formal comparison of models via posterior model probabilities.
Likelihood of model \(m\in 1,...,M\)

\[ L_m(x|\theta_m,m),\quad\theta_m\in\Theta_m \]

Prior for model \(m\)’s parameters \(\theta_m\):

\[ \pi_m(\theta_m|m)\]

Encompassing model can then be thought of being indexed by \(\Theta = (m,\theta_m)\).
- \(\Theta\) is the entire model parameter space
- \(\Theta\) is therefore the union of individiual model parameter spaces

\[ \Theta = \bigcup^M_{m=1}\{m\}*\Theta_m\]

Encompassing model prior:

\[ \pi(\theta) = \pi(m,\theta_m) = \pi_m(\theta_m|m)\pi(m)\]

Posterior Inference

The posterior distribution on \(\pi(\theta|x)\) is:
- \(\pi_m(\theta_m|x)\) is the posterior of \(\theta_m\)
- \(\pi(m|x)\) is the posterior of model \(m\)

\[\begin{align}\pi(\theta_m,m|x)&=\frac{L_m(x|\theta_m)\pi(\theta_m|m)\pi(m)}{\pi(x)}\\ &=\frac{L_m(x|\theta_m)\pi_m(\theta_m|m)}{m_m(x)}*\frac{\pi(m)m_m(x)}{\pi(x)}\\ &=\pi_m(\theta_m|x)\pi(m|x)\end{align}\]

Where \(m_m(x)\) is the marginal distribution of the data under model \(m\), aka the marginal likelihood for model \(m\):

\[ m_m(x) = \pi(x|m) = \int\pi_m(x,\theta_m)d\theta_m = \int L_m(x|\theta_m)\pi_m(\theta_m|m)d\theta_m\] * To avoid computation of normalised posterior model probabilities are often calculated as: + Note the Bayes factor \(B_ij\)

\[ \text{Posterior Odds} = \frac{\pi(m=i|x)}{\pi(m=j|x)} = \frac{m_i(x)\pi(m=i)}{m_j(x)\pi(m=j)} = \frac{\pi(m=i)}{\pi(m=j)}*B_{ij} \] * Can use Bayes factors to compute the normalized posterior model probabilities:

\[ \pi(m=i|x) = \left(1+\sum_{j\neq i}\frac{\pi(m=j)}{\pi(m=i)}b_{ji}\right)\]

Nested Model Bayes Factors

We can use Bayes factors when one model is a strict subset of another
- \(L_0(x|\theta)=L_1(x|\theta,\phi=\phi_0),\quad L_1(x|\theta,\phi)\)
Simplify by defining:

\[ \pi_0(\theta) = \pi_1(\theta|\phi=\phi_0)\]

Therefore Bayes factor \(B_{01}\):

\[ B_{01} = \frac{m_0(x)}{m_1(x)} = \frac{\pi_1(\theta_0|x)}{\pi_1(\theta_0)}\]

Bayes Factors and Improper Priors

Considering the uniform prior \(\pi(\theta)\propto 1\) as the limit as \(c\rightarrow\infty\) of the proper uniform distribution:

\[ \pi(\theta)=\frac{1}{2c},\quad -c\leq\theta\leq c\]

Then, for the marginal likelihood for model \(m\) is:
- Noting that for most problems \(L_i(x|\phi)\) is finite and approaches \(0\) as \(\phi\rightarrow\infty\)

\[ m_i(x) = \int L_i(x|\phi)\pi_i(\phi)d\phi = \frac{1}{2c}\int^c_{-c}L_i(x|\phi)\]

Therefore as \(c\rightarrow\infty\), \(m_i(x)\rightarrow 0\), resulting in an undefined Bayes factor.
- For the case of a nested model, \(B_{ij}\rightarrow\infty\)
Therefore, Bayes factors struggle with weak priors (high c/variance), becoming undefined due to \(c_1/c_2\) ratio.
- Note that there is an exception for a parameter that is present in all models and hs the same prior under each model, therefore the factor cancels out perfectly in the Bayes factor ratio.
Also note that Bayes factors can struggle with particularly strong data too
- Likelihood is negligible outside of a small interval around its maximum at \(\phi = \hat{\phi}\)
- Bayes factor becomes proportional to any perturbation in prior \(\pi(\phi)\) that alters its value at \(\phi = \hat{\phi}\) as:

\[ m_i(x) \approx \pi_i(\hat{\phi})\int L_i(x|\phi)d\phi\]

Bayes Factor Alternatives for Improper Priors

Partial Bayesian Factors

Split data into \((x_T,x_R)\), with training data providing improved prior information
- Diminishes prior sensitivity
Partial Bayes Factor is therefore:
- Both factors in fraction have the same problem factor, therefore it cancels out in the partial factor.

\[ B^{R|T}_{12} = \frac{B_{12}}{B_{12}^T}\]

Where:

\[ B^{R|T}_{12} = \frac{m_1(x_R|x_T)}{m_2(x_R|x_T)},\quad B^T_{12} = \frac{m_1(x_T)}{m_2(x_T)}\]

However choosing how to split the data becomes an issue. Two solutions have been proposed:

Intrinsic Bayes Factors

Average partial bayes factor over all combination of minimal training samples
- \(n_T\) is the size of the minimum training set
Arithmetic:

\[ B_{12}^{AI} = \left(\begin{matrix}n\\n_T\end{matrix}\right)^{-1}\sum_{x_T}B^{R|T}_{12}(x_T)\]

Geometric:

\[ \left(\prod_{x_T}B^{R|T}_{12}(x_T)\right)^{\left(\begin{matrix}n\\n_T\end{matrix}\right)^{-1}}\]

Fractional Bayes Factors

No explicit choice of \(x_T\), instead corresponds to an ‘idealised’ training sample:

\[ B^b_{12}=\frac{m^*_1(x)}{m^*_2(x)}\]

Where:
- Numerator is the typical \(m_m(x)\)

\[ m_m^*(x) = \frac{\int L_m(x|\theta_m)\pi_m(\theta_m)d\theta_m}{\int [L_m(x|\theta_m)]^d\pi_m(\theta_m)d\theta_m},\quad 0<b<1\] * This formula cancels out the unknown normalising constants that were a problem, resulting in: + Note \(\pi_m(\theta_m)=c_mg_m(\theta_m)\)

\[ m_m^*(x) = \frac{L_m(x|\theta_m)^b\pi_m(\theta_m)d\theta_m}{\int [L_m(x|\theta_m)]^d\pi_m(\theta_m)d\theta_m}\]

Have to however choose \(b\)
- \(b > \frac{n_{min}}{n}\) is rule of thumb

Bayes Factors

Jake

28/11/2022

Bayesian Hypothesis Testing

Bayes Factors

Encompassing Model

Posterior Inference

Nested Model Bayes Factors

Bayes Factors and Improper Priors

Bayes Factor Alternatives for Improper Priors

Partial Bayesian Factors

Intrinsic Bayes Factors

Fractional Bayes Factors