Bayesian Hypothesis Testing
- Considering a test statistic \(T = T(x_1,...,x_n)\) we can calculate a posterior of a hypothesis given observed data \(T\). For \(H_0\):
\[ P(H_0|T) = \frac{P(T|H_0)P(H_0)}{P(T|H_0)P(H_0)+P(T|H_1)P(H_1)}\] * To avoid computing normalisation constant, can compute the posterior odds ratio + If ratio is \(>1\), accept \(H_0\) (or numerator hypothesis) - Accepting means it is more probable than alternative.
- Posterior odds for simple hypothesis \(H_0 : \theta = \theta_0, H_1 : \theta = \theta_1\)
\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{P(T|H_0)}{P(T|H_1)}\]
- Posterior odds for composite hypothesis \(H_0 : \theta = \theta_0, H_1 : \theta\neq \theta_0\)
\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{P(T|H_0,\theta_0)}{\int P(T|H_1,\theta)\pi_1(\theta)d\theta}\] ## Example
- Test Statistic
\[ T = \frac{1}{n}\sum^n_{i=1}X_i,\quad X_i\sim N(\theta,1) \]
- Null hypothesis \(h_0:\theta = \theta_0\):
\[ T|H_0\sim N(0,1/n)\]
- Alternative Hypothesis \(H_1:\theta\neq\theta_0\):
\[ T|H_1,\theta\sim N(\theta,1/n),\quad\theta\sim N(1,1)\] * Therefore the posterior odds ratio:
\[ \frac{P(H_0|T)}{P(H_1|T)} = \frac{P(H_0)}{P(H_1)}*\frac{N(0,1/n)}{\int N(\theta,1/n)*N(1,1)d\theta} \]
Bayes Factors
Encompassing Model
We consider an encompassing model that contains all models of interest for formal comparison of models via posterior model probabilities.
Likelihood of model \(m\in 1,...,M\)
\[ L_m(x|\theta_m,m),\quad\theta_m\in\Theta_m \]
- Prior for model \(m\)’s parameters \(\theta_m\):
\[ \pi_m(\theta_m|m)\]
- Encompassing model can then be thought of being indexed by \(\Theta = (m,\theta_m)\).
- \(\Theta\) is the entire model parameter space
- \(\Theta\) is therefore the union of individiual model parameter spaces
\[ \Theta = \bigcup^M_{m=1}\{m\}*\Theta_m\]
- Encompassing model prior:
\[ \pi(\theta) = \pi(m,\theta_m) = \pi_m(\theta_m|m)\pi(m)\]
Posterior Inference
- The posterior distribution on \(\pi(\theta|x)\) is:
- \(\pi_m(\theta_m|x)\) is the posterior of \(\theta_m\)
- \(\pi(m|x)\) is the posterior of model \(m\)
\[\begin{align}\pi(\theta_m,m|x)&=\frac{L_m(x|\theta_m)\pi(\theta_m|m)\pi(m)}{\pi(x)}\\ &=\frac{L_m(x|\theta_m)\pi_m(\theta_m|m)}{m_m(x)}*\frac{\pi(m)m_m(x)}{\pi(x)}\\ &=\pi_m(\theta_m|x)\pi(m|x)\end{align}\]
- Where \(m_m(x)\) is the marginal distribution of the data under model \(m\), aka the marginal likelihood for model \(m\):
\[ m_m(x) = \pi(x|m) = \int\pi_m(x,\theta_m)d\theta_m = \int L_m(x|\theta_m)\pi_m(\theta_m|m)d\theta_m\] * To avoid computation of normalised posterior model probabilities are often calculated as: + Note the Bayes factor \(B_ij\)
\[ \text{Posterior Odds} = \frac{\pi(m=i|x)}{\pi(m=j|x)} = \frac{m_i(x)\pi(m=i)}{m_j(x)\pi(m=j)} = \frac{\pi(m=i)}{\pi(m=j)}*B_{ij} \] * Can use Bayes factors to compute the normalized posterior model probabilities:
\[ \pi(m=i|x) = \left(1+\sum_{j\neq i}\frac{\pi(m=j)}{\pi(m=i)}b_{ji}\right)\]
Nested Model Bayes Factors
- We can use Bayes factors when one model is a strict subset of another
- \(L_0(x|\theta)=L_1(x|\theta,\phi=\phi_0),\quad L_1(x|\theta,\phi)\)
- Simplify by defining:
\[ \pi_0(\theta) = \pi_1(\theta|\phi=\phi_0)\]
- Therefore Bayes factor \(B_{01}\):
\[ B_{01} = \frac{m_0(x)}{m_1(x)} = \frac{\pi_1(\theta_0|x)}{\pi_1(\theta_0)}\]
Bayes Factors and Improper Priors
- Considering the uniform prior \(\pi(\theta)\propto 1\) as the limit as \(c\rightarrow\infty\) of the proper uniform distribution:
\[ \pi(\theta)=\frac{1}{2c},\quad -c\leq\theta\leq c\]
- Then, for the marginal likelihood for model \(m\) is:
- Noting that for most problems \(L_i(x|\phi)\) is finite and approaches \(0\) as \(\phi\rightarrow\infty\)
\[ m_i(x) = \int L_i(x|\phi)\pi_i(\phi)d\phi = \frac{1}{2c}\int^c_{-c}L_i(x|\phi)\]
- Therefore as \(c\rightarrow\infty\), \(m_i(x)\rightarrow 0\), resulting in an undefined Bayes factor.
- For the case of a nested model, \(B_{ij}\rightarrow\infty\)
- Therefore, Bayes factors struggle with weak priors (high c/variance), becoming undefined due to \(c_1/c_2\) ratio.
- Note that there is an exception for a parameter that is present in all models and hs the same prior under each model, therefore the factor cancels out perfectly in the Bayes factor ratio.
- Also note that Bayes factors can struggle with particularly strong data too
- Likelihood is negligible outside of a small interval around its maximum at \(\phi = \hat{\phi}\)
- Bayes factor becomes proportional to any perturbation in prior \(\pi(\phi)\) that alters its value at \(\phi = \hat{\phi}\) as:
\[ m_i(x) \approx \pi_i(\hat{\phi})\int L_i(x|\phi)d\phi\]
Bayes Factor Alternatives for Improper Priors
Partial Bayesian Factors
- Split data into \((x_T,x_R)\), with training data providing improved prior information
- Diminishes prior sensitivity
- Partial Bayes Factor is therefore:
- Both factors in fraction have the same problem factor, therefore it cancels out in the partial factor.
\[ B^{R|T}_{12} = \frac{B_{12}}{B_{12}^T}\]
- Where:
\[ B^{R|T}_{12} = \frac{m_1(x_R|x_T)}{m_2(x_R|x_T)},\quad B^T_{12} = \frac{m_1(x_T)}{m_2(x_T)}\]
- However choosing how to split the data becomes an issue. Two solutions have been proposed:
Intrinsic Bayes Factors
- Average partial bayes factor over all combination of minimal training samples
- \(n_T\) is the size of the minimum training set
- Arithmetic:
\[ B_{12}^{AI} = \left(\begin{matrix}n\\n_T\end{matrix}\right)^{-1}\sum_{x_T}B^{R|T}_{12}(x_T)\]
- Geometric:
\[ \left(\prod_{x_T}B^{R|T}_{12}(x_T)\right)^{\left(\begin{matrix}n\\n_T\end{matrix}\right)^{-1}}\]
Fractional Bayes Factors
- No explicit choice of \(x_T\), instead corresponds to an ‘idealised’ training sample:
\[ B^b_{12}=\frac{m^*_1(x)}{m^*_2(x)}\]
- Where:
- Numerator is the typical \(m_m(x)\)
\[ m_m^*(x) = \frac{\int L_m(x|\theta_m)\pi_m(\theta_m)d\theta_m}{\int [L_m(x|\theta_m)]^d\pi_m(\theta_m)d\theta_m},\quad 0<b<1\] * This formula cancels out the unknown normalising constants that were a problem, resulting in: + Note \(\pi_m(\theta_m)=c_mg_m(\theta_m)\)
\[ m_m^*(x) = \frac{L_m(x|\theta_m)^b\pi_m(\theta_m)d\theta_m}{\int [L_m(x|\theta_m)]^d\pi_m(\theta_m)d\theta_m}\]
- Have to however choose \(b\)
- \(b > \frac{n_{min}}{n}\) is rule of thumb