Bayes Theorem
\[ P(A|B) = \frac{P(A,B)}{P(B)} = \frac{P(B|A)P(A)}{P(B)}\]
Bayes Theorem and Distributions
\[ P(\theta|x) = \frac{P(x|\theta)P(\theta)}{P(x)}\]
\[ \pi(\theta|x) = \frac{L(x|\theta)\pi(\theta)}{\pi(x)}\propto L(x|\theta)\pi(\theta)\]
Definitions
Note our main idea is to define parameters as distributions rather than fixed values that we do in frequentist statistics.
Prior Distribution - \(\pi(\theta)\)
- Describes knowledge about model parameters we have before we have seen the data
- Can be uniformative, or can be based on expert opinion
Posterior Distribution - \(\pi(\theta|x)\)
- Describes knowledge about model parameters we have after we have seen the data
- \(\pi(\theta)\rightarrow\pi(\theta|x)\) describes how our belief about model parameters have changed after observing \(x\)
Likelihood Function - \(L(x|\theta)\)
- Likelihood of observed data, \(x\sim f(X)\)
Normalisation Constant - \(\theta(x)\)
- Can often be ignored
Bayesian Updating Steps
- 1 - Specify a likelihood model \(L(x|\theta)\)
- 2 - Determine/elicit a suitable prior dist \(\pi(\theta)\)
- 3 - Calculation of posterior distribution through Bayes Theorem
\[ \pi(\theta|x) = \frac{L(x|\theta)\pi(\theta)}{\pi(x)}\propto L(x|\theta)\pi(\theta)\]
- 4 - Draw inference from prior
- Point estimates, credible intervals, hypothesis tests etc
Calculating Posterior Example
- Likelihood
\[ x_i\sim Poi(\theta),\quad\therefore\quad \pi(x|\theta) = \frac{\theta^x}{x!}\exp(-\theta),\quad\therefore\quad L(x|\theta) = \frac{\theta^{\sum_i x}\exp(-\theta n)}{\prod_i x}\]
- Prior
\[ \theta\sim Gamma(\alpha,\beta)\quad\therefore\quad\pi(\theta)=\frac{\beta^\alpha}{\Gamma(\alpha)}\theta^{\alpha-1}exp(-\beta\theta)\]
- Posterior Calculation:
\[ \pi(\theta|x)\propto \frac{\theta^{\sum_i x_i}\exp(-n\theta)}{\color{red}{\prod_ix_i!}}*\frac{\color{red}{\beta^\alpha}}{\color{red}{\Gamma(\alpha)}}\theta^{\alpha -1}exp(-\beta\theta)\] \[ \propto \theta^{\sum_i x_i}\exp(-n\theta)*\theta^{\alpha -1}\exp(-\beta\theta)\]
\[ \propto \theta^{\sum_i x_i+\alpha-1}\exp(-(n+\beta)\theta)\]
\[ \propto Gamma(\alpha+\sum^n_{i=1}x_i,\beta+n)\]
Posterior Inference
- Point Estimate (Mean) of \(\theta\):
\[ \bar{\theta}=\mathbb{E}_\pi[\theta]=\int_\theta\theta\pi(\theta|x)d\theta \]
- Variance of \(\theta\):
\[ \mathbf{V}ar(\theta) = \mathbb{E}_\pi[\theta^2] - (\mathbb{E}_\pi[\theta])^2\]
- \(95\%\) Credible Interval - Solve Numerically:
- Assuming \(0\) is lower bound of distribution
\[ \int^a_0 \pi(\theta|x)d\theta = 0.025,\quad \int^b_0 \pi(\theta|x)d\theta = 0.975\] * Predictive Distribution of Future Data: + Note \(y\) has the same distribution as \(x\), but with parameter \(\theta\)
\[ p(y|x) = \int_\theta\pi(y|\theta)\pi(\theta|x)d\theta\] ## Posterior Inference - Multivariate:
Consider a parameter vector \((\theta_1,\theta_2,\theta_3)\)
Marginal Distribution of \(\theta_1\):
\[ \int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3\]
- Point estimate (mean) of \(\theta_1\):
\[ \bar{\theta}_1 = \int_{\theta_1}\theta_1\left(\int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3\right)d\theta_1\]
- \(95\%\) Credible Interval - Solve Numerically:
- Assuming \(0\) is lower bound of distribution
\[ \int^a_0 \left(\int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3d\right)\theta_1 = 0.025 \]
\[ \int^b_0\left( \int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3d\right)\theta_1 = 0.975\]
- Predictive Distribution of Future Data:
\[ p(y|x) = \int_{\theta_1}\int_{\theta_2}\int_{\theta_3}\pi(y|\theta_1,\theta_2,\theta_3)\pi(\theta_1,\theta_2,\theta_3|x)d\theta_1d\theta_2d\theta_3\]
Credible Intervals
- Confidence Intervals:
- In the long run, \(95\%\) of CIs will contain the parameter
- Each CI either does or doesn’t contain the parameter
- In the long run, \(95\%\) of CIs will contain the parameter
- Credible Interval:
- There is a \(95\%\) chance that the parameter is in the single credible interval
- High density region intervals are credible intervals with the shortest width for a set \(\alpha\) level
Monte Carlo Integration Basics
The basis of Monte Carlo Integration is approximation of the mean through sampling distributions.
Approximation of mean:
\[ \mathbb{E}_\pi[\theta] = \int_\theta\theta\pi(\theta|x)d\theta\approx\frac{1}{N}\sum^N_{i=1}\theta^{(i)},\quad\theta^{(i)}\sim\pi(\theta|x)\]
- Can approximate other quantities similarly:
\[ P(\theta <-1) = \int_\theta\mathbf{1}(\theta <-1)\pi(\theta|x)d\theta = \mathbf{E}_\pi[\mathbf{1}(\theta <-1)]\]
\[ \approx \frac{1}{N}\sum^N_{i=1}\mathbf{1}(\theta^{(i)}<-1),\quad\theta^{(i)}\sim\pi(\theta|x)\]
Monte Carlo Posterior Predictive Distribution:
- Generate samples from \(p(y|x)\)
\[ p(y|x) = \int_\theta\pi(y|\theta)\pi(\theta|x)d\theta\]
Method:
Sample \(\theta^{(i)}\sim\pi(\theta|x)\)
For each \(\theta^{(i)}\), generate \(y^{(i)}\sim\theta(y|\theta^{(i)})\)
- This gives us joint samples \((\theta^{(i)},y^{(i)})\sim\pi(y|\theta)\pi(\theta|x)\)
To get samples from \(p(y)\), discard \(\theta^{(i)}\) values to leave \(y^{(i)}\sim p(y)\)
- This is analogous to marginalising/integrating out \(\theta\)
Multivariate Monte Carlo
- Computing marginal distributions from joint distribution:
\[ p(\theta_1|x)=\int_{\theta_2}\pi(\theta_1,\theta_2|x)d\theta_2\]
Method:
Generate samples \((\theta^{(i)}_1,\theta^{(i)}_2)\sim\pi(\theta_1,\theta_2|x)\)
‘Integrate’ over \(\theta^{(i)}_2\) by discarding \(\theta^{(i)}_2\) values
Construct histogram of \(\theta_1^{(1)},...,\theta^{(n)}_n\)
Distributions of Functions of Parameters
Method:
Generate samples \((\theta^{(i)}_1,\theta^{(i)}_2)\sim\pi(\theta_1,\theta_2|x)\)
Compute \((\theta^{(i)}_1,\theta^{(i)}_2)\rightarrow g(\theta^{(i)}_1,\theta^{(i)}_2)\)
Construct histogram \(g(\theta^{(1)}_1,\theta^{(1)}_2), ..., g(\theta^{(n)}_1,\theta^{(n)}_2)\)
Monte Carlo Error
- Estimation of quantities with random samples give random estimates, therefore there will be variability in estimates
- Known as Monte Carlo Error
- Decreases \(\propto \frac{1}{N}\)
- Normally have a target precision, with sampling done until this is reached.