Intro_Bayesian_Inference

Jake

01/10/2022

Bayes Theorem

\[ P(A|B) = \frac{P(A,B)}{P(B)} = \frac{P(B|A)P(A)}{P(B)}\]

Bayes Theorem and Distributions

\[ P(\theta|x) = \frac{P(x|\theta)P(\theta)}{P(x)}\]

\[ \pi(\theta|x) = \frac{L(x|\theta)\pi(\theta)}{\pi(x)}\propto L(x|\theta)\pi(\theta)\]

Definitions

  • Note our main idea is to define parameters as distributions rather than fixed values that we do in frequentist statistics.

  • Prior Distribution - \(\pi(\theta)\)

    • Describes knowledge about model parameters we have before we have seen the data
    • Can be uniformative, or can be based on expert opinion
  • Posterior Distribution - \(\pi(\theta|x)\)

    • Describes knowledge about model parameters we have after we have seen the data
    • \(\pi(\theta)\rightarrow\pi(\theta|x)\) describes how our belief about model parameters have changed after observing \(x\)
  • Likelihood Function - \(L(x|\theta)\)

    • Likelihood of observed data, \(x\sim f(X)\)
  • Normalisation Constant - \(\theta(x)\)

    • Can often be ignored

Bayesian Updating Steps

  • 1 - Specify a likelihood model \(L(x|\theta)\)
  • 2 - Determine/elicit a suitable prior dist \(\pi(\theta)\)
  • 3 - Calculation of posterior distribution through Bayes Theorem

\[ \pi(\theta|x) = \frac{L(x|\theta)\pi(\theta)}{\pi(x)}\propto L(x|\theta)\pi(\theta)\]

  • 4 - Draw inference from prior
    • Point estimates, credible intervals, hypothesis tests etc

Calculating Posterior Example

  • Likelihood

\[ x_i\sim Poi(\theta),\quad\therefore\quad \pi(x|\theta) = \frac{\theta^x}{x!}\exp(-\theta),\quad\therefore\quad L(x|\theta) = \frac{\theta^{\sum_i x}\exp(-\theta n)}{\prod_i x}\]

  • Prior

\[ \theta\sim Gamma(\alpha,\beta)\quad\therefore\quad\pi(\theta)=\frac{\beta^\alpha}{\Gamma(\alpha)}\theta^{\alpha-1}exp(-\beta\theta)\]

  • Posterior Calculation:

\[ \pi(\theta|x)\propto \frac{\theta^{\sum_i x_i}\exp(-n\theta)}{\color{red}{\prod_ix_i!}}*\frac{\color{red}{\beta^\alpha}}{\color{red}{\Gamma(\alpha)}}\theta^{\alpha -1}exp(-\beta\theta)\] \[ \propto \theta^{\sum_i x_i}\exp(-n\theta)*\theta^{\alpha -1}\exp(-\beta\theta)\]

\[ \propto \theta^{\sum_i x_i+\alpha-1}\exp(-(n+\beta)\theta)\]

\[ \propto Gamma(\alpha+\sum^n_{i=1}x_i,\beta+n)\]

Posterior Inference

  • Point Estimate (Mean) of \(\theta\):

\[ \bar{\theta}=\mathbb{E}_\pi[\theta]=\int_\theta\theta\pi(\theta|x)d\theta \]

  • Variance of \(\theta\):

\[ \mathbf{V}ar(\theta) = \mathbb{E}_\pi[\theta^2] - (\mathbb{E}_\pi[\theta])^2\]

  • \(95\%\) Credible Interval - Solve Numerically:
    • Assuming \(0\) is lower bound of distribution

\[ \int^a_0 \pi(\theta|x)d\theta = 0.025,\quad \int^b_0 \pi(\theta|x)d\theta = 0.975\] * Predictive Distribution of Future Data: + Note \(y\) has the same distribution as \(x\), but with parameter \(\theta\)

\[ p(y|x) = \int_\theta\pi(y|\theta)\pi(\theta|x)d\theta\] ## Posterior Inference - Multivariate:

  • Consider a parameter vector \((\theta_1,\theta_2,\theta_3)\)

  • Marginal Distribution of \(\theta_1\):

\[ \int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3\]

  • Point estimate (mean) of \(\theta_1\):

\[ \bar{\theta}_1 = \int_{\theta_1}\theta_1\left(\int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3\right)d\theta_1\]

  • \(95\%\) Credible Interval - Solve Numerically:
    • Assuming \(0\) is lower bound of distribution

\[ \int^a_0 \left(\int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3d\right)\theta_1 = 0.025 \]

\[ \int^b_0\left( \int_{\theta_2}\int_{\theta_3}\pi(\theta_1,\theta_2,\theta_3)d\theta_2d\theta_3d\right)\theta_1 = 0.975\]

  • Predictive Distribution of Future Data:

\[ p(y|x) = \int_{\theta_1}\int_{\theta_2}\int_{\theta_3}\pi(y|\theta_1,\theta_2,\theta_3)\pi(\theta_1,\theta_2,\theta_3|x)d\theta_1d\theta_2d\theta_3\]

Credible Intervals

  • Confidence Intervals:
    • In the long run, \(95\%\) of CIs will contain the parameter
      • Each CI either does or doesn’t contain the parameter
  • Credible Interval:
    • There is a \(95\%\) chance that the parameter is in the single credible interval
    • High density region intervals are credible intervals with the shortest width for a set \(\alpha\) level

Monte Carlo Integration Basics

  • The basis of Monte Carlo Integration is approximation of the mean through sampling distributions.

  • Approximation of mean:

\[ \mathbb{E}_\pi[\theta] = \int_\theta\theta\pi(\theta|x)d\theta\approx\frac{1}{N}\sum^N_{i=1}\theta^{(i)},\quad\theta^{(i)}\sim\pi(\theta|x)\]

  • Can approximate other quantities similarly:

\[ P(\theta <-1) = \int_\theta\mathbf{1}(\theta <-1)\pi(\theta|x)d\theta = \mathbf{E}_\pi[\mathbf{1}(\theta <-1)]\]

\[ \approx \frac{1}{N}\sum^N_{i=1}\mathbf{1}(\theta^{(i)}<-1),\quad\theta^{(i)}\sim\pi(\theta|x)\]

Monte Carlo Posterior Predictive Distribution:

  • Generate samples from \(p(y|x)\)

\[ p(y|x) = \int_\theta\pi(y|\theta)\pi(\theta|x)d\theta\]

  • Method:

  • Sample \(\theta^{(i)}\sim\pi(\theta|x)\)

  • For each \(\theta^{(i)}\), generate \(y^{(i)}\sim\theta(y|\theta^{(i)})\)

    • This gives us joint samples \((\theta^{(i)},y^{(i)})\sim\pi(y|\theta)\pi(\theta|x)\)
  • To get samples from \(p(y)\), discard \(\theta^{(i)}\) values to leave \(y^{(i)}\sim p(y)\)

    • This is analogous to marginalising/integrating out \(\theta\)

Multivariate Monte Carlo

  • Computing marginal distributions from joint distribution:

\[ p(\theta_1|x)=\int_{\theta_2}\pi(\theta_1,\theta_2|x)d\theta_2\]

  • Method:

  • Generate samples \((\theta^{(i)}_1,\theta^{(i)}_2)\sim\pi(\theta_1,\theta_2|x)\)

  • ‘Integrate’ over \(\theta^{(i)}_2\) by discarding \(\theta^{(i)}_2\) values

  • Construct histogram of \(\theta_1^{(1)},...,\theta^{(n)}_n\)

Distributions of Functions of Parameters

  • Method:

  • Generate samples \((\theta^{(i)}_1,\theta^{(i)}_2)\sim\pi(\theta_1,\theta_2|x)\)

  • Compute \((\theta^{(i)}_1,\theta^{(i)}_2)\rightarrow g(\theta^{(i)}_1,\theta^{(i)}_2)\)

  • Construct histogram \(g(\theta^{(1)}_1,\theta^{(1)}_2), ..., g(\theta^{(n)}_1,\theta^{(n)}_2)\)

Monte Carlo Error

  • Estimation of quantities with random samples give random estimates, therefore there will be variability in estimates
    • Known as Monte Carlo Error
    • Decreases \(\propto \frac{1}{N}\)
    • Normally have a target precision, with sampling done until this is reached.