Bayesian Inference

2024-09-23

Introduction

Statistical inference is the process of inferring information about a population based on samples taken from that population. Bayesian inference is a type of statistical inference that relies on the application of Bayes’ theorem. At the core of Bayesian inference is the notion that, when making an inference, we start out with certain initial assumptions or prior knowledge that we can update and improve when presented with new observations.

Bayes’ Theorem

The mathematical definition of conditional probability is as follows:

\[P(A|B)=\frac{P(A\cap B)}{P(B)}\]

This formula states that the probability of A given B is equal to the probability of A and B, divided by the probability of B (where A and B are both events). Relying on the commutativity of the intersection operator \(\scriptsize\text(P(A\cap B)=P(B\cap A)\text)\), we arrive at Bayes’ theorem through a simple algebraic manipulation of the conditional probability formula:

\[P(A|B)=\frac{P(B|A)P(A)}{P(B)}\]

Bayes’ Theorem

In the context of inferencing, the probabilities in Bayes’ theorem are probability functions (PDFs or PMFs) rather than singular values. They represent the four essential components of the Bayesian inferencing framework: the prior, the posterior, the likelihood, and the evidence. In this context, Bayes’ theorem is also often written with a slightly different convention:

\[\color{purple}{p(\theta|x)} = \frac{\color{teal}{p(x|\theta)}\color{olive}{p(\theta)}}{\color{orange}{p(x)}}\] \(\scriptsize\color{olive}{p(\theta)}\) is the prior: the probability of our prior belief or knowledge, represented by the parameter \(\scriptsize\theta\)

\(\scriptsize\color{purple}{p(\theta|x)}\) is the posterior: the probability of our updated knowledge, i.e., the probability of our prior belief given the observed data \(\scriptsize x\)

\(\scriptsize\color{teal}{p(x|\theta)}\) is the likelihood: the probability of the observed data given a value for the parameter

\(\scriptsize\color{orange}{p(x)}\) is the evidence: the probability of the observed data independent of the parameter

Computing Bayes’ Theorem

The evidence, \(\scriptsize p(x)\), is also known as the marginal likelihood. When the parameter \(\scriptsize\theta\) is continuous, it can be computed by integrating the likelihood function times the probability of the parameter over all possible values of the parameter:

\[p(x)=\int_{\theta}p(x|\theta)p(\theta)d\theta\]

When the parameter \(\scriptsize\theta\) is discrete, it can be computed by summing the likelihood function times the probability of the parameter over all possible values of the parameter:

\[p(x)=\sum_{\theta}p(x|\theta)p(\theta)\]

Bayesian Inference Example: Introduction

We will now go through a simple example of Bayesian inference in which we attempt to investigate the relationship between a person’s sex and their height. In particular, we want to obtain a probability distribution for the height of a person, given their sex (this is the posterior). Our parameter \(\scriptsize\theta\) represents a person’s sex (whether they are male or female), and our data \(\scriptsize x\) represents the distribution of heights that a person could have.

Bayesian Inference Example: The Prior

We need to generate a function that represents our prior, \(\scriptsize p(\theta)\). Because \(\scriptsize \theta\) is a discrete variable, this will be a probability mass function. Our initial belief is that there is an equal probability of any random person chosen from the population being either male or female. Our prior is therefore given by a discrete uniform distribution function:

\[p(\theta)=\frac{1}{n}=\frac{1}{2}\]

Bayesian Inference Example: The Likelihood

Now we will generate a function that represents the likelihood, \(\scriptsize p(x|\theta)\). Because height is a continuous variable, this will be a probability density function. We will assume, for the purposes of this demonstration, that our observed height data can be represented by a normal curve:

\[p(x|\theta=male)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}}=\frac{1}{3\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-70}{3})^{2}}\] \[p(x|\theta=female)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}}=\frac{1}{3\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-64}{3})^{2}}\]

Bayesian Inference Example: The Likelihood

Here, we see the probability distribution of height for both \(\scriptsize\theta=male\) and \(\scriptsize\theta=female\):

Bayesian Inference Example: The Evidence

Since \(\scriptsize\theta\), representing a person’s sex, is a discrete variable, we can compute the evidence (marginal likelihood) function using the summation formula:

\[\small p(x)=\sum_{\theta}p(x|\theta)p(\theta) = (\frac{1}{3\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-70}{3})^{2}})(0.5) + \frac{1}{3\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-64}{3})^{2}}(0.5)\]
In R, we create the function evidence_pdf to accomplish this:

evidence_pdf <- function(x)
{
  prior <- 0.5
  likelihood_male <- dnorm(x = x, mean = 70, sd = 3)
  likelihood_female <- dnorm(x = x, mean = 64, sd = 3)
  return ((likelihood_male * prior) + (likelihood_female * prior))
}

Bayesian Inference Example: The Evidence

Here, we see the probability distribution of height independent of the value of the parameter \(\scriptsize\theta\):

Bayesian Inference Example: The Posterior

In order to find the posterior, and to achieve our initial objective, which was to determine the probability that a person is male or female given their height, we create two functions in R. posterior_pdf_male computes \(\scriptsize p(\theta=male|x)\), while posterior_pdf_female computes \(\scriptsize p(\theta=female|x)\):

posterior_pdf_male <- function(x)
{
  prior <- 0.5
  likelihood_male <- dnorm(x = x, mean = 70, sd = 3)
  return ((likelihood_male * prior) / evidence_pdf(x))
}

posterior_pdf_female <- function(x)
{
  prior <- 0.5
  likelihood_female <- dnorm(x = x, mean = 64, sd = 3)
  return ((likelihood_female * prior) / evidence_pdf(x))
}

Bayesian Inference Example: The Posterior

Here, we see the probability distribution for the posterior. Note that we have chosen to compute the probability for both \(\scriptsize\theta=male\) and \(\scriptsize\theta=female\) given height \(x\). This was quite easy to do, but in a more complex problem where the parameter space was much larger, this could prove to be a far more difficult task.

Bayesian Inference Example: The Posterior

Let’s say, for example, that we want to know the likely sex of a person whose height is 71.1 inches. We can refer to the graph shown in the last slide, or we can compute the probabilities using our posterior probability functions.

posterior_pdf_male(71.1)

## [1] 0.9389651

posterior_pdf_female(71.1)

## [1] 0.06103485

There is about a 94% chance that a person who is 71.1 inches tall is male, and about a 6% chance that they are female. Naturally, these two probabilities add up to 1, as they cover every parameter in the parameter space of our problem.

References

Bayesian inference problem, MCMC and variational inference (https://towardsdatascience.com/bayesian-inference-problem-mcmc-and-variational-inference-25a8aa9bce29)
Bayesian Inference (https://www.wolfram.com/language/introduction-machine-learning/bayesian-inference/)
L14.4 The Bayesian Inference Framework (https://www.youtube.com/watch?v=0w_4QcvBYII)
Bayesian Statistics: A Beginner’s Guide (https://www.quantstart.com/articles/Bayesian-Statistics-A-Beginners-Guide/)
Bayesian inference (https://www.statlect.com/fundamentals-of-statistics/Bayesian-inference)
Distribution of adult heights (https://www.johndcook.com/blog/2008/11/25/distribution-of-adult-heights/)
Simple way to plot a normal distribution with ggplot2 (https://sebastiansauer.github.io/normal_curve_ggplot2/)
Building a nice legend with R and ggplot2 (https://r-graph-gallery.com/239-custom-layout-legend-ggplot2.html)