Introduction to Bayesian Statistics

2025-04-13

Why Bayesian

Traditional statistics deals with mostly fixed parameters
in Bayesian statistics, parameters may be uncertain, uses probability
Bayesian stats updates beliefs with data, incorporates prior knowledge

Bayes’ Theorem

\[ P(\theta \mid D) = \frac{P(D \mid \theta) \cdot P(\theta)}{P(D)} \]

\(P(\theta)\): prior
\(P(D \mid \theta)\): likelihood
\(P(\theta \mid D)\): posterior
\(P(D)\): marginal likelihood

Bayesian Components

Prior: what you believe before seeing data
Likelihood: model for the data
Posterior: updated belief after seeing data

\[ \\{Posterior} = {Likelihood} * {Prior} \]

Simulating Bayes

With this plot we can see how the prior and likelihood combine to form a posterior distribution. The prior is a symmetric Beta(2,2) distribution showing initial uncertainty about the parameter “theta”, the likelihood shows a skewed (Beta(5,1)) dibstribution.

Coin Toss Simulation

Taking Bayes’ Theorem to the real-world, lets estimate the probability of heads in a coin toss. The prior is Beta(1,1), which indicates no prior knowledge or assumptions. After observing 7 heads and 3 tails, the posterior becomes Beta(8,4).

3D Posterior Viz

This plot visualizes a joint posterior distribution over two parameters, each of which have their own prior. Each axis is a parameter, the height of the surface represents the joint density, or likelihood, of an outcome. In many (or even most!) real-world problems, there are multiple parameters with uncertain outcomes. Multi-parameter Bayesian models can help visualize this!

Code to Simulate Inference for Two Unknown Probabilities

library(tidyr)
library(ggplot2)

# Prior parameters
alpha1 <- 2; beta1 <- 2 # Coin A
alpha2 <- 3; beta2 <- 3 # Coin B

# Observed data
headsA <- 8; tailsA <- 2
headsB <- 4; tailsB <- 6

# Posterior parameters
post_alpha1 <- alpha1 + headsA
post_beta1 <- beta1 + tailsA
post_alpha2 <- alpha2 + headsB
post_beta2 <- beta2 + tailsB

# grid it
theta1 <- seq(0, 1, length.out = 100)
theta2 <- seq(0, 1, length.out = 100)
posterior_grid <- expand.grid(theta1 = theta1, theta2 = theta2)

# joint posterior = product of 2 intependent beta distributions
posterior_grid$z <- dbeta(posterior_grid$theta1, post_alpha1, post_beta1) * dbeta(posterior_grid$theta2, post_alpha2, post_beta2)