We want to sample from the posterior distribution:
\[ \pi(\theta \mid y) \propto \pi(\theta) \times L(\theta) \]
But this posterior is not a standard distribution we know how to sample from directly (not Normal, Beta, Gamma, etc.).
The proposal distribution \(q(\theta' \mid \theta)\) is our tool for exploring the parameter space. Think of it as:
We chose \(q(\theta' \mid \theta) = \pi(\theta')\) (the prior) as our proposal distribution.
Why would we do this? Because:
| Step | What happens |
|---|---|
| Current state | We’re at \(\theta\) |
| Propose | Use \(q\) to suggest a new \(\theta'\) |
| Accept/Reject | Use MH formula to decide whether to move there |
| Repeat | Eventually, the samples approximate \(\pi(\theta \mid y)\) |
Imagine you’re trying to map a mountain (the posterior):
You could choose: - Random jumps anywhere (independence sampler - what we did) - Small steps from current position (random walk) - Guided by gradient (Hamiltonian Monte Carlo)
The MH algorithm corrects for your exploration strategy so that you end up sampling from the true posterior.
You can’t explore the space! You need some rule for generating candidate values to try. The proposal distribution is that rule.
The proposal distribution is our chosen exploration strategy. We can pick any convenient distribution, and MH tells us how to accept/reject proposals so we eventually sample from the correct posterior.
Observations: \(y_1, \ldots, y_n \stackrel{\text{i.i.d.}}{\sim} \text{Bernoulli}(\theta)\)
Prior: \(\log\left( \frac{\theta}{1-\theta} \right) \sim \mathcal{N}(0, 1)\)
Prior density: \[ \pi(\theta) = \frac{1}{\sqrt{2\pi} \, \theta(1-\theta)} \exp\left[ -\frac12 \left( \log\frac{\theta}{1-\theta} \right)^2 \right], \quad 0<\theta<1 \]
Posterior: \[ \pi(\theta \mid y) \propto \pi(\theta) \times \theta^S (1-\theta)^{n-S} \] where \(S = \sum_{i=1}^n y_i\).
We choose \(q(\theta' \mid \theta) = \pi(\theta')\) (the prior).
To draw from this proposal: 1. Draw \(z \sim \mathcal{N}(0, 1)\) 2. Set \(\theta' = \frac{e^z}{1 + e^z}\)
\[ \alpha(\theta, \theta') = \min\left\{ 1, \frac{(\theta')^{S} (1-\theta')^{n-S}}{\theta^{S} (1-\theta)^{n-S}} \right\} \]
Initialization: 1. Draw \(z^{(0)} \sim \mathcal{N}(0, 1)\) 2. Set \(\theta^{(0)} = \frac{e^{z^{(0)}}}{1 + e^{z^{(0)}}}\)
For \(t = 1, 2, \ldots, T\):
# Generate synthetic data
set.seed(123)
n <- 100
true_theta <- 0.3
y <- rbinom(n, 1, true_theta)
S <- sum(y)
cat("Observed data: n =", n, ", S =", S, "\n")
## Observed data: n = 100 , S = 29
cat("True theta =", true_theta, "\n")
## True theta = 0.3
# Function to draw from prior (proposal distribution)
draw_from_prior <- function() {
z <- rnorm(1, 0, 1)
theta <- exp(z) / (1 + exp(z))
return(theta)
}
# Metropolis-Hastings independence sampler
mh_independence <- function(S, n, n_iter) {
# Initialize: draw from prior
theta_current <- draw_from_prior()
# Store samples
samples <- numeric(n_iter)
samples[1] <- theta_current
acceptance_count <- 0
# MCMC loop
for (t in 2:n_iter) {
# Propose from prior (independent of current)
theta_proposed <- draw_from_prior()
# Compute likelihood ratio
log_r <- S * (log(theta_proposed) - log(theta_current)) +
(n-S) * (log(1-theta_proposed) - log(1-theta_current))
r <- exp(min(log_r, 700)) # Cap to avoid Inf
alpha <- min(1, r)
# Accept/reject
if (runif(1) < alpha) {
theta_current <- theta_proposed
acceptance_count <- acceptance_count + 1
}
samples[t] <- theta_current
}
acceptance_rate <- acceptance_count / (n_iter - 1)
return(list(samples = samples, acceptance_rate = acceptance_rate))
}
# Run the sampler
n_iter <- 10000
results <- mh_independence(S, n, n_iter)
cat("\nAcceptance rate:", round(results$acceptance_rate * 100, 2), "%\n")
##
## Acceptance rate: 18.69 %
cat("Posterior mean:", round(mean(results$samples), 4), "\n")
## Posterior mean: 0.2979
cat("MLE:", round(S/n, 4), "\n")
## MLE: 0.29
# Trace plot
plot(results$samples, type = "l",
main = "Trace Plot of θ",
xlab = "Iteration", ylab = "θ",
col = "steelblue")
abline(h = true_theta, col = "red", lwd = 2, lty = 2)
# Histogram (after burn-in)
burn_in <- 1000
post_burnin <- results$samples[-(1:burn_in)]
hist(post_burnin, breaks = 50,
main = "Posterior Distribution of θ",
xlab = "θ", ylab = "Frequency",
col = "lightblue", border = "white")
abline(v = true_theta, col = "red", lwd = 2, lty = 2)
abline(v = mean(post_burnin), col = "darkgreen", lwd = 2)
# Autocorrelation
acf(post_burnin, main = "Autocorrelation of θ (after burn-in)")
This algorithm allows us to sample from an intractable posterior using only: - Easy draws from the prior - Likelihood calculations - The MH acceptance formula
| Method | Needs Proposal Distribution? | What It Uses Instead |
|---|---|---|
| Rejection Sampling | ✅ Yes | Proposal distribution \(g(x)\) (plus a constant \(M\)) |
| Metropolis-Hastings | ✅ Yes | Proposal distribution \(q(\theta' \mid \theta)\) |
| Importance Sampling | ✅ Yes | Importance distribution \(g(x)\) (for weighting) |
| Gibbs Sampling | ❌ No | Full conditional distributions (derived from model) |
Proposal: \(g(x)\) (easy to sample from)
How it works: - Sample \(x \sim g(x)\) - Sample \(u \sim \text{Uniform}(0, M g(x))\) - Accept if \(u < f(x)\) (target)
Why needed: You need some distribution to propose candidates from, then reject some to match target \(f(x)\).
Example (R code):
# Target: Beta(2,5) distribution
f <- function(x) dbeta(x, 2, 5)
# Proposal: Uniform(0,1)
g <- function(x) dunif(x, 0, 1)
# Find M (maximum of f/g)
M <- optimize(function(x) f(x)/g(x), interval=c(0,1), maximum=TRUE)$objective
# Rejection sampling
n_samples <- 10000
samples <- numeric(n_samples)
accepted <- 0
while(accepted < n_samples) {
x <- runif(1) # Sample from proposal
u <- runif(1, 0, M * g(x))
if(u < f(x)) {
accepted <- accepted + 1
samples[accepted] <- x
}
}
hist(samples, breaks=50, main="Rejection Sampling from Beta(2,5)",
xlab="x", col="lightblue", probability=TRUE)
curve(dbeta(x,2,5), add=TRUE, col="red", lwd=2)
Proposal: \(q(\theta' \mid \theta)\) (your exploration strategy)
How it works: - Propose \(\theta' \sim q(\cdot \mid \theta)\) - Accept with probability \(\alpha = \min\left(1, \frac{f(\theta') q(\theta \mid \theta')}{f(\theta) q(\theta' \mid \theta)}\right)\)
Why needed: You need a way to explore the parameter space. The proposal can be anything (random walk, independence sampler, etc.).
Example (Random Walk MH):
# Target: N(0,1)
target <- function(x) dnorm(x, 0, 1)
# Proposal: Random walk N(current, 0.5^2)
propose <- function(current) {
rnorm(1, current, 0.5)
}
# MH algorithm
n_iter <- 10000
samples <- numeric(n_iter)
samples[1] <- 0 # Initial value
accept_count <- 0
for(t in 2:n_iter) {
# Propose
theta_prime <- propose(samples[t-1])
# Compute acceptance probability
alpha <- min(1, target(theta_prime) / target(samples[t-1]))
# Accept/reject
if(runif(1) < alpha) {
samples[t] <- theta_prime
accept_count <- accept_count + 1
} else {
samples[t] <- samples[t-1]
}
}
cat("Acceptance rate:", accept_count/(n_iter-1)*100, "%\n")
## Acceptance rate: 84.48845 %
hist(samples, breaks=50, main="MH Sampling from N(0,1)",
xlab="x", col="lightblue", probability=TRUE)
curve(dnorm(x), add=TRUE, col="red", lwd=2)
Proposal: \(g(x)\) (importance distribution)
How it works: - Sample \(x_i \sim g(x)\) - Compute weights \(w_i = f(x_i) / g(x_i)\) - Estimate \(E[h(X)] = \frac{\sum w_i h(x_i)}{\sum w_i}\)
Why needed: You sample from \(g\) instead of target \(f\), then reweight. No acceptance/rejection.
Example:
# Target: Beta(2,5)
f <- function(x) dbeta(x, 2, 5)
# Proposal: Uniform(0,1)
g <- function(x) dunif(x, 0, 1)
# Importance sampling
n_samples <- 10000
x <- runif(n_samples) # Sample from proposal
weights <- f(x) / g(x)
weights <- weights / sum(weights) # Normalize
# Estimate mean
est_mean <- sum(x * weights)
true_mean <- 2/(2+5) # Beta(2,5) mean
cat("True mean:", true_mean, "\n")
## True mean: 0.2857143
cat("Importance sampling estimate:", est_mean, "\n")
## Importance sampling estimate: 0.2843986
# Weighted histogram
hist(x, breaks=50, main="Importance Sampling from Beta(2,5)",
xlab="x", col="lightblue", probability=TRUE, weights=weights)
## Warning in plot.window(xlim, ylim, log, ...): "weights" is not a graphical
## parameter
## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...):
## "weights" is not a graphical parameter
## Warning in axis(1, ...): "weights" is not a graphical parameter
## Warning in axis(2, at = yt, ...): "weights" is not a graphical parameter
curve(dbeta(x,2,5), add=TRUE, col="red", lwd=2)
What it uses: Full conditional distributions
How it works (for parameters \(\theta_1, \theta_2, \ldots, \theta_k\)):
For \(t = 1, 2, \ldots\): - Draw \(\theta_1^{(t)} \sim \pi(\theta_1 \mid \theta_2^{(t-1)}, \theta_3^{(t-1)}, \ldots, y)\) - Draw \(\theta_2^{(t)} \sim \pi(\theta_2 \mid \theta_1^{(t)}, \theta_3^{(t-1)}, \ldots, y)\) - Draw \(\theta_3^{(t)} \sim \pi(\theta_3 \mid \theta_1^{(t)}, \theta_2^{(t)}, \ldots, y)\) - And so on…
Why no proposal needed: - You sample directly from full conditionals - Every sample is accepted automatically (acceptance probability = 1) - But you need to be able to sample from these conditionals
Example (Bivariate Normal):
# Target: Bivariate Normal with correlation
# Using Gibbs sampling
n_iter <- 5000
n_burnin <- 1000
samples <- matrix(0, nrow=n_iter, ncol=2)
colnames(samples) <- c("x", "y")
# Parameters
mu_x <- 0
mu_y <- 0
sigma_x <- 1
sigma_y <- 1
rho <- 0.8
# Initialize
samples[1,] <- c(0, 0)
# Gibbs sampling
for(t in 2:n_iter) {
# Sample x given y
mu_x_given_y <- mu_x + rho * (sigma_x/sigma_y) * (samples[t-1,2] - mu_y)
sigma_x_given_y <- sigma_x * sqrt(1 - rho^2)
samples[t,1] <- rnorm(1, mu_x_given_y, sigma_x_given_y)
# Sample y given x
mu_y_given_x <- mu_y + rho * (sigma_y/sigma_x) * (samples[t,1] - mu_x)
sigma_y_given_x <- sigma_y * sqrt(1 - rho^2)
samples[t,2] <- rnorm(1, mu_y_given_x, sigma_y_given_x)
}
# Plot results
par(mfrow=c(1,2))
plot(samples[(n_burnin+1):n_iter,], pch=20, cex=0.5,
main="Gibbs Samples from Bivariate Normal",
xlab="x", ylab="y", col="steelblue")
hist(samples[(n_burnin+1):n_iter,1], breaks=50,
main="Marginal of x", xlab="x", col="lightblue", probability=TRUE)
curve(dnorm(x, mu_x, sigma_x), add=TRUE, col="red", lwd=2)
Rejection Sampling:
Target f(x) ←→ Proposal g(x) [NEEDED]
↓
Accept/Reject
Metropolis-Hastings:
Target f(θ) ←→ Proposal q(θ'|θ) [NEEDED]
↓
Accept with α
Importance Sampling:
Target f(x) ←→ Proposal g(x) [NEEDED]
↓
Weight samples
Gibbs Sampling:
Target π(θ|y) ←→ Full conditionals [NO PROPOSAL]
↓
Direct sampling
| Method | Best when… |
|---|---|
| Rejection Sampling | Target is bounded, proposal is close to target, low dimension |
| Metropolis-Hastings | General purpose, any target, any dimension |
| Importance Sampling | Estimating expectations, target hard to sample but easy to evaluate |
| Gibbs Sampling | Full conditionals are known and easy to sample from |
Rejection, MH, and Importance all need a proposal/importance distribution because they sample from something other than the target
Gibbs samples directly from pieces of the target (the conditionals), so no proposal needed
The choice of proposal distribution dramatically affects:
Gibbs is the special one that doesn’t require a user-specified proposal distribution - but it requires deriving full conditionals!
| Method | Proposal Needed? | Acceptance Step? | Can Sample Any Target? |
|---|---|---|---|
| Rejection Sampling | Yes | Yes (accept/reject) | No (needs bounded f/g) |
| Metropolis-Hastings | Yes | Yes (probabilistic) | Yes (with good proposal) |
| Importance Sampling | Yes | No (uses weights) | Yes (but high variance possible) |
| Gibbs Sampling | No | No (always accept) | No (needs conditional forms) |