This document explains the Metropolis-Hastings (M-H) algorithm, focusing on:
We want to sample from a target distribution \(\pi(\theta) = p(\theta \mid y)\), which is often known only up to a normalizing constant:
\[p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)} \propto p(y \mid \theta) p(\theta)\]
The denominator \(p(y)\) is usually intractable, so we cannot sample directly.
The M-H algorithm constructs a Markov chain that has \(p(\theta \mid y)\) as its stationary distribution.
Initialize \(\theta^{(0)}\)
For \(t = 1, 2, \ldots, m\):
\[\alpha(\theta^{*}, \theta^{(t-1)}) = \min\left(1, \frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)} \times \frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\right)\]
The acceptance ratio consists of two parts:
\[\frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)} = \frac{p(y \mid \theta^{*}) p(\theta^{*})}{p(y \mid \theta^{(t-1)}) p(\theta^{(t-1)})}\]
This tells us whether the proposed \(\theta^{*}\) is more or less probable than the current \(\theta^{(t-1)}\) given the data \(y\).
\[\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\]
This corrects for any asymmetry in the proposal distribution.
A proposal distribution is symmetric if:
\[q(\theta^{*} \mid \theta^{(t-1)}) = q(\theta^{(t-1)} \mid \theta^{*}) \quad \text{for all } \theta^{*}, \theta^{(t-1)}\]
\[q(\theta^{*} \mid \theta^{(t-1)}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(\theta^{*} - \theta^{(t-1)})^2}{2\sigma^2}\right)\]
\[q(\theta^{(t-1)} \mid \theta^{*}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(\theta^{(t-1)} - \theta^{*})^2}{2\sigma^2}\right)\]
Since \((\theta^{*} - \theta^{(t-1)})^2 = (\theta^{(t-1)} - \theta^{*})^2\), we have:
\[q(\theta^{*} \mid \theta^{(t-1)}) = q(\theta^{(t-1)} \mid \theta^{*})\]
Therefore:
\[\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})} = 1\]
For symmetric proposals:
\[\alpha(\theta^{*}, \theta^{(t-1)}) = \min\left(1, \frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)}\right)\]
This special case is called the Metropolis algorithm.
| Proposal Distribution | Symmetric? | Notes |
|---|---|---|
| Normal\((\theta^{(t-1)}, \sigma^2)\) | ✅ Yes | Most common |
| Uniform\((\theta^{(t-1)}-\delta, \theta^{(t-1)}+\delta)\) | ✅ Yes | Random walk |
| Cauchy\((\theta^{(t-1)}, \gamma)\) | ✅ Yes | Heavy tails |
| Gamma\((a, b)\) | ❌ No | Asymmetric |
| Beta\((\alpha, \beta)\) | ❌ No | Asymmetric |
Even when the proposal ratio cancels, the proposal distribution remains essential:
| Role | Description | Does it cancel? |
|---|---|---|
| Generating candidates | \(\theta^{*} \sim q(\cdot \mid \theta^{(t-1)})\) | Never - without this, no candidates! |
| Hastings correction | \(\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\) | Only for symmetric proposals |
Observed frequencies of 197 animals across 4 categories:
| Category | Frequency (\(y_i\)) | Probability \(p(y_i \mid \pi)\) |
|---|---|---|
| 1 | 125 | \(\frac{1}{2} + \frac{\pi}{4} = \frac{2+\pi}{4}\) |
| 2 | 18 | \(\frac{1}{4}(1-\pi)\) |
| 3 | 20 | \(\frac{1}{4}(1-\pi)\) |
| 4 | 34 | \(\frac{\pi}{4}\) |
\[p(y \mid \pi) = \left(\frac{2+\pi}{4}\right)^{125} \left(\frac{1-\pi}{4}\right)^{38} \left(\frac{\pi}{4}\right)^{34}\]
\[= \frac{(2+\pi)^{125} (1-\pi)^{38} \pi^{34}}{4^{197}}\]
We use a \(\text{Uniform}(0,1)\) prior: \(p(\pi) = 1\) for \(\pi \in (0,1)\).
\[p(\pi \mid y) \propto (2+\pi)^{125} (1-\pi)^{38} \pi^{34}, \quad \pi \in (0,1)\]
\[q(\pi^{*} \mid \pi^{(t-1)}) \sim \text{Normal}(\pi^{(t-1)}, \sigma^2 = 0.01)\]
This is symmetric, so the Hastings ratio = 1.
\[r = \frac{p(\pi^{*} \mid y)}{p(\pi^{(t-1)} \mid y)} = \frac{(2+\pi^{*})^{125} (1-\pi^{*})^{38} (\pi^{*})^{34}}{(2+\pi^{(t-1)})^{125} (1-\pi^{(t-1)})^{38} (\pi^{(t-1)})^{34}}\]
# Data
y <- c(125, 18, 20, 34)
# M-H Algorithm parameters
pi.sim <- c()
pi.sim[1] <- 0.5 # Starting value
acceptance <- 0 # Counter for accepted proposals
sigma <- 0.1 # Proposal SD (tuning parameter)
m <- 10000 # Number of iterations
# Run M-H algorithm
for (i in 2:m) {
# Step 1: Propose a new value
pi.star <- rnorm(n = 1, mean = pi.sim[i-1], sd = sigma)
# Step 2: Compute acceptance ratio (posterior ratio only, since proposal is symmetric)
# Note: If pi.star is outside (0,1), set r = 0 to reject automatically
if (pi.star <= 0 || pi.star >= 1) {
r <- 0
} else {
r.numerator <- (2 + pi.star)^y[1] *
(1 - pi.star)^(y[2] + y[3]) *
pi.star^y[4]
r.denominator <- (2 + pi.sim[i-1])^y[1] *
(1 - pi.sim[i-1])^(y[2] + y[3]) *
pi.sim[i-1]^y[4]
r <- r.numerator / r.denominator
}
# Step 3: Accept or reject
u <- runif(n = 1, min = 0, max = 1)
if (u < r) {
pi.sim[i] <- pi.star
acceptance <- acceptance + 1
} else {
pi.sim[i] <- pi.sim[i-1]
}
}
# Discard burn-in
t0 <- 2000
pi.sim <- pi.sim[-c(1:t0)]
# Results
cat("Posterior mean of pi:", mean(pi.sim), "\n")
## Posterior mean of pi: 0.620068
cat("Posterior variance of pi:", var(pi.sim), "\n")
## Posterior variance of pi: 0.002474039
cat("Acceptance rate:", acceptance / (m - 1), "\n")
## Acceptance rate: 0.5126513
par(mfrow = c(1, 2))
# Trace plot
plot(pi.sim, type = "l", col = "steelblue",
xlab = "Iteration (after burn-in)",
ylab = expression(pi),
main = "Trace Plot of MCMC Chain")
# Histogram with density estimate
hist(pi.sim, breaks = 30, col = "lightblue",
xlab = expression(pi), ylab = "Frequency",
main = "Posterior Distribution of pi", prob = TRUE)
lines(density(pi.sim), col = "red", lwd = 2)
Suppose we used a Gamma proposal (asymmetric). Then we would need the full ratio:
# Asymmetric proposal: Gamma(shape = pi.sim[i-1] * 100, rate = 100)
# This requires the Hastings correction!
for (i in 2:m) {
# Propose from Gamma (asymmetric!)
pi.star <- rgamma(1, shape = pi.sim[i-1] * 100, rate = 100)
# Posterior ratio
posterior_ratio <- ((2 + pi.star)^125 * (1 - pi.star)^38 * pi.star^34) /
((2 + pi.sim[i-1])^125 * (1 - pi.sim[i-1])^38 * pi.sim[i-1]^34)
# Hastings ratio (CORRECTION FACTOR - does NOT cancel!)
q_ratio <- dgamma(pi.sim[i-1], shape = pi.star * 100, rate = 100) /
dgamma(pi.star, shape = pi.sim[i-1] * 100, rate = 100)
# Full M-H ratio
r <- posterior_ratio * q_ratio
# Rest of algorithm same...
}
The proposal distribution critically affects algorithm efficiency:
| \(\sigma\) | Acceptance Rate | Chain Behavior |
|---|---|---|
| 0.01 | ~99% | Slow exploration, high autocorrelation |
| 0.1 | ~35-50% | Optimal mixing |
| 0.5 | ~15% | Moderate rejection |
| 1.0 | ~5% | Most proposals rejected |
# Compare different proposal scales
sigmas <- c(0.01, 0.05, 0.1, 0.2, 0.5, 1.0)
accept_rates <- c()
for (sig in sigmas) {
pi.chain <- numeric(5000)
pi.chain[1] <- 0.5
acc <- 0
for (i in 2:5000) {
pi.star <- rnorm(1, pi.chain[i-1], sig)
# Reject invalid proposals (outside (0,1))
if (pi.star <= 0 || pi.star >= 1) {
r <- 0
} else {
r_num <- (2 + pi.star)^125 * (1 - pi.star)^38 * pi.star^34
r_den <- (2 + pi.chain[i-1])^125 * (1 - pi.chain[i-1])^38 * pi.chain[i-1]^34
r <- r_num / r_den
}
if (runif(1) < r) {
pi.chain[i] <- pi.star
acc <- acc + 1
} else {
pi.chain[i] <- pi.chain[i-1]
}
}
accept_rates <- c(accept_rates, acc/4999)
}
plot(sigmas, accept_rates, type = "b", pch = 19, col = "blue",
xlab = expression(sigma), ylab = "Acceptance Rate",
main = "Effect of Proposal Scale on Acceptance Rate")
abline(h = 0.44, col = "red", lty = 2)
text(0.6, 0.45, "Optimal ~0.44 for 1D", col = "red")
The full M-H acceptance ratio includes the Hastings correction \(\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\)
For symmetric proposals, this ratio equals 1, simplifying to just the posterior ratio \(\frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)}\)
The proposal distribution NEVER cancels in its role as a candidate generator - you always need it to propose new values
Even when symmetric, the proposal’s scale (\(\sigma\)) dramatically affects efficiency
Many author’s simplify the ratio for symmetric proposal (Normal proposal is symmetric),we should note the cancellation
Important: When using a normal proposal for parameters with bounded support (like \(\pi \in (0,1)\)), you must reject proposals outside the valid range by setting \(r = 0\)