Introduction

This document explains the Metropolis-Hastings (M-H) algorithm, focusing on:

  • The full acceptance ratio formula
  • Why the proposal density ratio cancels for symmetric proposals
  • The role of the proposal distribution even when it cancels
  • A complete worked example (genetic linkage model)

The Goal: Sampling from a Posterior Distribution

We want to sample from a target distribution \(\pi(\theta) = p(\theta \mid y)\), which is often known only up to a normalizing constant:

\[p(\theta \mid y) = \frac{p(y \mid \theta) p(\theta)}{p(y)} \propto p(y \mid \theta) p(\theta)\]

The denominator \(p(y)\) is usually intractable, so we cannot sample directly.

The Metropolis-Hastings Algorithm

The M-H algorithm constructs a Markov chain that has \(p(\theta \mid y)\) as its stationary distribution.

Algorithm Steps

  1. Initialize \(\theta^{(0)}\)

  2. For \(t = 1, 2, \ldots, m\):

    • Propose \(\theta^{*} \sim q(\theta^{*} \mid \theta^{(t-1)})\)
    • Compute the acceptance probability:

    \[\alpha(\theta^{*}, \theta^{(t-1)}) = \min\left(1, \frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)} \times \frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\right)\]

    • Draw \(u \sim \text{Uniform}(0, 1)\)
    • If \(u < \alpha\), accept: \(\theta^{(t)} = \theta^{*}\)
    • Else reject: \(\theta^{(t)} = \theta^{(t-1)}\)

The Two Key Components of the Acceptance Ratio

The acceptance ratio consists of two parts:

Part 1: The Posterior Ratio

\[\frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)} = \frac{p(y \mid \theta^{*}) p(\theta^{*})}{p(y \mid \theta^{(t-1)}) p(\theta^{(t-1)})}\]

This tells us whether the proposed \(\theta^{*}\) is more or less probable than the current \(\theta^{(t-1)}\) given the data \(y\).

Part 2: The Proposal (Hastings) Ratio

\[\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\]

This corrects for any asymmetry in the proposal distribution.

Symmetric Proposals: Why the Ratio Cancels

A proposal distribution is symmetric if:

\[q(\theta^{*} \mid \theta^{(t-1)}) = q(\theta^{(t-1)} \mid \theta^{*}) \quad \text{for all } \theta^{*}, \theta^{(t-1)}\]

Example: Normal Proposal

\[q(\theta^{*} \mid \theta^{(t-1)}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(\theta^{*} - \theta^{(t-1)})^2}{2\sigma^2}\right)\]

\[q(\theta^{(t-1)} \mid \theta^{*}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(\theta^{(t-1)} - \theta^{*})^2}{2\sigma^2}\right)\]

Since \((\theta^{*} - \theta^{(t-1)})^2 = (\theta^{(t-1)} - \theta^{*})^2\), we have:

\[q(\theta^{*} \mid \theta^{(t-1)}) = q(\theta^{(t-1)} \mid \theta^{*})\]

Therefore:

\[\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})} = 1\]

The Simplified Acceptance Probability

For symmetric proposals:

\[\alpha(\theta^{*}, \theta^{(t-1)}) = \min\left(1, \frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)}\right)\]

This special case is called the Metropolis algorithm.

Common Symmetric Proposals

Proposal Distribution Symmetric? Notes
Normal\((\theta^{(t-1)}, \sigma^2)\) ✅ Yes Most common
Uniform\((\theta^{(t-1)}-\delta, \theta^{(t-1)}+\delta)\) ✅ Yes Random walk
Cauchy\((\theta^{(t-1)}, \gamma)\) ✅ Yes Heavy tails
Gamma\((a, b)\) ❌ No Asymmetric
Beta\((\alpha, \beta)\) ❌ No Asymmetric

Important Distinction: The Proposal Has TWO Roles

Even when the proposal ratio cancels, the proposal distribution remains essential:

Role Description Does it cancel?
Generating candidates \(\theta^{*} \sim q(\cdot \mid \theta^{(t-1)})\) Never - without this, no candidates!
Hastings correction \(\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\) Only for symmetric proposals

Worked Example: Genetic Linkage Model

The Data

Observed frequencies of 197 animals across 4 categories:

Category Frequency (\(y_i\)) Probability \(p(y_i \mid \pi)\)
1 125 \(\frac{1}{2} + \frac{\pi}{4} = \frac{2+\pi}{4}\)
2 18 \(\frac{1}{4}(1-\pi)\)
3 20 \(\frac{1}{4}(1-\pi)\)
4 34 \(\frac{\pi}{4}\)

The Likelihood

\[p(y \mid \pi) = \left(\frac{2+\pi}{4}\right)^{125} \left(\frac{1-\pi}{4}\right)^{38} \left(\frac{\pi}{4}\right)^{34}\]

\[= \frac{(2+\pi)^{125} (1-\pi)^{38} \pi^{34}}{4^{197}}\]

The Prior

We use a \(\text{Uniform}(0,1)\) prior: \(p(\pi) = 1\) for \(\pi \in (0,1)\).

The Posterior

\[p(\pi \mid y) \propto (2+\pi)^{125} (1-\pi)^{38} \pi^{34}, \quad \pi \in (0,1)\]

The Proposal Distribution

\[q(\pi^{*} \mid \pi^{(t-1)}) \sim \text{Normal}(\pi^{(t-1)}, \sigma^2 = 0.01)\]

This is symmetric, so the Hastings ratio = 1.

The Acceptance Ratio

\[r = \frac{p(\pi^{*} \mid y)}{p(\pi^{(t-1)} \mid y)} = \frac{(2+\pi^{*})^{125} (1-\pi^{*})^{38} (\pi^{*})^{34}}{(2+\pi^{(t-1)})^{125} (1-\pi^{(t-1)})^{38} (\pi^{(t-1)})^{34}}\]

R Implementation

# Data
y <- c(125, 18, 20, 34)

# M-H Algorithm parameters
pi.sim <- c()
pi.sim[1] <- 0.5        # Starting value
acceptance <- 0         # Counter for accepted proposals
sigma <- 0.1            # Proposal SD (tuning parameter)
m <- 10000              # Number of iterations

# Run M-H algorithm
for (i in 2:m) {
  # Step 1: Propose a new value
  pi.star <- rnorm(n = 1, mean = pi.sim[i-1], sd = sigma)
  
  # Step 2: Compute acceptance ratio (posterior ratio only, since proposal is symmetric)
  # Note: If pi.star is outside (0,1), set r = 0 to reject automatically
  if (pi.star <= 0 || pi.star >= 1) {
    r <- 0
  } else {
    r.numerator <- (2 + pi.star)^y[1] * 
                   (1 - pi.star)^(y[2] + y[3]) * 
                   pi.star^y[4]
    
    r.denominator <- (2 + pi.sim[i-1])^y[1] * 
                     (1 - pi.sim[i-1])^(y[2] + y[3]) * 
                     pi.sim[i-1]^y[4]
    
    r <- r.numerator / r.denominator
  }
  
  # Step 3: Accept or reject
  u <- runif(n = 1, min = 0, max = 1)
  
  if (u < r) {
    pi.sim[i] <- pi.star
    acceptance <- acceptance + 1
  } else {
    pi.sim[i] <- pi.sim[i-1]
  }
}

# Discard burn-in
t0 <- 2000
pi.sim <- pi.sim[-c(1:t0)]

# Results
cat("Posterior mean of pi:", mean(pi.sim), "\n")
## Posterior mean of pi: 0.620068
cat("Posterior variance of pi:", var(pi.sim), "\n")
## Posterior variance of pi: 0.002474039
cat("Acceptance rate:", acceptance / (m - 1), "\n")
## Acceptance rate: 0.5126513

Visualizing the Results

par(mfrow = c(1, 2))

# Trace plot
plot(pi.sim, type = "l", col = "steelblue",
     xlab = "Iteration (after burn-in)", 
     ylab = expression(pi),
     main = "Trace Plot of MCMC Chain")

# Histogram with density estimate
hist(pi.sim, breaks = 30, col = "lightblue", 
     xlab = expression(pi), ylab = "Frequency",
     main = "Posterior Distribution of pi", prob = TRUE)
lines(density(pi.sim), col = "red", lwd = 2)

What If We Used an Asymmetric Proposal?

Suppose we used a Gamma proposal (asymmetric). Then we would need the full ratio:

# Asymmetric proposal: Gamma(shape = pi.sim[i-1] * 100, rate = 100)
# This requires the Hastings correction!

for (i in 2:m) {
  # Propose from Gamma (asymmetric!)
  pi.star <- rgamma(1, shape = pi.sim[i-1] * 100, rate = 100)
  
  # Posterior ratio
  posterior_ratio <- ((2 + pi.star)^125 * (1 - pi.star)^38 * pi.star^34) /
                     ((2 + pi.sim[i-1])^125 * (1 - pi.sim[i-1])^38 * pi.sim[i-1]^34)
  
  # Hastings ratio (CORRECTION FACTOR - does NOT cancel!)
  q_ratio <- dgamma(pi.sim[i-1], shape = pi.star * 100, rate = 100) /
             dgamma(pi.star, shape = pi.sim[i-1] * 100, rate = 100)
  
  # Full M-H ratio
  r <- posterior_ratio * q_ratio
  
  # Rest of algorithm same...
}

Why the Proposal Distribution Still Matters (Even When Symmetric)

The proposal distribution critically affects algorithm efficiency:

\(\sigma\) Acceptance Rate Chain Behavior
0.01 ~99% Slow exploration, high autocorrelation
0.1 ~35-50% Optimal mixing
0.5 ~15% Moderate rejection
1.0 ~5% Most proposals rejected
# Compare different proposal scales
sigmas <- c(0.01, 0.05, 0.1, 0.2, 0.5, 1.0)
accept_rates <- c()

for (sig in sigmas) {
  pi.chain <- numeric(5000)
  pi.chain[1] <- 0.5
  acc <- 0
  
  for (i in 2:5000) {
    pi.star <- rnorm(1, pi.chain[i-1], sig)
    
    # Reject invalid proposals (outside (0,1))
    if (pi.star <= 0 || pi.star >= 1) {
      r <- 0
    } else {
      r_num <- (2 + pi.star)^125 * (1 - pi.star)^38 * pi.star^34
      r_den <- (2 + pi.chain[i-1])^125 * (1 - pi.chain[i-1])^38 * pi.chain[i-1]^34
      r <- r_num / r_den
    }
    
    if (runif(1) < r) {
      pi.chain[i] <- pi.star
      acc <- acc + 1
    } else {
      pi.chain[i] <- pi.chain[i-1]
    }
  }
  accept_rates <- c(accept_rates, acc/4999)
}

plot(sigmas, accept_rates, type = "b", pch = 19, col = "blue",
     xlab = expression(sigma), ylab = "Acceptance Rate",
     main = "Effect of Proposal Scale on Acceptance Rate")
abline(h = 0.44, col = "red", lty = 2)
text(0.6, 0.45, "Optimal ~0.44 for 1D", col = "red")

Key Takeaways

  1. The full M-H acceptance ratio includes the Hastings correction \(\frac{q(\theta^{(t-1)} \mid \theta^{*})}{q(\theta^{*} \mid \theta^{(t-1)})}\)

  2. For symmetric proposals, this ratio equals 1, simplifying to just the posterior ratio \(\frac{p(\theta^{*} \mid y)}{p(\theta^{(t-1)} \mid y)}\)

  3. The proposal distribution NEVER cancels in its role as a candidate generator - you always need it to propose new values

  4. Even when symmetric, the proposal’s scale (\(\sigma\)) dramatically affects efficiency

  5. Many author’s simplify the ratio for symmetric proposal (Normal proposal is symmetric),we should note the cancellation

  6. Important: When using a normal proposal for parameters with bounded support (like \(\pi \in (0,1)\)), you must reject proposals outside the valid range by setting \(r = 0\)