Gibbs Sampling Framework

A Gibbs sampler is an MCMC algorithm where parameters are sampled sequentially from their full conditional distributions. For parameters \(\theta_1, \theta_2, \ldots, \theta_k\), we sample:

\[ \begin{aligned} \theta_1^{(t+1)} &\sim p(\theta_1 \mid \theta_2^{(t)}, \theta_3^{(t)}, \ldots, \theta_k^{(t)}, y) \\ \theta_2^{(t+1)} &\sim p(\theta_2 \mid \theta_1^{(t+1)}, \theta_3^{(t)}, \ldots, \theta_k^{(t)}, y) \\ \vdots \\ \theta_k^{(t+1)} &\sim p(\theta_k \mid \theta_1^{(t+1)}, \theta_2^{(t+1)}, \ldots, \theta_{k-1}^{(t+1)}, y) \end{aligned} \]

Each step samples from the exact conditional posterior distribution, not an approximation.

Our Model

We have data \(y_1, \ldots, y_n \stackrel{\text{iid}}{\sim} N(\theta, \sigma^2)\) with \(\sigma^2 = 1/\lambda^2\).

Priors:

\(\theta \sim N(\mu_0, \sigma^2 / m)\) where \(m = 1\) and \(\mu_0 = \bar{y}\)
\(\lambda^2 \sim \text{Gamma}(a, b)\) where \(a = 2\), \(b = 1\)

The Two Full Conditional Distributions

Conditional for \(\theta\) (Equation 4.18)

From the joint posterior, the conditional distribution of \(\theta\) given \(\sigma^2\) (or \(\lambda^2\)) and the data is:

\[ \theta \mid \sigma^2, y \sim N\left(\mu_p, \frac{\sigma^2}{n + m}\right) \]

where the posterior mean is:

\[ \mu_p = \frac{n\bar{y} + m\mu_0}{n + m} \]

This is a normal distribution because: - The likelihood is normal in \(\theta\) - The prior is normal in \(\theta\) - Normal \(\times\) Normal = Normal

Conditional for \(\sigma^2\) (Equation 4.15)

From the joint posterior, the conditional distribution of \(\lambda^2 = 1/\sigma^2\) given \(\theta\) and the data is:

\[ \lambda^2 \mid \theta, y \sim \text{Gamma}\left(\frac{n}{2} + a, \; b + \frac{1}{2}\sum_{i=1}^n (y_i - \theta)^2 + \frac{m}{2}(\theta - \mu_0)^2\right) \]

This is a Gamma distribution because: - The likelihood is normal, which gives a sum of squares term - The prior is Gamma - Normal likelihood \(\times\) Gamma prior = Gamma posterior for the precision

The Gibbs Sampling Loop

At iteration \(t\), we have current values \((\theta^{(t)}, \sigma^{2(t)})\). Then:

Step 1: Sample \(\theta^{(t+1)}\)

We draw from the conditional distribution:

\[ \theta^{(t+1)} \sim N\left(\mu_p, \frac{\sigma^{2(t)}}{n + m}\right) \]

Notice we condition on \(\sigma^{2(t)}\) — the current value from the previous iteration.

Step 2: Sample \(\sigma^{2(t+1)}\)

We first sample the precision:

\[ \lambda^{2(t+1)} \sim \text{Gamma}\left(\frac{n}{2} + a, \; b + \frac{1}{2}\sum_{i=1}^n (y_i - \theta^{(t+1)})^2 + \frac{m}{2}(\theta^{(t+1)} - \mu_0)^2\right) \]

Then set \(\sigma^{2(t+1)} = 1 / \lambda^{2(t+1)}\).

Notice we condition on \(\theta^{(t+1)}\) — the newly sampled value from Step 1.

Why This Satisfies the Gibbs Sampler Definition

Gibbs Sampler Requirement	How Our Code Meets It
Sample from full conditional distributions	We sample \(\theta\) from \(p(\theta \mid \sigma^2, y)\) and \(\sigma^2\) from \(p(\sigma^2 \mid \theta, y)\)
Sequential updates	We update \(\theta\) first, then \(\sigma^2\) using the new \(\theta\)
Use most recent values	\(\theta^{(t+1)}\) uses \(\sigma^{2(t)}\); \(\sigma^{2(t+1)}\) uses \(\theta^{(t+1)}\)
Exact sampling (not Metropolis)	We use `rnorm()` and `rgamma()` which draw directly from the exact distributions
Creates a Markov chain	Each draw \((\theta^{(t+1)}, \sigma^{2(t+1)})\) depends only on \((\theta^{(t)}, \sigma^{2(t)})\)

The Markov Chain Property

The Gibbs sampler generates a sequence:

\[ (\theta^{(0)}, \sigma^{2(0)}) \rightarrow (\theta^{(1)}, \sigma^{2(1)}) \rightarrow (\theta^{(2)}, \sigma^{2(2)}) \rightarrow \cdots \]

where each new state depends only on the previous state. This is a Markov chain whose stationary distribution is the joint posterior:

\[ p(\theta, \sigma^2 \mid y) \]

After a sufficient number of iterations (burn-in), the draws are approximately from the posterior distribution.

Posterior Estimation

After discarding the first \(B\) burn-in iterations, we estimate:

\[ \begin{aligned} \hat{E}[\theta \mid y] &= \frac{1}{N-B} \sum_{t=B+1}^N \theta^{(t)} \\ \hat{E}[\sigma^2 \mid y] &= \frac{1}{N-B} \sum_{t=B+1}^N \sigma^{2(t)} \\ \widehat{\text{Var}}[\theta \mid y] &= \frac{1}{N-B-1} \sum_{t=B+1}^N (\theta^{(t)} - \hat{E}[\theta \mid y])^2 \end{aligned} \]

The coefficient of variation is estimated as:

\[ \hat{E}\left[\frac{\sigma}{\theta} \mid y\right] = \frac{1}{N-B} \sum_{t=B+1}^N \frac{\sigma^{2(t)}}{\theta^{(t)}} \]

R Implementation: Gibbs Sampler for New York Air Pollution Data

# Gibbs Sampler for Normal Distribution with Both Parameters Unknown
# New York Air Pollution Data

# Data
ny_pollution <- c(
  45.1, 48.3, 54.7, 45.0, 43.9, 55.4, 51.1, 44.1, 44.8, 41.2,
  45.9, 45.9, 48.9, 46.3, 50.0, 48.2, 45.6, 39.9, 54.3, 57.6,
  48.3, 46.1, 53.0, 48.2, 44.4, 39.2, 52.8, 52.4
)

# Hyperparameters
m <- 1
mu0 <- mean(ny_pollution)  # prior mean = sample mean
a <- 2
b <- 1

# Data statistics
n <- length(ny_pollution)
y_bar <- mean(ny_pollution)

# Gibbs sampling
N_iter <- 10000
burn_in <- 1000

theta_samples <- numeric(N_iter)
sigma2_samples <- numeric(N_iter)

# Initial values
theta_current <- y_bar
sigma2_current <- var(ny_pollution)

set.seed(123)

for (iter in 1:N_iter) {
  # Sample θ
  mu_n <- (n * y_bar + m * mu0) / (n + m)
  var_n <- sigma2_current / (n + m)
  theta_current <- rnorm(1, mean = mu_n, sd = sqrt(var_n))
  
  # Sample σ²
  shape <- n/2 + a
  ss <- sum((ny_pollution - theta_current)^2)
  rate <- b + 0.5 * ss + 0.5 * m * (theta_current - mu0)^2
  sigma2_current <- 1 / rgamma(1, shape = shape, rate = rate)
  
  theta_samples[iter] <- theta_current
  sigma2_samples[iter] <- sigma2_current
}

# Remove burn-in
theta_post <- theta_samples[(burn_in+1):N_iter]
sigma2_post <- sigma2_samples[(burn_in+1):N_iter]

# Calculate estimates
E_theta <- mean(theta_post)
Var_theta <- var(theta_post)
E_sigma2 <- mean(sigma2_post)
Var_sigma2 <- var(sigma2_post)

# Coefficient of variation
cv_samples <- sqrt(sigma2_post) / theta_post
E_cv <- mean(cv_samples)
lower_cv <- quantile(cv_samples, 0.025)
upper_cv <- quantile(cv_samples, 0.975)

# Credible intervals
theta_lower <- quantile(theta_post, 0.025)
theta_upper <- quantile(theta_post, 0.975)
sigma2_lower <- quantile(sigma2_post, 0.025)
sigma2_upper <- quantile(sigma2_post, 0.975)

Results

Gibbs Sampler Results for New York Air Pollution Data
Parameter	Quantity	Value
θ	Posterior Mean	47.869
θ	Posterior Variance	0.694
θ	95% Credible Interval	(46.247, 49.522)
σ²	Posterior Mean	20.567
σ²	Posterior Variance	31.132
σ²	95% Credible Interval	(12.399, 33.981)
σ/θ	Posterior Mean	0.094
σ/θ	95% Credible Interval	(0.073, 0.122)
σ/θ	Posterior Variance	0.000156

Summary

The code implements a Gibbs sampler because:

It alternates between sampling \(\theta\) and \(\sigma^2\)
Each sample is from the exact conditional posterior (equations 4.15 and 4.18)
It uses the most recent values — \(\theta\) uses current \(\sigma^2\), then \(\sigma^2\) uses the new \(\theta\)
The sequence \(\{(\theta^{(t)}, \sigma^{2(t)})\}\) forms a Markov chain converging to \(p(\theta, \sigma^2 \mid y)\)