A Gibbs sampler is an MCMC algorithm where parameters are sampled sequentially from their full conditional distributions. For parameters \(\theta_1, \theta_2, \ldots, \theta_k\), we sample:
\[ \begin{aligned} \theta_1^{(t+1)} &\sim p(\theta_1 \mid \theta_2^{(t)}, \theta_3^{(t)}, \ldots, \theta_k^{(t)}, y) \\ \theta_2^{(t+1)} &\sim p(\theta_2 \mid \theta_1^{(t+1)}, \theta_3^{(t)}, \ldots, \theta_k^{(t)}, y) \\ \vdots \\ \theta_k^{(t+1)} &\sim p(\theta_k \mid \theta_1^{(t+1)}, \theta_2^{(t+1)}, \ldots, \theta_{k-1}^{(t+1)}, y) \end{aligned} \]
Each step samples from the exact conditional posterior distribution, not an approximation.
We have data \(y_1, \ldots, y_n \stackrel{\text{iid}}{\sim} N(\theta, \sigma^2)\) with \(\sigma^2 = 1/\lambda^2\).
Priors:
From the joint posterior, the conditional distribution of \(\theta\) given \(\sigma^2\) (or \(\lambda^2\)) and the data is:
\[ \theta \mid \sigma^2, y \sim N\left(\mu_p, \frac{\sigma^2}{n + m}\right) \]
where the posterior mean is:
\[ \mu_p = \frac{n\bar{y} + m\mu_0}{n + m} \]
This is a normal distribution because: - The likelihood is normal in \(\theta\) - The prior is normal in \(\theta\) - Normal \(\times\) Normal = Normal
From the joint posterior, the conditional distribution of \(\lambda^2 = 1/\sigma^2\) given \(\theta\) and the data is:
\[ \lambda^2 \mid \theta, y \sim \text{Gamma}\left(\frac{n}{2} + a, \; b + \frac{1}{2}\sum_{i=1}^n (y_i - \theta)^2 + \frac{m}{2}(\theta - \mu_0)^2\right) \]
This is a Gamma distribution because: - The likelihood is normal, which gives a sum of squares term - The prior is Gamma - Normal likelihood \(\times\) Gamma prior = Gamma posterior for the precision
At iteration \(t\), we have current values \((\theta^{(t)}, \sigma^{2(t)})\). Then:
Step 1: Sample \(\theta^{(t+1)}\)
We draw from the conditional distribution:
\[ \theta^{(t+1)} \sim N\left(\mu_p, \frac{\sigma^{2(t)}}{n + m}\right) \]
Notice we condition on \(\sigma^{2(t)}\) — the current value from the previous iteration.
Step 2: Sample \(\sigma^{2(t+1)}\)
We first sample the precision:
\[ \lambda^{2(t+1)} \sim \text{Gamma}\left(\frac{n}{2} + a, \; b + \frac{1}{2}\sum_{i=1}^n (y_i - \theta^{(t+1)})^2 + \frac{m}{2}(\theta^{(t+1)} - \mu_0)^2\right) \]
Then set \(\sigma^{2(t+1)} = 1 / \lambda^{2(t+1)}\).
Notice we condition on \(\theta^{(t+1)}\) — the newly sampled value from Step 1.
| Gibbs Sampler Requirement | How Our Code Meets It |
|---|---|
| Sample from full conditional distributions | We sample \(\theta\) from \(p(\theta \mid \sigma^2, y)\) and \(\sigma^2\) from \(p(\sigma^2 \mid \theta, y)\) |
| Sequential updates | We update \(\theta\) first, then \(\sigma^2\) using the new \(\theta\) |
| Use most recent values | \(\theta^{(t+1)}\) uses \(\sigma^{2(t)}\); \(\sigma^{2(t+1)}\) uses \(\theta^{(t+1)}\) |
| Exact sampling (not Metropolis) | We use rnorm() and rgamma() which draw
directly from the exact distributions |
| Creates a Markov chain | Each draw \((\theta^{(t+1)}, \sigma^{2(t+1)})\) depends only on \((\theta^{(t)}, \sigma^{2(t)})\) |
The Gibbs sampler generates a sequence:
\[ (\theta^{(0)}, \sigma^{2(0)}) \rightarrow (\theta^{(1)}, \sigma^{2(1)}) \rightarrow (\theta^{(2)}, \sigma^{2(2)}) \rightarrow \cdots \]
where each new state depends only on the previous state. This is a Markov chain whose stationary distribution is the joint posterior:
\[ p(\theta, \sigma^2 \mid y) \]
After a sufficient number of iterations (burn-in), the draws are approximately from the posterior distribution.
After discarding the first \(B\) burn-in iterations, we estimate:
\[ \begin{aligned} \hat{E}[\theta \mid y] &= \frac{1}{N-B} \sum_{t=B+1}^N \theta^{(t)} \\ \hat{E}[\sigma^2 \mid y] &= \frac{1}{N-B} \sum_{t=B+1}^N \sigma^{2(t)} \\ \widehat{\text{Var}}[\theta \mid y] &= \frac{1}{N-B-1} \sum_{t=B+1}^N (\theta^{(t)} - \hat{E}[\theta \mid y])^2 \end{aligned} \]
The coefficient of variation is estimated as:
\[ \hat{E}\left[\frac{\sigma}{\theta} \mid y\right] = \frac{1}{N-B} \sum_{t=B+1}^N \frac{\sigma^{2(t)}}{\theta^{(t)}} \]
# Gibbs Sampler for Normal Distribution with Both Parameters Unknown
# New York Air Pollution Data
# Data
ny_pollution <- c(
45.1, 48.3, 54.7, 45.0, 43.9, 55.4, 51.1, 44.1, 44.8, 41.2,
45.9, 45.9, 48.9, 46.3, 50.0, 48.2, 45.6, 39.9, 54.3, 57.6,
48.3, 46.1, 53.0, 48.2, 44.4, 39.2, 52.8, 52.4
)
# Hyperparameters
m <- 1
mu0 <- mean(ny_pollution) # prior mean = sample mean
a <- 2
b <- 1
# Data statistics
n <- length(ny_pollution)
y_bar <- mean(ny_pollution)
# Gibbs sampling
N_iter <- 10000
burn_in <- 1000
theta_samples <- numeric(N_iter)
sigma2_samples <- numeric(N_iter)
# Initial values
theta_current <- y_bar
sigma2_current <- var(ny_pollution)
set.seed(123)
for (iter in 1:N_iter) {
# Sample θ
mu_n <- (n * y_bar + m * mu0) / (n + m)
var_n <- sigma2_current / (n + m)
theta_current <- rnorm(1, mean = mu_n, sd = sqrt(var_n))
# Sample σ²
shape <- n/2 + a
ss <- sum((ny_pollution - theta_current)^2)
rate <- b + 0.5 * ss + 0.5 * m * (theta_current - mu0)^2
sigma2_current <- 1 / rgamma(1, shape = shape, rate = rate)
theta_samples[iter] <- theta_current
sigma2_samples[iter] <- sigma2_current
}
# Remove burn-in
theta_post <- theta_samples[(burn_in+1):N_iter]
sigma2_post <- sigma2_samples[(burn_in+1):N_iter]
# Calculate estimates
E_theta <- mean(theta_post)
Var_theta <- var(theta_post)
E_sigma2 <- mean(sigma2_post)
Var_sigma2 <- var(sigma2_post)
# Coefficient of variation
cv_samples <- sqrt(sigma2_post) / theta_post
E_cv <- mean(cv_samples)
lower_cv <- quantile(cv_samples, 0.025)
upper_cv <- quantile(cv_samples, 0.975)
# Credible intervals
theta_lower <- quantile(theta_post, 0.025)
theta_upper <- quantile(theta_post, 0.975)
sigma2_lower <- quantile(sigma2_post, 0.025)
sigma2_upper <- quantile(sigma2_post, 0.975)
| Parameter | Quantity | Value |
|---|---|---|
| θ | Posterior Mean | 47.869 |
| θ | Posterior Variance | 0.694 |
| θ | 95% Credible Interval | (46.247, 49.522) |
| σ² | Posterior Mean | 20.567 |
| σ² | Posterior Variance | 31.132 |
| σ² | 95% Credible Interval | (12.399, 33.981) |
| σ/θ | Posterior Mean | 0.094 |
| σ/θ | 95% Credible Interval | (0.073, 0.122) |
| σ/θ | Posterior Variance | 0.000156 |
The code implements a Gibbs sampler because: