Suppose you want to sample from a joint distribution \(p(a, b)\), but it’s difficult to sample directly. If you can compute:
then you can sample alternately from these conditional distributions to simulate from the joint.
As \(t \to \infty\), the samples \((a^{(t)}, b^{(t)})\) approximate the target joint distribution.
# Number of iterations
n_iter <- 6000
# Storage vectors
a_vals <- numeric(n_iter)
b_vals <- numeric(n_iter)
# Initial value (within support)
b_vals[1] <- 0.1
# Gibbs sampler
for (t in 2:n_iter) {
# Sample a | b
a_max <- 1 - 3 * b_vals[t - 1]
a_vals[t] <- runif(1, min = 0, max = a_max)
# Sample b | a
b_max <- (1 - a_vals[t]) / 3
b_vals[t] <- runif(1, min = 0, max = b_max)
}
# Plotting trace and scatter
par(mfrow = c(2, 2))
plot(a_vals, type = 'l', col = 'blue', main = 'Trace plot of a')
plot(b_vals, type = 'l', col = 'darkgreen', main = 'Trace plot of b')
plot(a_vals, b_vals, pch = 16, cex = 0.5, col = 'purple',
main = 'Samples in (a,b)', xlab = 'a', ylab = 'b')
hist(a_vals, breaks = 30, col = 'skyblue', main = 'Histogram of a')
Estimate E(A) and E(B)
Early draws might still reflect the arbitrary starting point. After some iterations, the chain forgets where it started and starts producing samples from the true joint distribution. Burn-in ensures we only keep samples that reflect the long-run behavior.
burn_in <- 1000 # discard initial values (burn-in period)
# Use post-burn-in samples
mean_a <- mean(a_vals[(burn_in + 1):n_iter])
mean_b <- mean(b_vals[(burn_in + 1):n_iter])
cat("Estimated E(A):", round(mean_a, 4), "\n")
## Estimated E(A): 0.333
cat("Estimated E(B):", round(mean_b, 4), "\n")
## Estimated E(B): 0.1119
The averages lie comfortably within those bounds, and reflect the triangular region’s skewed shape.
Estimate E(AB) : we need to Compute the elementwise product of A and B and average it over the post-burn-in samples:
# Post-burn-in product
ab_vals <- a_vals[(burn_in + 1):n_iter] * b_vals[(burn_in + 1):n_iter]
mean_ab <- mean(ab_vals)
cat("Estimated E(AB):", round(mean_ab, 4), "\n")
## Estimated E(AB): 0.0277
AR sampling lets you simulate from a complex distribution by:
Sampling from an easy one,
Keeping only samples that fall under the shape of the target distribution.
## Acceptance-Rejection Estimates
# Generate uniform proposals in bounding box
ar_sample <- runif(2 * 5000, 0, c(1, 1/3)) |>
matrix(5000, 2, byrow = TRUE,
dimnames = list(draws = 1:5000, variables = c("A", "B"))) |>
as.data.frame() |>
subset(A + 3 * B < 1)
# Estimate expectations from accepted draws
ar_EA <- mean(ar_sample$A)
ar_EB <- mean(ar_sample$B)
ar_EAB <- mean(ar_sample$A * ar_sample$B)
cat("AR Estimated E(A):", round(ar_EA, 4), "\n")
## AR Estimated E(A): 0.3338
cat("AR Estimated E(B):", round(ar_EB, 4), "\n")
## AR Estimated E(B): 0.1102
cat("AR Estimated E(AB):", round(ar_EAB, 4), "\n")
## AR Estimated E(AB): 0.0273
The differences are tiny, which is what you’d expect if both methods are working correctly.Slight variation is normal due to randomness.
If repeated with more iterations, the estimates would likely converge even closer.
Both methods are unbiased, but Gibbs is often more efficient in higher dimensions, while AR is easier in bounded regions like your triangle.
| Assignment.1..Q2. | Assignment.2..Q2. |
|---|---|
| Assumes normal data with known variance | Generalizes to unknown variance |
| Uses normal prior on \(\mu\) | Adds exponential prior on \(\sigma^{-2}\) |
| Derives posterior for \(\mu \mid y\) | Derives joint posterior for \((\mu, \sigma^2) \mid y\) |
| Closed-form result for \(\mu \mid y\) | Uses normal-inverse gamma posterior structure |
| Mentions flat prior as alternative | Uses prior structure to simulate in Bayesian hierarchy |
# Load libraries
library(ggplot2)
library(extraDistr) # for dinvgamma()
library(pracma) #Normalize prior
# Set true parameters and sample size
set.seed(123)
n <- 10
mu_true <- 2
sigma2_true <- 1 # true value of sigma^2
# Simulate data
y <- rnorm(n, mean = mu_true, sd = sqrt(sigma2_true))
y_bar <- mean(y)
s2 <- var(y)
# Prior for sigma^2: exponential on precision --> p(sigma^2) = (1/sigma^2)^2 * exp(-1/sigma^2)
prior_density <- function(s2) {
(1 / s2^2) * exp(-1 / s2)
}
# Posterior parameters (Inverse-Gamma)
shape_post <- (n + 2) / 2
rate_post <- 0.5 * (2 + (n - 1) * s2 + (n / (n + 1)) * y_bar^2)
# Define range for plotting
s2_vals <- seq(0.01, 4, length.out = 500)
# Densities
prior_vals <- prior_density(s2_vals)
posterior_vals <- dinvgamma(s2_vals, shape_post, rate_post)
# Normalize prior (optional)
prior_vals <- prior_vals / trapz(s2_vals, prior_vals)
# Create data frame
df <- data.frame(
s2 = s2_vals,
Prior = prior_vals,
Posterior = posterior_vals
)
# Plot
ggplot(df, aes(x = s2)) +
geom_line(aes(y = Prior), color = "blue", linewidth = 1, linetype = "dashed") +
geom_line(aes(y = Posterior), color = "darkgreen", linewidth = 1) +
geom_vline(xintercept = sigma2_true, color = "red", linetype = "dotted", linewidth = 1) +
labs(
title = expression("Prior and Posterior of " ~ sigma^2),
x = expression(sigma^2),
y = "Density"
) +
theme_minimal() +
annotate("text", x = sigma2_true + 0.2, y = max(posterior_vals),
label = expression(sigma[true]^2), color = "red")
The true value lies close to the mode of the posterior distribution, indicating that the observed data (even with only \(n=10\)) was informative enough to shift the posterior away from the prior and center it near the actual generating value. This demonstrates effective Bayesian learning from data.
Drawing from the Marginal Posterior of \(\mu\)
We draw samples from the marginal posterior of \(\mu\) by numerically integrating out \(\sigma^2\) using sampling:
That is, we proceed as follows:
Sample \(\sigma^2 \sim p(\sigma^2 \mid y)\), which is an inverse-gamma distribution: \[ \sigma^2 \mid y \sim \mathcal{G}^{-1}\left( \frac{n+2}{2},\ \frac{1}{2} \left[2 + (n - 1)s^2 + \frac{n}{n+1} \bar{y}^2 \right] \right) \]
Then sample: \[ \mu \mid y, \sigma^2 \sim \mathcal{N}\left( \frac{n}{n+1} \bar{y},\ \frac{1}{n+1} \sigma^2 \right) \]
By repeating these two steps many times, we generate Monte Carlo draws from the marginal posterior distribution \(p(\mu \mid y)\).
# Gibbs-like sampling:
set.seed(123)
# Number of samples
N <- 5000
# Preallocate vector for mu draws
mu_draws <- numeric(N)
# Mean and variance of conditional posterior of mu given sigma^2
mu_mean_factor <- n / (n + 1)
mu_var_factor <- 1 / (n + 1)
# Sample from posterior of sigma^2 and then from conditional of mu | sigma^2
for (i in 1:N) {
sigma2_sample <- rinvgamma(1, shape_post, rate_post)
mu_draws[i] <- rnorm(1, mean = mu_mean_factor * y_bar,
sd = sqrt(mu_var_factor * sigma2_sample))
}
# Prior for mu: mu ~ N(0, sigma^2), but marginal prior is heavy-tailed.
# For plotting, we approximate prior with N(0, mean posterior sigma^2)
prior_mu_sd <- sqrt(mean(rinvgamma(N, shape_post, rate_post))) # rough approx
mu_vals <- seq(-1, 5, length.out = 300)
prior_mu_density <- dnorm(mu_vals, mean = 0, sd = prior_mu_sd)
# Plot posterior and prior of mu
ggplot() +
geom_histogram(aes(x = mu_draws, y = after_stat(density)), bins = 60,
fill = "darkgreen", alpha = 0.4, color = "darkgreen") +
geom_line(aes(x = mu_vals, y = prior_mu_density),
color = "blue", linetype = "dashed", linewidth = 1) +
geom_vline(xintercept = mu_true, color = "red", linetype = "dotted", linewidth = 1) +
labs(title = expression("Prior and Posterior Distribution of " * mu),
x = expression(mu), y = "Density") +
annotate("text", x = mu_true + 0.3, y = max(density(mu_draws)$y),
label = expression(mu[true]), color = "red") +
theme_minimal()
The plot above shows the posterior distribution of \(\mu\), obtained by integrating out the uncertainty in \(\sigma^2\) using Monte Carlo sampling.
Where does the true value \(\mu_{\text{true}} = 2\) lie?
The true value lies very close to the center of the posterior distribution, which suggests that the Bayesian model has successfully learned the true mean from the data — even when the variance \(\sigma^2\) is unknown.
This result highlights that integrating over \(\sigma^2\) properly accounts for variance uncertainty, yet still produces a posterior that is sharply concentrated around the true value of \(\mu\), demonstrating effective Bayesian inference.
comparing:
The empirical posterior \(p(\mu \mid y)\) from simulation (Part b)
With the analytical posterior \(t_(n+2)\) \((mean,scale^2)\)
# --- Part (c): Compare posterior simulation vs. analytical t-distribution ---
# Analytical posterior parameters
mu_mean <- (n / (n + 1)) * y_bar
mu_var <- (2 + (n - 1) * s2 + (n / (n + 1)) * y_bar^2) / ((n + 1) * (n + 2))
mu_sd <- sqrt(mu_var)
# Generate density for analytical posterior (t-distribution with df = n + 2)
mu_vals <- seq(min(mu_draws) - 0.5, max(mu_draws) + 0.5, length.out = 300)
analytical_density <- dt((mu_vals - mu_mean) / mu_sd, df = n + 2) / mu_sd
# Create data frame for plotting
df_compare <- data.frame(mu = mu_vals, Analytical = analytical_density)
# Plot: simulated posterior vs analytical t-density
ggplot() +
geom_histogram(aes(x = mu_draws, y = after_stat(density)), bins = 60,
fill = "darkgreen", alpha = 0.4, color = "darkgreen") +
geom_line(data = df_compare, aes(x = mu, y = Analytical), color = "blue", linewidth = 1) +
geom_vline(xintercept = mu_true, color = "red", linetype = "dotted") +
labs(
title = expression("Posterior of " * mu * ": Simulation vs Analytical"),
x = expression(mu), y = "Density"
) +
theme_minimal() +
annotate("text", x = mu_true + 0.2, y = max(analytical_density),
label = expression(mu[true]), color = "red")
Comparison with Analytical Posterior
The histogram shows the posterior distribution of \(\mu\) obtained via Monte Carlo sampling from the marginal posterior \(p(\mu \mid y)\), while the blue line represents the analytical posterior, which is a Student-\(t\) distribution with \(n + 2\) degrees of freedom:
\[ \mu \mid y \sim t_{n+2} \left( \frac{n}{n+1} \bar{y},\ \frac{2 + (n-1)s^2 + \frac{n}{n+1} \bar{y}^2}{(n+1)(n+2)} \right) \]
The simulated posterior closely matches the analytical posterior, both in shape and center.
The true value \(\mu_{\text{true}} = 2\) lies close to the posterior peak, demonstrating good estimation.
This strong agreement confirms the correctness of the Monte Carlo method used in Part (b), and also illustrates the theoretical result: the marginal posterior of \(\mu \mid y\) is a scaled and shifted Student-\(t\) distribution when variance is integrated out with an exponential prior on precision.
The small discrepancy in the tails is expected due to the finite number of samples, but overall the match is excellent.
# Compute analytical posterior moments
mu_analytical_mean <- (n / (n + 1)) * y_bar
mu_analytical_var <- (2 + (n - 1) * s2 + (n / (n + 1)) * y_bar^2) / (n * (n + 1))
mu_analytical_sd <- sqrt(mu_analytical_var)
# Compute empirical posterior moments from simulation
mu_sim_mean <- mean(mu_draws)
mu_sim_sd <- sd(mu_draws)
mu_sim_var <- var(mu_draws)
# Show both in a data frame
comparison <- data.frame(
Moment = c("Posterior Mean", "Posterior SD", "Posterior Variance"),
Analytical = c(mu_analytical_mean, mu_analytical_sd, mu_analytical_var),
Simulated = c(mu_sim_mean, mu_sim_sd, mu_sim_var)
)
# Display the comparison table
knitr::kable(comparison, digits = 4, caption = "Comparison of Posterior Moments: Analytical vs Simulated")
| Moment | Analytical | Simulated |
|---|---|---|
| Posterior Mean | 1.8860 | 1.8853 |
| Posterior SD | 0.3580 | 0.3551 |
| Posterior Variance | 0.1282 | 0.1261 |
Posterior Moment Comparison
The table above presents the analytical and simulated posterior moments for \(\mu\). The results show excellent agreement:
This confirms that the posterior approximation obtained via Monte Carlo sampling is both accurate and consistent with the exact analytical posterior derived from the \(t_{n+2}\) distribution.
Such agreement reinforces the correctness of the numerical approach and validates the Bayesian model’s inference under uncertainty in variance.
We implement a two-block Gibbs sampler for an AR(2) model with normal errors and an intercept. The sampler is based on the conditional posterior distributions outlined in slides 5, 6, and 9.
The model is applied to U.S. real GDP growth data. The function
gibbs_ar2() takes the observed time series and
hyperparameters \(\mathbf{b}_0\), \(\mathbf{B}_0\), \(c_0\), and \(C_0\), and returns a \(10{,}000 \times 4\) matrix of posterior
samples for \((\beta_0, \beta_1, \beta_2,
\sigma^2)\).
Gibbs Sampler for AR(2) Model with Intercept and Normal Errors
We’ll implement a Gibbs sampler to estimate the parameters of the AR(2) model:
\[ y_t = \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, \sigma^2) \]
with priors:
We wrote a function that does the following:
We also used the following priors:
# Load required packages
library(MASS) # for multivariate operations
library(mvtnorm) # for rmvnorm
library(tidyverse) # for convenience
# Define Gibbs sampler function
gibbs_ar2 <- function(y, b0, B0, c0, C0, n_iter = 10000) {
# --- Step 0: Prepare AR(2) regression data ---
T <- length(y)
Y <- y[3:T] # dependent variable
X <- cbind(1, y[2:(T - 1)], y[1:(T - 2)]) # intercept, y_{t-1}, y_{t-2}
N <- nrow(X)
d <- ncol(X)
# --- Step 1: Initialize containers ---
draws <- matrix(NA, nrow = n_iter, ncol = d + 1) # beta0, beta1, beta2, sigma^2
colnames(draws) <- c("beta0", "beta1", "beta2", "sigma2")
sigma2 <- 1 # initialize sigma2
B0_inv <- solve(B0)
# --- Step 2: Gibbs sampling loop ---
for (iter in 1:n_iter) {
# 2a , y ~ N(b_N, B_N)
BN_inv <- B0_inv + t(X) %*% X / sigma2
BN <- solve(BN_inv)
bN <- BN %*% (B0_inv %*% b0 + (t(X) %*% Y) / sigma2)
beta <- as.vector(rmvnorm(1, mean = bN, sigma = BN))
# 2b , y ~ Inv-Gamma(c_N, C_N)
e <- Y - X %*% beta
cN <- c0 + N / 2
CN <- C0 + 0.5 * sum(e^2)
sigma2 <- 1 / rgamma(1, shape = cN, rate = CN)
# Store current draw
draws[iter, ] <- c(beta, sigma2)
}
return(draws) # return all 10,000 draws (including burn-in)
}
We ran the Gibbs sampler twice using the same data but different prior beliefs:
First run: A diffuse
prior
\[
\mathbf{B}_0 = 10000 \cdot \mathbf{I}_3
\] This means we are not very confident in our prior and let the
data drive the results.
Second run: A tight prior
\[
\mathbf{B}_0 = \frac{1}{10000} \cdot \mathbf{I}_3
\] This reflects strong prior belief that \(\boldsymbol{\beta} = \mathbf{0}\).
After both runs, we plotted the posterior densities for all parameters to examine the effect of prior choice.
We then assessed whether \(p(\boldsymbol{\beta} \mid \mathbf{y})\) is sensitive to the prior.
# Load necessary libraries
library(tidyverse)
library(readr)
# Read real GDP dataset (assume it's cleaned)
gdp <- read_csv("C:/Users/mahna/OneDrive/Documents/WU-quantitative Finance/second year/S2-P1/BE/exercises/2/GDPC1.csv")
# Convert to GDP growth in %
gdp$growth <- 100 * c(NA, diff(log(gdp$GDPC1)))
gdp_growth <- na.omit(gdp$growth) # drop first NA
# --- Hyperparameters ---
b0 <- c(0, 0, 0)
c0 <- 3
C0 <- 2
I3 <- diag(3)
# --- Run sampler for Prior 1 (Diffuse Prior) ---
B0_large <- 10000 * I3
samples_large <- gibbs_ar2(gdp_growth, b0, B0_large, c0, C0)
# --- Run sampler for Prior 2 (Tight Prior) ---
B0_small <- (1 / 10000) * I3
samples_small <- gibbs_ar2(gdp_growth, b0, B0_small, c0, C0)
# Combine into tidy format
library(reshape2)
samples_df <- bind_rows(
as.data.frame(samples_large) %>% mutate(Prior = "Diffuse (10000 * I3)"),
as.data.frame(samples_small) %>% mutate(Prior = "Tight (1/10000 * I3)")
)
# Plot posterior densities
samples_df %>%
pivot_longer(cols = beta0:sigma2) %>%
ggplot(aes(x = value, fill = Prior)) +
geom_density(alpha = 0.4) +
facet_wrap(~name, scales = "free", ncol = 2) +
theme_minimal() +
labs(title = "Posterior Densities: Diffuse vs. Tight Prior",
x = "Parameter Value", y = "Density")
This plot compares the posterior distributions of the AR(2) parameters under two different prior choices:
We observe that:
Conclusion:
Yes, \(p(\boldsymbol{\beta} \mid
\mathbf{y})\) is sensitive to the prior choice. Informative
(tight) priors can dominate the data and indirectly affect other
parameters like \(\sigma^2\). With
diffuse priors, the data plays a more central role in shaping the
posterior.