In many Bayesian models, the likelihood \(p(x \mid \theta)\) is not directly available in closed form. Instead, it appears as an integral over latent (unobserved) variables:
\[p(x \mid \theta) = \int p(x \mid z, \theta) \, p(z \mid \theta) \, dz\]
where:
This structure arises naturally in:
The joint posterior for parameters of interest \(\theta\) and latent variables \(z\) is:
\[p(\theta, z \mid x) = \frac{p(x \mid z, \theta) \, p(z \mid \theta) \, p(\theta)}{p(x)}\]
The marginal posterior for \(\theta\) (integrating out nuisance parameter \(z\)) is:
\[p(\theta \mid x) = \int p(\theta, z \mid x) \, dz = \frac{\left[\int p(x \mid z, \theta) \, p(z \mid \theta) \, dz\right] p(\theta)}{\int \left[\int p(x \mid z, \theta) \, p(z \mid \theta) \, dz\right] p(\theta) \, d\theta}\]
Suppose we want to estimate the mean \(\mu\) of a population, but we cannot measure the true values \(z_i\) directly. Instead, we observe noisy measurements \(x_i\):
Latent process (true values):
\[z_i \mid \mu, \tau^2 \sim N(\mu, \tau^2), \quad i = 1, \ldots, n\]
Measurement process (observation error):
\[x_i \mid z_i, \sigma^2 \sim N(z_i, \sigma^2)\]
Here:
The marginal likelihood for a single observation is:
\[p(x_i \mid \mu, \tau^2, \sigma^2) = \int_{-\infty}^{\infty} \underbrace{\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - z_i)^2}{2\sigma^2}}}_{\text{measurement}} \underbrace{\frac{1}{\sqrt{2\pi\tau^2}} e^{-\frac{(z_i - \mu)^2}{2\tau^2}}}_{\text{latent process}} \, dz_i\]
This integral has a closed-form solution:
\[p(x_i \mid \mu, \tau^2, \sigma^2) = \frac{1}{\sqrt{2\pi(\tau^2 + \sigma^2)}} \exp\left(-\frac{(x_i - \mu)^2}{2(\tau^2 + \sigma^2)}\right)\]
For \(n\) independent observations, the full likelihood is:
\[p(x \mid \mu, \tau^2, \sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi(\tau^2 + \sigma^2)}} \exp\left(-\frac{(x_i - \mu)^2}{2(\tau^2 + \sigma^2)}\right)\]
Using a prior \(p(\mu, \tau^2)\), the posterior is:
\[p(\mu, \tau^2 \mid x) \propto p(x \mid \mu, \tau^2, \sigma^2) \, p(\mu, \tau^2)\]
Let’s simulate data and perform Bayesian inference using Stan, which handles latent variables automatically.
library(ggplot2)
# Set parameters
set.seed(123)
n <- 50
mu_true <- 5
tau_true <- 2
sigma <- 1 # known measurement error
# Generate latent true values
z_true <- rnorm(n, mean = mu_true, sd = tau_true)
# Generate observed data with measurement error
x_obs <- rnorm(n, mean = z_true, sd = sigma)
# Plot
df <- data.frame(
index = 1:n,
true = z_true,
observed = x_obs
)
ggplot(df) +
geom_segment(aes(x = index, xend = index, y = true, yend = observed),
alpha = 0.5, color = "gray50") +
geom_point(aes(x = index, y = observed, color = "Observed"), size = 2) +
geom_point(aes(x = index, y = true, color = "True (latent)"), size = 2) +
labs(title = "Latent True Values vs Noisy Observations",
x = "Observation Index", y = "Value") +
theme_minimal() +
scale_color_manual(values = c("Observed" = "red", "True (latent)" = "blue"))
Here’s the Stan code that explicitly models the latent variables \(z_i\):
data {
int<lower=0> n;
vector[n] x;
real<lower=0> sigma;
}
parameters {
real mu;
real<lower=0> tau;
vector[n] z; // latent variables (nuisance parameters)
}
model {
// Priors
mu ~ normal(0, 10);
tau ~ cauchy(0, 5);
// Latent process (prior on z)
z ~ normal(mu, tau);
// Measurement process (likelihood given latent z)
x ~ normal(z, sigma);
}
library(rstan)
options(mc.cores = parallel::detectCores())
stan_data <- list(
n = n,
x = x_obs,
sigma = sigma
)
fit <- stan(
model_code = stan_model_latent,
data = stan_data,
iter = 2000,
chains = 4
)
print(fit, pars = c("mu", "tau"))
For demonstration without Stan installed, we can implement a simple Gibbs sampler:
# Simple Gibbs sampler for the measurement error model
# (conditional conjugacy makes this efficient)
gibbs_measurement_error <- function(x, sigma2, n_iter = 10000, burnin = 5000) {
n <- length(x)
# Initialize
mu <- mean(x)
tau2 <- var(x)
z <- x # start with observed values
# Storage
mu_samples <- numeric(n_iter)
tau2_samples <- numeric(n_iter)
for (iter in 1:n_iter) {
# 1. Sample z_i | mu, tau2, x_i, sigma2
# Posterior for z_i is normal with:
# precision = 1/tau2 + 1/sigma2
# mean = (mu/tau2 + x_i/sigma2) / (1/tau2 + 1/sigma2)
prec_z <- 1/tau2 + 1/sigma2
mean_z <- (mu/tau2 + x/sigma2) / prec_z
z <- rnorm(n, mean = mean_z, sd = sqrt(1/prec_z))
# 2. Sample mu | z, tau2
# Posterior: normal with mean = mean(z), variance = tau2/n
mu_mean <- mean(z)
mu_var <- tau2 / n
mu <- rnorm(1, mean = mu_mean, sd = sqrt(mu_var))
# 3. Sample tau2 | z, mu
# Posterior: Inverse-Gamma
ss <- sum((z - mu)^2)
shape <- n/2
rate <- ss/2
tau2 <- 1 / rgamma(1, shape = shape, rate = rate)
mu_samples[iter] <- mu
tau2_samples[iter] <- tau2
}
list(
mu = mu_samples[-(1:burnin)],
tau2 = tau2_samples[-(1:burnin)]
)
}
# Run Gibbs sampler
sigma2 <- sigma^2
samples <- gibbs_measurement_error(x_obs, sigma2)
# Results
cat("Posterior mean of mu:", round(mean(samples$mu), 3), "\n")
## Posterior mean of mu: 5.212
cat("95% CI for mu:", round(quantile(samples$mu, c(0.025, 0.975)), 3), "\n")
## 95% CI for mu: 4.631 5.778
cat("\nPosterior mean of tau^2:", round(mean(samples$tau2), 3), "\n")
##
## Posterior mean of tau^2: 3.23
cat("95% CI for tau^2:", round(quantile(samples$tau2, c(0.025, 0.975)), 3), "\n")
## 95% CI for tau^2: 1.829 5.318
cat("\nTrue values: mu =", mu_true, ", tau^2 =", tau_true^2)
##
## True values: mu = 5 , tau^2 = 4
posterior_df <- data.frame(
mu = samples$mu,
tau2 = samples$tau2
)
# Trace plots
p1 <- ggplot(posterior_df, aes(x = 1:nrow(posterior_df), y = mu)) +
geom_line(alpha = 0.5) +
geom_hline(yintercept = mu_true, color = "red", linetype = "dashed") +
labs(title = "Trace Plot for mu", x = "Iteration", y = "mu") +
theme_minimal()
p2 <- ggplot(posterior_df, aes(x = 1:nrow(posterior_df), y = tau2)) +
geom_line(alpha = 0.5) +
geom_hline(yintercept = tau_true^2, color = "red", linetype = "dashed") +
labs(title = "Trace Plot for tau^2", x = "Iteration", y = "tau^2") +
theme_minimal()
# Histograms
p3 <- ggplot(posterior_df, aes(x = mu)) +
geom_histogram(fill = "skyblue", color = "black", alpha = 0.7, bins = 40) +
geom_vline(xintercept = mu_true, color = "red", linetype = "dashed", size = 1) +
labs(title = "Posterior Distribution of mu", x = "mu", y = "Frequency") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p4 <- ggplot(posterior_df, aes(x = tau2)) +
geom_histogram(fill = "lightgreen", color = "black", alpha = 0.7, bins = 40) +
geom_vline(xintercept = tau_true^2, color = "red", linetype = "dashed", size = 1) +
labs(title = "Posterior Distribution of tau^2", x = "tau^2", y = "Frequency") +
theme_minimal()
# Arrange plots
gridExtra::grid.arrange(p1, p2, p3, p4, ncol = 2)
A more complex hierarchical model adds another level. Suppose observations come from \(J\) groups:
Group-level parameters:
\[\mu_j \mid \mu_0, \tau_0^2 \sim N(\mu_0, \tau_0^2), \quad j = 1, \ldots, J\]
Within-group latent variables:
\[z_{ij} \mid \mu_j, \sigma_z^2 \sim N(\mu_j, \sigma_z^2)\]
Observations:
\[x_{ij} \mid z_{ij}, \sigma_x^2 \sim N(z_{ij}, \sigma_x^2)\]
The likelihood then involves nested integrals:
\[p(x_{ij} \mid \mu_0, \tau_0^2, \sigma_z^2, \sigma_x^2) = \iint p(x_{ij} \mid z_{ij}, \sigma_x^2) \, p(z_{ij} \mid \mu_j, \sigma_z^2) \, p(\mu_j \mid \mu_0, \tau_0^2) \, dz_{ij} \, d\mu_j\]
This is the essence of hierarchical Bayesian modeling — uncertainty propagates through all levels of the hierarchy.