1. Introduction

In many Bayesian models, the likelihood \(p(x \mid \theta)\) is not directly available in closed form. Instead, it appears as an integral over latent (unobserved) variables:

\[p(x \mid \theta) = \int p(x \mid z, \theta) \, p(z \mid \theta) \, dz\]

where:

  • \(x\) = observed data (continuous)
  • \(\theta\) = parameters of interest
  • \(z\) = latent variables (nuisance parameters)

This structure arises naturally in:

  • Hierarchical models
  • State-space models
  • Measurement error models
  • Models with missing data

2. General Bayesian Framework with Latent Variables

The joint posterior for parameters of interest \(\theta\) and latent variables \(z\) is:

\[p(\theta, z \mid x) = \frac{p(x \mid z, \theta) \, p(z \mid \theta) \, p(\theta)}{p(x)}\]

The marginal posterior for \(\theta\) (integrating out nuisance parameter \(z\)) is:

\[p(\theta \mid x) = \int p(\theta, z \mid x) \, dz = \frac{\left[\int p(x \mid z, \theta) \, p(z \mid \theta) \, dz\right] p(\theta)}{\int \left[\int p(x \mid z, \theta) \, p(z \mid \theta) \, dz\right] p(\theta) \, d\theta}\]

3. Concrete Example: Measurement Error Model

3.1 Model Specification

Suppose we want to estimate the mean \(\mu\) of a population, but we cannot measure the true values \(z_i\) directly. Instead, we observe noisy measurements \(x_i\):

Latent process (true values):

\[z_i \mid \mu, \tau^2 \sim N(\mu, \tau^2), \quad i = 1, \ldots, n\]

Measurement process (observation error):

\[x_i \mid z_i, \sigma^2 \sim N(z_i, \sigma^2)\]

Here:

  • Parameters of interest: \(\theta = (\mu, \tau^2)\)
  • Nuisance parameters: \(z = (z_1, \ldots, z_n)\)
  • Known measurement variance: \(\sigma^2 = 1\) (for simplicity)

3.2 Likelihood as an Integral

The marginal likelihood for a single observation is:

\[p(x_i \mid \mu, \tau^2, \sigma^2) = \int_{-\infty}^{\infty} \underbrace{\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - z_i)^2}{2\sigma^2}}}_{\text{measurement}} \underbrace{\frac{1}{\sqrt{2\pi\tau^2}} e^{-\frac{(z_i - \mu)^2}{2\tau^2}}}_{\text{latent process}} \, dz_i\]

This integral has a closed-form solution:

\[p(x_i \mid \mu, \tau^2, \sigma^2) = \frac{1}{\sqrt{2\pi(\tau^2 + \sigma^2)}} \exp\left(-\frac{(x_i - \mu)^2}{2(\tau^2 + \sigma^2)}\right)\]

For \(n\) independent observations, the full likelihood is:

\[p(x \mid \mu, \tau^2, \sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi(\tau^2 + \sigma^2)}} \exp\left(-\frac{(x_i - \mu)^2}{2(\tau^2 + \sigma^2)}\right)\]

3.3 Bayesian Inference

Using a prior \(p(\mu, \tau^2)\), the posterior is:

\[p(\mu, \tau^2 \mid x) \propto p(x \mid \mu, \tau^2, \sigma^2) \, p(\mu, \tau^2)\]

4. Simulation Example in R

Let’s simulate data and perform Bayesian inference using Stan, which handles latent variables automatically.

library(ggplot2)
# Set parameters

set.seed(123)
n <- 50
mu_true <- 5
tau_true <- 2
sigma <- 1  # known measurement error

# Generate latent true values
z_true <- rnorm(n, mean = mu_true, sd = tau_true)

# Generate observed data with measurement error
x_obs <- rnorm(n, mean = z_true, sd = sigma)

# Plot
df <- data.frame(
  index = 1:n,
  true = z_true,
  observed = x_obs
)

ggplot(df) +
  geom_segment(aes(x = index, xend = index, y = true, yend = observed),
               alpha = 0.5, color = "gray50") +
  geom_point(aes(x = index, y = observed, color = "Observed"), size = 2) +
  geom_point(aes(x = index, y = true, color = "True (latent)"), size = 2) +
  labs(title = "Latent True Values vs Noisy Observations",
       x = "Observation Index", y = "Value") +
  theme_minimal() +
  scale_color_manual(values = c("Observed" = "red", "True (latent)" = "blue"))

4.1 Stan Model with Explicit Latent Variables

Here’s the Stan code that explicitly models the latent variables \(z_i\):

data {
  int<lower=0> n;
  vector[n] x;
  real<lower=0> sigma;
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[n] z;  // latent variables (nuisance parameters)
}

model {
  // Priors
  mu ~ normal(0, 10);
  tau ~ cauchy(0, 5);
  
  // Latent process (prior on z)
  z ~ normal(mu, tau);
  
  // Measurement process (likelihood given latent z)
  x ~ normal(z, sigma);
}

4.2 Fitting the Model

library(rstan)
options(mc.cores = parallel::detectCores())

stan_data <- list(
  n = n,
  x = x_obs,
  sigma = sigma
)

fit <- stan(
  model_code = stan_model_latent,
  data = stan_data,
  iter = 2000,
  chains = 4
)

print(fit, pars = c("mu", "tau"))

For demonstration without Stan installed, we can implement a simple Gibbs sampler:

# Simple Gibbs sampler for the measurement error model
# (conditional conjugacy makes this efficient)

gibbs_measurement_error <- function(x, sigma2, n_iter = 10000, burnin = 5000) {
  n <- length(x)
  
  # Initialize
  mu <- mean(x)
  tau2 <- var(x)
  z <- x  # start with observed values
  
  # Storage
  mu_samples <- numeric(n_iter)
  tau2_samples <- numeric(n_iter)
  
  for (iter in 1:n_iter) {
    # 1. Sample z_i | mu, tau2, x_i, sigma2
    # Posterior for z_i is normal with:
    # precision = 1/tau2 + 1/sigma2
    # mean = (mu/tau2 + x_i/sigma2) / (1/tau2 + 1/sigma2)
    prec_z <- 1/tau2 + 1/sigma2
    mean_z <- (mu/tau2 + x/sigma2) / prec_z
    z <- rnorm(n, mean = mean_z, sd = sqrt(1/prec_z))
    
    # 2. Sample mu | z, tau2
    # Posterior: normal with mean = mean(z), variance = tau2/n
    mu_mean <- mean(z)
    mu_var <- tau2 / n
    mu <- rnorm(1, mean = mu_mean, sd = sqrt(mu_var))
    
    # 3. Sample tau2 | z, mu
    # Posterior: Inverse-Gamma
    ss <- sum((z - mu)^2)
    shape <- n/2
    rate <- ss/2
    tau2 <- 1 / rgamma(1, shape = shape, rate = rate)
    
    mu_samples[iter] <- mu
    tau2_samples[iter] <- tau2
  }
  
  list(
    mu = mu_samples[-(1:burnin)],
    tau2 = tau2_samples[-(1:burnin)]
  )
}

# Run Gibbs sampler
sigma2 <- sigma^2
samples <- gibbs_measurement_error(x_obs, sigma2)

# Results
cat("Posterior mean of mu:", round(mean(samples$mu), 3), "\n")
## Posterior mean of mu: 5.212
cat("95% CI for mu:", round(quantile(samples$mu, c(0.025, 0.975)), 3), "\n")
## 95% CI for mu: 4.631 5.778
cat("\nPosterior mean of tau^2:", round(mean(samples$tau2), 3), "\n")
## 
## Posterior mean of tau^2: 3.23
cat("95% CI for tau^2:", round(quantile(samples$tau2, c(0.025, 0.975)), 3), "\n")
## 95% CI for tau^2: 1.829 5.318
cat("\nTrue values: mu =", mu_true, ", tau^2 =", tau_true^2)
## 
## True values: mu = 5 , tau^2 = 4

4.3 Posterior Visualization

posterior_df <- data.frame(
  mu = samples$mu,
  tau2 = samples$tau2
)

# Trace plots
p1 <- ggplot(posterior_df, aes(x = 1:nrow(posterior_df), y = mu)) +
  geom_line(alpha = 0.5) +
  geom_hline(yintercept = mu_true, color = "red", linetype = "dashed") +
  labs(title = "Trace Plot for mu", x = "Iteration", y = "mu") +
  theme_minimal()

p2 <- ggplot(posterior_df, aes(x = 1:nrow(posterior_df), y = tau2)) +
  geom_line(alpha = 0.5) +
  geom_hline(yintercept = tau_true^2, color = "red", linetype = "dashed") +
  labs(title = "Trace Plot for tau^2", x = "Iteration", y = "tau^2") +
  theme_minimal()

# Histograms
p3 <- ggplot(posterior_df, aes(x = mu)) +
  geom_histogram(fill = "skyblue", color = "black", alpha = 0.7, bins = 40) +
  geom_vline(xintercept = mu_true, color = "red", linetype = "dashed", size = 1) +
  labs(title = "Posterior Distribution of mu", x = "mu", y = "Frequency") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
p4 <- ggplot(posterior_df, aes(x = tau2)) +
  geom_histogram(fill = "lightgreen", color = "black", alpha = 0.7, bins = 40) +
  geom_vline(xintercept = tau_true^2, color = "red", linetype = "dashed", size = 1) +
  labs(title = "Posterior Distribution of tau^2", x = "tau^2", y = "Frequency") +
  theme_minimal()

# Arrange plots
gridExtra::grid.arrange(p1, p2, p3, p4, ncol = 2)

5. Hierarchical Extension: Group-Level Structure

A more complex hierarchical model adds another level. Suppose observations come from \(J\) groups:

Group-level parameters:

\[\mu_j \mid \mu_0, \tau_0^2 \sim N(\mu_0, \tau_0^2), \quad j = 1, \ldots, J\]

Within-group latent variables:

\[z_{ij} \mid \mu_j, \sigma_z^2 \sim N(\mu_j, \sigma_z^2)\]

Observations:

\[x_{ij} \mid z_{ij}, \sigma_x^2 \sim N(z_{ij}, \sigma_x^2)\]

The likelihood then involves nested integrals:

\[p(x_{ij} \mid \mu_0, \tau_0^2, \sigma_z^2, \sigma_x^2) = \iint p(x_{ij} \mid z_{ij}, \sigma_x^2) \, p(z_{ij} \mid \mu_j, \sigma_z^2) \, p(\mu_j \mid \mu_0, \tau_0^2) \, dz_{ij} \, d\mu_j\]

This is the essence of hierarchical Bayesian modeling — uncertainty propagates through all levels of the hierarchy.

6. Key Takeaways

  • Latent variables \(z\) are unobserved but part of the data-generating process
  • They become nuisance parameters when interest lies in \(\theta\) only
  • The likelihood \(p(x \mid \theta)\) is an integral over latent variables
  • Bayesian inference handles this naturally by:
    • Assigning priors to latent variables
    • Sampling from the joint posterior \(p(\theta, z \mid x)\)
    • Marginalizing to obtain \(p(\theta \mid x)\)
  • Computational methods (MCMC, Gibbs sampling, Stan) make these models practical