The importance sampling estimator for the posterior expectation is:
\[ \hat{E}_{\pi}[b(\theta) \mid y] = \frac{\sum_{j=1}^{N} b(\theta^{(j)}) w^{(j)}}{\sum_{j=1}^{N} w^{(j)}} \]
where:
\[ w^{(j)} = \frac{h(\theta^{(j)})}{g(\theta^{(j)})} \]
\(E_{\pi}\) means expectation taken with respect to the posterior distribution \(\pi(\theta \mid y)\) — this is what we want but cannot compute directly
\(E_g\) means expectation taken with respect to the importance sampling distribution \(g(\theta)\) — this is what we can compute because we can sample from \(g\)
From the derivation in your text, the posterior expectation equals:
\[ E_{\pi}[b(\theta) \mid y] = \frac{E_g[b(\theta) w(\theta)]}{E_g[w(\theta)]} \]
This is a ratio of two expectations under \(g\):
Numerator: \(E_g[b(\theta) w(\theta)]\) — expected value of \(b(\theta) \times w(\theta)\)
Denominator: \(E_g[w(\theta)]\) — expected value of just \(w(\theta)\)
We draw \(N\) independent samples from \(g(\theta)\):
\[ \theta^{(1)}, \theta^{(2)}, \ldots, \theta^{(N)} \sim g(\theta) \]
Then we approximate each expectation by the sample average:
Numerator estimate:
\[ \hat{E}_g[b(\theta) w(\theta)] = \frac{1}{N} \sum_{j=1}^{N} b(\theta^{(j)}) w^{(j)} \]
Denominator estimate:
\[ \hat{E}_g[w(\theta)] = \frac{1}{N} \sum_{j=1}^{N} w^{(j)} \]
Taking the ratio (the \(1/N\) cancels out):
\[ \hat{E}_{\pi}[b(\theta) \mid y] = \frac{\frac{1}{N} \sum_{j=1}^{N} b(\theta^{(j)}) w^{(j)}}{\frac{1}{N} \sum_{j=1}^{N} w^{(j)}} = \frac{\sum_{j=1}^{N} b(\theta^{(j)}) w^{(j)}}{\sum_{j=1}^{N} w^{(j)}} \]
Think of the weights \(w^{(j)}\) as importance weights that correct for sampling from \(g\) instead of \(\pi\):
If \(\theta^{(j)}\) is likely under \(\pi\) but unlikely under \(g\), then \(w^{(j)}\) is large (boosting its contribution)
If \(\theta^{(j)}\) is unlikely under \(\pi\) but likely under \(g\), then \(w^{(j)}\) is small (down-weighting its contribution)
The denominator \(\sum w^{(j)}\) ensures the weights are properly normalized (sum to 1).
| Component | Description |
|---|---|
| \(\pi(\theta \mid y)\) | Target posterior distribution (difficult to sample from) |
| \(g(\theta)\) | Proposal/importance distribution (easy to sample from) |
| \(w^{(j)} = h(\theta^{(j)})/g(\theta^{(j)})\) | Importance weight (unnormalized) |
| \(b(\theta)\) | Function of interest (e.g., \(\theta\) for posterior mean) |
| \(\hat{E}_{\pi}[b(\theta) \mid y]\) | Importance sampling estimator |
We simulate n = 25 observations from N(θ = 1, 1) distribution.
With random seed fixed at 44, the resulting sample mean is ȳ =
0.847.
We perform importance sampling to estimate the posterior mean: - Prior: Standard Cauchy distribution - Proposal distribution: Standard Cauchy (same as prior) - Target distribution: Posterior ∝ Prior × Likelihood
We draw N = 10,000 samples from the standard Cauchy distribution:
# Set parameters
set.seed(44) # for reproducibility
n <- 25
true_theta <- 1
sigma <- 1
# Generate n observations from N(theta=1, variance=1)
y <- rnorm(n, mean = true_theta, sd = sigma)
# Calculate sample mean
ybar <- mean(y)
N <- 10000
# Draw from standard Cauchy (location=0, scale=1)
theta <- rcauchy(N, location = 0, scale = 1)
# Display first 10 samples
head(theta, 10)
## [1] 0.03761153 0.22766209 1.37844753 -1.62301391 -0.38655000 -4.73897934
## [7] -0.20646008 3.70470032 -4.29933192 0.45876771
The weight for each sample \(\theta^{(j)}\) is:
\[ w^{(j)} = \exp\left(-\frac{n}{2}(\theta^{(j)} - \bar{y})^2\right) \]
Since the prior (Cauchy) and proposal (Cauchy) cancel, the weight simplifies to just the likelihood.
# Calculate weights
weights <- exp(- (n/2) * (theta - ybar)^2)
# Display first 10 weights
head(weights, 10)
## [1] 2.779912e-04 8.279142e-03 2.927328e-02 7.604073e-34 5.496678e-09
## [6] 4.081481e-170 9.460315e-07 4.628972e-45 1.680587e-144 1.520421e-01
# Summary of weights
summary(weights)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000000 0.000000 0.000000 0.093150 0.005565 1.000000
The importance sampling estimator for the posterior mean is:
\[ \hat{E}_{\pi}[b(\theta) \mid y]=\bar{\theta}_{IS} = \frac{\sum_{j=1}^{N} \theta^{(j)} w^{(j)}}{\sum_{j=1}^{N} w^{(j)}} \]
# Calculate weighted average
posterior_mean <- sum(theta * weights) / sum(weights)
# Display result
posterior_mean
## [1] 0.8110299
The estimated posterior mean is 0.811.
Due to the heavy tails of the Cauchy distribution, many samples receive very small weights. The effective sample size tells us how many independent samples our weighted sample is equivalent to:
# Normalize weights
weights_norm <- weights / sum(weights)
# Calculate ESS
ESS <- 1 / sum(weights_norm^2)
# Display ESS
ESS
## [1] 1336.264
# Efficiency percentage
efficiency <- (ESS / N) * 100
efficiency
## [1] 13.36264
Only about 13.4% of the samples are effectively contributing to the estimate.
The posterior mean (0.811) differs from the sample mean (0.847) because:
With n = 25, the likelihood is fairly strong, but the Cauchy prior still exerts some influence, resulting in shrinkage from 0.847 to 0.811.
Here’s all the code together for easy copying:
set.seed(44)
n <- 25
true_theta <- 1
sigma <- 1
# Generate n observations from N(theta=1, variance=1)
y <- rnorm(n, mean = true_theta, sd = sigma)
# Calculate sample mean
ybar <- mean(y)
N <- 10000
# Step 1: Draw from Cauchy
theta <- rcauchy(N, 0, 1)
# Step 2: Calculate weights
weights <- exp(- (n/2) * (theta - ybar)^2)
# Step 3: Compute posterior mean
post_mean <- sum(theta * weights) / sum(weights)
print(post_mean) # Should be ~0.811
## [1] 0.8110299
# Optional: Check effective sample size
weights_norm <- weights / sum(weights)
ESS <- 1 / sum(weights_norm^2)
print(ESS)
## [1] 1336.264