Rejection sampling is a method for generating independent samples from a target posterior distribution \(\pi(\theta \mid y)\) using a proposal density \(g(\theta)\). This document provides:
| Symbol | Meaning |
|---|---|
| \(\pi(\theta \mid y)\) | Target posterior distribution |
| \(h(\theta)\) | Unnormalized posterior: Likelihood \(\times\) Prior |
| \(g(\theta)\) | Proposal (importance) density |
| \(M\) | \(\sup_{\theta} \frac{h(\theta)}{g(\theta)}\) (maximum ratio) |
| \(\alpha(\theta)\) | Acceptance probability: \(\frac{h(\theta)}{M g(\theta)}\) |
| \(C\) | Normalizing constant: \(\int h(\theta) d\theta\) |
| \(U\) | Uniform random variable on \((0,1)\) |
The target posterior is:
\[ \pi(\theta \mid y) = \frac{h(\theta)}{\int h(\theta) d\theta} = \frac{h(\theta)}{C} \]
where \(C = \int h(\theta) d\theta\) is the normalizing constant (often unknown).
The algorithm has the following steps:
The quantity \(\alpha(\theta)\) is called the acceptance probability of a candidate sample \(\theta\) generated from the importance sampling distribution.
Note that to implement the method we do not need the normalizing constant in \(\pi(\theta \mid y)\).
We now show that a sample drawn using the rejection sampling method has the exact distribution \(\pi(\theta \mid y)\).
Define:
For a given \(\theta\):
\[ P(\text{retain} \mid \theta) = \alpha(\theta) = \frac{h(\theta)}{M g(\theta)} \]
By the law of total probability:
\[ \begin{aligned} P(\text{retain}) &= \int_{-\infty}^{\infty} P(\text{retain} \mid \theta) \, g(\theta) \, d\theta \\ &= \int_{-\infty}^{\infty} \frac{h(\theta)}{M g(\theta)} \, g(\theta) \, d\theta \\ &= \frac{1}{M} \int_{-\infty}^{\infty} h(\theta) \, d\theta \\ &= \frac{C}{M} \end{aligned} \]
where \(C = \int_{-\infty}^{\infty} h(\theta) d\theta\).
This step explicitly uses the uniform random variable \(U\):
The event {\(\theta \leq c\) and retain} means: - \(\theta \leq c\) (from the first draw) - \(U \leq \alpha(\theta)\) (acceptance condition)
Since \(\theta\) and \(U\) are independent, their joint density is \(g(\theta) \times 1\) (because \(U\) has density 1 on \([0,1]\)).
Thus:
\[ \begin{aligned} P(\theta \leq c \text{ and retain}) &= \int_{\theta \leq c} \int_{u=0}^{\alpha(\theta)} g(\theta) \cdot 1 \, du \, d\theta \\ &= \int_{\theta \leq c} g(\theta) \cdot \alpha(\theta) \, d\theta \\ &= \int_{\theta \leq c} g(\theta) \cdot \frac{h(\theta)}{M g(\theta)} \, d\theta \\ &= \frac{1}{M} \int_{\theta \leq c} h(\theta) \, d\theta \end{aligned} \]
By definition of conditional probability:
\[ \begin{aligned} P(\theta \leq c \mid \text{retain}) &= \frac{P(\theta \leq c \text{ and retain})}{P(\text{retain})} \\ &= \frac{\frac{1}{M} \int_{\theta \leq c} h(\theta) \, d\theta}{\frac{C}{M}} \\ &= \frac{\int_{\theta \leq c} h(\theta) \, d\theta}{C} \end{aligned} \]
The cumulative distribution function of \(\pi(\theta \mid y) = h(\theta)/C\) is:
\[ \int_{\theta \leq c} \pi(\theta \mid y) \, d\theta = \int_{\theta \leq c} \frac{h(\theta)}{C} \, d\theta = \frac{\int_{\theta \leq c} h(\theta) \, d\theta}{C} \]
Therefore:
\[ P(\theta \leq c \mid \text{retain}) = \int_{\theta \leq c} \pi(\theta \mid y) \, d\theta \]
The retained \(\theta\) has exactly the target distribution \(\pi(\theta \mid y)\). This derivation shows that the rejection sampling method generates exact draws from \(\pi(\theta \mid y)\) as claimed.
Notice that \(C = \int h(\theta) d\theta\) cancels out completely in the final expression. We only need:
This is powerful because the normalizing constant \(C\) is often impossible to compute directly, especially in high-dimensional Bayesian problems.
For example, if the prior distribution \(\pi(\theta)\) is taken as the importance sampling distribution \(g(\theta)\), then:
\[ M = \sup_{\theta} \frac{h(\theta)}{g(\theta)} = \sup_{\theta} \frac{\text{Likelihood} \times \text{Prior}}{\text{Prior}} = \max_{\theta} \text{Likelihood} \]
This strategy works for low-dimensional problems where it is easy to find the maximum likelihood estimate.
Suppose we want to sample from: - Target: Beta(\(a=3\), \(b=5\)) posterior distribution - Proposal: Uniform(0,1) which equals Beta(1,1) - \(h(\theta)\): \(\theta^{a-1}(1-\theta)^{b-1} = \theta^{2}(1-\theta)^{4}\) (unnormalized Beta)
Since \(g(\theta)=1\) on \([0,1]\):
\[ M = \sup_{\theta} \frac{h(\theta)}{g(\theta)} = \max_{\theta \in [0,1]} \theta^{2}(1-\theta)^{4} \]
Find the maximum by taking the derivative:
Let \(f(\theta) = \theta^{2}(1-\theta)^{4}\)
\[ \begin{aligned} f'(\theta) &= 2\theta(1-\theta)^{4} - 4\theta^{2}(1-\theta)^{3} \\ &= 2\theta(1-\theta)^{3}[(1-\theta) - 2\theta] \\ &= 2\theta(1-\theta)^{3}(1-3\theta) \end{aligned} \]
Set \(f'(\theta)=0\): \(\theta=0\), \(\theta=1\), or \(\theta=1/3\)
The maximum is at \(\theta=1/3\):
\[ f(1/3) = (1/3)^{2}(2/3)^{4} = \frac{1}{9} \times \frac{16}{81} = \frac{16}{729} \approx 0.02195 \]
Thus \(M = 16/729 \approx 0.02195\)
\[ \alpha(\theta) = \frac{h(\theta)}{M g(\theta)} = \frac{\theta^{2}(1-\theta)^{4}}{16/729} = \frac{729}{16} \theta^{2}(1-\theta)^{4} \]
Note: \(\alpha(\theta) \leq 1\) for all \(\theta\) (by definition of \(M\))
Let’s manually simulate one candidate \(\theta = 0.5\):
# Unnormalized target at θ = 0.5
h_theta <- (0.5)^2 * (1-0.5)^4
h_theta
## [1] 0.015625
# M value
M <- 16/729
# Proposal density (Uniform)
g_theta <- 1
# Acceptance probability
alpha <- h_theta / (M * g_theta)
alpha
## [1] 0.7119141
Since \(\alpha (0.5)=0.142<1\), only about 14.2% of \(U\) draws would accept this \(\theta\).
ow let’s manually run through the algorithm for 5 candidates, showing each step explicitly:
set.seed(123) # for reproducibility
# Parameters
M <- 16/729
# Function to compute alpha
alpha <- function(theta) {
h_theta <- theta^2 * (1-theta)^4
return(h_theta / M)
}
# Manual rejection sampling for 5 candidates
n_candidates <- 5
results <- data.frame(
candidate = 1:n_candidates,
theta = numeric(n_candidates),
U = numeric(n_candidates),
alpha = numeric(n_candidates),
accepted = logical(n_candidates)
)
for(i in 1:n_candidates) {
# Step 1: Draw θ from Uniform(0,1)
theta_i <- runif(1)
# Step 2: Draw U from Uniform(0,1)
U_i <- runif(1)
# Step 3: Compute α(θ)
alpha_i <- alpha(theta_i)
# Step 4: Accept if U ≤ α(θ)
accept_i <- (U_i <= alpha_i)
results[i, ] <- list(i, theta_i, U_i, alpha_i, accept_i)
}
results
## candidate theta U alpha accepted
## 1 1 0.2875775 0.7883051 0.9706616548 TRUE
## 2 2 0.4089769 0.8830174 0.9298714364 TRUE
## 3 3 0.9404673 0.0455565 0.0005061948 FALSE
## 4 4 0.5281055 0.8924190 0.6301277611 FALSE
## 5 5 0.5514350 0.4566147 0.5609158393 TRUE
Now let’s generate many accepted samples and compare them to the true Beta(3,5) density:
rejection_sampling <- function(n_accept, max_iter = 100000) {
accepted_theta <- numeric(n_accept)
accepted_count <- 0
iter <- 0
while(accepted_count < n_accept && iter < max_iter) {
iter <- iter + 1
# Draw θ and U
theta <- runif(1)
U <- runif(1)
# Compute acceptance probability
alpha_val <- (theta^2 * (1-theta)^4) / M
# Accept if U ≤ α(θ)
if(U <= alpha_val) {
accepted_count <- accepted_count + 1
accepted_theta[accepted_count] <- theta
}
}
return(accepted_theta)
}
# Generate 5000 accepted samples
set.seed(456)
samples <- rejection_sampling(5000)
# Compare with true Beta(3,5) density
hist(samples, breaks = 30, freq = FALSE,
main = "Rejection Sampling vs True Beta(3,5)",
xlab = expression(theta), col = "lightblue", border = "white")
# Add true Beta(3,5) density
curve(dbeta(x, 3, 5), add = TRUE, col = "red", lwd = 2)
# Add legend
legend("topright", legend = c("Rejection samples", "True Beta(3,5)"),
fill = c("lightblue", NA), col = c(NA, "red"), lwd = c(NA, 2))
cat("Sample mean:", mean(samples), "\n")
## Sample mean: 0.3747278
cat("True mean (Beta(3,5)):", 3/(3+5), "\n\n")
## True mean (Beta(3,5)): 0.375
cat("Sample variance:", var(samples), "\n")
## Sample variance: 0.02560176
cat("True variance (Beta(3,5)):", (3*5)/((3+5)^2*(3+5+1)), "\n")
## True variance (Beta(3,5)): 0.02604167
The histogram shows that the rejection sampling method successfully generates samples that follow the true Beta(3,5) distribution.
| Step | Operation | Result |
|---|---|---|
| 1 | Compute \(P(\text{retain} \mid \theta)\) | \(\frac{h(\theta)}{M g(\theta)}\) |
| 2 | Compute joint \(P(\theta \leq c, \text{retain})\) | \(\frac{1}{M} \int_{\theta \leq c} h(\theta) d\theta\) |
| 3 | Compute marginal \(P(\text{retain})\) | \(\frac{C}{M}\) |
| 4 | Divide: \(P(\theta \leq c \mid \text{retain})\) | \(\frac{\int_{\theta \leq c} h(\theta) d\theta}{C}\) |