Rejection Sampling: Proof and Implementation

Introduction

Rejection sampling is a method for generating independent samples from a target posterior distribution \(\pi(\theta \mid y)\) using a proposal density \(g(\theta)\). This document provides:

A detailed step-by-step proof that retained samples follow the target distribution
Manual experiments to verify the proof
R code implementation of rejection sampling

Notation

Symbol	Meaning
\(\pi(\theta \mid y)\)	Target posterior distribution
\(h(\theta)\)	Unnormalized posterior: Likelihood \(\times\) Prior
\(g(\theta)\)	Proposal (importance) density
\(M\)	\(\sup_{\theta} \frac{h(\theta)}{g(\theta)}\) (maximum ratio)
\(\alpha(\theta)\)	Acceptance probability: \(\frac{h(\theta)}{M g(\theta)}\)
\(C\)	Normalizing constant: \(\int h(\theta) d\theta\)
\(U\)	Uniform random variable on \((0,1)\)

Target Distribution

The target posterior is:

\[ \pi(\theta \mid y) = \frac{h(\theta)}{\int h(\theta) d\theta} = \frac{h(\theta)}{C} \]

where \(C = \int h(\theta) d\theta\) is the normalizing constant (often unknown).

The Rejection Sampling Algorithm

The algorithm has the following steps:

Draw a random sample \(\theta \sim g(\theta)\)
Draw a random sample \(U \sim \text{Uniform}(0,1)\) independently
Compute \(\alpha(\theta) = \frac{h(\theta)}{M g(\theta)}\)
If \(U \leq \alpha(\theta)\): retain \(\theta\) as a sample from \(\pi(\theta \mid y)\)
Otherwise: reject and repeat from step 1

The quantity \(\alpha(\theta)\) is called the acceptance probability of a candidate sample \(\theta\) generated from the importance sampling distribution.

Note that to implement the method we do not need the normalizing constant in \(\pi(\theta \mid y)\).

Detailed Proof

We now show that a sample drawn using the rejection sampling method has the exact distribution \(\pi(\theta \mid y)\).

Step 1: Setup

Define:

\(M = \sup_{\theta} \frac{h(\theta)}{g(\theta)}\) (a known constant)
\(\alpha(\theta) = \frac{h(\theta)}{M g(\theta)}\) (acceptance probability)

Step 2: Probability a candidate is retained given \(\theta\)

For a given \(\theta\):

\[ P(\text{retain} \mid \theta) = \alpha(\theta) = \frac{h(\theta)}{M g(\theta)} \]

Step 3: Marginal probability of retention

By the law of total probability:

\[ \begin{aligned} P(\text{retain}) &= \int_{-\infty}^{\infty} P(\text{retain} \mid \theta) \, g(\theta) \, d\theta \\ &= \int_{-\infty}^{\infty} \frac{h(\theta)}{M g(\theta)} \, g(\theta) \, d\theta \\ &= \frac{1}{M} \int_{-\infty}^{\infty} h(\theta) \, d\theta \\ &= \frac{C}{M} \end{aligned} \]

where \(C = \int_{-\infty}^{\infty} h(\theta) d\theta\).

Step 4: Joint probability (\(\theta \leq c\) AND retained)

This step explicitly uses the uniform random variable \(U\):

The event {\(\theta \leq c\) and retain} means: - \(\theta \leq c\) (from the first draw) - \(U \leq \alpha(\theta)\) (acceptance condition)

Since \(\theta\) and \(U\) are independent, their joint density is \(g(\theta) \times 1\) (because \(U\) has density 1 on \([0,1]\)).

Thus:

\[ \begin{aligned} P(\theta \leq c \text{ and retain}) &= \int_{\theta \leq c} \int_{u=0}^{\alpha(\theta)} g(\theta) \cdot 1 \, du \, d\theta \\ &= \int_{\theta \leq c} g(\theta) \cdot \alpha(\theta) \, d\theta \\ &= \int_{\theta \leq c} g(\theta) \cdot \frac{h(\theta)}{M g(\theta)} \, d\theta \\ &= \frac{1}{M} \int_{\theta \leq c} h(\theta) \, d\theta \end{aligned} \]

Step 5: Conditional probability (\(\theta \leq c\) | retained)

By definition of conditional probability:

\[ \begin{aligned} P(\theta \leq c \mid \text{retain}) &= \frac{P(\theta \leq c \text{ and retain})}{P(\text{retain})} \\ &= \frac{\frac{1}{M} \int_{\theta \leq c} h(\theta) \, d\theta}{\frac{C}{M}} \\ &= \frac{\int_{\theta \leq c} h(\theta) \, d\theta}{C} \end{aligned} \]

Step 6: Recognize the target distribution

The cumulative distribution function of \(\pi(\theta \mid y) = h(\theta)/C\) is:

\[ \int_{\theta \leq c} \pi(\theta \mid y) \, d\theta = \int_{\theta \leq c} \frac{h(\theta)}{C} \, d\theta = \frac{\int_{\theta \leq c} h(\theta) \, d\theta}{C} \]

Therefore:

\[ P(\theta \leq c \mid \text{retain}) = \int_{\theta \leq c} \pi(\theta \mid y) \, d\theta \]

Conclusion

The retained \(\theta\) has exactly the target distribution \(\pi(\theta \mid y)\). This derivation shows that the rejection sampling method generates exact draws from \(\pi(\theta \mid y)\) as claimed.

Key Insight: Why We Don’t Need the Normalizing Constant

Notice that \(C = \int h(\theta) d\theta\) cancels out completely in the final expression. We only need:

\(M\) - the maximum ratio \(h(\theta)/g(\theta)\)
The ratio \(h(\theta)/g(\theta)\) - not the individual values

This is powerful because the normalizing constant \(C\) is often impossible to compute directly, especially in high-dimensional Bayesian problems.

For example, if the prior distribution \(\pi(\theta)\) is taken as the importance sampling distribution \(g(\theta)\), then:

\[ M = \sup_{\theta} \frac{h(\theta)}{g(\theta)} = \sup_{\theta} \frac{\text{Likelihood} \times \text{Prior}}{\text{Prior}} = \max_{\theta} \text{Likelihood} \]

This strategy works for low-dimensional problems where it is easy to find the maximum likelihood estimate.

Manual Experiment: Beta-Binomial Example

Problem Setup

Suppose we want to sample from: - Target: Beta(\(a=3\), \(b=5\)) posterior distribution - Proposal: Uniform(0,1) which equals Beta(1,1) - \(h(\theta)\): \(\theta^{a-1}(1-\theta)^{b-1} = \theta^{2}(1-\theta)^{4}\) (unnormalized Beta)

Step 1: Find M

Since \(g(\theta)=1\) on \([0,1]\):

\[ M = \sup_{\theta} \frac{h(\theta)}{g(\theta)} = \max_{\theta \in [0,1]} \theta^{2}(1-\theta)^{4} \]

Find the maximum by taking the derivative:

Let \(f(\theta) = \theta^{2}(1-\theta)^{4}\)

\[ \begin{aligned} f'(\theta) &= 2\theta(1-\theta)^{4} - 4\theta^{2}(1-\theta)^{3} \\ &= 2\theta(1-\theta)^{3}[(1-\theta) - 2\theta] \\ &= 2\theta(1-\theta)^{3}(1-3\theta) \end{aligned} \]

Set \(f'(\theta)=0\): \(\theta=0\), \(\theta=1\), or \(\theta=1/3\)

The maximum is at \(\theta=1/3\):

\[ f(1/3) = (1/3)^{2}(2/3)^{4} = \frac{1}{9} \times \frac{16}{81} = \frac{16}{729} \approx 0.02195 \]

Thus \(M = 16/729 \approx 0.02195\)

Step 2: Acceptance probability function

\[ \alpha(\theta) = \frac{h(\theta)}{M g(\theta)} = \frac{\theta^{2}(1-\theta)^{4}}{16/729} = \frac{729}{16} \theta^{2}(1-\theta)^{4} \]

Note: \(\alpha(\theta) \leq 1\) for all \(\theta\) (by definition of \(M\))

Step 3: Manual calculation for one candidate

Let’s manually simulate one candidate \(\theta = 0.5\):

# Unnormalized target at θ = 0.5
h_theta <- (0.5)^2 * (1-0.5)^4
h_theta

## [1] 0.015625

# M value
M <- 16/729

# Proposal density (Uniform)
g_theta <- 1

# Acceptance probability
alpha <- h_theta / (M * g_theta)
alpha

## [1] 0.7119141

Since \(\alpha (0.5)=0.142<1\), only about 14.2% of \(U\) draws would accept this \(\theta\).

Step 4: Complete manual experiment for 5 candidates

ow let’s manually run through the algorithm for 5 candidates, showing each step explicitly:

set.seed(123)  # for reproducibility

# Parameters
M <- 16/729

# Function to compute alpha
alpha <- function(theta) {
  h_theta <- theta^2 * (1-theta)^4
  return(h_theta / M)
}

# Manual rejection sampling for 5 candidates
n_candidates <- 5
results <- data.frame(
  candidate = 1:n_candidates,
  theta = numeric(n_candidates),
  U = numeric(n_candidates),
  alpha = numeric(n_candidates),
  accepted = logical(n_candidates)
)

for(i in 1:n_candidates) {
  # Step 1: Draw θ from Uniform(0,1)
  theta_i <- runif(1)
  
  # Step 2: Draw U from Uniform(0,1)
  U_i <- runif(1)
  
  # Step 3: Compute α(θ)
  alpha_i <- alpha(theta_i)
  
  # Step 4: Accept if U ≤ α(θ)
  accept_i <- (U_i <= alpha_i)
  
  results[i, ] <- list(i, theta_i, U_i, alpha_i, accept_i)
}

results

##   candidate     theta         U        alpha accepted
## 1         1 0.2875775 0.7883051 0.9706616548     TRUE
## 2         2 0.4089769 0.8830174 0.9298714364     TRUE
## 3         3 0.9404673 0.0455565 0.0005061948    FALSE
## 4         4 0.5281055 0.8924190 0.6301277611    FALSE
## 5         5 0.5514350 0.4566147 0.5609158393     TRUE

Step 5: Verify distribution of accepted samples

Now let’s generate many accepted samples and compare them to the true Beta(3,5) density:

rejection_sampling <- function(n_accept, max_iter = 100000) {
  accepted_theta <- numeric(n_accept)
  accepted_count <- 0
  iter <- 0
  
  while(accepted_count < n_accept && iter < max_iter) {
    iter <- iter + 1
    
    # Draw θ and U
    theta <- runif(1)
    U <- runif(1)
    
    # Compute acceptance probability
    alpha_val <- (theta^2 * (1-theta)^4) / M
    
    # Accept if U ≤ α(θ)
    if(U <= alpha_val) {
      accepted_count <- accepted_count + 1
      accepted_theta[accepted_count] <- theta
    }
  }
  
  return(accepted_theta)
}

# Generate 5000 accepted samples
set.seed(456)
samples <- rejection_sampling(5000)

# Compare with true Beta(3,5) density
hist(samples, breaks = 30, freq = FALSE, 
     main = "Rejection Sampling vs True Beta(3,5)",
     xlab = expression(theta), col = "lightblue", border = "white")

# Add true Beta(3,5) density
curve(dbeta(x, 3, 5), add = TRUE, col = "red", lwd = 2)

# Add legend
legend("topright", legend = c("Rejection samples", "True Beta(3,5)"),
       fill = c("lightblue", NA), col = c(NA, "red"), lwd = c(NA, 2))

cat("Sample mean:", mean(samples), "\n")

## Sample mean: 0.3747278

cat("True mean (Beta(3,5)):", 3/(3+5), "\n\n")

## True mean (Beta(3,5)): 0.375

cat("Sample variance:", var(samples), "\n")

## Sample variance: 0.02560176

cat("True variance (Beta(3,5)):", (3*5)/((3+5)^2*(3+5+1)), "\n")

## True variance (Beta(3,5)): 0.02604167

The histogram shows that the rejection sampling method successfully generates samples that follow the true Beta(3,5) distribution.

Summary of the Proof Structure

Step	Operation	Result
1	Compute \(P(\text{retain} \mid \theta)\)	\(\frac{h(\theta)}{M g(\theta)}\)
2	Compute joint \(P(\theta \leq c, \text{retain})\)	\(\frac{1}{M} \int_{\theta \leq c} h(\theta) d\theta\)
3	Compute marginal \(P(\text{retain})\)	\(\frac{C}{M}\)
4	Divide: \(P(\theta \leq c \mid \text{retain})\)	\(\frac{\int_{\theta \leq c} h(\theta) d\theta}{C}\)