Generating Random Samples using the Rejection Method (Project-2)

Problem Description
First 25 values generated and accepted
Acceptance and Rejection Rates
Histogram of generated values
Mean, standard deviation, and probability X < 2
Discussion
Code

Problem Description

Use the Rejection Method to generate n = 10000 values for a random variable X that has the pdf of \(f(x) = \frac{x^{2}}{9}, 0 < x < 3\).
Create a density histogram for your generated values.
Superimpose the probability density function of the target distribution to the histogram.
Use the generated values to estimate the mean, standard deviation, and the probability that X < 2.
Compare these estimated values with the theoretic ones.
Report the rejection rate.

First 25 values generated and accepted

##  [1] 1.656967 2.437208 1.639676 1.260304 1.647290 2.968692 1.773963 1.792726
##  [9] 2.881719 2.502081 2.199943 2.720863 2.719279 1.552379 2.913350 2.897970
## [17] 2.749219 1.057162 2.985264 1.654954 2.214784 1.739763 1.416390 2.019748
## [25] 2.851132

Acceptance and Rejection Rates

The Acceptance Rate:

## [1] 0.3341799

The Rejection Rate:

## [1] 0.6658201

Histogram of generated values

Mean, standard deviation, and probability X < 2

Theoretical mean:

\(\mu = E(X) = \int_{0}^{3}xf(x)dx = \frac{9}{4} = 2.25\)

Estimated mean from experiment:

## [1] 2.254027

Theoretical standard deviation:

\(\sigma = sqrt(Var(X)) = sqrt(\int_{0}^{3}x^2f(x)dx - \mu^2) = sqrt( [\left. \frac{1}{45}x^5 \right|_{0}^{3}]-(\frac{9}{4})^2 ) = sqrt(\frac{243}{720}) = 0.58095\)

Estimated standard deviation:

## [1] 0.5795161

Theoretical probability X < 2:

\(P(X < 2) = \int_{0}^{2}f(x)dx = \left. \frac{1}{27}x^3 \right|_{0}^{2} = \frac{8}{27} = 0.2963\)

Estimated probability X < 2:

## [1] 0.2901

Discussion

The Rejection Method we used here is not the most efficient way (from a computer’s perspective) to generate values. This is illustrated by the acceptance and rejection rates, where we generate nearly 30,000 values, but only accept 10,000 of them. However, generating values in this manner is much more intuitive from a statistician or programmer’s perspective. We use the rejection method when we are given a simple probability density function (pdf) and the CDF is complicated.

As we generate more and more values, we can use these values to estimate the mean, standard deviation, and probability. Comparing the theoretical and estimated mean, std. dev., and probability that X is less than 2 from above, we see that generating 10,000 values has given us estimates very close to the theoretical ones.

The same process can be repeated for generating X from any pdf, no matter how complicated. We don’t need to know the theoretical values, as long as we generate a large # of values.

Code

set.seed(100)

# number of samples to generate
n = 10000
n_trials = 0 # number of trials it takes to accept n samples

# pdf
f = function(x){
  return( (x*x)/9 )
}

# lower/upper bounds for X
a = 0
b = 3
# max. f(X) value
c = 1 

# initialize empty list to store samples after accepted
x <- c()

# run loop until n=10,000 generated samples are "accepted"
for( i in 1:n ){
  while(TRUE){
    
    # generate a point (X,Y) and determine whether to accept or reject
    X = runif( n = 1, min = a, max = b )
    Y = runif( n = 1, min = 0, max = c )
    
    # increment the # of trials, no matter if point is acc. or rej.
    n_trials = n_trials + 1
    
    # Accept a point if Y <= f(X), if point is within area under the pdf curve
    if( Y <= f(X) ){
      break # When point is accepted, break the while loop
    } 
    # else: reject the point, do not break while loop: we need to regenerate a new point.
  }
  
  # after point is accepted, we need to store the accepted value
  x[i] <- X
}

# first 25 "accepted" values
head(x, 25)

# accpetance rate: number of trials accepted of the total number of trials
n / n_trials

# rejection rate: umber of trials that weren't accepted 
(n_trials - n) / n_trials

# histogram of x
hist( x, prob = TRUE )
curve( f, add = TRUE, col = "red", lty = 2 )
legend(0.25, 0.75, c(expression(paste("pdf: f(x) = ", frac(x^2,9)))), text.col = "red", bty = "n")

# Estimates
mean(x)
sd(x)
mean(x < 2) # P(X < 2)