(Instructor : Nishant Panda) (Additional Reference (IMC) : Introducing Monte Carlo Methods with R, Christian P. Robert & George Casella, Springer)

Introduction

Simulation is an important technique in applied mathematics. It typically means using computers to conduct experiments! In statistics, it is a very useful way to check if a statistical model has the necessary properties and this involves sampling from probability distributions. In most real life situations, analytical derivations of statistical properties for your model is rarely possible and one usually works with “Large sample approximations”. To test properties in finite samples simulation studies are extremely crucial!

Random Variable generation

The core step in simulation is drawing samples from probability distributions. This will be the topic of interest in this lecture.

How does R generate samples from a “named” distribution? For example rnorm generates samples from a normal distribution.

# Generates 10,000 samples from a normal distribution
# with mean 2 and standard deviation 0.5
norm.samples <- rnorm(10000, mean = 2, sd = 0.5)
hist(norm.samples, breaks = 21, col = "steelblue")

# Plot the "density"
plot(density(norm.samples))
polygon(density(norm.samples), col="steelblue", border="red")

As it turns out, all that is needed to generate samples from any distribution is a uniform number generator. Let us assume that this is God Given (i.e God wrote the runif).

unif.samples <- runif(100000, 0, 2)
plot(density(unif.samples))
polygon(density(unif.samples), col="steelblue", border="red")

Note, that there is no such thing as a “random” number generator! The computer generates numbers deterministically that satistisfies all the properties of the distribution. This can be seen through set.seed method in R.

set.seed(1)
runif(5)
## [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
set.seed(1)
runif(5)
## [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819

Methods that produce an endless supply of random samples from distributions are termed as Monte Carlo methods.

For a nice overview see Wiki https://en.wikipedia.org/wiki/Monte_Carlo_method

Inverse Transform method

The inverse transform theorem is a technical result in probability theory which states that if \(X\) is a random variable with the C.D.F \(F(x)\), then if \(u\) is a random sample from the uniform(0,1) then \(F^{-1}(u)\) is a random sample of \(X\).

(Technically \(F^{-1}\) is NOT the inverse of the C.D.F but for the purpose of this course thinking of it as an inverse function will suffice!)

If we want to generate random samples of \(X\), what this means is the following:
1. First generate random samples \(u_i\) from uniform \(U(0,1)\).
2. Let \(x_i = F^{-1}(u_i)\). Then \(x_i\)’s are random samples from \(X\).

(Home Assignment!) Let \(X \sim\) exponentially with rate 1. Show that the C.D.F of \(X\) is

\[ F(x) = 1 - e^{-x} \]

Example 1: Using the inverse transform method, generate 10000 samples from Exponential Distribution with rate 1 i.e generate samples from \(X\sim\) Exp(1).

First we need to get the inverse function \(F^{-1}\). This means that if \(F(x) = y\), then we solve for \(x\). From Home Assignment 1 in this Lecture, we know that \(F(x) = 1 - e^{-x}\). Thus solving for \(y\), \[ \begin{aligned} &y = 1 - e^{-x} \\ & \implies e^{-x} = 1 - y \\ & \implies -x = log(1-y)\\ & \implies x = -log(1-y) \end{aligned} \] Hence, the inverse function is given by \(F^{-1}(y) = -log(1-y)\).

# Step 1: generate n samples from uniform random variable
n <- 10000
u <- runif(n)

# Step 2: compute F^-1(u)
my.exp.samples <- sapply(u, function(u) -1.0*log(1-u))

Let us test this visually with a plot.

plot(my.exp.samples)

Oops What happened! This doesn’t look like an exponential distribution, does it? We are plotting samples and not the distribution! Let us look at the histogram!

# Plot histogram
hist(my.exp.samples, breaks = 21, col = "steelblue", main = "Histogram of Exp(1) samples" )

# Plot density
plot(density(my.exp.samples), xlim = c(0.5, 5), main = "Density plot for Exp(1)")
polygon(density(my.exp.samples), col="steelblue", border="red", xlim = c(0.5, 5))

Let us look at the R version

# Plot density
plot(density(rexp(10000)), xlim = c(0.5, 5), main = "Density plot for Exp(1) from rexp")
polygon(density(rexp(10000)), col="steelblue", border="red")

(From IMC) “The generation of uniform random variables is therefore a key determinant of the behavior of simulation methods for other probability distributions since those distributions can be represented as a deterministic transformation of uniform random variables.”

(Home Assignment!) Let \(X\) be the Logistic distribution with the C.D.F given by \[F(x) = \frac{1}{1 + e^{-(\frac{(x-\mu)}{\beta})}}\] for two parameters \(\mu\) and \(\beta\). Generate 1000 random samples using the inverse transform method. Take \(\mu = 5\) and \(\beta = 2\).

R uses the inverse transform method to generate samples from the Normal distribution. Can we use the inverse transform method to generate samples from discrete distributions like Binomial etc? Yes we can! But, we won’t delve further into this.

Accept-Reject simulation method

Typically, a C.D.F won’t come up in nice closed expression for us to invert! When we come up against such a wall, the next simulation method of choice is almost always going to be the Accept-Reject method. Accept-Reject method fall under a large class of simulation methods called Indirect methods.

Again, for a great read see Wiki https://en.wikipedia.org/wiki/Rejection_sampling

With indirect methods we generate samples from a candidate (simpler) random variable and only accept it as the sample of our target random variable subject to passing a test.

Notation:

  1. \(X\sim f\) is the random variable of interest and \(f(x)\) is called the target density.
  2. Let \(g(x)\) be another density from which we (or R!) can generate samples having the same support as \(f(x)\). In other words, \(g(x) > 0\) exactly for those \(x\) for which \(f(x) > 0\). We call \(g\) a candidate density.
  3. Let \(e(x)\) be the envelope function that serves as an upper bound for \(f\) having the property \(e(x) = Mg(x) \geq f(x)\), where \(M \geq 1\) is some constant.

Goal: Draw a sample of \(X\) i.e simulate from \(f\).

The accept-reject algorithm can be compactly stated as follows:
1. Generate \(Y\sim g\) and \(U\sim U(0,1)\).
2. Set \(X = Y\) if \[ U < \frac{f(Y)}{e(Y)}, \]
3. else, go back to 1.

We see that picking an envelope (i.e picking \(M\) and a candidate density) is crucial for this method to work. Let us illustrate this with the following example.

Example 2: Generate 1000 samples from \(Beta(4, 3)\) using the accept-reject method.

We know that if \(X \sim Beta(\alpha, \beta)\), then its density \(f\) is given by, \[ f(x\lvert \alpha,\beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha - 1}(1-x)^{\beta - 1} \]

Plugging in \(\alpha = 4\) and \(\beta = 3\), we get \[ f(x) = 60x^3(1-x)^2 \]

What is the support of \(f\)?

Any \(Beta\) distribution is supported on \(\left[0, 1\right]\). Let us plot this density.

x.seq <- seq(from = 0, to = 1, length.out = 101)
f <- sapply(x.seq, function(x) 60*x^3*(1-x)^2)
library(ggplot2)
# put the x and y values in a data frame
plot_data <- data.frame("x" = x.seq, "f" = f)


# create the base layer of ggplot with geom_Line
plot_f <- ggplot(plot_data, aes(x = x, y = f)) + 
  geom_line(color = "red", size = 1.5) + 
  labs(x = "x", y = "f(x| 4,3)")

plot_f

Now, our candidate density function \(g\) should have the same support as \(f\) and should be a valid density. Lets take \(g\) to be uniform density \(U(0,1)\).

A naive choice for the envelope \(e(x) = Mg(x)\), would be to take \(M = \max(f(x))\). Thus, we can guarantee that \(e(x) \geq f(x)\). This naive choice works whenver we have a compact support for \(f(x)\) and can find a global maxima. Lets find \(M\). We can use the optimize function in R to find the maxima!

max_p <- optimize(function(x) 60*x^3*(1-x)^2, c(0.05, 0.95), maximum = TRUE)
round(max_p$maximum, 4)
## [1] 0.6

Thus, \(M = f(0.6)\).

# Lets create our beta density f
beta.f <- function(x) {
   60*x^3*(1-x)^2
} 

M <- beta.f(0.6)
print(M)
## [1] 2.0736

Our candidate density function \(g(x) = 1\) for any \(x\) in \(\left[0, 1\right]\) because we it to be uniform(0,1). Hence, our envelope function \(e(y) = M\). Now, let us write a simple accept-reject algorithm for this example

num.samp <- 1000
beta.samples <- rep(0, num.samp) 
count.samp <- 0

# keep running the accept-reject method until you have drawn 1000 samples

while(count.samp < num.samp) {
  # Step 1: generat sample y from g and u from unif(0,1)
  y <- runif(1) # g is U(0,1)
  u <- runif(1)
  
  # Step 2: Check the test
  if (u < beta.f(y)/M) {
    # increase the counter
    count.samp <- count.samp + 1
    
    # set X = Y
    beta.samples[count.samp] <- y
  }
}

Let us check if this worked

hist(beta.samples,prob=T,ylab="f(x)",xlab="x",ylim=c(0,max(beta.f(x.seq))), 
     main="Histogram of draws from Beta(4,3)" )
lines(x.seq,f,lty=2,col = "red")

Let us visualize which samples were accepted and which were rejected.

# Total number of samples
num.sim <- 3000

# testing condition : u < f/e and hence ue < f

# e is just M, so ue is u * M
ue <- M*runif(num.sim)

# y is a sample from g and g is U(0,1)
y <- runif(num.sim)
# create all the pairs y and ue
mat <- cbind(y, ue)
# put them in a data frame
plot_data <- data.frame(mat)
colnames(plot_data) <- c("y", "ue")
# create a factor variable to see which were accepted
plot_data$Accepted <- ue < beta.f(y)

# show the first 5 pairs
head(plot_data,5)
##           y        ue Accepted
## 1 0.5521227 0.6982361     TRUE
## 2 0.8818690 1.0105790    FALSE
## 3 0.9236169 1.3849691    FALSE
## 4 0.2549313 0.9364871    FALSE
## 5 0.1082588 0.5137490    FALSE
# we will create a scatter plot
# "x" values will be y (i.e samples from g)
# "y" values will be ue (u * e)

# If (u*e < f(y)) then samples are accepted otherwise rejected
# Hence, in scatter plot we will color by the Accepted variable

plot_ar <- ggplot(plot_data, aes(x=y, y=ue)) + 
  geom_point(shape=20, aes(color=Accepted)) + 
  stat_function(fun = beta.f) + 
  geom_vline(xintercept = 0.0) + 
  geom_vline(xintercept = 1.0) +
  geom_hline(yintercept = M) + 
  geom_hline(yintercept = 0)


# beautify : Increase font size in axes 
plot_ar + theme(axis.text.x = 
                 element_text(face = "bold",size = 12),
               axis.text.y = 
                 element_text(face = "bold", size = 12),
               axis.line = element_line(colour = "black", 
                      size = 1, linetype = "solid"))

Let us check what proportion of the samples were accepted

round(sum(plot_data$Accepted)/nrow(plot_data),2)
## [1] 0.48
round((1/M),2)
## [1] 0.48

This is not a coincidence! \(\frac{1}{M}\) is the expected proportion of candidates that are accepted. If we find a better \(M\) (i.e a tighter envelope) we will on avergae have fewer rejections.

A good envelope should have the following properties:

  1. Exceeds the target \(f\) everywhere
  2. Easy to sample from \(g\).
  3. Generete few rejections.

If \(f\) is supported on a set \(\left[a, b\right]\), then a naive but good choice for \(e\) is \(\max(f(x)) * \frac{1}{b-a}\) i.e we take the candidate density to be uniform \(U(a,b)\) and \(M = \max(f(x))\). Will this method work if \(f \sim N(\mu,\sigma)\)?

(Home Assignment!) Consider the strange density \[f(x) = c e^{-x^2/2}\left(sin(6x)^2 + 3cos(x)^2sin(4x)^2 + 1\right).\] Use the Accept-Reject algorithm to sample from \(f\) by filling out the following steps: 1. We don’t need to know the normalizing constant \(c\) to sample from \(f\)! We can disregard \(c\) in this question! The support of \(f\) is \((-\infty, \infty)\). Take \(g(x)\) to be the standard normal density. 2. Find a suitable \(M\) (Hint, use optimize!). 3. Plot \(Mg(x)\) and \(f\)(without the c). 4. Generate 1000 samples from \(f\)(without the c!). 5. Mimic the plot in this lecture to show the accepted and rejected samples.
f <- function(x){
  exp(-x^2/2)*(sin(6*x)^2 + 3*cos(x)^2 * sin(4*x)^2+ 1)
}
opt.f <-optimize(f, c(-2,2), maximum = TRUE)
norm <- function(x){
  exp(-x^2/2)
}

M = 1.05*f(opt.f$maximum)

x.seq = seq(from = -10, to = 10, length.out = 201)
f.seq = sapply(x.seq, f)
norm.seq = sapply(x.seq, norm)
plot(x.seq, f.seq, type = "l")
lines(x.seq, M*norm.seq)