Introduction to Monte Carlo Methods

\[E_f\Big[h(X)\Big] = \int_{x} h(x)f(x)dx\]

Using a sample from \(f(x)\), \(X_1, X_2, ..., X_m\), one can approximate this by the emprirical average:

\[\bar{h} = \frac{1}{m} \sum_{j=1}^{m} h(x_j)\]

Since \(\bar{h}\) converges almost surely to \(E_f[h(x)]\), by the Strong Law of Large Numbers.

Monte Carlo Methods are framed as solving one of two problems:

Problem 1: to generate samples \(x_r, r = 1,...,R\) from a given probability distribution \(P(x)\)

Problem 2: to estimate expectations of functions under this distribution, e.g.

\[\Phi = \int\phi(x)P(x)d^{N}x\]

where \(N\) refers to the dimension of the distribution.

Example:

\(X_1,X_2,...\) is a sequence of random variables, where \(X_i\) is exponentially distributed with \(\mu = X_{i-1}\).

We assume that the mean of \(X_1\) (\(\bar{X_1}\)), is a known constant, say \(x_0\).

We are interested in the distribution of \(S=\sum_{i=1}^N X_i\), where \(N\) is itself a random variable, say, a negative binomial, with parameters \(p\) and \(r\).

Negative Binomial PDF: \(X \sim NB(r;p)\)

i.e. The probability of \(k\) successes \(Pr(X=k)\) is given by:

\[f(k;r,p)\equiv Pr(X=k)=\Big(\frac {(k+r-1)!}{k!(r-1)!} \Big)p^k(1-p)^r\]

for \(k=0,1,2,...\) where \(k\) is the number of successes, \(r\) is the number of failures and \(p\) is the probability of success.

This is our workflow in Pseudocode:

Set:

\[SSQ=0\] \[Sum=0\]

for (\(i\) in \(1\rightarrow BIG\))

\(\{BEGIN \space FOR-LOOP \space 1\)

Generate an \(N\) from \(N\sim NB(p;r)\)

Set:

\[Mean=x_0\space\space(given)\]

\[S=0\]

for(\(j\) in \(1\rightarrow N\))

\(\{BEGIN \space FOR-LOOP \space 2\)

Generate an \(X\) from \(X\sim Exp(\lambda=x_0)\)

Set:

\[S=S+X\] \[Mean=X\]

\(END \space FOR-LOOP\space 2\}\)

Set:

\[SSQ=SSQ+\big((S-E[S])\times(S-E[S])\big)\]

where \(E[S]\) is assumed known

if \(S>c\)

\(BEGIN \space IF-STATEMENT\)

Set:

\[Sum = Sum+1\]

\(END \space IF-STATEMENT\)

\(END \space FOR-LOOP\space 1\}\)

\[V_{Estimate} = \frac{SSQ}{BIG}\]

\[P_{Estimate}=\frac{Sum}{BIG}\]

Use of the probability integral transform:

  1. Generate a realization, \(u\), from the uniform distribution on \([0,1]\)
  2. Solve for \(x\) on the (generally non-linear) equation \(F(X) = u\); use this as a realization of \(X\)

i.e. \(x = F^-1(u)\), where \(F^1(u)\) is the inverse function of \(u\).

The probability that \(X\) is less than or equal to any real value \(x\) is precisely equal to the probability that the random number \(u\) is less than or equal to \(F(x)\)

\(Pr[U \leq a]=a\) for any \(0\leq a \leq 1\), Thus:

\[Pr[X \leq x] = Pr[U \leq F(x)] = F(x)\]

For a discrete random variable, let’s implement this:

Let’s assume \(X\) takes on non-negative integer values only, \(Pr[X=k]=p_{k}\) for any non-negative integer \(k\).

So \(F(0)=p_{0}\), while for \(k>0\), \(F(k)=F(k-1)+p_{k}\), for any non-negative integer \(x\) such that \(k<x<k+1\), \(F(x)=F(k)\); the algorithm to generate \(X\) is then:

  1. Generate \(u\) from \(u \sim U(0,1)\), set \(k=0\) (integer) and \(F=p_{0}\) (real).

  2. If \(u \leq F\), EXIT, returning \(X=k\); otherwise, set \(k=k+1\), \(F=F+p_k\) and repeat this step.

For continuous random variables:

We explicitly have to solve \(F(x)=u\)

e.g. Exponential distribution:

X has a probability density function defined by:

\[f(x)=\frac{1}{\mu}e^{\frac{-x}{\mu}}\] The distribution function is then:

\[F(x)=1-e^{\frac{-x}{\mu}}\]

And the solution to \(F(x)=u\) is then: \(x=-\mu \space ln(1-u)\)

The Box-Muller method for the Normal Distribution:

There is no closed-form expression for the CDF of the Normal Distribution, but using a pair of independent standard normal deviates, \(X\) and \(Y\); the joint PDF is then given by:

\[f(x,y)=\frac{1}{2\pi} exp\Big[\frac{1}{2}(x^2+y^2)\Big]\]

Now, tranform to two random variables, \(V\) and \(\Phi\), defined such that \(X=\sqrt{2V}cos(\Phi)\) and \(Y=\sqrt{2V}sin(\Phi)\), s that the joint pdf of \(V\) and \(\Phi\) is given by:

\[f(v,\phi)=\frac{1}{2\pi} e^{-v}\]

over the region defined by \(0 < v < \infty\) and \(0<\phi<2\pi\)

\(V\) and \(\Phi\) are independent random variables, \(V\sim Exp(\lambda =1)\) and \(\Phi \sim U(0,2\pi)\)

We generate realizations of \(V\) and \(\Phi\) and transform them to the desired \(X\) and \(Y\).

Acceptence-Rejection Methods

To generate a random variable \(X\) from a distribution defined by the pdf \(f(x)\), that does not allow for an inverse transformation, we need another distribution \(h(x)\), from which we can generate random numbers and that has the property: \(h(x)=0\) only when \(f(x)=0\)

The density \(h(x)\) should be “similar” to \(f(x)\) in some known sense, so that when we generate values from \(h(x)\), we can reject a certain proportion of the values, to compensate for the differences.

First, let’s define two quantities:

\[C=max_{[x|h(x)>0]}\frac{f(x)}{h(x)}\]

and

\[g(x)=\frac{f(x)}{C.h(x)}\]

Assuming the following:

\[h(x)>0; 1\leq C; 0 \leq g(x) \leq 1\]

The algorithm proceeds as follows:

  1. Generate a random variable \(Y\) from \(h(y)\); and an independent random variable \(U\) from \(u \sim U(0,1)\)

Let \(y\) and \(u\) be the values generated.

  1. If \(u \leq g(y)\) ACCEPT the value, i.e. set \(X=y\), otherwise REJECT both values and repeat the previous step

This generates random variable \(Y\) with the conditional distribution for \(Y\) given the event

\(\{U \leq g(Y)\}\)

The conditional density is given from Bayes’ theorem as follows: \(h(y|U \leq g(Y))=\frac{h(y)Pr[U \leq g(Y)|Y=y]}{Pr[U \leq g(Y)]}\)

This simplifies to:

\[h(y|U \leq g(y))=C.h(y).g(y)=f(y)\]

which is the desired density.

The probability of acceptance is \(\frac{1}{C}\), the number of iterations required is the number of rejections before the first acceptance, the number of iterations required then has a geometric distribution with parameter \(1/C\) and expected value \(C\).

It is important to insure that the tails of \(h(.)\) are at least as dense as the tails of \(f(.)\)

Example:

We want to generate values from a gamma distribution with pdf:

\[f(x)=\frac{x^{\alpha-1}e^{-x}}{\Gamma(\alpha)}\]

  • For integer values of \(\alpha\), \(x\) can be generated as the sum of \(\alpha\) exponentially distributed random variables

  • For half-integer values of \(\alpha\), \(x\) can be generated by the sum of squares of \(2 \alpha\) exponentially distributed random variables

  • This becomes computationally expensive for moderately large \(\alpha\) values, and only works if \(\alpha\) is integer or half-integer

So, let’s rather use Acceptance-Rejection:

For the density \(h(x)\), we would use the exponential distribution with the same mean as \(f(x)\), i.e. \(\alpha\), thus

\[h(x)=\frac{1}{\alpha}exp(\frac{-x}{\alpha})\]

In any case, let’s see how we can implement all this theory in R:

nsamp<-1000 #number of samples drawn
lambda <- 5
x <- rexp(nsamp,lambda) #sample from exponential distribution with lambda=5
hist(x,prob=T) #plot the random draw of 1000 values from Exp(lambda=5)

mean(x)#mean of samples
## [1] 0.1992264
#should be roughly equal to 1/lambda
1/lambda
## [1] 0.2

Exercise 1:

Calculate \(E(X^2)\) i.e. \(\int x^2 f(x)dx\), where \(X \sim Exp(\lambda = 5)\)

nsamp=10000
lambda=5
x.square<-rexp(nsamp,lambda)^2 #NOTE the square!
hist(x.square,prob=T,sub=paste0("Mean of samples = ",mean(x.square)))

Exercise 2: Solution using integrate

hx<-function(x){ (x^2)*5*exp(-5*x) } #define an exponential distrbution function with lambda=5, multiplied by x^2, to get the expected value of X^2
integrate(hx, lower=0, upper=Inf) #integrate from 0 to infinity
## 0.08 with absolute error < 8.2e-06

Exponential distribution example (continued)

*Draw 5 obserations from \(Exp(\lambda = 5)\), i.e. a sample with n=5

mean(rexp(5,5))
## [1] 0.178771
1/5
## [1] 0.2
#increase sample size

mean(rexp(10,5))
## [1] 0.1729375
1/5
## [1] 0.2
#hundred

mean(rexp(100,5))
## [1] 0.1743087
1/5
## [1] 0.2
#thousand observations:

mean(rexp(1000,5))
## [1] 0.2082827
1/5
## [1] 0.2
#100 thousand observations:

mean(rexp(100000,5))
## [1] 0.2001176
1/5
## [1] 0.2

We are more likely to get close to the expected value of X, with a larger sample size:

n_sizes<-seq(from=5, by=15, length.out=1000)

draw_exp <- function(size){
  sample <- (rexp(size,5))
  mu <- mean(sample)
}

means <- sapply(n_sizes, draw_exp,simplify="vector")

hist(means,breaks=1000)

plot(means~n_sizes,type="l")
abline(h=1/lambda,col="red")

As one would expect from the Law of Large Numbers: Means are approximately normally distributed, and As sample size increases, means converge on the expected value of 0.2

n_sizes <- rep(5,5000)
meanOf5samples <- sapply(n_sizes, draw_exp,simplify="vector")
hist(meanOf5samples)

nsize<-c(5, 10, 300, 1500)

for(i in nsize){
  n_sizes <- rep(i,5000)
  means <- sapply(n_sizes, draw_exp,simplify="vector")
  hist(means,main=paste0("Sample size = ",i),breaks=1000,border="blue")
  abline(v=1/lambda,col="red",lwd=5)
}

Exercise 3:

Assume \(X\) is a random variable:

\[X \sim Exp(\lambda = 5)\]

Use Monte Carlo simulation methods to obtain:

lambda=5
#define pdf of exp(x,5)
f.x <- function(lambda=5,x=X){
  
  beta <- 1/lambda
  
  return((1/beta)*exp(-x/beta))
  
}

#define the CDF for the same distribution:

F.x <- function(lambda=5,x){
  return(1-exp(-lambda*x))
}

#define inverse CDF

F.inv <- function(lambda,p){
  return(-log(1-p)/lambda)
}

#so if F(x)=U

#we get the x value associated with a generated U, using this function

get.x <- function(lambda=5,U){
  
  mu <- 1/lambda
  
  x <- -mu*(log(U))
  
}

#sigma squared is the integral of x^2 * f(x)

sigma.est <- function(x){
  return(x^2*f.x(x=x))
}

count <- 0
sigma <- c()
#varX <- c()

while(count<10000){
  #Probability integral transform

  #generatwe 10000 realizations from the uniform distribution
U <- runif(10000)

#get x as an inverse function of U
x <- get.x(U=U)
#x <- mean(x)
# varX <- c(varX,var(x))
#calculate the estimated sigma squared value using this x and add it to the results vector
  sigma <- c(sigma,mean(sigma.est(x=x)))
  
  #increment loop variable
count <- count+1
}

#plot a histogram of sigma squared estimates
plot(sigma)
abline(h=mean(sigma),col="red",lwd=3)

abline(h=1/(lambda^2),col="blue",lwd=3)

# U <- runif(100000)
# x <- get.x(U=U)

x <- seq(0,1,length.out = 1000000)

mu <- x*f.x(x=x)
hist(x)

plot(x=x,y=mu)
abline(v=mean(mu),col="blue")
abline(v=1/lambda,col="red")

plot(x=x,y=f.x(x=x))

plot(x=x,y=F.x(x=x))

sigma <- (x-mu)^2

hist(sigma,breaks = 1000,freq=F)

mean(sigma)
## [1] 0.2431394
U <- runif(1000000)
x <- get.x(U=U)

hist(x)

x.in.range <- x[x>0.1&x<0.3]

prob.x.in.range <- length(x.in.range)/length(x)

prob.x.in.range
## [1] 0.384018
#Test:

F.x(x=0.3)-F.x(x=0.1)
## [1] 0.3834005

More Monte Carlo:

Suppose \(X \sim Exp(\lambda)\)

\[f(x)=\lambda e^{-\lambda x}\]

\[F(x)=1-e^{-\lambda x}\] \[F^{-1}(x)=-\frac{1}{\lambda}log(1-x)\]

F. <- function(x,lambda){
  return(1-exp(-lambda*x))
}

F.plot.vals <- F.(x=seq(1,30,length.out = 1000),lambda=2)

par(mfrow=c(3,3))

curve(F.(lambda=1,x),from=1,to=30)
curve(F.(lambda=2,x),from=1,to=30)
curve(F.(lambda=5,x),from=1,to=30)
curve(F.(lambda=10,x),from=1,to=30)
curve(F.(lambda=20,x),from=1,to=30)
curve(F.(lambda=0.9,x),from=1,to=30)
curve(F.(lambda=0.5,x),from=1,to=30)
curve(F.(lambda=0.2,x),from=1,to=30)
curve(F.(lambda=0.1,x),from=1,to=30)

Probability integral transform:

U <- runif(10000)

F.inv <- function(x,lambda){
  return(-1/lambda*log10(1-x))
}

X <- F.inv(x=U,lambda=5)

f. <- function(x,lambda){
  return(lambda*exp(-lambda*x))
}

Y <- f.(X,5)

x <- seq(0,1,length.out = 10000)

densiity.x <- dexp(x=x,rate=5)
par(mfrow=c(1,1))
plot(x=x,y=densiity.x,type="l",lwd=5)
points(x=X,y=Y,col="red",cex=0.1)

Accept-Reject Method:

We want to sample from the following distribution:

\[f(x) \propto e^{-x^2/2}sin(2x)^2\]

We cannot use probability integral transform, since we don’t know the inverse CDF

So:

Steps:

  1. Generate a candidate value \(y\) from \(h(x)\)
  2. Calculate \(C = max(\frac{f(x)}{h(x)})\)
  3. Define \(g(y)= \frac{f(y)}{C.h(y)} \space \leftarrow\) proportion to accept
  4. Generate \(u \sim U(0,1)\): 4.1 If \(u < g(y)\) accept 4.2 If \(u > g(y)\) reject
  5. Repeat all steps until yuo have \(N\) values.

In R code:

#define target distribution f(x)
f. <- function(x){
  return(exp((-x^2)/2)*sin(2*x)^2)
}

#define candidate distribution h(x)

h. <- function(x){
  
  return((1/sqrt(2*pi))*exp((-x^2)/2))
  
}

#define f(x)/h(x):

f.x_h.x <- function(x){
  return(sqrt(2*pi*(sin(2*x))^2))
}

C <- max(f.x_h.x(x=-100:100))

plot(x=-100:100,y=f.x_h.x(-100:100),type="l")
abline(h=C,col="red",lwd=5)
abline(h=sqrt(2*pi),col="yellow",lwd=2)

#this should be roughly equal to the sqrt(2*pi) as can be seen from the plot

#So we set C equal to the ideal value

C <- sqrt(2*pi)

#define g(x)=f(x)/C.h(x)

# g. <- function(x){
#   
#   return((sqrt(2*pi)*(sin(2*x)/C)))
#   
# }

g. <-function(x){
  return(f.(x)/h.(x))
}
#sample a u
h <- c()

count <-1

x <- seq(-5,5,length.out = 100000)

fx <- f.(x)

plot(x,fx)

integrated.fx <- integrate(f.,-Inf,Inf)

gx <- g.(x)

plot(x,gx)

hx <- h.(x)

plot(x,hx)

x. <- c()

#########################################################################

#sample size
N <- 1000000

#scaling factor
C <- max(gx)

#get a sample of Y values from h(x)
Y <- rnorm(N)

gy <- g.(Y)/C

U <- runif(1000000)

index <- which((U<=gy)==T)

keep_y <- Y[index]

length(keep_y)/N
## [1] 0.500035
hist(keep_y,breaks=1000,prob=T)

lines(x,f.(x)/integrated.fx$value,col="red",lwd=5)

# while(count<=100000){
#   
# u <- runif(1)
# 
# X <- sample(x,size=1)
# 
# Y <- h.(X)
# 
# gy <- g.(Y)
# 
# keep <- u<=gy
# 
# if(keep==T){
#   h <- c(h,g.(X))
#   x. <- c(x.,X)
#   
# }
#   
# count <- count+1
# }
# 
# hist(h,breaks=1000)
# 
# plot(density(h))
# 
# plot(x=x.,y=h,type="h",ylim=c(0,1))
# 
# 
# 
# curve(f.,from=-3,to=3,col="red",lwd=3,add=T)
# #curve(h.,from=-3,to=3,col="blue",lwd=3,add=T)