Purpose:

I just want to recall the reasons behind variance reduction techniques. Monte Carlo integration typically has an error variance of the form \(\frac {\sigma^2} {n}\). We get a better answer by sampling with a larger value of n, but the computing time grows with n. Sometimes we can find a way to reduce \(\sigma\) instead. To do this, we construct a new Monte Carlo problem with the same answer as our original one but with a lower\(\sigma\). Methods to do this are known as variance reduction techniques.

Question 1

In this problem, you will implement and investigate a series of variance reduction procedures for Monte Carlo methods by estimating the expected value of a cost function c(x) which depends on a D-dimensional random variable x.

The cost function you will use is:

\[ c(x)=\frac{1}{(2\pi)^{\frac{D}{2}}} e^{\frac {-1} {2} x^T x} \]

where

\[x_i \sim U(-5,5) \text{ for } i=1..D\]

That is, x is a D-dimensional random variable (i.e., a Dx1 column vector) and each component of x is uniformly distributed between -5 and 5. Note that c(x) conveniently assumes the form of a standard multivariate normal, and no correlation exists between the components.

Goal: estimate E[c(x)] - the expected value of c(x) - using Monte Carlo methods and see how it compares to the real value, which you are able to find by hand.

Answer 1

a) Crude Monte Carlo

For sample sizes of 1000 to 10000 (in increments of 1000), obtain 100 estimates for E[c(x)] when D = 1, using crude Monte Carlo sampling. Calculate the average value of the 100 estimates as well as their standard deviation, and plot them. In your plot, include a line showing the analytical value for E[c(x)].

First define the cost function as an R function

cost_func <- function(x, D=1)
{
  c <- exp(-0.5 * t(x) %*% x)
  cost_func_results <- (1 / ((2 * pi)^(D/2))) * c
  return (cost_func_results)
}

Next we will create a matrix with a column for each sample size NN and 100 rows to store estimates from each iteration of the function. We will then apply column-wise functions to find the mean, standard deviation, and coefficient of variance of the 100 estimates for each N. Fisrt we will calculate the analytic values for D=1, 2,4, and 6.

library(knitr)
## Warning: package 'knitr' was built under R version 3.3.1
######
# ref:  http://www.integral-calculator.com/
AnalyticValue_D1 = 0.099999994
AnalyticValue_D2 = 0.03989420516865691
AnalyticValue_D4 = 0.006349359953313988
AnalyticValue_D6 = 0.0010105320220309647
AnalyticValue_D8 = 0001600831166402579

CMC <- function(n, d = 1)
{
  t <- rep(NA, n)
  for(i in 1:n)
  {
    x <- runif(d, -5, 5)
    t[i] <- cost_func(x)
  }
  return (t)
}
#########
MC_loop <- function(d, fun)
{
  CMC.result <- data.frame(mean=c(), stdev=c(), n=c())
  for(n in seq(1000, 20000, by=1000))
  {
    res <- fun(n=n, d=d)
    CMC.result <- rbind(CMC.result, data.frame(mean=mean(res), stdev=sd(res), N=n))
  }
  #CMC.result$EcActual <- (1/10)^d
  CMC.result$CV <- CMC.result$stdev / CMC.result$mean
  return (CMC.result)
}

CMC.D1 <- MC_loop(d=1, fun=CMC)

Letโ€™s show the CMC and visualize the results for D=1

#####
kable(CMC.D1, caption = 'Monte Carlo Estimation D= 1')
Monte Carlo Estimation D= 1
mean stdev N CV
0.0981998 0.1361343 1000 1.386300
0.1059722 0.1391388 2000 1.312975
0.1024486 0.1354279 3000 1.321911
0.0997539 0.1346572 4000 1.349894
0.1000528 0.1348032 5000 1.347321
0.0991720 0.1345330 6000 1.356562
0.0991800 0.1341254 7000 1.352343
0.0986359 0.1336084 8000 1.354561
0.0998855 0.1344078 9000 1.345619
0.0970901 0.1347123 10000 1.387498
0.0993453 0.1340295 11000 1.349127
0.0996995 0.1351484 12000 1.355558
0.1005037 0.1357187 13000 1.350384
0.0994147 0.1345373 14000 1.353294
0.0976972 0.1332197 15000 1.363597
0.0993604 0.1342501 16000 1.351142
0.0998262 0.1350334 17000 1.352685
0.1006510 0.1357473 18000 1.348693
0.1004996 0.1353388 19000 1.346660
0.1007590 0.1350156 20000 1.339986
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
D1 <- ggplot(CMC.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 1") 
D1

Letโ€™s show the CMC and visualize the results for D=2

CMC.D2 <- MC_loop(d=2, fun=CMC)
kable(CMC.D2, caption = 'Monte Carlo Estimation D= 2')
Monte Carlo Estimation D= 2
mean stdev N CV
0.0254051 0.0663428 1000 2.611401
0.0238710 0.0640611 2000 2.683631
0.0256491 0.0666433 3000 2.598267
0.0266108 0.0692082 4000 2.600755
0.0243074 0.0652812 5000 2.685648
0.0246965 0.0666069 6000 2.697016
0.0244762 0.0656951 7000 2.684035
0.0259938 0.0676566 8000 2.602797
0.0250898 0.0666876 9000 2.657958
0.0247470 0.0656335 10000 2.652180
0.0239399 0.0643529 11000 2.688099
0.0255236 0.0671404 12000 2.630526
0.0258621 0.0674592 13000 2.608420
0.0244511 0.0654314 14000 2.676004
0.0251191 0.0661890 15000 2.635004
0.0259076 0.0676854 16000 2.612568
0.0257010 0.0672579 17000 2.616935
0.0247881 0.0655405 18000 2.644033
0.0245196 0.0654146 19000 2.667848
0.0258598 0.0674840 20000 2.609612
library(ggplot2)
D2 <- ggplot(CMC.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 2") 
D2

Letโ€™s show the CMC and visualize the results for D=2

CMC.D4 <- MC_loop(d=4, fun=CMC)
kable(CMC.D4, caption = 'Monte Carlo Estimation D= 4')
Monte Carlo Estimation D= 4
mean stdev N CV
0.0026106 0.0183441 1000 7.026809
0.0014728 0.0108799 2000 7.387433
0.0013510 0.0094267 3000 6.977682
0.0019340 0.0158778 4000 8.209950
0.0018646 0.0142016 5000 7.616306
0.0015641 0.0114340 6000 7.310103
0.0015704 0.0118809 7000 7.565402
0.0016110 0.0123754 8000 7.681689
0.0014636 0.0114820 9000 7.845081
0.0016180 0.0129828 10000 8.024164
0.0017087 0.0137720 11000 8.059947
0.0015932 0.0129686 12000 8.139858
0.0018852 0.0148507 13000 7.877676
0.0016631 0.0128511 14000 7.727118
0.0015141 0.0121979 15000 8.056110
0.0015022 0.0116179 16000 7.734011
0.0017511 0.0132175 17000 7.548127
0.0015433 0.0117087 18000 7.586978
0.0016732 0.0124422 19000 7.435980
0.0015759 0.0124120 20000 7.876149
library(ggplot2)
D4 <- ggplot(CMC.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 4") 
D4

CMC.D6 <- MC_loop(d=6, fun=CMC)
kable(CMC.D6, caption = 'Monte Carlo Estimation D= 6')
Monte Carlo Estimation D= 6
mean stdev N CV
0.0000835 0.0014630 1000 17.53040
0.0000821 0.0013203 2000 16.08263
0.0000868 0.0016866 3000 19.44090
0.0001078 0.0018750 4000 17.40077
0.0000922 0.0016049 5000 17.40323
0.0001011 0.0025173 6000 24.90937
0.0000564 0.0008839 7000 15.66245
0.0000932 0.0020616 8000 22.12910
0.0001159 0.0026584 9000 22.94546
0.0001003 0.0021627 10000 21.56102
0.0000978 0.0020746 11000 21.21784
0.0000819 0.0016432 12000 20.07264
0.0000875 0.0018530 13000 21.18115
0.0001373 0.0030473 14000 22.20084
0.0001270 0.0023447 15000 18.46454
0.0000958 0.0020694 16000 21.61016
0.0000865 0.0016872 17000 19.50311
0.0000625 0.0010298 18000 16.48372
0.0001457 0.0031924 19000 21.91200
0.0001085 0.0023251 20000 21.42078
library(ggplot2)
D6 <- ggplot(CMC.D6) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D6), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 6") 
D6

As we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value)..

b) Quasi-Random Numbers

We now investigate the variance reduction properties of quasi-random numbers using Sobol numbers. To see how these Sobol numbers differ from normal random numbers, generate and plot 100 pairs of both. Comment on the differences.

First letโ€™s use runif to generate m = 100 random numbers and plots them:

n <- 100
uniFrom <- as.data.frame(matrix(runif(n * 2), ncol=2))
colnames(uniFrom) <- c("x", "y")

unif <- ggplot(data=as.data.frame(uniFrom)) + 
  geom_point(aes(x=x, y=y)) + 
  labs(title="Uniform for  n=100 x 2") 
unif

Next we wil use sobol to generate n = 100 random numbers and plots them.

library(randtoolbox)
## Warning: package 'randtoolbox' was built under R version 3.3.1
## Loading required package: rngWELL
## Warning: package 'rngWELL' was built under R version 3.3.1
## This is randtoolbox. For overview, type 'help("randtoolbox")'.
sobol_Rnum <- as.data.frame(sobol(n, d=2))
colnames(sobol_Rnum) <- c("x", "y")

sobol_p2 <- ggplot(data=as.data.frame(sobol_Rnum)) + 
  geom_point(aes(x=x, y=y)) + 
  labs(title="Sobol Quasi-Random Numbers n=100 x 2 ") 
sobol_p2 

# Sobol Monte Carlo function
sobolMC <- function(n, d = 1, init=TRUE) 
{ 
  # Need a loop in here
  t <- rep(NA, n)
  for(i in 1:n)
  {
    x <- as.vector(sobol(n=1, d=d, init=init))
    x <- -5 + (x * 10)
    t[i] <- cost_func(x)
    init <- FALSE # turn off the init. We don't want to re-initialize every call
  }
  return (t)
}

#sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
#sobolMc.D1

Letโ€™s show the Sobol Monte Carlo and visualize the results for D=1

sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
kable(sobolMc.D1, caption = 'Sobol Monte Carlo Estimation D= 1')
Sobol Monte Carlo Estimation D= 1
mean stdev N CV
0.1000078 0.1350043 1000 1.349938
0.1000000 0.1349762 2000 1.349762
0.1000002 0.1349645 3000 1.349642
0.0999999 0.1349594 4000 1.349595
0.1000007 0.1349560 5000 1.349550
0.1000000 0.1349538 6000 1.349538
0.1000003 0.1349523 7000 1.349519
0.0999999 0.1349510 8000 1.349511
0.1000014 0.1349487 9000 1.349468
0.1000000 0.1349493 10000 1.349493
0.1000000 0.1349484 11000 1.349484
0.0999999 0.1349482 12000 1.349482
0.1000004 0.1349475 13000 1.349470
0.0999999 0.1349474 14000 1.349474
0.1000001 0.1349471 15000 1.349470
0.0999999 0.1349468 16000 1.349468
0.1000005 0.1349460 17000 1.349453
0.1000000 0.1349463 18000 1.349463
0.1000000 0.1349460 19000 1.349460
0.0999999 0.1349459 20000 1.349460
library(ggplot2)
DS1 <- ggplot(sobolMc.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 1") 
DS1

Letโ€™s show the Sobol Monte Carlo and visualize the results for D=2

sobolMc.D2 <- MC_loop(d=2, fun=sobolMC)
kable(sobolMc.D2, caption = 'Sobol Monte Carlo Estimation D= 2')
Sobol Monte Carlo Estimation D= 2
mean stdev N CV
0.0252972 0.0670778 1000 2.651591
0.0248737 0.0658396 2000 2.646959
0.0249524 0.0658683 3000 2.639755
0.0250161 0.0659642 4000 2.636868
0.0251183 0.0662144 5000 2.636098
0.0251252 0.0662073 6000 2.635097
0.0250746 0.0661084 7000 2.636471
0.0250514 0.0660554 8000 2.636799
0.0250939 0.0663065 9000 2.642330
0.0251324 0.0663249 10000 2.639022
0.0250553 0.0661434 11000 2.639900
0.0251099 0.0663046 12000 2.640572
0.0250408 0.0659899 13000 2.635296
0.0250999 0.0661518 14000 2.635543
0.0250528 0.0660462 15000 2.636278
0.0250648 0.0660833 16000 2.636496
0.0250663 0.0661090 17000 2.637361
0.0250603 0.0661020 18000 2.637718
0.0250577 0.0661014 19000 2.637966
0.0250991 0.0662334 20000 2.638879
library(ggplot2)
DS2 <- ggplot(sobolMc.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 2") 
DS2

Letโ€™s show the Sobol Monte Carlo and visualize the results for D=4

sobolMc.D4 <- MC_loop(d=4, fun=sobolMC)
kable(sobolMc.D4, caption = 'Sobol Monte Carlo Estimation D= 4')
Sobol Monte Carlo Estimation D= 4
mean stdev N CV
0.0020512 0.0185322 1000 9.034808
0.0019187 0.0167684 2000 8.739384
0.0018435 0.0157903 3000 8.565195
0.0019194 0.0165702 4000 8.633011
0.0017874 0.0152083 5000 8.508425
0.0016878 0.0141846 6000 8.404088
0.0016170 0.0133921 7000 8.282330
0.0015693 0.0127941 8000 8.152662
0.0015889 0.0129440 9000 8.146647
0.0016370 0.0133957 10000 8.183101
0.0016648 0.0137198 11000 8.240937
0.0016845 0.0139399 12000 8.275303
0.0016590 0.0136365 13000 8.219491
0.0016375 0.0133230 14000 8.135959
0.0016169 0.0130736 15000 8.085829
0.0015945 0.0127932 16000 8.023545
0.0015681 0.0125222 17000 7.985796
0.0015569 0.0123451 18000 7.929454
0.0015347 0.0121085 19000 7.889822
0.0015302 0.0119699 20000 7.822379
library(ggplot2)
DS4 <- ggplot(sobolMc.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 4") 
DS4

Again, as we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value)โ€ฆ

c) Antithetic Variates

A simple method that is sometimes used for reducing the variance of the estimate is to induce negative correlation between random draws. When the random variable being generated is uniform, or is a function of a uniform random variable, then one can induce negative correlation between the draws using the following simple procedure:

First letโ€™s create our antithetic Monte Carlo function

##############################
antithetic_MC <- function(n,  d = 1)
{ 
  t <- rep(NA, n)
  for(i in 1:n)
  {
    x <- runif(d, -5, 5)
    x2 <- 1 - x
    t[i] <- (cost_func(x) + cost_func(x2) ) / 2
  }
  return (t)
}


###############################

Letโ€™s show the antithetic Monte Carlo and visualize the results for D=1

antithetic_MC.D1 <- MC_loop(d=1, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D= 1')
Antithetic Monte Carlo Estimation D= 1
mean stdev N CV
0.0946337 0.1213087 1000 1.281877
0.0965587 0.1200370 2000 1.243150
0.0997019 0.1232915 3000 1.236602
0.0992320 0.1226721 4000 1.236216
0.0972665 0.1217437 5000 1.251651
0.0990116 0.1219937 6000 1.232115
0.0993223 0.1219714 7000 1.228037
0.1006994 0.1245233 8000 1.236584
0.1003271 0.1225045 9000 1.221051
0.0994185 0.1228737 10000 1.235924
0.1003738 0.1227318 11000 1.222747
0.0993619 0.1217504 12000 1.225322
0.0988115 0.1220463 13000 1.235142
0.1000180 0.1229179 14000 1.228958
0.0995190 0.1228303 15000 1.234240
0.0989344 0.1220141 16000 1.233283
0.1008799 0.1226262 17000 1.215567
0.0991722 0.1224676 18000 1.234898
0.0994139 0.1227754 19000 1.234992
0.0991846 0.1228981 20000 1.239085
library(ggplot2)
AMC1 <- ggplot(antithetic_MC.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D= 1") 
AMC1

Letโ€™s show the antithetic Monte Carlo and visualize the results for D=2

antithetic_MC.D2 <- MC_loop(d=2, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D= 2')
Antithetic Monte Carlo Estimation D= 2
mean stdev N CV
0.0946337 0.1213087 1000 1.281877
0.0965587 0.1200370 2000 1.243150
0.0997019 0.1232915 3000 1.236602
0.0992320 0.1226721 4000 1.236216
0.0972665 0.1217437 5000 1.251651
0.0990116 0.1219937 6000 1.232115
0.0993223 0.1219714 7000 1.228037
0.1006994 0.1245233 8000 1.236584
0.1003271 0.1225045 9000 1.221051
0.0994185 0.1228737 10000 1.235924
0.1003738 0.1227318 11000 1.222747
0.0993619 0.1217504 12000 1.225322
0.0988115 0.1220463 13000 1.235142
0.1000180 0.1229179 14000 1.228958
0.0995190 0.1228303 15000 1.234240
0.0989344 0.1220141 16000 1.233283
0.1008799 0.1226262 17000 1.215567
0.0991722 0.1224676 18000 1.234898
0.0994139 0.1227754 19000 1.234992
0.0991846 0.1228981 20000 1.239085
library(ggplot2)
AMC2 <- ggplot(antithetic_MC.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D= 2") 
AMC2

Letโ€™s show the antithetic Monte Carlo and visualize the results for D=4

antithetic_MC.D4 <- MC_loop(d=4, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D = 4')
Antithetic Monte Carlo Estimation D = 4
mean stdev N CV
0.0946337 0.1213087 1000 1.281877
0.0965587 0.1200370 2000 1.243150
0.0997019 0.1232915 3000 1.236602
0.0992320 0.1226721 4000 1.236216
0.0972665 0.1217437 5000 1.251651
0.0990116 0.1219937 6000 1.232115
0.0993223 0.1219714 7000 1.228037
0.1006994 0.1245233 8000 1.236584
0.1003271 0.1225045 9000 1.221051
0.0994185 0.1228737 10000 1.235924
0.1003738 0.1227318 11000 1.222747
0.0993619 0.1217504 12000 1.225322
0.0988115 0.1220463 13000 1.235142
0.1000180 0.1229179 14000 1.228958
0.0995190 0.1228303 15000 1.234240
0.0989344 0.1220141 16000 1.233283
0.1008799 0.1226262 17000 1.215567
0.0991722 0.1224676 18000 1.234898
0.0994139 0.1227754 19000 1.234992
0.0991846 0.1228981 20000 1.239085
library(ggplot2)
AMC4 <- ggplot(antithetic_MC.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D = 4") 
AMC4

Again, as we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value).

d) Latin Hypercube Sampling

Latin Hypercube Sampling is a variance reduction procedure that is based on the idea of stratified sampling. In this approach, you divide the interval into K equally probable subintervals, and generate a uniform sample from within each subinterval.

latinHypercube_MC <- function(n, d = 1, K=10)
{
  min = -5
  max = 5
  size <- (max - min) / K
  samples <- n / K
  
  Y <- rep(NA, samples)
  for(i in 1:samples)
  {
    # Loop through 
    vec <- rep(NA, K)
    for(k in 1:K)
    {
      unif <- runif(d,
                    min + ((k - 1) *size),
                    min + (k *size))
      vec[k] <- cost_func(unif)
    }
    Y[i] <- mean(vec)
  }
  return (Y)
}

Letโ€™s show the Latin Hypercube Sampling and visualize the results for D=1

latinHypercubeMc.D1 <- MC_loop(d=1, fun=latinHypercube_MC)
kable(latinHypercubeMc.D1,caption = ' Latin Hypercube Sampling Estimation D = 1'  )
Latin Hypercube Sampling Estimation D = 1
mean stdev N CV
0.1011382 0.0113785 1000 0.1125045
0.1012569 0.0107048 2000 0.1057191
0.0995613 0.0106339 3000 0.1068072
0.1001634 0.0103042 4000 0.1028741
0.1002135 0.0101784 5000 0.1015672
0.0992301 0.0102730 6000 0.1035272
0.0999507 0.0099905 7000 0.0999546
0.1000021 0.0101731 8000 0.1017285
0.0994530 0.0104530 9000 0.1051048
0.0997668 0.0104803 10000 0.1050478
0.1002104 0.0104709 11000 0.1044891
0.1002008 0.0105730 12000 0.1055186
0.1000000 0.0103324 13000 0.1033240
0.0997822 0.0107929 14000 0.1081647
0.0998053 0.0104872 15000 0.1050769
0.0999606 0.0105485 16000 0.1055267
0.1003504 0.0108123 17000 0.1077458
0.1002694 0.0106498 18000 0.1062119
0.0999437 0.0108851 19000 0.1089127
0.1001995 0.0103313 20000 0.1031070
library(ggplot2)
LHMC1 <- ggplot(latinHypercubeMc.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 1") 
LHMC1

Letโ€™s show the Latin Hypercube Sampling and visualize the results for D=2

latinHypercubeMc.D2 <- MC_loop(d=2, fun=latinHypercube_MC)
kable(latinHypercubeMc.D2,caption = ' Latin Hypercube Sampling Estimation D = 2')
Latin Hypercube Sampling Estimation D = 2
mean stdev N CV
0.0663606 0.0086655 1000 0.1305821
0.0682323 0.0093198 2000 0.1365892
0.0680283 0.0090032 3000 0.1323444
0.0684487 0.0096270 4000 0.1406460
0.0678750 0.0090206 5000 0.1328999
0.0679240 0.0093050 6000 0.1369914
0.0681422 0.0091867 7000 0.1348164
0.0676344 0.0091002 8000 0.1345498
0.0684262 0.0089636 9000 0.1309969
0.0680151 0.0089289 10000 0.1312775
0.0680866 0.0093895 11000 0.1379056
0.0680275 0.0093537 12000 0.1374985
0.0681882 0.0093006 13000 0.1363954
0.0678959 0.0092469 14000 0.1361926
0.0676701 0.0090321 15000 0.1334729
0.0680897 0.0091529 16000 0.1344241
0.0677603 0.0092902 17000 0.1371042
0.0681556 0.0093221 18000 0.1367764
0.0681890 0.0092636 19000 0.1358515
0.0676736 0.0094901 20000 0.1402330
library(ggplot2)
LHMC2 <- ggplot(latinHypercubeMc.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 2") 
LHMC2

Letโ€™s show the Latin Hypercube Sampling and visualize the results for D=4

latinHypercubeMc.D4 <- MC_loop(d=4, fun=latinHypercube_MC)
kable(latinHypercubeMc.D4,caption = ' Latin Hypercube Sampling Estimation D = 4'  )
Latin Hypercube Sampling Estimation D = 4
mean stdev N CV
0.0428856 0.0083432 1000 0.1945454
0.0438387 0.0081845 2000 0.1866956
0.0436227 0.0085193 3000 0.1952946
0.0430651 0.0084867 4000 0.1970675
0.0436693 0.0087182 5000 0.1996412
0.0442685 0.0087111 6000 0.1967796
0.0433512 0.0088641 7000 0.2044717
0.0442068 0.0088902 8000 0.2011053
0.0440514 0.0090101 9000 0.2045351
0.0441966 0.0088146 10000 0.1994412
0.0436382 0.0085834 11000 0.1966942
0.0437254 0.0087255 12000 0.1995517
0.0438595 0.0087161 13000 0.1987280
0.0439454 0.0090017 14000 0.2048393
0.0436612 0.0084336 15000 0.1931607
0.0437465 0.0086988 16000 0.1988446
0.0438383 0.0085813 17000 0.1957485
0.0438346 0.0086257 18000 0.1967789
0.0437257 0.0085573 19000 0.1957031
0.0441372 0.0087772 20000 0.1988615
library(ggplot2)
LHMC4 <- ggplot(latinHypercubeMc.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 4") 
LHMC4

e) Importance Sampling

Recallโ€ฆ The goal of importance sampling is to estimate E[c(x)] by choosing a second density q(x) such that:

\[E_p[c(x)] = \int c(x) p(x) dx\] \[E_p[c(x)] = \int c(x) p(x) \frac {q(x)} {q(x)} dx\]

\[E_p[c(x)] = \int c(x) q(x) w(x) dx\]

\[q(x)=\frac{c(x)w(x)}{E_p[c(x)]}\]

\[w(x)=\frac{c(x)}{q(x)}\]

Next, we wil create our functions p(x), w(x), and c(x)

#################################################
qx <- function(x, d)
{ min=-5
  max=5
  cx <- cost_func(x)
  Ecx <- (1/10)^d 
  res <- (cx %*% t(px(x))) / Ecx
  return(res)
}
px <- function(x)
{
  return (dunif(x))
}
wx <- function(x,d)
{
  w <- px(x[1]) / qx(x[1], d)
  return(w)
}

############
# Helper function
helpeMethod <- function(fun)
{
  min=-5
  max=5
  M <- 2
  accepted <- FALSE
  while(!accepted)
  {
    # Get a random value from uniform distrubtion (g(x) for us)
    r <- runif(1, min, max)
    # Sample x from g(x) and u from U(0,1) (the uniform distribution over the unit interval)
    u <- runif(1, 0, 1)
    gx <- dunif(r, min, max)
    # Check whether or not u<f(x)/Mg(x).
    if(u < fun(r, 1) / (M * gx))
    {
      accepted = TRUE
    }
  }
  return(r)
}
importance_MC <- function(n, d = 1)
{ 
  min=-5
  max =5
  theta.hat <- rep(NA, n)
  for(i in 1:n)
  {
    x <- rep(NA, d)
    for(j in 1:d)
    {
      x[j] <- helpeMethod(qx)
    }
    xw <- wx(x, d)
    theta.hat[i] <- cost_func(x) * xw
  }
  return (theta.hat)
}

Letโ€™s show the Importance Sampling and visualize the results for D=1

importance_MC.D1 <- MC_loop(d=1, fun=importance_MC)

kable(importance_MC.D1,caption = 'Importance Sampling Estimation D = 1'  )
Importance Sampling Estimation D = 1
mean stdev N CV
0.1 0 1000 0
0.1 0 2000 0
0.1 0 3000 0
0.1 0 4000 0
0.1 0 5000 0
0.1 0 6000 0
0.1 0 7000 0
0.1 0 8000 0
0.1 0 9000 0
0.1 0 10000 0
0.1 0 11000 0
0.1 0 12000 0
0.1 0 13000 0
0.1 0 14000 0
0.1 0 15000 0
0.1 0 16000 0
0.1 0 17000 0
0.1 0 18000 0
0.1 0 19000 0
0.1 0 20000 0
AnalyticValue_D1=1/10
library(ggplot2)
IMPD1 <- ggplot(importance_MC.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 1") 
IMPD1

Letโ€™s show the Importance Sampling and visualize the results for D=2

importance_MC.D2 <- MC_loop(d=2, fun=importance_MC)

kable(importance_MC.D2,caption = 'Importance Sampling Estimation D = 2'  )
Importance Sampling Estimation D = 2
mean stdev N CV
0.0085731 0.0012143 1000 0.1416426
0.0085592 0.0012215 2000 0.1427140
0.0085616 0.0012298 3000 0.1436388
0.0085488 0.0012173 4000 0.1423922
0.0085300 0.0012166 5000 0.1426225
0.0085795 0.0012070 6000 0.1406798
0.0085678 0.0012063 7000 0.1407903
0.0085535 0.0012036 8000 0.1407123
0.0085472 0.0012130 9000 0.1419129
0.0085516 0.0012178 10000 0.1424017
0.0085543 0.0012113 11000 0.1415963
0.0085606 0.0012117 12000 0.1415460
0.0085605 0.0012179 13000 0.1422706
0.0085579 0.0012129 14000 0.1417326
0.0085622 0.0012100 15000 0.1413210
0.0085481 0.0012150 16000 0.1421339
0.0085614 0.0012087 17000 0.1411845
0.0085495 0.0012176 18000 0.1424133
0.0085573 0.0012178 19000 0.1423064
0.0085716 0.0012099 20000 0.1411526
AnalyticValue_D2=0.03989420516865691
library(ggplot2)
IMPD2 <- ggplot(importance_MC.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 2") 
IMPD2

Letโ€™s show the Importance Sampling and visualize the results for D=4

importance_MC.D4 <- MC_loop(d=4, fun=importance_MC)

kable(importance_MC.D4,caption = 'Importance Sampling Estimation D = 4'  )
Importance Sampling Estimation D = 4
mean stdev N CV
6.23e-05 1.51e-05 1000 0.2427272
6.27e-05 1.54e-05 2000 0.2447866
6.31e-05 1.56e-05 3000 0.2479409
6.27e-05 1.55e-05 4000 0.2479586
6.25e-05 1.56e-05 5000 0.2494539
6.26e-05 1.55e-05 6000 0.2474235
6.27e-05 1.55e-05 7000 0.2474750
6.25e-05 1.57e-05 8000 0.2514339
6.27e-05 1.54e-05 9000 0.2457493
6.29e-05 1.57e-05 10000 0.2491272
6.27e-05 1.55e-05 11000 0.2476801
6.24e-05 1.54e-05 12000 0.2469352
6.24e-05 1.56e-05 13000 0.2500206
6.27e-05 1.55e-05 14000 0.2468886
6.29e-05 1.54e-05 15000 0.2443543
6.27e-05 1.56e-05 16000 0.2482902
6.28e-05 1.56e-05 17000 0.2480723
6.25e-05 1.56e-05 18000 0.2489033
6.24e-05 1.55e-05 19000 0.2482173
6.26e-05 1.56e-05 20000 0.2491884
AnalyticValue_D2=0.03989420516865691
library(ggplot2)
IMPD4 <- ggplot(importance_MC.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 4") 
IMPD4

Summary:The below visualization shows the variance reduction with each of the methods as follow:Crude Monte Carlo = blue, Sobol =black, Antithetic = yellow, Latin Hypercube = red, Importance = brown, Actual = green.

We can clearly see that Sobol and Importance methods had estimated the mean very well. The Crude Monte Carlo and Antithetic had performed poorly.

CMC.D1 <- MC_loop(d=1, fun=CMC)
sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
antithetic_MC.D1 <- MC_loop(d=1, fun=antithetic_MC)
latinHypercubeMc.D1 <- MC_loop(d=1, fun=latinHypercube_MC)
importance_MC.D1 <- MC_loop(d=1, fun=importance_MC)

summary_res<- data.frame('N'= CMC.D1$N,
                      'CMC'= CMC.D1$mean,
                     'Sobol' = sobolMc.D1$mean,
                     'Antithetic' = antithetic_MC.D1$mean,
                     'LatinHypercube' = latinHypercubeMc.D1$mean,
                     'Importance' = importance_MC.D1$mean,
                      'Actual' = AnalyticValue_D1
)
N=n
library(ggplot2)

summary_res1 <- ggplot(summary_res) + 
  geom_line(aes(x=N, y=CMC), colour="blue") + 
  geom_point(aes(x=N, y=Sobol), colour="black") +
  geom_line(aes(x=N, y=Antithetic), colour="yellow") + 
  geom_line(aes(x=N, y=LatinHypercube), colour="red") + 
  geom_point(aes(x=N, y=Importance), colour="brown") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean", title="All Methods Estimations Summary at D = 1") 
summary_res1

Question 6.3

Plot the power curves for the t-test in Example 6.9 for samples sizes 10, 20, 30, 40 and 50. but omit the standard error bars. Plot the curves on the same graph, each in a different color or different line type, and include a legend. Comment on the relation between power and sample size.

Answer 6.3

################################
library(ggplot2)
N<- c(10, 20, 30, 40, 50)
m <- 1000
mu0 <- 500
sigma <- 100
mu <- c(seq(450, 650, 10)) #alternatives
M <- length(mu)
df <- data.frame(Sample_size=c(), mu=c(), power=c())
for(n in N)
{
  for(i in 1:M)
  {
    mu1 <- mu[i]
    pvalues <- replicate(m, expr={
      # simulate under alternative mu
      x <- rnorm(n, mean=mu1, sd=sigma)
      ttest <- t.test(x, alternative="greater", mu=mu0)
      ttest$p.value
    })
    df <- rbind(df, data.frame(Sample_size=n, mu=mu1, power=mean(pvalues <= 0.05)))
  }
}

df$Sample_size <- as.factor(df$Sample_size)
p63 <- ggplot(df) + 
  geom_path(aes(x=mu, y=power, colour=Sample_size)) + 
  labs(title="Power Curves", x="mu alternatives", y='Powers' )
p63

The power curve shown above we observed that the empirical power is small when \(\theta\) is close to \(\theta_0\)=500, and increasing as \(\theta\) moves further away from \(\theta_0\), approaching 1 as \(\theta\to\infty\)

Question 6.4

Suppose the \((X_1,...,X_n)\) are a random sample from a lognormal distribution with unknown parameters. Construct a 95% confidence interval for the parameter \(\mu\). Use a Monte Carlo method to obtain an empirical estimate of the confidence level.

Answer 6.4

We usually use \(\bar X\) = \(\frac {X_1+...+X_n} n\) to estimate the parameter ยต, and, as the above shows, we define the pivot as:

\[h(X_1,...,X_n, \mu) = \frac {\bar X - \mu} {\frac{\sigma}{\sqrt{n}}}\] and \[ h(X_1,...,X_n, \mu) \sim N(0,1)\] Therefore: \[P\left( z(\frac \alpha 2) \le \frac {\bar X - \mu} {\frac{\sigma}{\sqrt{n}}} \le z(1 - \frac \alpha 2) \right) = .95\]

Now let use the random log normal variables to estimate the \(\mu\) parameter given 95% confidence level.

n <- 2500
mu <- 1
sigma <- 0.5
alpha <- 0.05
res <- replicate(m, expr={
  x <- rlnorm(n, meanlog=mu, sdlog=sigma)
  log.mean <- mean(log(x))
  log.sd <- sd(log(x))
  log.se <- log.sd / sqrt(n)
  t.val <- qt(1 - alpha / 2, n - 1)
  CI <- c(log.mean, log.mean - (t.val * log.se), log.mean + (t.val * log.se))
  CI
 
})
theCI <- c(mean(res[2,]), mean(res[3,]))
theCI
## [1] 0.9800012 1.0192280
theCL <- mean(res[2,] < (0.999) & (1.001) < res[3,])
theCL
## [1] 0.942

Question 7.1

Compute the jackknife esimate of the bias and the standard error of the correlation statistic in Example 7.2

Answer 7.1

Jackknife estimate of bias:

############################
library(bootstrap)
## Warning: package 'bootstrap' was built under R version 3.3.1
n <- nrow(law)
lsat <- law$LSAT
gpa <- law$GPA
theta.hat <- mean(lsat) / mean(gpa)
print (theta.hat)
## [1] 1.939681
################
theta.jack <- numeric(n)
for (i in 1:n)
  theta.jack[i] <- mean(lsat[-i]) / mean(gpa[-i])
bias <- (n - 1) * (mean(theta.jack) - theta.hat)
print(bias) #jackknife estimate of bias
## [1] 0.0002506089

Jackkknife estimate of standard error:

se <- sqrt((n-1) * mean((theta.jack - mean(theta.jack))^2))
print(se)
## [1] 0.02525517

Question 7.4

Refer to the air-conditioning data set aircondit provided in the boot package. The 12 observations are the times in hours between failures of air-conditioning equipment. [63, Example 1.1]: \[3,5,7,18,43,85,91,98,100,130,230,487\]

Assume that the times between failures follow an exponential model \(Exp(\lambda)\). Obtain the MLE of the hazard rate \(\lambda\) and use bootstrap to estimate the bias and standard error of the estimate.

Answer 7.4

The bootstrap estimate for the standard error of the estimate is:

library(boot)

B <- 200
n <- 12
R <- numeric(B)
for (b in 1:B) {
  i <- sample(1:n, size = n, replace = TRUE)
  hours <- aircondit$hours[i]
  R[b] <- length(hours)/sum(hours)
}
print(se.R <- sd(R))
## [1] 0.003830035

The bootstrap estimate for the bias of the estimate is:

theta.hat <- 0.00925212
B <- 2000 
n <- 12
theta.b <- numeric(B)
for (b in 1:B) {
i <- sample(1:n, size = n, replace = TRUE)
hours <- aircondit$hours[i]
theta.b[b] <- length(hours)/sum(hours)
}
bias <- mean(theta.b - theta.hat)
bias
## [1] 0.001407882