Simulation: Variance Reduction Methods

Purpose:

I just want to recall the reasons behind variance reduction techniques. Monte Carlo integration typically has an error variance of the form \(\frac {\sigma^2} {n}\). We get a better answer by sampling with a larger value of n, but the computing time grows with n. Sometimes we can find a way to reduce \(\sigma\) instead. To do this, we construct a new Monte Carlo problem with the same answer as our original one but with a lower\(\sigma\). Methods to do this are known as variance reduction techniques.

Question 1

In this problem, you will implement and investigate a series of variance reduction procedures for Monte Carlo methods by estimating the expected value of a cost function c(x) which depends on a D-dimensional random variable x.

The cost function you will use is:

\[ c(x)=\frac{1}{(2\pi)^{\frac{D}{2}}} e^{\frac {-1} {2} x^T x} \]

where

\[x_i \sim U(-5,5) \text{ for } i=1..D\]

That is, x is a D-dimensional random variable (i.e., a Dx1 column vector) and each component of x is uniformly distributed between -5 and 5. Note that c(x) conveniently assumes the form of a standard multivariate normal, and no correlation exists between the components.

Goal: estimate E[c(x)] - the expected value of c(x) - using Monte Carlo methods and see how it compares to the real value, which you are able to find by hand.

Answer 1

a) Crude Monte Carlo

For sample sizes of 1000 to 10000 (in increments of 1000), obtain 100 estimates for E[c(x)] when D = 1, using crude Monte Carlo sampling. Calculate the average value of the 100 estimates as well as their standard deviation, and plot them. In your plot, include a line showing the analytical value for E[c(x)].

First define the cost function as an R function

cost_func <- function(x, D=1)
{
  c <- exp(-0.5 * t(x) %*% x)
  cost_func_results <- (1 / ((2 * pi)^(D/2))) * c
  return (cost_func_results)
}

Next we will create a matrix with a column for each sample size NN and 100 rows to store estimates from each iteration of the function. We will then apply column-wise functions to find the mean, standard deviation, and coefficient of variance of the 100 estimates for each N. Fisrt we will calculate the analytic values for D=1, 2,4, and 6.

library(knitr)

## Warning: package 'knitr' was built under R version 3.3.1

######
# ref:  http://www.integral-calculator.com/
AnalyticValue_D1 = 0.099999994
AnalyticValue_D2 = 0.03989420516865691
AnalyticValue_D4 = 0.006349359953313988
AnalyticValue_D6 = 0.0010105320220309647
AnalyticValue_D8 = 0001600831166402579

CMC <- function(n, d = 1)
{
  t <- rep(NA, n)
  for(i in 1:n)
  {
    x <- runif(d, -5, 5)
    t[i] <- cost_func(x)
  }
  return (t)
}
#########
MC_loop <- function(d, fun)
{
  CMC.result <- data.frame(mean=c(), stdev=c(), n=c())
  for(n in seq(1000, 20000, by=1000))
  {
    res <- fun(n=n, d=d)
    CMC.result <- rbind(CMC.result, data.frame(mean=mean(res), stdev=sd(res), N=n))
  }
  #CMC.result$EcActual <- (1/10)^d
  CMC.result$CV <- CMC.result$stdev / CMC.result$mean
  return (CMC.result)
}

CMC.D1 <- MC_loop(d=1, fun=CMC)

Let’s show the CMC and visualize the results for D=1

#####
kable(CMC.D1, caption = 'Monte Carlo Estimation D= 1')

Monte Carlo Estimation D= 1
mean	stdev	N	CV
0.0981998	0.1361343	1000	1.386300
0.1059722	0.1391388	2000	1.312975
0.1024486	0.1354279	3000	1.321911
0.0997539	0.1346572	4000	1.349894
0.1000528	0.1348032	5000	1.347321
0.0991720	0.1345330	6000	1.356562
0.0991800	0.1341254	7000	1.352343
0.0986359	0.1336084	8000	1.354561
0.0998855	0.1344078	9000	1.345619
0.0970901	0.1347123	10000	1.387498
0.0993453	0.1340295	11000	1.349127
0.0996995	0.1351484	12000	1.355558
0.1005037	0.1357187	13000	1.350384
0.0994147	0.1345373	14000	1.353294
0.0976972	0.1332197	15000	1.363597
0.0993604	0.1342501	16000	1.351142
0.0998262	0.1350334	17000	1.352685
0.1006510	0.1357473	18000	1.348693
0.1004996	0.1353388	19000	1.346660
0.1007590	0.1350156	20000	1.339986

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.2

D1 <- ggplot(CMC.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 1") 
D1

Let’s show the CMC and visualize the results for D=2

CMC.D2 <- MC_loop(d=2, fun=CMC)
kable(CMC.D2, caption = 'Monte Carlo Estimation D= 2')

Monte Carlo Estimation D= 2
mean	stdev	N	CV
0.0254051	0.0663428	1000	2.611401
0.0238710	0.0640611	2000	2.683631
0.0256491	0.0666433	3000	2.598267
0.0266108	0.0692082	4000	2.600755
0.0243074	0.0652812	5000	2.685648
0.0246965	0.0666069	6000	2.697016
0.0244762	0.0656951	7000	2.684035
0.0259938	0.0676566	8000	2.602797
0.0250898	0.0666876	9000	2.657958
0.0247470	0.0656335	10000	2.652180
0.0239399	0.0643529	11000	2.688099
0.0255236	0.0671404	12000	2.630526
0.0258621	0.0674592	13000	2.608420
0.0244511	0.0654314	14000	2.676004
0.0251191	0.0661890	15000	2.635004
0.0259076	0.0676854	16000	2.612568
0.0257010	0.0672579	17000	2.616935
0.0247881	0.0655405	18000	2.644033
0.0245196	0.0654146	19000	2.667848
0.0258598	0.0674840	20000	2.609612

library(ggplot2)
D2 <- ggplot(CMC.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 2") 
D2

Let’s show the CMC and visualize the results for D=2

CMC.D4 <- MC_loop(d=4, fun=CMC)
kable(CMC.D4, caption = 'Monte Carlo Estimation D= 4')

Monte Carlo Estimation D= 4
mean	stdev	N	CV
0.0026106	0.0183441	1000	7.026809
0.0014728	0.0108799	2000	7.387433
0.0013510	0.0094267	3000	6.977682
0.0019340	0.0158778	4000	8.209950
0.0018646	0.0142016	5000	7.616306
0.0015641	0.0114340	6000	7.310103
0.0015704	0.0118809	7000	7.565402
0.0016110	0.0123754	8000	7.681689
0.0014636	0.0114820	9000	7.845081
0.0016180	0.0129828	10000	8.024164
0.0017087	0.0137720	11000	8.059947
0.0015932	0.0129686	12000	8.139858
0.0018852	0.0148507	13000	7.877676
0.0016631	0.0128511	14000	7.727118
0.0015141	0.0121979	15000	8.056110
0.0015022	0.0116179	16000	7.734011
0.0017511	0.0132175	17000	7.548127
0.0015433	0.0117087	18000	7.586978
0.0016732	0.0124422	19000	7.435980
0.0015759	0.0124120	20000	7.876149

library(ggplot2)
D4 <- ggplot(CMC.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 4") 
D4

CMC.D6 <- MC_loop(d=6, fun=CMC)
kable(CMC.D6, caption = 'Monte Carlo Estimation D= 6')

Monte Carlo Estimation D= 6
mean	stdev	N	CV
0.0000835	0.0014630	1000	17.53040
0.0000821	0.0013203	2000	16.08263
0.0000868	0.0016866	3000	19.44090
0.0001078	0.0018750	4000	17.40077
0.0000922	0.0016049	5000	17.40323
0.0001011	0.0025173	6000	24.90937
0.0000564	0.0008839	7000	15.66245
0.0000932	0.0020616	8000	22.12910
0.0001159	0.0026584	9000	22.94546
0.0001003	0.0021627	10000	21.56102
0.0000978	0.0020746	11000	21.21784
0.0000819	0.0016432	12000	20.07264
0.0000875	0.0018530	13000	21.18115
0.0001373	0.0030473	14000	22.20084
0.0001270	0.0023447	15000	18.46454
0.0000958	0.0020694	16000	21.61016
0.0000865	0.0016872	17000	19.50311
0.0000625	0.0010298	18000	16.48372
0.0001457	0.0031924	19000	21.91200
0.0001085	0.0023251	20000	21.42078

library(ggplot2)
D6 <- ggplot(CMC.D6) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D6), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 6") 
D6

As we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value)..

b) Quasi-Random Numbers

We now investigate the variance reduction properties of quasi-random numbers using Sobol numbers. To see how these Sobol numbers differ from normal random numbers, generate and plot 100 pairs of both. Comment on the differences.

First let’s use runif to generate m = 100 random numbers and plots them:

n <- 100
uniFrom <- as.data.frame(matrix(runif(n * 2), ncol=2))
colnames(uniFrom) <- c("x", "y")

unif <- ggplot(data=as.data.frame(uniFrom)) + 
  geom_point(aes(x=x, y=y)) + 
  labs(title="Uniform for  n=100 x 2") 
unif

Next we wil use sobol to generate n = 100 random numbers and plots them.

library(randtoolbox)

## Warning: package 'randtoolbox' was built under R version 3.3.1

## Loading required package: rngWELL

## Warning: package 'rngWELL' was built under R version 3.3.1

## This is randtoolbox. For overview, type 'help("randtoolbox")'.

sobol_Rnum <- as.data.frame(sobol(n, d=2))
colnames(sobol_Rnum) <- c("x", "y")

sobol_p2 <- ggplot(data=as.data.frame(sobol_Rnum)) + 
  geom_point(aes(x=x, y=y)) + 
  labs(title="Sobol Quasi-Random Numbers n=100 x 2 ") 
sobol_p2

# Sobol Monte Carlo function
sobolMC <- function(n, d = 1, init=TRUE) 
{ 
  # Need a loop in here
  t <- rep(NA, n)
  for(i in 1:n)
  {
    x <- as.vector(sobol(n=1, d=d, init=init))
    x <- -5 + (x * 10)
    t[i] <- cost_func(x)
    init <- FALSE # turn off the init. We don't want to re-initialize every call
  }
  return (t)
}

#sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
#sobolMc.D1

Let’s show the Sobol Monte Carlo and visualize the results for D=1

sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
kable(sobolMc.D1, caption = 'Sobol Monte Carlo Estimation D= 1')

Sobol Monte Carlo Estimation D= 1
mean	stdev	N	CV
0.1000078	0.1350043	1000	1.349938
0.1000000	0.1349762	2000	1.349762
0.1000002	0.1349645	3000	1.349642
0.0999999	0.1349594	4000	1.349595
0.1000007	0.1349560	5000	1.349550
0.1000000	0.1349538	6000	1.349538
0.1000003	0.1349523	7000	1.349519
0.0999999	0.1349510	8000	1.349511
0.1000014	0.1349487	9000	1.349468
0.1000000	0.1349493	10000	1.349493
0.1000000	0.1349484	11000	1.349484
0.0999999	0.1349482	12000	1.349482
0.1000004	0.1349475	13000	1.349470
0.0999999	0.1349474	14000	1.349474
0.1000001	0.1349471	15000	1.349470
0.0999999	0.1349468	16000	1.349468
0.1000005	0.1349460	17000	1.349453
0.1000000	0.1349463	18000	1.349463
0.1000000	0.1349460	19000	1.349460
0.0999999	0.1349459	20000	1.349460

library(ggplot2)
DS1 <- ggplot(sobolMc.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 1") 
DS1

Let’s show the Sobol Monte Carlo and visualize the results for D=2

sobolMc.D2 <- MC_loop(d=2, fun=sobolMC)
kable(sobolMc.D2, caption = 'Sobol Monte Carlo Estimation D= 2')

Sobol Monte Carlo Estimation D= 2
mean	stdev	N	CV
0.0252972	0.0670778	1000	2.651591
0.0248737	0.0658396	2000	2.646959
0.0249524	0.0658683	3000	2.639755
0.0250161	0.0659642	4000	2.636868
0.0251183	0.0662144	5000	2.636098
0.0251252	0.0662073	6000	2.635097
0.0250746	0.0661084	7000	2.636471
0.0250514	0.0660554	8000	2.636799
0.0250939	0.0663065	9000	2.642330
0.0251324	0.0663249	10000	2.639022
0.0250553	0.0661434	11000	2.639900
0.0251099	0.0663046	12000	2.640572
0.0250408	0.0659899	13000	2.635296
0.0250999	0.0661518	14000	2.635543
0.0250528	0.0660462	15000	2.636278
0.0250648	0.0660833	16000	2.636496
0.0250663	0.0661090	17000	2.637361
0.0250603	0.0661020	18000	2.637718
0.0250577	0.0661014	19000	2.637966
0.0250991	0.0662334	20000	2.638879

library(ggplot2)
DS2 <- ggplot(sobolMc.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 2") 
DS2

Let’s show the Sobol Monte Carlo and visualize the results for D=4

sobolMc.D4 <- MC_loop(d=4, fun=sobolMC)
kable(sobolMc.D4, caption = 'Sobol Monte Carlo Estimation D= 4')

Sobol Monte Carlo Estimation D= 4
mean	stdev	N	CV
0.0020512	0.0185322	1000	9.034808
0.0019187	0.0167684	2000	8.739384
0.0018435	0.0157903	3000	8.565195
0.0019194	0.0165702	4000	8.633011
0.0017874	0.0152083	5000	8.508425
0.0016878	0.0141846	6000	8.404088
0.0016170	0.0133921	7000	8.282330
0.0015693	0.0127941	8000	8.152662
0.0015889	0.0129440	9000	8.146647
0.0016370	0.0133957	10000	8.183101
0.0016648	0.0137198	11000	8.240937
0.0016845	0.0139399	12000	8.275303
0.0016590	0.0136365	13000	8.219491
0.0016375	0.0133230	14000	8.135959
0.0016169	0.0130736	15000	8.085829
0.0015945	0.0127932	16000	8.023545
0.0015681	0.0125222	17000	7.985796
0.0015569	0.0123451	18000	7.929454
0.0015347	0.0121085	19000	7.889822
0.0015302	0.0119699	20000	7.822379

library(ggplot2)
DS4 <- ggplot(sobolMc.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 4") 
DS4

Again, as we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value)…

c) Antithetic Variates

A simple method that is sometimes used for reducing the variance of the estimate is to induce negative correlation between random draws. When the random variable being generated is uniform, or is a function of a uniform random variable, then one can induce negative correlation between the draws using the following simple procedure:

First let’s create our antithetic Monte Carlo function

##############################
antithetic_MC <- function(n,  d = 1)
{ 
  t <- rep(NA, n)
  for(i in 1:n)
  {
    x <- runif(d, -5, 5)
    x2 <- 1 - x
    t[i] <- (cost_func(x) + cost_func(x2) ) / 2
  }
  return (t)
}


###############################

Let’s show the antithetic Monte Carlo and visualize the results for D=1

antithetic_MC.D1 <- MC_loop(d=1, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D= 1')

Antithetic Monte Carlo Estimation D= 1
mean	stdev	N	CV
0.0946337	0.1213087	1000	1.281877
0.0965587	0.1200370	2000	1.243150
0.0997019	0.1232915	3000	1.236602
0.0992320	0.1226721	4000	1.236216
0.0972665	0.1217437	5000	1.251651
0.0990116	0.1219937	6000	1.232115
0.0993223	0.1219714	7000	1.228037
0.1006994	0.1245233	8000	1.236584
0.1003271	0.1225045	9000	1.221051
0.0994185	0.1228737	10000	1.235924
0.1003738	0.1227318	11000	1.222747
0.0993619	0.1217504	12000	1.225322
0.0988115	0.1220463	13000	1.235142
0.1000180	0.1229179	14000	1.228958
0.0995190	0.1228303	15000	1.234240
0.0989344	0.1220141	16000	1.233283
0.1008799	0.1226262	17000	1.215567
0.0991722	0.1224676	18000	1.234898
0.0994139	0.1227754	19000	1.234992
0.0991846	0.1228981	20000	1.239085

library(ggplot2)
AMC1 <- ggplot(antithetic_MC.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D= 1") 
AMC1

Let’s show the antithetic Monte Carlo and visualize the results for D=2

antithetic_MC.D2 <- MC_loop(d=2, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D= 2')

Antithetic Monte Carlo Estimation D= 2
mean	stdev	N	CV
0.0946337	0.1213087	1000	1.281877
0.0965587	0.1200370	2000	1.243150
0.0997019	0.1232915	3000	1.236602
0.0992320	0.1226721	4000	1.236216
0.0972665	0.1217437	5000	1.251651
0.0990116	0.1219937	6000	1.232115
0.0993223	0.1219714	7000	1.228037
0.1006994	0.1245233	8000	1.236584
0.1003271	0.1225045	9000	1.221051
0.0994185	0.1228737	10000	1.235924
0.1003738	0.1227318	11000	1.222747
0.0993619	0.1217504	12000	1.225322
0.0988115	0.1220463	13000	1.235142
0.1000180	0.1229179	14000	1.228958
0.0995190	0.1228303	15000	1.234240
0.0989344	0.1220141	16000	1.233283
0.1008799	0.1226262	17000	1.215567
0.0991722	0.1224676	18000	1.234898
0.0994139	0.1227754	19000	1.234992
0.0991846	0.1228981	20000	1.239085

library(ggplot2)
AMC2 <- ggplot(antithetic_MC.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D= 2") 
AMC2

Let’s show the antithetic Monte Carlo and visualize the results for D=4

antithetic_MC.D4 <- MC_loop(d=4, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D = 4')

Antithetic Monte Carlo Estimation D = 4
mean	stdev	N	CV
0.0946337	0.1213087	1000	1.281877
0.0965587	0.1200370	2000	1.243150
0.0997019	0.1232915	3000	1.236602
0.0992320	0.1226721	4000	1.236216
0.0972665	0.1217437	5000	1.251651
0.0990116	0.1219937	6000	1.232115
0.0993223	0.1219714	7000	1.228037
0.1006994	0.1245233	8000	1.236584
0.1003271	0.1225045	9000	1.221051
0.0994185	0.1228737	10000	1.235924
0.1003738	0.1227318	11000	1.222747
0.0993619	0.1217504	12000	1.225322
0.0988115	0.1220463	13000	1.235142
0.1000180	0.1229179	14000	1.228958
0.0995190	0.1228303	15000	1.234240
0.0989344	0.1220141	16000	1.233283
0.1008799	0.1226262	17000	1.215567
0.0991722	0.1224676	18000	1.234898
0.0994139	0.1227754	19000	1.234992
0.0991846	0.1228981	20000	1.239085

library(ggplot2)
AMC4 <- ggplot(antithetic_MC.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D = 4") 
AMC4

Again, as we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value).

d) Latin Hypercube Sampling

Latin Hypercube Sampling is a variance reduction procedure that is based on the idea of stratified sampling. In this approach, you divide the interval into K equally probable subintervals, and generate a uniform sample from within each subinterval.

latinHypercube_MC <- function(n, d = 1, K=10)
{
  min = -5
  max = 5
  size <- (max - min) / K
  samples <- n / K
  
  Y <- rep(NA, samples)
  for(i in 1:samples)
  {
    # Loop through 
    vec <- rep(NA, K)
    for(k in 1:K)
    {
      unif <- runif(d,
                    min + ((k - 1) *size),
                    min + (k *size))
      vec[k] <- cost_func(unif)
    }
    Y[i] <- mean(vec)
  }
  return (Y)
}

Let’s show the Latin Hypercube Sampling and visualize the results for D=1

latinHypercubeMc.D1 <- MC_loop(d=1, fun=latinHypercube_MC)
kable(latinHypercubeMc.D1,caption = ' Latin Hypercube Sampling Estimation D = 1'  )

Latin Hypercube Sampling Estimation D = 1
mean	stdev	N	CV
0.1011382	0.0113785	1000	0.1125045
0.1012569	0.0107048	2000	0.1057191
0.0995613	0.0106339	3000	0.1068072
0.1001634	0.0103042	4000	0.1028741
0.1002135	0.0101784	5000	0.1015672
0.0992301	0.0102730	6000	0.1035272
0.0999507	0.0099905	7000	0.0999546
0.1000021	0.0101731	8000	0.1017285
0.0994530	0.0104530	9000	0.1051048
0.0997668	0.0104803	10000	0.1050478
0.1002104	0.0104709	11000	0.1044891
0.1002008	0.0105730	12000	0.1055186
0.1000000	0.0103324	13000	0.1033240
0.0997822	0.0107929	14000	0.1081647
0.0998053	0.0104872	15000	0.1050769
0.0999606	0.0105485	16000	0.1055267
0.1003504	0.0108123	17000	0.1077458
0.1002694	0.0106498	18000	0.1062119
0.0999437	0.0108851	19000	0.1089127
0.1001995	0.0103313	20000	0.1031070

library(ggplot2)
LHMC1 <- ggplot(latinHypercubeMc.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 1") 
LHMC1

Let’s show the Latin Hypercube Sampling and visualize the results for D=2

latinHypercubeMc.D2 <- MC_loop(d=2, fun=latinHypercube_MC)
kable(latinHypercubeMc.D2,caption = ' Latin Hypercube Sampling Estimation D = 2')

Latin Hypercube Sampling Estimation D = 2
mean	stdev	N	CV
0.0663606	0.0086655	1000	0.1305821
0.0682323	0.0093198	2000	0.1365892
0.0680283	0.0090032	3000	0.1323444
0.0684487	0.0096270	4000	0.1406460
0.0678750	0.0090206	5000	0.1328999
0.0679240	0.0093050	6000	0.1369914
0.0681422	0.0091867	7000	0.1348164
0.0676344	0.0091002	8000	0.1345498
0.0684262	0.0089636	9000	0.1309969
0.0680151	0.0089289	10000	0.1312775
0.0680866	0.0093895	11000	0.1379056
0.0680275	0.0093537	12000	0.1374985
0.0681882	0.0093006	13000	0.1363954
0.0678959	0.0092469	14000	0.1361926
0.0676701	0.0090321	15000	0.1334729
0.0680897	0.0091529	16000	0.1344241
0.0677603	0.0092902	17000	0.1371042
0.0681556	0.0093221	18000	0.1367764
0.0681890	0.0092636	19000	0.1358515
0.0676736	0.0094901	20000	0.1402330

library(ggplot2)
LHMC2 <- ggplot(latinHypercubeMc.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 2") 
LHMC2

Let’s show the Latin Hypercube Sampling and visualize the results for D=4

latinHypercubeMc.D4 <- MC_loop(d=4, fun=latinHypercube_MC)
kable(latinHypercubeMc.D4,caption = ' Latin Hypercube Sampling Estimation D = 4'  )

Latin Hypercube Sampling Estimation D = 4
mean	stdev	N	CV
0.0428856	0.0083432	1000	0.1945454
0.0438387	0.0081845	2000	0.1866956
0.0436227	0.0085193	3000	0.1952946
0.0430651	0.0084867	4000	0.1970675
0.0436693	0.0087182	5000	0.1996412
0.0442685	0.0087111	6000	0.1967796
0.0433512	0.0088641	7000	0.2044717
0.0442068	0.0088902	8000	0.2011053
0.0440514	0.0090101	9000	0.2045351
0.0441966	0.0088146	10000	0.1994412
0.0436382	0.0085834	11000	0.1966942
0.0437254	0.0087255	12000	0.1995517
0.0438595	0.0087161	13000	0.1987280
0.0439454	0.0090017	14000	0.2048393
0.0436612	0.0084336	15000	0.1931607
0.0437465	0.0086988	16000	0.1988446
0.0438383	0.0085813	17000	0.1957485
0.0438346	0.0086257	18000	0.1967789
0.0437257	0.0085573	19000	0.1957031
0.0441372	0.0087772	20000	0.1988615

library(ggplot2)
LHMC4 <- ggplot(latinHypercubeMc.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 4") 
LHMC4

e) Importance Sampling

Recall… The goal of importance sampling is to estimate E[c(x)] by choosing a second density q(x) such that:

\[E_p[c(x)] = \int c(x) p(x) dx\] \[E_p[c(x)] = \int c(x) p(x) \frac {q(x)} {q(x)} dx\]

\[E_p[c(x)] = \int c(x) q(x) w(x) dx\]

\[q(x)=\frac{c(x)w(x)}{E_p[c(x)]}\]

\[w(x)=\frac{c(x)}{q(x)}\]

Next, we wil create our functions p(x), w(x), and c(x)

#################################################
qx <- function(x, d)
{ min=-5
  max=5
  cx <- cost_func(x)
  Ecx <- (1/10)^d 
  res <- (cx %*% t(px(x))) / Ecx
  return(res)
}
px <- function(x)
{
  return (dunif(x))
}
wx <- function(x,d)
{
  w <- px(x[1]) / qx(x[1], d)
  return(w)
}

############
# Helper function
helpeMethod <- function(fun)
{
  min=-5
  max=5
  M <- 2
  accepted <- FALSE
  while(!accepted)
  {
    # Get a random value from uniform distrubtion (g(x) for us)
    r <- runif(1, min, max)
    # Sample x from g(x) and u from U(0,1) (the uniform distribution over the unit interval)
    u <- runif(1, 0, 1)
    gx <- dunif(r, min, max)
    # Check whether or not u<f(x)/Mg(x).
    if(u < fun(r, 1) / (M * gx))
    {
      accepted = TRUE
    }
  }
  return(r)
}
importance_MC <- function(n, d = 1)
{ 
  min=-5
  max =5
  theta.hat <- rep(NA, n)
  for(i in 1:n)
  {
    x <- rep(NA, d)
    for(j in 1:d)
    {
      x[j] <- helpeMethod(qx)
    }
    xw <- wx(x, d)
    theta.hat[i] <- cost_func(x) * xw
  }
  return (theta.hat)
}

Let’s show the Importance Sampling and visualize the results for D=1

importance_MC.D1 <- MC_loop(d=1, fun=importance_MC)

kable(importance_MC.D1,caption = 'Importance Sampling Estimation D = 1'  )

Importance Sampling Estimation D = 1
mean	stdev	N	CV
0.1	0	1000	0
0.1	0	2000	0
0.1	0	3000	0
0.1	0	4000	0
0.1	0	5000	0
0.1	0	6000	0
0.1	0	7000	0
0.1	0	8000	0
0.1	0	9000	0
0.1	0	10000	0
0.1	0	11000	0
0.1	0	12000	0
0.1	0	13000	0
0.1	0	14000	0
0.1	0	15000	0
0.1	0	16000	0
0.1	0	17000	0
0.1	0	18000	0
0.1	0	19000	0
0.1	0	20000	0

AnalyticValue_D1=1/10
library(ggplot2)
IMPD1 <- ggplot(importance_MC.D1) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 1") 
IMPD1

Let’s show the Importance Sampling and visualize the results for D=2

importance_MC.D2 <- MC_loop(d=2, fun=importance_MC)

kable(importance_MC.D2,caption = 'Importance Sampling Estimation D = 2'  )

Importance Sampling Estimation D = 2
mean	stdev	N	CV
0.0085731	0.0012143	1000	0.1416426
0.0085592	0.0012215	2000	0.1427140
0.0085616	0.0012298	3000	0.1436388
0.0085488	0.0012173	4000	0.1423922
0.0085300	0.0012166	5000	0.1426225
0.0085795	0.0012070	6000	0.1406798
0.0085678	0.0012063	7000	0.1407903
0.0085535	0.0012036	8000	0.1407123
0.0085472	0.0012130	9000	0.1419129
0.0085516	0.0012178	10000	0.1424017
0.0085543	0.0012113	11000	0.1415963
0.0085606	0.0012117	12000	0.1415460
0.0085605	0.0012179	13000	0.1422706
0.0085579	0.0012129	14000	0.1417326
0.0085622	0.0012100	15000	0.1413210
0.0085481	0.0012150	16000	0.1421339
0.0085614	0.0012087	17000	0.1411845
0.0085495	0.0012176	18000	0.1424133
0.0085573	0.0012178	19000	0.1423064
0.0085716	0.0012099	20000	0.1411526

AnalyticValue_D2=0.03989420516865691
library(ggplot2)
IMPD2 <- ggplot(importance_MC.D2) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 2") 
IMPD2

Let’s show the Importance Sampling and visualize the results for D=4

importance_MC.D4 <- MC_loop(d=4, fun=importance_MC)

kable(importance_MC.D4,caption = 'Importance Sampling Estimation D = 4'  )

Importance Sampling Estimation D = 4
mean	stdev	N	CV
6.23e-05	1.51e-05	1000	0.2427272
6.27e-05	1.54e-05	2000	0.2447866
6.31e-05	1.56e-05	3000	0.2479409
6.27e-05	1.55e-05	4000	0.2479586
6.25e-05	1.56e-05	5000	0.2494539
6.26e-05	1.55e-05	6000	0.2474235
6.27e-05	1.55e-05	7000	0.2474750
6.25e-05	1.57e-05	8000	0.2514339
6.27e-05	1.54e-05	9000	0.2457493
6.29e-05	1.57e-05	10000	0.2491272
6.27e-05	1.55e-05	11000	0.2476801
6.24e-05	1.54e-05	12000	0.2469352
6.24e-05	1.56e-05	13000	0.2500206
6.27e-05	1.55e-05	14000	0.2468886
6.29e-05	1.54e-05	15000	0.2443543
6.27e-05	1.56e-05	16000	0.2482902
6.28e-05	1.56e-05	17000	0.2480723
6.25e-05	1.56e-05	18000	0.2489033
6.24e-05	1.55e-05	19000	0.2482173
6.26e-05	1.56e-05	20000	0.2491884

AnalyticValue_D2=0.03989420516865691
library(ggplot2)
IMPD4 <- ggplot(importance_MC.D4) + 
  geom_point(aes(x=N, y=mean), colour="blue") + 
  geom_line(aes(x=N, y=mean), colour="blue") + 
  geom_point(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=stdev), colour="red") + 
  geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") + 
  labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 4") 
IMPD4

Summary:The below visualization shows the variance reduction with each of the methods as follow:Crude Monte Carlo = blue, Sobol =black, Antithetic = yellow, Latin Hypercube = red, Importance = brown, Actual = green.

We can clearly see that Sobol and Importance methods had estimated the mean very well. The Crude Monte Carlo and Antithetic had performed poorly.

CMC.D1 <- MC_loop(d=1, fun=CMC)
sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
antithetic_MC.D1 <- MC_loop(d=1, fun=antithetic_MC)
latinHypercubeMc.D1 <- MC_loop(d=1, fun=latinHypercube_MC)
importance_MC.D1 <- MC_loop(d=1, fun=importance_MC)

summary_res<- data.frame('N'= CMC.D1$N,
                      'CMC'= CMC.D1$mean,
                     'Sobol' = sobolMc.D1$mean,
                     'Antithetic' = antithetic_MC.D1$mean,
                     'LatinHypercube' = latinHypercubeMc.D1$mean,
                     'Importance' = importance_MC.D1$mean,
                      'Actual' = AnalyticValue_D1
)
N=n
library(ggplot2)

summary_res1 <- ggplot(summary_res) + 
  geom_line(aes(x=N, y=CMC), colour="blue") + 
  geom_point(aes(x=N, y=Sobol), colour="black") +
  geom_line(aes(x=N, y=Antithetic), colour="yellow") + 
  geom_line(aes(x=N, y=LatinHypercube), colour="red") + 
  geom_point(aes(x=N, y=Importance), colour="brown") + 
  geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") + 
  labs(x="N", y= " Mean", title="All Methods Estimations Summary at D = 1") 
summary_res1

Question 6.3

Plot the power curves for the t-test in Example 6.9 for samples sizes 10, 20, 30, 40 and 50. but omit the standard error bars. Plot the curves on the same graph, each in a different color or different line type, and include a legend. Comment on the relation between power and sample size.

Answer 6.3

################################
library(ggplot2)
N<- c(10, 20, 30, 40, 50)
m <- 1000
mu0 <- 500
sigma <- 100
mu <- c(seq(450, 650, 10)) #alternatives
M <- length(mu)
df <- data.frame(Sample_size=c(), mu=c(), power=c())
for(n in N)
{
  for(i in 1:M)
  {
    mu1 <- mu[i]
    pvalues <- replicate(m, expr={
      # simulate under alternative mu
      x <- rnorm(n, mean=mu1, sd=sigma)
      ttest <- t.test(x, alternative="greater", mu=mu0)
      ttest$p.value
    })
    df <- rbind(df, data.frame(Sample_size=n, mu=mu1, power=mean(pvalues <= 0.05)))
  }
}

df$Sample_size <- as.factor(df$Sample_size)
p63 <- ggplot(df) + 
  geom_path(aes(x=mu, y=power, colour=Sample_size)) + 
  labs(title="Power Curves", x="mu alternatives", y='Powers' )
p63

The power curve shown above we observed that the empirical power is small when \(\theta\) is close to \(\theta_0\)=500, and increasing as \(\theta\) moves further away from \(\theta_0\), approaching 1 as \(\theta\to\infty\)

Question 6.4

Suppose the \((X_1,...,X_n)\) are a random sample from a lognormal distribution with unknown parameters. Construct a 95% confidence interval for the parameter \(\mu\). Use a Monte Carlo method to obtain an empirical estimate of the confidence level.

Answer 6.4

We usually use \(\bar X\) = \(\frac {X_1+...+X_n} n\) to estimate the parameter µ, and, as the above shows, we define the pivot as:

\[h(X_1,...,X_n, \mu) = \frac {\bar X - \mu} {\frac{\sigma}{\sqrt{n}}}\] and \[ h(X_1,...,X_n, \mu) \sim N(0,1)\] Therefore: \[P\left( z(\frac \alpha 2) \le \frac {\bar X - \mu} {\frac{\sigma}{\sqrt{n}}} \le z(1 - \frac \alpha 2) \right) = .95\]

Now let use the random log normal variables to estimate the \(\mu\) parameter given 95% confidence level.

n <- 2500
mu <- 1
sigma <- 0.5
alpha <- 0.05
res <- replicate(m, expr={
  x <- rlnorm(n, meanlog=mu, sdlog=sigma)
  log.mean <- mean(log(x))
  log.sd <- sd(log(x))
  log.se <- log.sd / sqrt(n)
  t.val <- qt(1 - alpha / 2, n - 1)
  CI <- c(log.mean, log.mean - (t.val * log.se), log.mean + (t.val * log.se))
  CI
 
})
theCI <- c(mean(res[2,]), mean(res[3,]))
theCI

## [1] 0.9800012 1.0192280

theCL <- mean(res[2,] < (0.999) & (1.001) < res[3,])
theCL

## [1] 0.942

Question 7.1

Compute the jackknife esimate of the bias and the standard error of the correlation statistic in Example 7.2

Answer 7.1

Jackknife estimate of bias:

############################
library(bootstrap)

## Warning: package 'bootstrap' was built under R version 3.3.1

n <- nrow(law)
lsat <- law$LSAT
gpa <- law$GPA
theta.hat <- mean(lsat) / mean(gpa)
print (theta.hat)

## [1] 1.939681

################
theta.jack <- numeric(n)
for (i in 1:n)
  theta.jack[i] <- mean(lsat[-i]) / mean(gpa[-i])
bias <- (n - 1) * (mean(theta.jack) - theta.hat)
print(bias) #jackknife estimate of bias

## [1] 0.0002506089

Jackkknife estimate of standard error:

se <- sqrt((n-1) * mean((theta.jack - mean(theta.jack))^2))
print(se)

## [1] 0.02525517

Question 7.4

Refer to the air-conditioning data set aircondit provided in the boot package. The 12 observations are the times in hours between failures of air-conditioning equipment. [63, Example 1.1]: \[3,5,7,18,43,85,91,98,100,130,230,487\]

Assume that the times between failures follow an exponential model \(Exp(\lambda)\). Obtain the MLE of the hazard rate \(\lambda\) and use bootstrap to estimate the bias and standard error of the estimate.

Answer 7.4

The bootstrap estimate for the standard error of the estimate is:

library(boot)

B <- 200
n <- 12
R <- numeric(B)
for (b in 1:B) {
  i <- sample(1:n, size = n, replace = TRUE)
  hours <- aircondit$hours[i]
  R[b] <- length(hours)/sum(hours)
}
print(se.R <- sd(R))

## [1] 0.003830035

The bootstrap estimate for the bias of the estimate is:

theta.hat <- 0.00925212
B <- 2000 
n <- 12
theta.b <- numeric(B)
for (b in 1:B) {
i <- sample(1:n, size = n, replace = TRUE)
hours <- aircondit$hours[i]
theta.b[b] <- length(hours)/sum(hours)
}
bias <- mean(theta.b - theta.hat)
bias

## [1] 0.001407882