Purpose:
I just want to recall the reasons behind variance reduction techniques. Monte Carlo integration typically has an error variance of the form \(\frac {\sigma^2} {n}\). We get a better answer by sampling with a larger value of n, but the computing time grows with n. Sometimes we can find a way to reduce \(\sigma\) instead. To do this, we construct a new Monte Carlo problem with the same answer as our original one but with a lower\(\sigma\). Methods to do this are known as variance reduction techniques.
In this problem, you will implement and investigate a series of variance reduction procedures for Monte Carlo methods by estimating the expected value of a cost function c(x) which depends on a D-dimensional random variable x.
The cost function you will use is:
\[ c(x)=\frac{1}{(2\pi)^{\frac{D}{2}}} e^{\frac {-1} {2} x^T x} \]
where
\[x_i \sim U(-5,5) \text{ for } i=1..D\]
That is, x is a D-dimensional random variable (i.e., a Dx1 column vector) and each component of x is uniformly distributed between -5 and 5. Note that c(x) conveniently assumes the form of a standard multivariate normal, and no correlation exists between the components.
Goal: estimate E[c(x)] - the expected value of c(x) - using Monte Carlo methods and see how it compares to the real value, which you are able to find by hand.
For sample sizes of 1000 to 10000 (in increments of 1000), obtain 100 estimates for E[c(x)] when D = 1, using crude Monte Carlo sampling. Calculate the average value of the 100 estimates as well as their standard deviation, and plot them. In your plot, include a line showing the analytical value for E[c(x)].
First define the cost function as an R function
cost_func <- function(x, D=1)
{
c <- exp(-0.5 * t(x) %*% x)
cost_func_results <- (1 / ((2 * pi)^(D/2))) * c
return (cost_func_results)
}
Next we will create a matrix with a column for each sample size NN and 100 rows to store estimates from each iteration of the function. We will then apply column-wise functions to find the mean, standard deviation, and coefficient of variance of the 100 estimates for each N. Fisrt we will calculate the analytic values for D=1, 2,4, and 6.
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.1
######
# ref: http://www.integral-calculator.com/
AnalyticValue_D1 = 0.099999994
AnalyticValue_D2 = 0.03989420516865691
AnalyticValue_D4 = 0.006349359953313988
AnalyticValue_D6 = 0.0010105320220309647
AnalyticValue_D8 = 0001600831166402579
CMC <- function(n, d = 1)
{
t <- rep(NA, n)
for(i in 1:n)
{
x <- runif(d, -5, 5)
t[i] <- cost_func(x)
}
return (t)
}
#########
MC_loop <- function(d, fun)
{
CMC.result <- data.frame(mean=c(), stdev=c(), n=c())
for(n in seq(1000, 20000, by=1000))
{
res <- fun(n=n, d=d)
CMC.result <- rbind(CMC.result, data.frame(mean=mean(res), stdev=sd(res), N=n))
}
#CMC.result$EcActual <- (1/10)^d
CMC.result$CV <- CMC.result$stdev / CMC.result$mean
return (CMC.result)
}
CMC.D1 <- MC_loop(d=1, fun=CMC)
Letโs show the CMC and visualize the results for D=1
#####
kable(CMC.D1, caption = 'Monte Carlo Estimation D= 1')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0981998 | 0.1361343 | 1000 | 1.386300 |
| 0.1059722 | 0.1391388 | 2000 | 1.312975 |
| 0.1024486 | 0.1354279 | 3000 | 1.321911 |
| 0.0997539 | 0.1346572 | 4000 | 1.349894 |
| 0.1000528 | 0.1348032 | 5000 | 1.347321 |
| 0.0991720 | 0.1345330 | 6000 | 1.356562 |
| 0.0991800 | 0.1341254 | 7000 | 1.352343 |
| 0.0986359 | 0.1336084 | 8000 | 1.354561 |
| 0.0998855 | 0.1344078 | 9000 | 1.345619 |
| 0.0970901 | 0.1347123 | 10000 | 1.387498 |
| 0.0993453 | 0.1340295 | 11000 | 1.349127 |
| 0.0996995 | 0.1351484 | 12000 | 1.355558 |
| 0.1005037 | 0.1357187 | 13000 | 1.350384 |
| 0.0994147 | 0.1345373 | 14000 | 1.353294 |
| 0.0976972 | 0.1332197 | 15000 | 1.363597 |
| 0.0993604 | 0.1342501 | 16000 | 1.351142 |
| 0.0998262 | 0.1350334 | 17000 | 1.352685 |
| 0.1006510 | 0.1357473 | 18000 | 1.348693 |
| 0.1004996 | 0.1353388 | 19000 | 1.346660 |
| 0.1007590 | 0.1350156 | 20000 | 1.339986 |
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
D1 <- ggplot(CMC.D1) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 1")
D1
Letโs show the CMC and visualize the results for D=2
CMC.D2 <- MC_loop(d=2, fun=CMC)
kable(CMC.D2, caption = 'Monte Carlo Estimation D= 2')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0254051 | 0.0663428 | 1000 | 2.611401 |
| 0.0238710 | 0.0640611 | 2000 | 2.683631 |
| 0.0256491 | 0.0666433 | 3000 | 2.598267 |
| 0.0266108 | 0.0692082 | 4000 | 2.600755 |
| 0.0243074 | 0.0652812 | 5000 | 2.685648 |
| 0.0246965 | 0.0666069 | 6000 | 2.697016 |
| 0.0244762 | 0.0656951 | 7000 | 2.684035 |
| 0.0259938 | 0.0676566 | 8000 | 2.602797 |
| 0.0250898 | 0.0666876 | 9000 | 2.657958 |
| 0.0247470 | 0.0656335 | 10000 | 2.652180 |
| 0.0239399 | 0.0643529 | 11000 | 2.688099 |
| 0.0255236 | 0.0671404 | 12000 | 2.630526 |
| 0.0258621 | 0.0674592 | 13000 | 2.608420 |
| 0.0244511 | 0.0654314 | 14000 | 2.676004 |
| 0.0251191 | 0.0661890 | 15000 | 2.635004 |
| 0.0259076 | 0.0676854 | 16000 | 2.612568 |
| 0.0257010 | 0.0672579 | 17000 | 2.616935 |
| 0.0247881 | 0.0655405 | 18000 | 2.644033 |
| 0.0245196 | 0.0654146 | 19000 | 2.667848 |
| 0.0258598 | 0.0674840 | 20000 | 2.609612 |
library(ggplot2)
D2 <- ggplot(CMC.D2) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 2")
D2
Letโs show the CMC and visualize the results for D=2
CMC.D4 <- MC_loop(d=4, fun=CMC)
kable(CMC.D4, caption = 'Monte Carlo Estimation D= 4')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0026106 | 0.0183441 | 1000 | 7.026809 |
| 0.0014728 | 0.0108799 | 2000 | 7.387433 |
| 0.0013510 | 0.0094267 | 3000 | 6.977682 |
| 0.0019340 | 0.0158778 | 4000 | 8.209950 |
| 0.0018646 | 0.0142016 | 5000 | 7.616306 |
| 0.0015641 | 0.0114340 | 6000 | 7.310103 |
| 0.0015704 | 0.0118809 | 7000 | 7.565402 |
| 0.0016110 | 0.0123754 | 8000 | 7.681689 |
| 0.0014636 | 0.0114820 | 9000 | 7.845081 |
| 0.0016180 | 0.0129828 | 10000 | 8.024164 |
| 0.0017087 | 0.0137720 | 11000 | 8.059947 |
| 0.0015932 | 0.0129686 | 12000 | 8.139858 |
| 0.0018852 | 0.0148507 | 13000 | 7.877676 |
| 0.0016631 | 0.0128511 | 14000 | 7.727118 |
| 0.0015141 | 0.0121979 | 15000 | 8.056110 |
| 0.0015022 | 0.0116179 | 16000 | 7.734011 |
| 0.0017511 | 0.0132175 | 17000 | 7.548127 |
| 0.0015433 | 0.0117087 | 18000 | 7.586978 |
| 0.0016732 | 0.0124422 | 19000 | 7.435980 |
| 0.0015759 | 0.0124120 | 20000 | 7.876149 |
library(ggplot2)
D4 <- ggplot(CMC.D4) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 4")
D4
CMC.D6 <- MC_loop(d=6, fun=CMC)
kable(CMC.D6, caption = 'Monte Carlo Estimation D= 6')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0000835 | 0.0014630 | 1000 | 17.53040 |
| 0.0000821 | 0.0013203 | 2000 | 16.08263 |
| 0.0000868 | 0.0016866 | 3000 | 19.44090 |
| 0.0001078 | 0.0018750 | 4000 | 17.40077 |
| 0.0000922 | 0.0016049 | 5000 | 17.40323 |
| 0.0001011 | 0.0025173 | 6000 | 24.90937 |
| 0.0000564 | 0.0008839 | 7000 | 15.66245 |
| 0.0000932 | 0.0020616 | 8000 | 22.12910 |
| 0.0001159 | 0.0026584 | 9000 | 22.94546 |
| 0.0001003 | 0.0021627 | 10000 | 21.56102 |
| 0.0000978 | 0.0020746 | 11000 | 21.21784 |
| 0.0000819 | 0.0016432 | 12000 | 20.07264 |
| 0.0000875 | 0.0018530 | 13000 | 21.18115 |
| 0.0001373 | 0.0030473 | 14000 | 22.20084 |
| 0.0001270 | 0.0023447 | 15000 | 18.46454 |
| 0.0000958 | 0.0020694 | 16000 | 21.61016 |
| 0.0000865 | 0.0016872 | 17000 | 19.50311 |
| 0.0000625 | 0.0010298 | 18000 | 16.48372 |
| 0.0001457 | 0.0031924 | 19000 | 21.91200 |
| 0.0001085 | 0.0023251 | 20000 | 21.42078 |
library(ggplot2)
D6 <- ggplot(CMC.D6) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D6), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Monte Carlo Estimation D= 6")
D6
As we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value)..
We now investigate the variance reduction properties of quasi-random numbers using Sobol numbers. To see how these Sobol numbers differ from normal random numbers, generate and plot 100 pairs of both. Comment on the differences.
First letโs use runif to generate m = 100 random numbers and plots them:
n <- 100
uniFrom <- as.data.frame(matrix(runif(n * 2), ncol=2))
colnames(uniFrom) <- c("x", "y")
unif <- ggplot(data=as.data.frame(uniFrom)) +
geom_point(aes(x=x, y=y)) +
labs(title="Uniform for n=100 x 2")
unif
Next we wil use sobol to generate n = 100 random numbers and plots them.
library(randtoolbox)
## Warning: package 'randtoolbox' was built under R version 3.3.1
## Loading required package: rngWELL
## Warning: package 'rngWELL' was built under R version 3.3.1
## This is randtoolbox. For overview, type 'help("randtoolbox")'.
sobol_Rnum <- as.data.frame(sobol(n, d=2))
colnames(sobol_Rnum) <- c("x", "y")
sobol_p2 <- ggplot(data=as.data.frame(sobol_Rnum)) +
geom_point(aes(x=x, y=y)) +
labs(title="Sobol Quasi-Random Numbers n=100 x 2 ")
sobol_p2
# Sobol Monte Carlo function
sobolMC <- function(n, d = 1, init=TRUE)
{
# Need a loop in here
t <- rep(NA, n)
for(i in 1:n)
{
x <- as.vector(sobol(n=1, d=d, init=init))
x <- -5 + (x * 10)
t[i] <- cost_func(x)
init <- FALSE # turn off the init. We don't want to re-initialize every call
}
return (t)
}
#sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
#sobolMc.D1
Letโs show the Sobol Monte Carlo and visualize the results for D=1
sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
kable(sobolMc.D1, caption = 'Sobol Monte Carlo Estimation D= 1')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.1000078 | 0.1350043 | 1000 | 1.349938 |
| 0.1000000 | 0.1349762 | 2000 | 1.349762 |
| 0.1000002 | 0.1349645 | 3000 | 1.349642 |
| 0.0999999 | 0.1349594 | 4000 | 1.349595 |
| 0.1000007 | 0.1349560 | 5000 | 1.349550 |
| 0.1000000 | 0.1349538 | 6000 | 1.349538 |
| 0.1000003 | 0.1349523 | 7000 | 1.349519 |
| 0.0999999 | 0.1349510 | 8000 | 1.349511 |
| 0.1000014 | 0.1349487 | 9000 | 1.349468 |
| 0.1000000 | 0.1349493 | 10000 | 1.349493 |
| 0.1000000 | 0.1349484 | 11000 | 1.349484 |
| 0.0999999 | 0.1349482 | 12000 | 1.349482 |
| 0.1000004 | 0.1349475 | 13000 | 1.349470 |
| 0.0999999 | 0.1349474 | 14000 | 1.349474 |
| 0.1000001 | 0.1349471 | 15000 | 1.349470 |
| 0.0999999 | 0.1349468 | 16000 | 1.349468 |
| 0.1000005 | 0.1349460 | 17000 | 1.349453 |
| 0.1000000 | 0.1349463 | 18000 | 1.349463 |
| 0.1000000 | 0.1349460 | 19000 | 1.349460 |
| 0.0999999 | 0.1349459 | 20000 | 1.349460 |
library(ggplot2)
DS1 <- ggplot(sobolMc.D1) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 1")
DS1
Letโs show the Sobol Monte Carlo and visualize the results for D=2
sobolMc.D2 <- MC_loop(d=2, fun=sobolMC)
kable(sobolMc.D2, caption = 'Sobol Monte Carlo Estimation D= 2')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0252972 | 0.0670778 | 1000 | 2.651591 |
| 0.0248737 | 0.0658396 | 2000 | 2.646959 |
| 0.0249524 | 0.0658683 | 3000 | 2.639755 |
| 0.0250161 | 0.0659642 | 4000 | 2.636868 |
| 0.0251183 | 0.0662144 | 5000 | 2.636098 |
| 0.0251252 | 0.0662073 | 6000 | 2.635097 |
| 0.0250746 | 0.0661084 | 7000 | 2.636471 |
| 0.0250514 | 0.0660554 | 8000 | 2.636799 |
| 0.0250939 | 0.0663065 | 9000 | 2.642330 |
| 0.0251324 | 0.0663249 | 10000 | 2.639022 |
| 0.0250553 | 0.0661434 | 11000 | 2.639900 |
| 0.0251099 | 0.0663046 | 12000 | 2.640572 |
| 0.0250408 | 0.0659899 | 13000 | 2.635296 |
| 0.0250999 | 0.0661518 | 14000 | 2.635543 |
| 0.0250528 | 0.0660462 | 15000 | 2.636278 |
| 0.0250648 | 0.0660833 | 16000 | 2.636496 |
| 0.0250663 | 0.0661090 | 17000 | 2.637361 |
| 0.0250603 | 0.0661020 | 18000 | 2.637718 |
| 0.0250577 | 0.0661014 | 19000 | 2.637966 |
| 0.0250991 | 0.0662334 | 20000 | 2.638879 |
library(ggplot2)
DS2 <- ggplot(sobolMc.D2) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 2")
DS2
Letโs show the Sobol Monte Carlo and visualize the results for D=4
sobolMc.D4 <- MC_loop(d=4, fun=sobolMC)
kable(sobolMc.D4, caption = 'Sobol Monte Carlo Estimation D= 4')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0020512 | 0.0185322 | 1000 | 9.034808 |
| 0.0019187 | 0.0167684 | 2000 | 8.739384 |
| 0.0018435 | 0.0157903 | 3000 | 8.565195 |
| 0.0019194 | 0.0165702 | 4000 | 8.633011 |
| 0.0017874 | 0.0152083 | 5000 | 8.508425 |
| 0.0016878 | 0.0141846 | 6000 | 8.404088 |
| 0.0016170 | 0.0133921 | 7000 | 8.282330 |
| 0.0015693 | 0.0127941 | 8000 | 8.152662 |
| 0.0015889 | 0.0129440 | 9000 | 8.146647 |
| 0.0016370 | 0.0133957 | 10000 | 8.183101 |
| 0.0016648 | 0.0137198 | 11000 | 8.240937 |
| 0.0016845 | 0.0139399 | 12000 | 8.275303 |
| 0.0016590 | 0.0136365 | 13000 | 8.219491 |
| 0.0016375 | 0.0133230 | 14000 | 8.135959 |
| 0.0016169 | 0.0130736 | 15000 | 8.085829 |
| 0.0015945 | 0.0127932 | 16000 | 8.023545 |
| 0.0015681 | 0.0125222 | 17000 | 7.985796 |
| 0.0015569 | 0.0123451 | 18000 | 7.929454 |
| 0.0015347 | 0.0121085 | 19000 | 7.889822 |
| 0.0015302 | 0.0119699 | 20000 | 7.822379 |
library(ggplot2)
DS4 <- ggplot(sobolMc.D4) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Sobol Monte Carlo Estimation D= 4")
DS4
Again, as we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value)โฆ
A simple method that is sometimes used for reducing the variance of the estimate is to induce negative correlation between random draws. When the random variable being generated is uniform, or is a function of a uniform random variable, then one can induce negative correlation between the draws using the following simple procedure:
First letโs create our antithetic Monte Carlo function
##############################
antithetic_MC <- function(n, d = 1)
{
t <- rep(NA, n)
for(i in 1:n)
{
x <- runif(d, -5, 5)
x2 <- 1 - x
t[i] <- (cost_func(x) + cost_func(x2) ) / 2
}
return (t)
}
###############################
Letโs show the antithetic Monte Carlo and visualize the results for D=1
antithetic_MC.D1 <- MC_loop(d=1, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D= 1')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0946337 | 0.1213087 | 1000 | 1.281877 |
| 0.0965587 | 0.1200370 | 2000 | 1.243150 |
| 0.0997019 | 0.1232915 | 3000 | 1.236602 |
| 0.0992320 | 0.1226721 | 4000 | 1.236216 |
| 0.0972665 | 0.1217437 | 5000 | 1.251651 |
| 0.0990116 | 0.1219937 | 6000 | 1.232115 |
| 0.0993223 | 0.1219714 | 7000 | 1.228037 |
| 0.1006994 | 0.1245233 | 8000 | 1.236584 |
| 0.1003271 | 0.1225045 | 9000 | 1.221051 |
| 0.0994185 | 0.1228737 | 10000 | 1.235924 |
| 0.1003738 | 0.1227318 | 11000 | 1.222747 |
| 0.0993619 | 0.1217504 | 12000 | 1.225322 |
| 0.0988115 | 0.1220463 | 13000 | 1.235142 |
| 0.1000180 | 0.1229179 | 14000 | 1.228958 |
| 0.0995190 | 0.1228303 | 15000 | 1.234240 |
| 0.0989344 | 0.1220141 | 16000 | 1.233283 |
| 0.1008799 | 0.1226262 | 17000 | 1.215567 |
| 0.0991722 | 0.1224676 | 18000 | 1.234898 |
| 0.0994139 | 0.1227754 | 19000 | 1.234992 |
| 0.0991846 | 0.1228981 | 20000 | 1.239085 |
library(ggplot2)
AMC1 <- ggplot(antithetic_MC.D1) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D= 1")
AMC1
Letโs show the antithetic Monte Carlo and visualize the results for D=2
antithetic_MC.D2 <- MC_loop(d=2, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D= 2')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0946337 | 0.1213087 | 1000 | 1.281877 |
| 0.0965587 | 0.1200370 | 2000 | 1.243150 |
| 0.0997019 | 0.1232915 | 3000 | 1.236602 |
| 0.0992320 | 0.1226721 | 4000 | 1.236216 |
| 0.0972665 | 0.1217437 | 5000 | 1.251651 |
| 0.0990116 | 0.1219937 | 6000 | 1.232115 |
| 0.0993223 | 0.1219714 | 7000 | 1.228037 |
| 0.1006994 | 0.1245233 | 8000 | 1.236584 |
| 0.1003271 | 0.1225045 | 9000 | 1.221051 |
| 0.0994185 | 0.1228737 | 10000 | 1.235924 |
| 0.1003738 | 0.1227318 | 11000 | 1.222747 |
| 0.0993619 | 0.1217504 | 12000 | 1.225322 |
| 0.0988115 | 0.1220463 | 13000 | 1.235142 |
| 0.1000180 | 0.1229179 | 14000 | 1.228958 |
| 0.0995190 | 0.1228303 | 15000 | 1.234240 |
| 0.0989344 | 0.1220141 | 16000 | 1.233283 |
| 0.1008799 | 0.1226262 | 17000 | 1.215567 |
| 0.0991722 | 0.1224676 | 18000 | 1.234898 |
| 0.0994139 | 0.1227754 | 19000 | 1.234992 |
| 0.0991846 | 0.1228981 | 20000 | 1.239085 |
library(ggplot2)
AMC2 <- ggplot(antithetic_MC.D2) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D= 2")
AMC2
Letโs show the antithetic Monte Carlo and visualize the results for D=4
antithetic_MC.D4 <- MC_loop(d=4, fun=antithetic_MC)
kable(antithetic_MC.D1, caption = 'Antithetic Monte Carlo Estimation D = 4')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0946337 | 0.1213087 | 1000 | 1.281877 |
| 0.0965587 | 0.1200370 | 2000 | 1.243150 |
| 0.0997019 | 0.1232915 | 3000 | 1.236602 |
| 0.0992320 | 0.1226721 | 4000 | 1.236216 |
| 0.0972665 | 0.1217437 | 5000 | 1.251651 |
| 0.0990116 | 0.1219937 | 6000 | 1.232115 |
| 0.0993223 | 0.1219714 | 7000 | 1.228037 |
| 0.1006994 | 0.1245233 | 8000 | 1.236584 |
| 0.1003271 | 0.1225045 | 9000 | 1.221051 |
| 0.0994185 | 0.1228737 | 10000 | 1.235924 |
| 0.1003738 | 0.1227318 | 11000 | 1.222747 |
| 0.0993619 | 0.1217504 | 12000 | 1.225322 |
| 0.0988115 | 0.1220463 | 13000 | 1.235142 |
| 0.1000180 | 0.1229179 | 14000 | 1.228958 |
| 0.0995190 | 0.1228303 | 15000 | 1.234240 |
| 0.0989344 | 0.1220141 | 16000 | 1.233283 |
| 0.1008799 | 0.1226262 | 17000 | 1.215567 |
| 0.0991722 | 0.1224676 | 18000 | 1.234898 |
| 0.0994139 | 0.1227754 | 19000 | 1.234992 |
| 0.0991846 | 0.1228981 | 20000 | 1.239085 |
library(ggplot2)
AMC4 <- ggplot(antithetic_MC.D4) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Antithetic Monte Carlo Estimation D = 4")
AMC4
Again, as we can observe as we increase the value of D, our estimate gets further than the actual value (analytic value).
Latin Hypercube Sampling is a variance reduction procedure that is based on the idea of stratified sampling. In this approach, you divide the interval into K equally probable subintervals, and generate a uniform sample from within each subinterval.
latinHypercube_MC <- function(n, d = 1, K=10)
{
min = -5
max = 5
size <- (max - min) / K
samples <- n / K
Y <- rep(NA, samples)
for(i in 1:samples)
{
# Loop through
vec <- rep(NA, K)
for(k in 1:K)
{
unif <- runif(d,
min + ((k - 1) *size),
min + (k *size))
vec[k] <- cost_func(unif)
}
Y[i] <- mean(vec)
}
return (Y)
}
Letโs show the Latin Hypercube Sampling and visualize the results for D=1
latinHypercubeMc.D1 <- MC_loop(d=1, fun=latinHypercube_MC)
kable(latinHypercubeMc.D1,caption = ' Latin Hypercube Sampling Estimation D = 1' )
| mean | stdev | N | CV |
|---|---|---|---|
| 0.1011382 | 0.0113785 | 1000 | 0.1125045 |
| 0.1012569 | 0.0107048 | 2000 | 0.1057191 |
| 0.0995613 | 0.0106339 | 3000 | 0.1068072 |
| 0.1001634 | 0.0103042 | 4000 | 0.1028741 |
| 0.1002135 | 0.0101784 | 5000 | 0.1015672 |
| 0.0992301 | 0.0102730 | 6000 | 0.1035272 |
| 0.0999507 | 0.0099905 | 7000 | 0.0999546 |
| 0.1000021 | 0.0101731 | 8000 | 0.1017285 |
| 0.0994530 | 0.0104530 | 9000 | 0.1051048 |
| 0.0997668 | 0.0104803 | 10000 | 0.1050478 |
| 0.1002104 | 0.0104709 | 11000 | 0.1044891 |
| 0.1002008 | 0.0105730 | 12000 | 0.1055186 |
| 0.1000000 | 0.0103324 | 13000 | 0.1033240 |
| 0.0997822 | 0.0107929 | 14000 | 0.1081647 |
| 0.0998053 | 0.0104872 | 15000 | 0.1050769 |
| 0.0999606 | 0.0105485 | 16000 | 0.1055267 |
| 0.1003504 | 0.0108123 | 17000 | 0.1077458 |
| 0.1002694 | 0.0106498 | 18000 | 0.1062119 |
| 0.0999437 | 0.0108851 | 19000 | 0.1089127 |
| 0.1001995 | 0.0103313 | 20000 | 0.1031070 |
library(ggplot2)
LHMC1 <- ggplot(latinHypercubeMc.D1) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 1")
LHMC1
Letโs show the Latin Hypercube Sampling and visualize the results for D=2
latinHypercubeMc.D2 <- MC_loop(d=2, fun=latinHypercube_MC)
kable(latinHypercubeMc.D2,caption = ' Latin Hypercube Sampling Estimation D = 2')
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0663606 | 0.0086655 | 1000 | 0.1305821 |
| 0.0682323 | 0.0093198 | 2000 | 0.1365892 |
| 0.0680283 | 0.0090032 | 3000 | 0.1323444 |
| 0.0684487 | 0.0096270 | 4000 | 0.1406460 |
| 0.0678750 | 0.0090206 | 5000 | 0.1328999 |
| 0.0679240 | 0.0093050 | 6000 | 0.1369914 |
| 0.0681422 | 0.0091867 | 7000 | 0.1348164 |
| 0.0676344 | 0.0091002 | 8000 | 0.1345498 |
| 0.0684262 | 0.0089636 | 9000 | 0.1309969 |
| 0.0680151 | 0.0089289 | 10000 | 0.1312775 |
| 0.0680866 | 0.0093895 | 11000 | 0.1379056 |
| 0.0680275 | 0.0093537 | 12000 | 0.1374985 |
| 0.0681882 | 0.0093006 | 13000 | 0.1363954 |
| 0.0678959 | 0.0092469 | 14000 | 0.1361926 |
| 0.0676701 | 0.0090321 | 15000 | 0.1334729 |
| 0.0680897 | 0.0091529 | 16000 | 0.1344241 |
| 0.0677603 | 0.0092902 | 17000 | 0.1371042 |
| 0.0681556 | 0.0093221 | 18000 | 0.1367764 |
| 0.0681890 | 0.0092636 | 19000 | 0.1358515 |
| 0.0676736 | 0.0094901 | 20000 | 0.1402330 |
library(ggplot2)
LHMC2 <- ggplot(latinHypercubeMc.D2) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 2")
LHMC2
Letโs show the Latin Hypercube Sampling and visualize the results for D=4
latinHypercubeMc.D4 <- MC_loop(d=4, fun=latinHypercube_MC)
kable(latinHypercubeMc.D4,caption = ' Latin Hypercube Sampling Estimation D = 4' )
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0428856 | 0.0083432 | 1000 | 0.1945454 |
| 0.0438387 | 0.0081845 | 2000 | 0.1866956 |
| 0.0436227 | 0.0085193 | 3000 | 0.1952946 |
| 0.0430651 | 0.0084867 | 4000 | 0.1970675 |
| 0.0436693 | 0.0087182 | 5000 | 0.1996412 |
| 0.0442685 | 0.0087111 | 6000 | 0.1967796 |
| 0.0433512 | 0.0088641 | 7000 | 0.2044717 |
| 0.0442068 | 0.0088902 | 8000 | 0.2011053 |
| 0.0440514 | 0.0090101 | 9000 | 0.2045351 |
| 0.0441966 | 0.0088146 | 10000 | 0.1994412 |
| 0.0436382 | 0.0085834 | 11000 | 0.1966942 |
| 0.0437254 | 0.0087255 | 12000 | 0.1995517 |
| 0.0438595 | 0.0087161 | 13000 | 0.1987280 |
| 0.0439454 | 0.0090017 | 14000 | 0.2048393 |
| 0.0436612 | 0.0084336 | 15000 | 0.1931607 |
| 0.0437465 | 0.0086988 | 16000 | 0.1988446 |
| 0.0438383 | 0.0085813 | 17000 | 0.1957485 |
| 0.0438346 | 0.0086257 | 18000 | 0.1967789 |
| 0.0437257 | 0.0085573 | 19000 | 0.1957031 |
| 0.0441372 | 0.0087772 | 20000 | 0.1988615 |
library(ggplot2)
LHMC4 <- ggplot(latinHypercubeMc.D4) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Latin Hypercube Sampling Estimation D = 4")
LHMC4
Recallโฆ The goal of importance sampling is to estimate E[c(x)] by choosing a second density q(x) such that:
\[E_p[c(x)] = \int c(x) p(x) dx\] \[E_p[c(x)] = \int c(x) p(x) \frac {q(x)} {q(x)} dx\]
\[E_p[c(x)] = \int c(x) q(x) w(x) dx\]
\[q(x)=\frac{c(x)w(x)}{E_p[c(x)]}\]
\[w(x)=\frac{c(x)}{q(x)}\]
Next, we wil create our functions p(x), w(x), and c(x)
#################################################
qx <- function(x, d)
{ min=-5
max=5
cx <- cost_func(x)
Ecx <- (1/10)^d
res <- (cx %*% t(px(x))) / Ecx
return(res)
}
px <- function(x)
{
return (dunif(x))
}
wx <- function(x,d)
{
w <- px(x[1]) / qx(x[1], d)
return(w)
}
############
# Helper function
helpeMethod <- function(fun)
{
min=-5
max=5
M <- 2
accepted <- FALSE
while(!accepted)
{
# Get a random value from uniform distrubtion (g(x) for us)
r <- runif(1, min, max)
# Sample x from g(x) and u from U(0,1) (the uniform distribution over the unit interval)
u <- runif(1, 0, 1)
gx <- dunif(r, min, max)
# Check whether or not u<f(x)/Mg(x).
if(u < fun(r, 1) / (M * gx))
{
accepted = TRUE
}
}
return(r)
}
importance_MC <- function(n, d = 1)
{
min=-5
max =5
theta.hat <- rep(NA, n)
for(i in 1:n)
{
x <- rep(NA, d)
for(j in 1:d)
{
x[j] <- helpeMethod(qx)
}
xw <- wx(x, d)
theta.hat[i] <- cost_func(x) * xw
}
return (theta.hat)
}
Letโs show the Importance Sampling and visualize the results for D=1
importance_MC.D1 <- MC_loop(d=1, fun=importance_MC)
kable(importance_MC.D1,caption = 'Importance Sampling Estimation D = 1' )
| mean | stdev | N | CV |
|---|---|---|---|
| 0.1 | 0 | 1000 | 0 |
| 0.1 | 0 | 2000 | 0 |
| 0.1 | 0 | 3000 | 0 |
| 0.1 | 0 | 4000 | 0 |
| 0.1 | 0 | 5000 | 0 |
| 0.1 | 0 | 6000 | 0 |
| 0.1 | 0 | 7000 | 0 |
| 0.1 | 0 | 8000 | 0 |
| 0.1 | 0 | 9000 | 0 |
| 0.1 | 0 | 10000 | 0 |
| 0.1 | 0 | 11000 | 0 |
| 0.1 | 0 | 12000 | 0 |
| 0.1 | 0 | 13000 | 0 |
| 0.1 | 0 | 14000 | 0 |
| 0.1 | 0 | 15000 | 0 |
| 0.1 | 0 | 16000 | 0 |
| 0.1 | 0 | 17000 | 0 |
| 0.1 | 0 | 18000 | 0 |
| 0.1 | 0 | 19000 | 0 |
| 0.1 | 0 | 20000 | 0 |
AnalyticValue_D1=1/10
library(ggplot2)
IMPD1 <- ggplot(importance_MC.D1) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 1")
IMPD1
Letโs show the Importance Sampling and visualize the results for D=2
importance_MC.D2 <- MC_loop(d=2, fun=importance_MC)
kable(importance_MC.D2,caption = 'Importance Sampling Estimation D = 2' )
| mean | stdev | N | CV |
|---|---|---|---|
| 0.0085731 | 0.0012143 | 1000 | 0.1416426 |
| 0.0085592 | 0.0012215 | 2000 | 0.1427140 |
| 0.0085616 | 0.0012298 | 3000 | 0.1436388 |
| 0.0085488 | 0.0012173 | 4000 | 0.1423922 |
| 0.0085300 | 0.0012166 | 5000 | 0.1426225 |
| 0.0085795 | 0.0012070 | 6000 | 0.1406798 |
| 0.0085678 | 0.0012063 | 7000 | 0.1407903 |
| 0.0085535 | 0.0012036 | 8000 | 0.1407123 |
| 0.0085472 | 0.0012130 | 9000 | 0.1419129 |
| 0.0085516 | 0.0012178 | 10000 | 0.1424017 |
| 0.0085543 | 0.0012113 | 11000 | 0.1415963 |
| 0.0085606 | 0.0012117 | 12000 | 0.1415460 |
| 0.0085605 | 0.0012179 | 13000 | 0.1422706 |
| 0.0085579 | 0.0012129 | 14000 | 0.1417326 |
| 0.0085622 | 0.0012100 | 15000 | 0.1413210 |
| 0.0085481 | 0.0012150 | 16000 | 0.1421339 |
| 0.0085614 | 0.0012087 | 17000 | 0.1411845 |
| 0.0085495 | 0.0012176 | 18000 | 0.1424133 |
| 0.0085573 | 0.0012178 | 19000 | 0.1423064 |
| 0.0085716 | 0.0012099 | 20000 | 0.1411526 |
AnalyticValue_D2=0.03989420516865691
library(ggplot2)
IMPD2 <- ggplot(importance_MC.D2) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D2), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 2")
IMPD2
Letโs show the Importance Sampling and visualize the results for D=4
importance_MC.D4 <- MC_loop(d=4, fun=importance_MC)
kable(importance_MC.D4,caption = 'Importance Sampling Estimation D = 4' )
| mean | stdev | N | CV |
|---|---|---|---|
| 6.23e-05 | 1.51e-05 | 1000 | 0.2427272 |
| 6.27e-05 | 1.54e-05 | 2000 | 0.2447866 |
| 6.31e-05 | 1.56e-05 | 3000 | 0.2479409 |
| 6.27e-05 | 1.55e-05 | 4000 | 0.2479586 |
| 6.25e-05 | 1.56e-05 | 5000 | 0.2494539 |
| 6.26e-05 | 1.55e-05 | 6000 | 0.2474235 |
| 6.27e-05 | 1.55e-05 | 7000 | 0.2474750 |
| 6.25e-05 | 1.57e-05 | 8000 | 0.2514339 |
| 6.27e-05 | 1.54e-05 | 9000 | 0.2457493 |
| 6.29e-05 | 1.57e-05 | 10000 | 0.2491272 |
| 6.27e-05 | 1.55e-05 | 11000 | 0.2476801 |
| 6.24e-05 | 1.54e-05 | 12000 | 0.2469352 |
| 6.24e-05 | 1.56e-05 | 13000 | 0.2500206 |
| 6.27e-05 | 1.55e-05 | 14000 | 0.2468886 |
| 6.29e-05 | 1.54e-05 | 15000 | 0.2443543 |
| 6.27e-05 | 1.56e-05 | 16000 | 0.2482902 |
| 6.28e-05 | 1.56e-05 | 17000 | 0.2480723 |
| 6.25e-05 | 1.56e-05 | 18000 | 0.2489033 |
| 6.24e-05 | 1.55e-05 | 19000 | 0.2482173 |
| 6.26e-05 | 1.56e-05 | 20000 | 0.2491884 |
AnalyticValue_D2=0.03989420516865691
library(ggplot2)
IMPD4 <- ggplot(importance_MC.D4) +
geom_point(aes(x=N, y=mean), colour="blue") +
geom_line(aes(x=N, y=mean), colour="blue") +
geom_point(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=stdev), colour="red") +
geom_line(aes(x=N, y=AnalyticValue_D4), colour="green") +
labs(x="N", y= " Mean & StdDev", title="Importance Sampling Estimation D = 4")
IMPD4
Summary:The below visualization shows the variance reduction with each of the methods as follow:Crude Monte Carlo = blue, Sobol =black, Antithetic = yellow, Latin Hypercube = red, Importance = brown, Actual = green.
We can clearly see that Sobol and Importance methods had estimated the mean very well. The Crude Monte Carlo and Antithetic had performed poorly.
CMC.D1 <- MC_loop(d=1, fun=CMC)
sobolMc.D1 <- MC_loop(d=1, fun=sobolMC)
antithetic_MC.D1 <- MC_loop(d=1, fun=antithetic_MC)
latinHypercubeMc.D1 <- MC_loop(d=1, fun=latinHypercube_MC)
importance_MC.D1 <- MC_loop(d=1, fun=importance_MC)
summary_res<- data.frame('N'= CMC.D1$N,
'CMC'= CMC.D1$mean,
'Sobol' = sobolMc.D1$mean,
'Antithetic' = antithetic_MC.D1$mean,
'LatinHypercube' = latinHypercubeMc.D1$mean,
'Importance' = importance_MC.D1$mean,
'Actual' = AnalyticValue_D1
)
N=n
library(ggplot2)
summary_res1 <- ggplot(summary_res) +
geom_line(aes(x=N, y=CMC), colour="blue") +
geom_point(aes(x=N, y=Sobol), colour="black") +
geom_line(aes(x=N, y=Antithetic), colour="yellow") +
geom_line(aes(x=N, y=LatinHypercube), colour="red") +
geom_point(aes(x=N, y=Importance), colour="brown") +
geom_line(aes(x=N, y=AnalyticValue_D1), colour="green") +
labs(x="N", y= " Mean", title="All Methods Estimations Summary at D = 1")
summary_res1
Plot the power curves for the t-test in Example 6.9 for samples sizes 10, 20, 30, 40 and 50. but omit the standard error bars. Plot the curves on the same graph, each in a different color or different line type, and include a legend. Comment on the relation between power and sample size.
################################
library(ggplot2)
N<- c(10, 20, 30, 40, 50)
m <- 1000
mu0 <- 500
sigma <- 100
mu <- c(seq(450, 650, 10)) #alternatives
M <- length(mu)
df <- data.frame(Sample_size=c(), mu=c(), power=c())
for(n in N)
{
for(i in 1:M)
{
mu1 <- mu[i]
pvalues <- replicate(m, expr={
# simulate under alternative mu
x <- rnorm(n, mean=mu1, sd=sigma)
ttest <- t.test(x, alternative="greater", mu=mu0)
ttest$p.value
})
df <- rbind(df, data.frame(Sample_size=n, mu=mu1, power=mean(pvalues <= 0.05)))
}
}
df$Sample_size <- as.factor(df$Sample_size)
p63 <- ggplot(df) +
geom_path(aes(x=mu, y=power, colour=Sample_size)) +
labs(title="Power Curves", x="mu alternatives", y='Powers' )
p63
The power curve shown above we observed that the empirical power is small when \(\theta\) is close to \(\theta_0\)=500, and increasing as \(\theta\) moves further away from \(\theta_0\), approaching 1 as \(\theta\to\infty\)
Suppose the \((X_1,...,X_n)\) are a random sample from a lognormal distribution with unknown parameters. Construct a 95% confidence interval for the parameter \(\mu\). Use a Monte Carlo method to obtain an empirical estimate of the confidence level.
We usually use \(\bar X\) = \(\frac {X_1+...+X_n} n\) to estimate the parameter ยต, and, as the above shows, we define the pivot as:
\[h(X_1,...,X_n, \mu) = \frac {\bar X - \mu} {\frac{\sigma}{\sqrt{n}}}\] and \[ h(X_1,...,X_n, \mu) \sim N(0,1)\] Therefore: \[P\left( z(\frac \alpha 2) \le \frac {\bar X - \mu} {\frac{\sigma}{\sqrt{n}}} \le z(1 - \frac \alpha 2) \right) = .95\]
Now let use the random log normal variables to estimate the \(\mu\) parameter given 95% confidence level.
n <- 2500
mu <- 1
sigma <- 0.5
alpha <- 0.05
res <- replicate(m, expr={
x <- rlnorm(n, meanlog=mu, sdlog=sigma)
log.mean <- mean(log(x))
log.sd <- sd(log(x))
log.se <- log.sd / sqrt(n)
t.val <- qt(1 - alpha / 2, n - 1)
CI <- c(log.mean, log.mean - (t.val * log.se), log.mean + (t.val * log.se))
CI
})
theCI <- c(mean(res[2,]), mean(res[3,]))
theCI
## [1] 0.9800012 1.0192280
theCL <- mean(res[2,] < (0.999) & (1.001) < res[3,])
theCL
## [1] 0.942
Compute the jackknife esimate of the bias and the standard error of the correlation statistic in Example 7.2
Jackknife estimate of bias:
############################
library(bootstrap)
## Warning: package 'bootstrap' was built under R version 3.3.1
n <- nrow(law)
lsat <- law$LSAT
gpa <- law$GPA
theta.hat <- mean(lsat) / mean(gpa)
print (theta.hat)
## [1] 1.939681
################
theta.jack <- numeric(n)
for (i in 1:n)
theta.jack[i] <- mean(lsat[-i]) / mean(gpa[-i])
bias <- (n - 1) * (mean(theta.jack) - theta.hat)
print(bias) #jackknife estimate of bias
## [1] 0.0002506089
Jackkknife estimate of standard error:
se <- sqrt((n-1) * mean((theta.jack - mean(theta.jack))^2))
print(se)
## [1] 0.02525517
Refer to the air-conditioning data set aircondit provided in the boot package. The 12 observations are the times in hours between failures of air-conditioning equipment. [63, Example 1.1]: \[3,5,7,18,43,85,91,98,100,130,230,487\]
Assume that the times between failures follow an exponential model \(Exp(\lambda)\). Obtain the MLE of the hazard rate \(\lambda\) and use bootstrap to estimate the bias and standard error of the estimate.
The bootstrap estimate for the standard error of the estimate is:
library(boot)
B <- 200
n <- 12
R <- numeric(B)
for (b in 1:B) {
i <- sample(1:n, size = n, replace = TRUE)
hours <- aircondit$hours[i]
R[b] <- length(hours)/sum(hours)
}
print(se.R <- sd(R))
## [1] 0.003830035
The bootstrap estimate for the bias of the estimate is:
theta.hat <- 0.00925212
B <- 2000
n <- 12
theta.b <- numeric(B)
for (b in 1:B) {
i <- sample(1:n, size = n, replace = TRUE)
hours <- aircondit$hours[i]
theta.b[b] <- length(hours)/sum(hours)
}
bias <- mean(theta.b - theta.hat)
bias
## [1] 0.001407882