Introduction

Monte Carlo (MC) methods refers to a broad collection of tools that are useful for approximating quantities based on artificially generated random samples. They include the Monte Carlo integration, the inversion technique for generating the required sample and the Markov chain Monte Carlo methods. In principle, the approximation can be made as good as required when the sample size is made sufficiently large. These methods are a very useful tool in Bayesian inference. As an example, consider the Buffon’s needle problem, where a needle of length 10 cm is dropped randomly onto a floor with parallel lines at a distance of 10 cm apart. What is the probability of the needle crosses a line? The exact value of p can be worked out analytically as \(\frac{2}{\pi} = 0.63662\).

Introduction cont’d…

Assuming the analytical solution is constrained, we could instead estimate p using the Monte Carlo methods. The simplest way to do this would be to toss the needle onto the floor 1000 times. If the needle crosses a line 843 times (say), then the Monte Carlo estimate of p is just 843/1,000 = 0.843. Alternatively, we simulate each drop on a computer and each time determine whether the virtual needle has crossed a virtual line. Although faster and more accurate, it will also require at least some mathematical work to identify exactly what the parameters of each drop are and what configuration of those parameters correspond to the needle crossing a line.

Firstly, we discuss Monte Carlo methods and their usefulness under the assumption that we have available or can generate the required random samples using eg runif() and rnorm().
We then describe special methods for generating a random samples, such as the inversion technique, rejection sampling, MCMC.
We will then discuss their application to Bayesian inference.

MC integration to estimate means

The method of Monte Carlo integration is popularly used for estimating means. Suppose \(\mu\), is the mean of some distribution defined by \(f(x)\) with a cumulative distribution function \(F(x)\), but we are unable to calculate \(\mu\) exactly, for example by applying the formulae \[\mu=E[X]=\int xf(x)dx = \sum xf(x)\] depending on X. Suppose, further that we are able to generate a random sample from the distribution in question \(x_1, ..., x_J \sim iid ~ f(x)\), then \[E[\bar{x}]=\sum_{j=1}^J x_j=\mu.\] Also, a \(1 - \alpha\) CI for \(\mu\) given by \[CI=\left(\bar{x}\pm z_{\alpha/2}\frac{s}{\sqrt{J}}\right)\] where \(s^2=\frac{1}{J-1}\sum_{j=1}^{J}(x_j-\bar{x})^2\).

      m         <- 10000
      x         <- runif(m,2,6)
      thetaESt  <- mean(exp(-x))*4
      thetaEXa  <- exp(-2)-exp(-6)
      print(c(thetaESt, thetaEXa),digits=3)

## [1] 0.133 0.133

Assumptions

The sample mean is unbiased \(E(\bar{x})=\mu\)
CI coverages approximately \((1-\alpha)\), by the central limit theorem.
The width of the CI converges to zero as the MC sample size J tends to infinity.

Use Monte Carlo to estimate a known Gamma density mean

Generate a random sample of size J = 100 from the Gamma(3,2) distribution, whose mean is \(\mu = 3/2 = 1.5\). \[\mu=\int_0^\infty x\left(\frac{2^3 x^{3-1}e^{-2x}}{\Gamma(3)}\right) dx \] Then use the method of Monte Carlo to produce a point estimate \(\mu\) and a \(95\%\) CI for \(\mu\).
Repeat (a) but with MC sample sizes of 1,000 and 10,000, and discuss the results.

Solution question 1 (a)

      options(digits=4)
      J = 100
      set.seed(221)
      xv=rgamma(J,3,2) 
      xbar=mean(xv)
      s=sd(xv)
      ci=xbar + c(-1,1)*qnorm(0.975)*s/sqrt(J) 
      c(xbar,s,s^2,ci,ci[2]-ci[1])

## [1] 1.5170 0.8320 0.6921 1.3539 1.6800 0.3261

Solution question 1 (b)

      options(digits=4)
      J = 100
      set.seed(221)
      xv=rgamma(J,3,2) 
      xbar=mean(xv)
      s=sd(xv)
      ci=xbar + c(-1,1)*qnorm(0.975)*s/sqrt(J) 
      c(xbar,s,s^2,ci,ci[2]-ci[1])

## [1] 1.5170 0.8320 0.6921 1.3539 1.6800 0.3261

Solution question 1 (b) cont’d

      J = 1000
      set.seed(231)
      xv=rgamma(J,3,2)
      xbar=mean(xv)
      s=sd(xv)
      
      ci=xbar + c(-1,1)*qnorm(0.975)*s/sqrt(J) 
      c(xbar,s,s^2,ci,ci[2]-ci[1])

## [1] 1.5199 0.8722 0.7607 1.4658 1.5740 0.1081

      xbar=mean(xv)
      s=sd(xv)
      ci=xbar + c(-1,1)*qnorm(0.975)*s/sqrt(J) 
      c(xbar,s,s^2,ci,ci[2]-ci[1])

## [1] 1.5199 0.8722 0.7607 1.4658 1.5740 0.1081

Estimate the p-quantile

Estimating the value of x such that \(F(x) = p\) is the p-quantile \[q = F^{-1}(p).\] For example the estimate of the median \(q_{1/2}\) is denoted by \[\hat{q_{1/2}}=\begin{cases} x_{{(j+1)}/2}, &\text{if J is odd}; \\ x_{{j}/2}+x_{{(j+1)}/2}, &\text{if J is even}. \end{cases}\]

The \((1-\alpha)\) CDR of x for \((q_{\alpha/2}, q_{1-\alpha/2})\) is \((\hat{q}_{\alpha/2},\hat{q}_{1-\alpha/2})\)

Estimate of the mean of a function

For some function of x, say \(y = g(x).\) To estimate \[\psi=E[y]= \int yf(y)dy = E[g(x)]=\int g(x) f(x)dx.\] Then calculate \(y_j=g(x_j)\) for each \(j=1,...,J\) to obtain a random sample \(y_1,...,y_J \sim iid~F(y)\) then apply the MC. Hence, \[\psi= E[\bar{y}]= \frac{1}{J} \sum_1^J y_j\]

Estimate of the density of a function

The density of \(f(x)\) can be estimated by smoothing a probability histogram of \(x_1,...,x_J.\) While density of \(f(y)\) can be estimated by smoothing a probability histogram of \(y_1,...,y_J,\) especially for complicated functions of x. Note:

we can sample from a distribution without knowing the exact form of its density function
Smoothing a histogram requires some arbitrary decisions to be made about the degree of smoothing and other smoothing parameters. So the MC estimate of a density is not uniquely defined.

Monte Carlo estimation of complicated quantities

Suppose that \(x \sim G(3, 2).\) Use MC methods and a sample of size J = 1000 to estimate:

the \(\mu = E[x],\) the \(80\%\) CDR for x and \(f(x).\)
the \(\psi = E[y],\) the \(80\%\) CDR for y and \(f(y),\) where \(y = \frac{x^{2}e^{-x}}{1+x+\frac{1}{x}}\)

Present your results graphically, and wherever possible show the true values of the quantities being estimated. Then repeat everything but using a Monte Carlo sample size of \(J=10000\).

Solution q2 a

    options(digits=4)
    J = 1000
    set.seed(221)
    xv=rgamma(J,3,2)
    xbar=mean(xv)
    xci=xbar + c(-1,1)*qnorm(0.975)*sd(xv)/sqrt(J) 
    xcdr=quantile(xv,c(0.1,0.9)); xden=density(xv)
    yv=xv^2 * exp(-xv) / ( 1 + xv + 1/xv )
    ybar=mean(yv)
    yci=ybar + c(-1,1)*qnorm(0.975)*sd(yv)/sqrt(J) 
    ycdr=quantile(yv,c(0.1,0.9))
    yden=density(yv)

Histogram of x-value (J = 1000)

    hist(xv,prob=T,breaks=seq(0,7,0.25),xlim=c(0,7),ylim=c(0,0.6),xlab="x", main="")
    lines(xden,lty=2,lwd=2)
    xvec=seq(0,10,0.01)
    lines(xvec,dgamma(xvec,3,2),lty=1,lwd=2) 
    abline(v= c(xbar, xci, xcdr), lty=2, lwd=2) 
    abline(v=c(3/2,qgamma(c(0.1,0.9),3,2)), lty=1,lwd=2) 
    legend(4,0.6,c("MC estimates","True values"),lty=c(2,1),lwd=c(2,2))

Histogram of y-value (J = 1000)

    hist(yv,prob=T,breaks=seq(0,0.2,0.005),xlim=c(0,0.2),ylim=c(0,30),xlab="y", main="")     
    lines(yden,lty=2,lwd=2)
    abline(v= c(ybar, yci, ycdr), lty=2, lwd=2)
    legend(4,0.6,c("MC estimates","True values"),lty=c(2,1),lwd=c(2,2))

## Solution q2 b

    options(digits=4)
    J = 10000
    set.seed(221)
    xv=rgamma(J,3,2)
    xbar=mean(xv)
    xci=xbar + c(-1,1)*qnorm(0.975)*sd(xv)/sqrt(J) 
    xcdr=quantile(xv,c(0.1,0.9)); xden=density(xv)
    yv=xv^2 * exp(-xv) / ( 1 + xv + 1/xv )
    ybar=mean(yv)
    yci=ybar + c(-1,1)*qnorm(0.975)*sd(yv)/sqrt(J) 
    ycdr=quantile(yv,c(0.1,0.9))
    yden=density(yv)

Histogram of x-value (J = 10000)

    hist(xv,prob=T,breaks=seq(0,9,0.25),xlim=c(0,7),ylim=c(0,0.6),xlab="x", main="")
    lines(xden,lty=2,lwd=2)
    xvec=seq(0,10,0.01)
    lines(xvec,dgamma(xvec,3,2),lty=1,lwd=2) 
    abline(v= c(xbar, xci, xcdr), lty=2, lwd=2) 
    abline(v=c(3/2,qgamma(c(0.1,0.9),3,2)), lty=1,lwd=2) 
    legend(4,0.6,c("MC estimates","True values"),lty=c(2,1),lwd=c(2,2))

Histogram of y-value (J = 10000)

    hist(yv,prob=T,breaks=seq(0,0.2,0.005),xlim=c(0,0.2),ylim=c(0,30),xlab="y", main="")     
    lines(yden,lty=2,lwd=2)
    abline(v= c(ybar, yci, ycdr), lty=2, lwd=2)
    legend(4,0.6,c("MC estimates","True values"),lty=c(2,1),lwd=c(2,2))

Importance sampling (1)

To estimate \[\psi= E[g(x)]=\int g(x) f(x)dx.\] We sample from \(h(x)\) which should be as similar to \(f(x)\) as possible. We then write \[\psi= E[g(x)]=\int \left( g(x) \frac{f(x)}{h(x)}\right)h(x)dx = \int w(x)h(x)dx.\] If \(f(x)\) is known only up to a multiplicative constant, that is \(f(x) = \frac{k(x)}{c},\) where the kernel \(k(x)\) is known exactly but it is too difficult or impossible to evaluate the normalising constant \(c = \int k(x)dx.\)

Importance sampling (2)

We write \[\psi=\int \frac{g(x)}{c}f(x)dx = \frac{\int \frac{g(x)f(x)}{h(x)}h(x)dx}{\int \frac{k(x)}{h(x)}h(x)dx} = \frac{\int w(x)h(x)dx}{\int u(x) h(x)dx}.\] Then sample from \(x_1,...,x_J\sim iid~h(x)\) and apply MC estimation to the means of \(w(x)~and~u(x)\) to obtain \[\hat{\psi}=\frac{\bar{w}}{\bar{u}} = \frac{\frac{\sum w_j}{J}}{\frac{\sum u_j}{J}}\] where \(w_j = w(x_j)~and~u_j=u(x_j).\)

Question 3 Monte Carlo with importance sampling

Use Monte Carlo methods and importance sampling to estimate \(\mu\) where x has a density \[f(x) \propto \frac{e^{-x}}{x+1},~x > 0\]

Solution q3 (1)

Let \(k(x)=\frac{e^{-x}}{x+1}\) and \(h(x)=e^{-x},~~ x>0,\) the standard exponential or \(Gamma(1,1)\) density. So, \[\mu = \int_0^{\infty} xf(x)dx = \frac{\int\frac{x}{x+1}h(x)dx}{\int\frac{1}{x+1}h(x)dx}\] and the MC estimate of \(\mu\) is \[\hat{\mu}=\frac{\sum_1^J\frac{x_j}{x_j+1}}{\sum_1^J\frac{1}{x_j+1}}\]

Solution q3 (2)

    options(digits=10)
    kfun=function(x){ exp(-x)/(x+1) } 
    c=integrate(f=kfun,lower=0,upper=Inf)$value
    xffun= function(x){ x*(1/0.5963474)*exp(-x)/(x+1) }
    mu = integrate(f=xffun,lower=0,upper=Inf)$value
    J=100000
    set.seed(413)
    xv=rgamma(J,1,1)
    num=mean(xv/(xv+1))
    den=mean(1/(xv+1))
    est=num/den
    c(num, den, est)

## [1] 0.4034510685 0.5965489315 0.6763084254

MC estimation involving two or more random variables

Suppose that we have a random sample from the bivariate distribution of two random variables x and y, denoted \((x_1, y_1),...,(x_J , y_J ) \sim iid ~f(x,y),\) and we are interested in some function of x and y, say r = g(x, y) . Then, we calculate \(r_j =g(x_j,y_j)\) and perform MC inference on the resulting sample \(r_1,...,r_J \sim iid~f(r).\) If they are independent then we simply sample \(x_j \sim f(x)~and~y_j \sim f(y).\) If x and y are dependent, it may not be obvious how to generate \((x_j,y_j)\sim f(x,y).\) We may apply the composition method as outlined:

Method of composition

To sample a vector \((x_j, y_j ) \sim f(x, y).\) One way is to first sample \(x_j \sim f( x)\) and then sample \(y_j \sim f( y | x_j ).\) It follows by the identity \(f(x,y) = f(x)f(y|x).\) Alternatively, first sample \(y_j \sim f( y)\) and then sample \(x_j \sim f( x | y_j ).\) It follows by the identity \(f(x,y) = f(y)f(x|y).\)

To sample a triplet \((x_j,y_j,z_j) \sim f(x,y,z)\) is to first sample \(y_j \sim f ( y )\) , then sample \(x_j \sim f ( x | y_j)\) and finally sample \(z_j \sim f( z | x_j , y_j).\) This works because of the identity \(f(x,y,z) = f(y)f(x|y)f(z|x,y).\)

Question 4 method of composition

Suppose that we are interested in the distribution of a random variable defined by \(r = \frac{y} {x+\sqrt |y|},\) where x and y have a joint distribution defined by the pdf \(f(x,y)= f(x)f(y|x),\) and where \(x\sim G(3,2)\) and \((y|x) \sim N(x,x).\) Use the R functions rgamma() and rnorm() to generate a sample of size \(J = 1000\) from the joint distribution of x and y.

Then use the method of the MC to estimate \(\psi= E[r]\) and report a \(95\%\) CI for \(\psi.\)

Also estimate the \(80\%\) CDR for \(r\) and \(f(r).\)

Present the results both graphically and numerically.

Solution 4 (1)

    options(digits=4)
    J = 1000
    set.seed(221)
    xv=rgamma(J,3,2)
    yv = rnorm(xv,sqrt(xv)) 
    rv = yv/(xv+sqrt(abs(yv)))
    rbar=mean(rv)
    rci=rbar + c(-1,1)*qnorm(0.975)*sd(rv)/sqrt(J) 
    rcdr=quantile(rv,c(0.1,0.9))
    rden=density(rv)
    c(rbar,rci,rcdr)

##                             10%     90% 
##  0.4256  0.4026  0.4486 -0.1025  0.8339

Solution 4 (2)

    hist(rv,prob=T, breaks=seq(-1,1.8,0.1),xlim=c(-1,1.6),ylim=c(0,1.3),xlab="r", main="")
    lines(rden,lty=1,lwd=2)
    abline(v= c(rbar, rci, rcdr), lty=2, lwd=2)

Monte Carlo estimation of a binomial parameter (1)

To estimate a binomial proportion p, we may interpret p as the mean \(\mu\) of a Bernoulli distribution and directly apply the method of Monte Carlo in the usual way. Suppose we are able to generate \(x_1,...,x_J \sim iid ~ Bern(p)\). Then the MC estimate of p is: \(\bar{x}=\frac{1}{J}\sum_1^Jx_j\) where \(s^2=\frac{1}{J-1}\sum_1^J(x_j^{2}-J\bar{x}^2)=\frac{1}{J-1}(J\bar{x}-J\bar{x}^2)=\frac{J}{J-1}\bar{x}(1-\bar{x}).\) The MC SE is \(\frac{s}{\sqrt{J}}=\sqrt\frac{\bar{x}(1-\bar{x})}{J-1}.\) The MC \(1???\alpha\) CI for p is: \(\hat{p}\pm z_{\alpha/2}\sqrt\frac{\hat{p}(1-\hat{p})}{J}.\)

Monte Carlo estimation of a binomial parameter (2)

Other CI to consider:

Wilson CI
Clopper-Pearson CI
The total \(x_T \sim Bin(J, p)\) generates the same Monte Carlo inference
To estimate \(p=P(x>1)\) and \(p=P(x < y).\) We generate \(x_1 \sim f ( x),\) let \(r_1 = I ( x_1 > 1) ,\) then repeat independently many times so as to generate a random sample \(r_1,...,r_J \sim iid~ Bern(p).\) That sample can then be used for MC inference on \(p=P(x>1).\) The procedure for the second part is similar, except that it involves sampling \((x_1 , y_1 ) \sim f (x, y)\) and determining \(r = I(x_1 < y_1 ).\)

Monte Carlo estimation of a binomial parameter (3)

One use of MC CIs for a binomial proportion is to assess the coverage of MC CIs. Often, the true coverage probability of a MC CI is not exactly the nominal level, say \(95\%\). This may be due to the MC sample size J being insufficiently large or for some other reason. If we are concerned about this, we may wish to estimate the true coverage of the MC CI by repeating the entire MC inference procedure itself a large number of times, say M. Each time we record an indicator r for the MC CI containing the quantity of interest. The result will be a sample \(r_1 ,..., r_M \sim iid~Bern( p),\) where p is the true coverage probability, which can then be estimated via MC methods in the usual way.

Question 5 Estimating a probability via Monte Carlo

Use MC to estimate \(p=P\left(\sqrt\frac{x}{x+1}>0.3e^x\right)\) where \(x \sim Gamma(3,2).\)

Hint: With \(J = 20000,\) we sample \(x_1,..., x_J \sim iid ~ G(3,2)\) and let \[r_j=I\left(\sqrt\frac{x}{x+1}>0.3e^x\right)\] and \[\hat{p}=\frac{1}{J}\sum_1^J r_j\]

Solution q 5 (1)

Note:

We may also view p as \(p=P(y>0.3)\) where for example \(y=e^{-x}\sqrt\frac{x}{x+1}\)
In such a case, we sample we \(x_1,...,x_J \sim iid~G(3,2),\) then calculate \(y_j=e^{-x_j}\sqrt\frac{x_j}{x_j+1}.\) And let \(r_j=I(y_j>0.3).\) Leads to same result as before for p.
As a by-product of this second approach, we obtain an estimate of the density function \(f(y)\) of the random variable \(y=e^{-x}\sqrt\frac{x}{x+1},\) which would be very difficult to obtain analytically.
The density() function in R used to smooth the histogram does not adequately capture the upper region of the density \(f(y),\) nor the fact that \(f(y)=0\) when \(y < 0.\)

Solution q 5 (2)

    options(digits=4)
    J=20000
    set.seed(162)
    xv=rgamma(J,3,2)
    ct=0
    yv= sqrt(xv)*exp(-xv) / sqrt(xv+1)
    for(j in 1:J) if(yv[j] > 0.3) 
    ct=ct+1
    phat=ct/J
    ci=phat+c(-1,1)*qnorm(0.975)*sqrt(phat*(1-phat)/J) 
    c(phat,ci)

## [1] 0.2117 0.2060 0.2173

Solution q 5 (3)

    hist(yv,prob=T,breaks=seq(0,0.5,0.005),xlim=c(0,0.4),xlab="y",main=" ") 
    abline(v=0.3,lwd=3)
    lines(density(yv),lwd=3)

Q 6 Buffon’s needle problem revisited

A needle of length 10 cm is dropped randomly onto a floor with lines on it that are parallel and 10 cm apart.

Analytically derive p, the probability that the needle crosses a line.
Now forget that you know p. Estimate p using Monte Carlo methods on a computer and a sample size of 1000. Also provide a \(95\%\) confidence interval for p. Then repeat with a sample size of 10000 and discuss.

Solution q 6 (1)

Let: X = perpendicular distance from centre of needle to nearest line in units of 5 cm Y = acute angle between lines and needle in radians C = ‘The needle crosses a line’. Then \[X \sim U (0,1)\Longrightarrow f(x)=1, 0<x<1\] \[Y \sim U (0,\frac{\pi}{2})\Longrightarrow f(y)=\frac{2}{\pi}, 0<y<\frac{\pi}{2}\] \(X \perp Y\) hence, \[f(x,y)=\frac{2}{\pi}, 0<x<1;0<y<\frac{\pi}{2}\] and \(C={X<sin(Y)}={(x,y):x<sin(y)}\)

Solution q 6 (2)

\[\therefore ~~~~~~ p=p(C)=P(X< sin(Y))\] \[p=\int\int f(x,y)dxdy = \frac{2}{\pi}\int_{y=0}^{\pi/2}\left(\int_{x=0}^{sin(y)}dx\right)dy\] \[=\frac{2}{\pi}\int_{y=0}^{\pi/2}sin(y)dy\] \[\therefore p=\frac{2}{\pi}\]

Solution q 6 (3)

Another way to express the above working is to first note that: \[P(C \vert y) \equiv P(X< sin(y)\vert y) = P(X < sin(y)) = sin(y).\] Since \[(X \vert y) \sim X \sim U(0,1)\] with cdf \[F(x \vert y) = F(x) = x, ~~ 0 < x < 1.\] It follows that \[p=P(C)=E[P(C|Y)] = E[sinY]= \int(siny)\frac{2}{\pi}dy = \frac{2}{\pi},\]
It can be shown that if the length of the needle is r times the distance between lines, then the probability that the needle will cross a line is given by the formula: \[p=\begin{cases} \frac{2r}{\pi}, &r \le 1 \\ 1-\frac{2}{\pi}\left(\sqrt {r^2 - 1}-r+sin^{-1}\left(\frac{1}{r}\right)\right), & r>1. \end{cases}\]

Solution q 6 (4)

We make use of the analysis in (a) whereby: \(C=\left( (x,y):x < sin(y)\right)\) and where \(x \sim U(0,1), y \sim U(0,\frac{\pi}{2}), X \perp Y.\) We now sample \(x_1,...,x_J \sim iid ~ U(0,1)\) and \(y_1,...,y_J \sim iid ~ U(0,\frac{\pi}{2})\) (all independent of one another). Next, we obtain the indicators defined by \[r_j=I(x_j < sin(y_j)) = \begin{cases} 1, & x_j < sin(y_j) \\ 0, & Otherwise. \end{cases}\] The MC estimate of p is \(\hat{p}=\bar{r}=\frac{1}{J}\sum_{j=1}^{J} r_j = \frac{r_T}{J}.\) The \(95\% ~ CI\) of p is \(CI=\left( \hat{p}\pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{J}}\right).\)

Solution q 6 (5)

    plot(seq(0, pi/2, 0.01), sin(seq(0, pi/2, 0.01)), type="l", lwd=3, xlab="y", ylab="x")
    abline(v=c(0,pi/2),lty=3)
    abline(h=c(0,1),lty=3)
    text(0.2,0.4,"x = sin(y)")
    text(1,0.4,"C")
    text(0.35,0.8,"Complement of C") 
    text(1.52,0.06,"pi/2")

Solution q 6 (6)

    J=1000
    set.seed(213)
    xv=runif(J,0,1)
    yv=runif(J,0,pi/2)
    rv=rep(0,J) 
    options(digits=4)
    for(j in 1:J) 
      if(xv[j]<sin(yv[j])) 
        rv[j]=1
    phat=mean(rv)
    z=qnorm(0.975)
    pci=phat+c(-1,1)*z*sqrt(phat*(1-phat)/J) 
    c(phat,pci,pci[2]-pci[1])

## [1] 0.61800 0.58789 0.64811 0.06023

Solution q 6 (7)

    J=10000
    set.seed(215)
    xv=runif(J,0,1)
    yv=runif(J,0,pi/2)
    rv=rep(0,J) 
    for(j in 1:J) 
      if(xv[j]<sin(yv[j])) 
        rv[j]=1
    phat=mean(rv)
    z=qnorm(0.975)
    pci=phat+c(-1,1)*z*sqrt(phat*(1-phat)/J) 
    c(phat,pci,pci[2]-pci[1])

## [1] 0.63320 0.62375 0.64265 0.01889

Question 7

Using the R function rgamma(), generate a random sample of size J = 100 from the gamma distribution with parameters 3 and 2 and mean \(\mu=3/2.\) Then use the method of Monte Carlo to estimate \(\mu.\) In your estimation, include a \(95\%\) CI for \(\mu\) and the width of this CI. Also report whether the CI contains the true value of \(\mu.\)
Repeat (a) but with J = 200, 500, 1000, 10000 and 100000, respectively. Report the widths of the resulting CIs and, for each CI, state whether it contains \(\mu.\) Discuss any patterns that you see.
Repeat (a) M = 100 times and report the proportion of the resulting M \(95\%\) MC CIs which contain the true value of the mean. (In each case use J = 100.) Hence calculate a \(95\%\) CI for p, the true coverage probability of the \(95\%\) MC CI for \(\mu\) based on a MC sample of size J = 100 from the Gamma(3,2) distribution.
Repeat (c), but with M = 200, 500, 1000 and 10000, respectively. Discuss any patterns that you see.

Solution 7 (a)

    options(digits=5)
    J = 100
    set.seed(221)
    xv=rgamma(J,3,2) 
    xbar=mean(xv)
    ci=xbar + c(-1,1)*qnorm(0.975)*sd(xv)/sqrt(J) 
    c(xbar,ci)

## [1] 1.5170 1.3539 1.6800

Solution 7 (b)

   Jvec=c(100,200,500,1000,10000,100000)
   K = length(Jvec) 
   xbarvec=rep(NA,K)
   LBvec= rep(NA,K)
   UBvec= rep(NA,K)
   set.seed(221)
   for(k in 1:K)
     { J=Jvec[k]
       xv=rgamma(J,3,2)
       xbar=mean(xv)
       ci=xbar + c(-1,1)*qnorm(0.975)*sd(xv)/sqrt(J) 
       xbarvec[k]=xbar
       LBvec[k]=ci[1]
       UBvec[k]=ci[2] 
      }
    Wvec=UBvec-LBvec

Solution 7 (b cont’d…)

    print(rbind(Jvec, xbarvec, LBvec,UBvec, Wvec),digits=2)

##           [,1]   [,2]   [,3]    [,4]    [,5]    [,6]
## Jvec    100.00 200.00 500.00 1000.00 1.0e+04 1.0e+05
## xbarvec   1.52   1.47   1.43    1.47 1.5e+00 1.5e+00
## LBvec     1.35   1.35   1.36    1.42 1.5e+00 1.5e+00
## UBvec     1.68   1.59   1.50    1.53 1.5e+00 1.5e+00
## Wvec      0.33   0.25   0.14    0.11 3.4e-02 1.1e-02

Solution 7 (c)

    J=100
    M=100
    ct=0
    set.seed(442)
    for(m in 1:M)
      {
        xv=rgamma(J,3,2)
        xbar=mean(xv)
        ci=xbar + c(-1,1)*qnorm(0.975)*sd(xv)/sqrt(J) 
        if((ci[1]<=1.5)&&(1.5<=ci[2])) 
          ct = ct + 1 
      }
    p=ct/M
    ci=p+c(-1,1)*qnorm(0.975)*sqrt(p*(1-p)/J) 
    c(ct,p,ci)

## [1] 93.00000  0.93000  0.87999  0.98001

Solution 7 (d)

    J=100
    Mvec=c(200,500,1000,10000)
    set.seed(651)
    for(M in Mvec)
      { ct=0 
      for(m in 1:M)
        {
          xv=rgamma(J,3,2)
          xbar=mean(xv) 
          ci=xbar + c(-1,1)*qnorm(0.975)*sd(xv)/sqrt(J) 
          if((ci[1]<=1.5)&&(1.5<=ci[2])) 
            ct = ct + 1
         }
       p=ct/M
       ci=p+c(-1,1)*qnorm(0.975)*sqrt(p*(1-p)/M) 
       }

Solution 7 (d cont’d…)

       print(c(M,p,ci,ci[2]-ci[1]),digits=3)

## [1] 1.00e+04 9.44e-01 9.40e-01 9.49e-01 9.00e-03

Computational Statistics

Monte Carlo

Monte Carlo Methods