Confidence Intervals from MLE

In this section we focus on making confidence intervals for an uknown parameter \(\theta\) based on our estimate \(\hat{\theta}\). There a number of ways of doing this, exact and approximate methods as well as a technique we call boostraping. We focus on the latter two, since being able to make exact CI’s for a parameter is actually the exception, and generally not possible.

Approximate Methods

We will first introduce approximate methods for making confidence intervals. We will build upon what we established in the last section Large Sample Theory (LST), and use its results to accomplish this. According the LST the distribution \(\sqrt{nI(\theta)}(\hat{\theta} - \theta)\) is approximately standard normal. Now since \(\theta_0\) is not known we will approximate this with \(\hat{\theta}\) and use \(I(\hat{\theta})\) in place of \(I(\theta)\). Furthemore since we know that the standard normal is centered around zero we have the following:

\[P\Big( -z_{\alpha/2} \leq \sqrt{nI(\hat{\theta)}}(\hat{\theta} - \theta_0) \leq z_{\alpha/2} \Big) \approx 1-\alpha\] Manipulating this algebraically we can get to: \[P\Big(\hat{\theta} - \frac{z_{\alpha/2}}{\sqrt{nI(\hat{\theta})}} \leq \theta_0 \leq \hat{\theta} + \frac{z_{\alpha/2}}{\sqrt{nI(\hat{\theta})}}\Big) \approx 1 - \alpha\] and so our \((1-\alpha)100\%\) Confidence Interval becomes: \[\hat{\theta} \pm z_{\alpha/2}\Big(\frac{1}{\sqrt{nI(\hat{\theta}})}\Big)\]

Let’s work through an example to illustrate the procedure.

Example (Poisson Distribution)

Although it is possible to derive exact confidence intervals for \(\lambda\) in the Poisson we illustrate how it would be done with large samples. Let \(f(x|\lambda)\) the mass function of the poisson with parameter \(\lambda\). The first thing we want to do is calculate \(I(\lambda)\). There are two equivalent ways of doing this as we showed in a Lemma in the LST section. We apply the second method namely:

\[-E\Big[\frac{\partial^2}{\partial\lambda^2} \log f(x|\lambda)\Big]\]

Moreover we have that \(\log f(x|\lambda)\) \[\log f(x|\lambda) = x\ln(\lambda) - \lambda - \ln(x!)\]

Now we find the first and second derivatives of this: \[l'(\lambda) = \frac{x}{\lambda} - 1\] \[l''(\lambda) = -\frac{x}{\lambda^2}\] and so we have that \[I(\lambda) = E\Big[l''(\lambda)\Big] = E(X)/\lambda^2 = \lambda/\lambda^2 = 1/\lambda\] Now recall that we found the mle of the Poisson previously such that: \[\hat{\lambda} = \bar{X}\] Applying the approximate CI we have: \[\bar{X} \pm z_{\alpha/2} \sqrt{\frac{\bar{X}}{n}}\]

Let’s illustrate one more example, this time dealing with parameters of samples taken from multinomial count. Although the counts are not i.i.d it can be shown that, \[Var(\hat{\theta}) = \frac{1}{E[l'(\theta_0)^2]} = -\frac{1}{E[l''(\theta_0)]}\] Moreover the mle is approximately normally distributed.

Example (Hardy-Weinberg Equilibrium)

If gene frequencies are in equilibrium, the genotypes, \(AA, Aa,\) and \(aa\) occur in a population with frquencies \((1-\theta)^2\), \(2\theta(1-\theta)\) and \(\theta^2\) according to the Hardy-Weinberg Law. In a sample from the Chinese population og Hong Kong in 1937, blood types occured with the following frequencies, where \(M\) and \(N\) erythrocyte antigens:

	Blood Types			Total
	M	MN	N
Frequencues	342	500	187	1029

In order to obtain large sample confidence intervals we need to obtain our estimate of \(\theta\). As we have seen in a previous section the log-likelihood function for a multinomial sample is given by: \[l(p_1, ..., p_m) = \ln(n!) - \sum \ln(x!) + \sum x_i\ln(p_i)\] In the case where these cell multinomial parameters \(p_i\) are functions of another unknown paremeter \(\theta\) we use: \[l(p_1, ..., p_m) = \ln(n!) - \sum \ln(x!) + \sum x_i\ln(p_i(\theta))\] This is indeed our case so let’s apply it. \[l(\theta) = \ln(n!) - \sum \ln(X_i!) + \sum X_i\ln(p_i(\theta))\] \[ = \ln(n!) - \sum\ln(X_i!) + (2X_1 + X_2)\ln(1-\theta) + (X_2 + 2X_3)\ln(\theta) + X_2\ln(2)\] Taking the derivative of this and setting to 0 we get: \[l'(\theta) = -\frac{2X_1 + X_2}{1-\theta} + \frac{X_2 + 2X_3}{\theta} = 0\] now solving for the parameter \(\theta\) and using the data provided we get that \[\hat{\theta} = .4247\]

Our next task is to find the asymptotic variance to create our CI’s. We will use the second formula above to achieve this, namely we want to find \(E[l''(\theta)]\)

\[l''(\theta) = -\frac{2X_1 + X_2}{(1-\theta)^2} - \frac{X_2 + 2X_3}{\theta^2}\] \[\Rightarrow E[l''(\theta)] = -\frac{2E[X_1] + E[X_2]}{(1-\theta)^2} - \frac{E[X_2] + 2E[X_3]}{\theta^2}\] Now each of \(X_1, X_2, X_3\) are binomials so it follows that we have the following:

\(E[X_1] = n(1-\theta)^2\)
\(E[X_2] = 2n\theta(1- \theta)\)
\(E[X_3] = n\theta^2\)

Plugging these in we have the following: \[E[l''(\theta)] = -\frac{2n}{\theta(1 - \theta)}\] And so the asymptotic variance is given by: \[\frac{\theta(1 - \theta)}{2n}\] Now using our estimated \(\hat{\theta}\) we find the standard error \(s_\theta = 1/\sqrt{I(\hat{\theta})} = .011\)

And so the cofidence interval for \(\theta\) is: \[\hat{\theta} \pm 1.96s_{\hat{\theta}} = .4247 \pm 1.96(.011) = (.403, .447)\]

Bootstrap Methods

Suppose that \(\hat{\theta}\) is an estimate of the parameter \(\theta\) whose true uknown value is \(\theta_0\), moreover suppose we know the distribution of \(\Delta = \hat{\theta} - \theta_0\). Denote the \(\alpha/2\) and \(1-\alpha/2\) quantiles of \(\Delta\) by \(\delta_*\) and \(\delta^*\) respectively such that we have the following:

\[P(\hat{\theta} - \theta_0 \leq \delta_*) = \alpha/2\] \[P(\hat{\theta} - \theta_0 \leq \delta^*) = 1 - \alpha/2\] then it follows that \[P(\delta_* \leq \hat{\theta} - \theta_0 \leq \delta^* ) = 1 - \alpha\] \[\Rightarrow P(\hat{\theta} - \delta_* \leq \theta_0 \leq \hat{\theta} - \delta^* ) = 1 - \alpha\]

Now recall that we assume that the distribution \(\Delta = \hat{\theta} - \theta_0\) is known, typically this is not the case. In the case where \(\theta_0\) were known we could generate many sample of this distribution, and calculate \(\hat{\theta} - \theta_0\) such that we could formulate both \(\delta_*\) and \(\delta^*\).

As we have have done before since we do not know \(\theta_0\) we will use \(\hat{\theta}\) in its place. We will then generate many samples of this distribution, from which we will estimate \(\theta\) from each one, call it \(\theta^*\). We then approximate \(\Delta\) by calculating \(\theta^* - \hat{\theta}\), and then formulate the desired quantiles to make our approximate Confidence Interval.

Lets apply an example using R.

Example (Poisson)

We already derived an approximate CI for the poisson above so let us use the bootstrap method and compare the two. Much like we would in a non academic example we will use our estimate of the true paramter \(\lambda_0\), which we derived to be \(\bar{X}\), and apply the methodology developed above.

First let us generate some data from the poisson we call it populationData

Next we calculate the \(\hat{\lambda}\) we derived previously

lambdaHat <- sum(populationData)/length(populationData)

Next we want to generate the approximation to \(\Delta\) by taking many samples of the Poisson with \(\hat{\lambda}\) as the parameter, and estimating \(\lambda\) from these.

# some global parameters
NoIterations <- 1000
deltaHat <- numeric(NoIterations)

# main loop
for (i in 1:NoIterations) {
  bootPoisson <- rpois(1000, lambdaHat) # using mle to generate data
  lambdaStar <- sum(bootPoisson)/length(bootPoisson) # estimating lambda
  deltaHat[i] <- lambdaStar - lambdaHat
}

Now we can find the confidence interval based on this \(\hat{\Delta}\)

CIPoiss <- c(lambdaHat, lambdaHat) - quantile(deltaHat, c( .975,.025))

And so the 95% Confidence Interval for \(\lambda\) is given by (0.294, 0.368)

As a bonus here is the sample Bootstrap with python

import numpy as np

# first we generate the simulated data 
realPoisson = np.random.poisson(.33, 1000)

# next we wish to calculate the mle for this 
hatLambda = np.mean(realPoisson)

totalIterations = 1000
hatDelta = np.zeros(totalIterations)

for i in range(1,totalIterations):
  poissonStrap = np.random.poisson(hatLambda,1000 )
  starLambda = np.mean(poissonStrap)
  hatDelta[i] = starLambda - hatLambda
  
ciForPoisson = np.percentile(hatDelta, [97.5, 2.5])
l = hatLambda - ciForPoisson[0]
u = hatLambda - ciForPoisson[1]
print("The 95 percent CI is given by: %f,%f " %(l,u))

## The 95 percent CI is given by: 0.288000,0.358000

Confidence Intervals from MLE

Emanuel Rodriguez

November 9, 2015

Approximate Methods

Bootstrap Methods