Confidence intervals

So let’s develop a more practical use of the central limit theorem.

Recall that \(\bar{X}\), is approximately normal with mean \(\mu\) and sd \(\sigma/\sqrt{n}\)

In this distribution, \(\mu+2(\sigma/\sqrt{n})\) is pretty far out in the right tail, with only a \(2.5\%\) chance of a normal being larger than two standard deviations the tail.

Similarly, \(\mu+2(\sigma/\sqrt{n})\) is pretty far in the left tail, with only a \(2.5\%\) chance of a normal being smaller than two standard deviations in the left tail.

So the probability \(\bar{X}\) is bigger than \(\mu+2\sigma/\sqrt{n}\) or smaller than \(\mu-2\sigma/\sqrt{n}\) is \(5\%\)

Where equivalently, the probability that \(\mu\) is between these limits is \(95\%\).

We can reverse the role of \(\bar{X}\mu\) without changing the probability inequalities and get that the quantity \(\bar{X}\pm2\sigma/\sqrt{n}\) contains \(\mu\) with probability \(95\%\).

So \(\bar{X}\pm2\sigma/\sqrt{n}\) is called a \(95\%\) interval for \(\mu\).

Remember here that we’re treating the interval as random \(\bar{X}\pm2\sigma/\sqrt{n}\), while \(\mu\) is fixed. So we talk about the probability that the interval contains \(\mu\).

The actual interpretation of that is that if we were to repeatedly get samples of size \(n\) from this population, construct the confidence interval in each case. About \(95\%\) of the intervals we obtained would contain \(\mu\), the parameter that we’re trying to estimate.

I would note that I get the 2 by rounding up the \(97.5th\) quantile, which is closer to \(1.96\). If you wanted a \(90\%\) interval, what you need is \(5\%\) in either tail. So you would replace the 2 with 1.645.

Give a confidence interval for the average height of sons

in Galton’s data

Let’s go through a simple example. Consider the father.son data from the UsingR pacakage. To find x to be the son’s height, I’m simply going to take mean of \(x\) plus or minus the \(0.975th\) normal quantile times the standard error of the mean. Standard deviation of \(x\) divided by the square root of n, which is the length of the vector x. I divided by \(12\) so that my confidence interval will be in feet rather than inches. Finally I get the confidence interval \(5.710\) to \(5.738\).

library(UsingR)
data(father.son)
x <- father.son$sheight
(mean(x) + c(-1, 1) * qnorm(0.975) * sd(x) / sqrt(length(x)))/12

## [1] 5.709670 5.737674

So if we are willing to assume that the sons from this data were and ideally draw from a population of interest, then the confidence interval for the average height to the sons would be \(5.71\) to \(5.74\)

Sample proporions

Let’s consider coin flips in developing confidence interval for the success probability of the coin.

In the event that each \(X_i\) is \(0\) or \(1\) with common success probability \(p\) then \(\sigma^2 = p(1-p)\)

Remember then, the standard error of the mean is \(\sqrt{\frac{p(1-p)}{n}}\)

The interval takes the form

\[ \hat{p} \pm z_{1-\alpha/2} \sqrt{\frac{p(1-p)}{n}}\]

We don’t know \(p\), is what we want to estimate, and so we don’t have the standard error, but we would replace it by \(\hat{p}\).

Replatcing \(p\) by \(\hat{p}\) in the standard error results in what is called a Wald confidence interval for \(p\)

It turns out that \(p(1-p)\) is as large as it can when \(p = 1/2\), so that \[\sqrt{\frac{p(1-p)}{n}}\leq\sqrt{\frac{\frac{1}{2}(1-\frac{1}{2})}{n}}=\frac{1}{2\sqrt{n}}\]

So when we multiply it times the \(2\), in the \(95\%\) interval, those cancel out

\[2\sqrt{\frac{p(1-p)}{n}}\leq2\frac{1}{2\sqrt{n}}=\frac{1}{\sqrt{n}}\]

And we get that:

For \(95\%\) intervals

\[ \hat{p} \pm \frac{1}{\sqrt{n}} \]

is a quick \(CI\) estimate for \(p\)

Example

Imagine you were running for political office and:

Your campaign advisor told you that in a random sample of \(100\) likely voters, \(56\) intent to vote for you.
- Can you relax? Do you have this race in the bag? Is \(0.56\) out of \(100\) sampled enough evidence to conclude that you’ll likely get more than \(50\%\) of the vote?
- Without access to a computer or calculator, how precise is this estimate? So what you do is you do \(\frac{1}{\sqrt{100}}=0.1\) so a back-of-the-envelope calculation gives an approximate \(95\%\) interval of \((0.46, 0.66)\) The confidence interval says that we can’t rule out possibilities below \(0.5\) with \(95%\) confidence.
- Not enough for you to relax, better go do more campaigning!
General rough guidelines is that you need \(100\) for \(1\) decimal place in a binomial experiment, \(10,000\) for \(2\), \(1,000,000\) for 3.

Here I just give \(1\) over the square root of a bunch of powers of \(10\) just to illustrate that.

round(1/sqrt(10^(1:6)), 3)

## [1] 0.316 0.100 0.032 0.010 0.003 0.001

Binomial interval

Here I just go through the calculation where I plug \(0.56\) directly into the standard area formula.

0.56 + c(-1, 1) * qnorm(0.975) * sqrt(0.56 * 0.44/100)

## [1] 0.4627099 0.6572901

This yelds the interval 0.46 to 0.66 just like before.

Also the function \(binom.test\) with \(56\) successes out of \(100\) trials in grabbing the confidence interval component yelds a very similar interval, \(0.46\) to \(0.66\)

binom.test(56, 100)$conf.int

## [1] 0.4571875 0.6591640
## attr(,"conf.level")
## [1] 0.95

The \(binom.test\) function is exactly the sense that it yelds \(95\%\) coverage or higher, regardless of the size \(n\). These so called exact procedures are a nice compliment to large sample procedures. They tend to involve computations that cannot be done by hand, and can be conservative \(-\) I have wider intervals \(-\) but nonetheless they do not rely on the \(Central Limit Theorem\).

Simulation

Let’s consider a simulation. Here I’m going to do a simulation where I flip a coin with a given success probability over and over again. Then I calculate the percentage of times my walled interval covers the true coin probability that we used to generate the data.

# In each of my simulations I'm going to do 20 coin flips.
n <- 20
# True values of success probability of the coin that I'd like to look at vary between 0.1 and 0.9, where I step through them by value 0.05.
pvals <- seq(0.1, 0.9, by = 0.05)
# I'm going to do a thousand simulations
nosim <- 1000
# And then I'm going to loop over these true success probabilities.
coverage <- sapply(pvals, function(p) {
        # For each true success probability I'm going to generate 1,000 sets 
        # of 20 coin flips where I take the sample proportion in each case.
        phats <- rbinom(nosim, prob = p, size = n)/n
        # Then I'm going to calculate the lower limit of the confidence 
        # interval in each of those cases.
        ll <- phats - qnorm(0.975) * sqrt(phats * (1 - phats)/n)
        # And also the upper limit of the confidence interval in each of 
        # those cases.
        ul <- phats + qnorm(0.975) * sqrt(phats * (1 - phats)/n)
        # Finally I'm going to calculate the proportion of times that they 
        # can cover that true value of p that I used to simulate the data.
        mean(ll < p & ul > p)
})

Now what I’d like to do is plot coverage as the function of the true success probability I used to simulate the data.

plot(x = pvals, y = coverage, type = "o", xlab = "pvals", ylim = range(0:1))
abline(h = 0.95, col = "red")

Here I show the plot of the coverage by the true values of p used for simulation. So, for example, at the value \(0.5\) I flipped \(10\) times, took the sample proportion, found the confidence interval, and then gave it a \(1\) if it covered the value \(0.5\) or not, I did that a \(1,000\). In this case over \(95\%\) of the time it did in fact cover that probability. So it’s basically saying if the true value of \(p\) was \(0.5\) and I do a confidence interval, the coverage is actually better than \(95\%\) a little bit.

There is some Monte Carlo error, right? In that I didn’t do an infinite number of simulations, I only did \(1,000\), but \(1,000\) is pretty good right? As we saw in a couple of slides ago it’s giving us a little bit better than one decimal place accuracy.

Now, here it’s pretty clear that I’m getting nowhere near \(95\%\) confidence level for the case where \(p\) is around \(12\%\) or so. Why is it that a \(95\%\) confidence interval procedure is giving us coverage less than \(95\%\)? It’s simply because the central limit theorem isn’t as accurate as we need it to be for this specific value of \(n\) for coins with this specific true probability

What’s happening?

So I’ll give you a quick fix to address this problem for smaller value of \(n\).

\(n\) isn’t large enough for the CLT to be applicable for many of the values of \(p\)
Quick fix, form the interval with

\[ \frac{X + 2}{n + 4} \]

(Add two successes and two failures, Agresti/Coull interval)

Then you just churn through the confidence interval procedure as normal. But with this new sample proportion that’s been shrunk towards \(0.5\) a little bit. This is called Agresti/Coull interval, and in a minute we’ll show you how it performs a little bit better.

Simulation

Before I show the results for adding two successes and adding two failures, I want to simply show that the performance is the better when \(n\) is larger.

First let’s show that coverage gets better with \(n\)

# Here I've done exactly the same simulation, but turn n into 100 rather than 20.
n <- 100
pvals <- seq(0.1, 0.9, by = 0.05)
nosim <- 1000
coverage2 <- sapply(pvals, function(p) {
        phats <- rbinom(nosim, prob = p, size = n)/n
        ll <- phats - qnorm(0.975) * sqrt(phats * (1 - phats)/n)
        ul <- phats + qnorm(0.975) * sqrt(phats * (1 - phats)/n)
        mean(ll < p & ul > p)
})

plot(x = pvals, y = coverage2, type = "o", xlab = "pvals", ylim = range(0:1))
abline(h = 0.95, col = "red")

For that specific true value of p, I simulated a hundred coin flips, took the sample proportion, created the confidence interval and saw if it contained the true value of p. In over \(95\%\) of the instances that was the case. The \(95\%\) is evidenced by a horizontal red line, and the coverage probability is fairly close to it throughout, it never dips below 90 like it did in the 20 case.

Simulation (continued)

Now let’s look at \(n = 20\) but adding \(2\) successes and failures.

n <- 100
pvals <- seq(0.1, 0.9, by = 0.05)
nosim <- 1000
coverage3 <- sapply(pvals, function(p) {
        # when I calculate the confidence interval, I am adding two successess and two failures.
        phats <- (rbinom(nosim, prob = p, size = n) + 2) / (n + 4)
        ll <- phats - qnorm(0.975) * sqrt(phats * (1 - phats)/n)
        ul <- phats + qnorm(0.975) * sqrt(phats * (1 - phats)/n)
        mean(ll < p & ul > p)
})

Let’s see the performance of that.

plot(x = pvals, y = coverage3, type = "o", xlab = "pvals", ylim = range(0:1))
abline(h = 0.95, col = "red")

And you can see that the add two successes and two failures interval, tends to cover the true success probability with a higher percentage of time than \(95\%\). This is a little bit better than the extremely poor coverage of the Wald interval for certain values of the true probability. However, being too conservative, in other words, having too high of a coverage rate is also not necessarily a good thing, as it implies that the interval is probably too wide. Nonetheless I have done some thinking about this specific problem and I can give a very categorical recommendation that the add \(2\) successes and add \(2\) failures interval should generally be used instead of the Wald interval.

Poisson interval

Let’s create a Poisson interval using \(\epsilon_{estimate} \pm z_{quantile} SE_{est}\)

\(z_{quantile}\) is based on the Central Limit Theorem though it’s maybe little less clear how the Central Limit Theorem is applying in this case.

\(z_{quantile}\) is \(\frac{(100 - coverage)}{2}\)

A nuclear pump failed \(5\) times out of \(94.32\) days, give a \(95\%\) confidence interval for the failure rate per day?
\(X \sim Poisson(\lambda t) -\) I assume that the number of failures is Poisson
Estimate \(\hat{\lambda} = X/t -\) My estimate of the failure rate is the number of failures divided by the total monitoring time
\(Var(\hat{\lambda} = \lambda/t)\)
\(\hat{\lambda/t}\) is our empirical variance estimate

R code

# defines the number of events
x <- 5
# monitoring time
t <- 94.32
# the estimate of the rate
lambda <- x/t
# the confidence interval estimate as the rat estimate plus or minus the 
# relevant standard normal quantile times the standard error, and then I 
# round it so that I only get three decimal place accuracy.
round(lambda + c(-1, 1) * qnorm(0.975) * sqrt(lambda/t), 3)

## [1] 0.007 0.099

In addition to doing a large sample interval, we can do an exact Poisson interval, and R has a function for doing it.

# the parameters are the number of events and the monitoring time.
# Then you can just grab the confidence interval part
poisson.test(x, T = 94.32)$conf

## [1] 0.01721254 0.12371005
## attr(,"conf.level")
## [1] 0.95

What I mean by exact is that it guarantees the coverage, but that might be conservative, it might give you a wider interval than necessary, but it will guarantee \(95\%\) coverage.

Simulating the Poisson coverage rate

Just because we’re very interested in seeing how confidence intervals perform on repeated samplings since that defines our idea of what a confidence interval is. Let’s repeat the process that we did for the coin, for the Poisson coverage rate.

Let’s see how this interval for lambda values near that we’re estimating

# Let's pick a set of lambda values, kind of near the ones from our example.
lambdavals <- seq(0.05, 0.1, by = 0.01)
# Let's do a thousand simulations
nosim <- 1000
# Let's set our monitoring time as 100 instead of 94, just to make it simple.
t <- 1000
# And then I'm going to define coverage in the same way.
coverage4 <- sapply(lambdavals, function(lambda) {
        # I'm going to simulate a bunch of Poissons with this particular 
        # rate, and then I'm going to divide it by the monitoring time to get
        # a vector of lambda hats over 1,000 simulations.
        lhats <- rpois(nosim, lambda = lambda * t)/t
        # I'm then going to create my 95% confidence interval by constructing
        # the lower limit, where I subtract off the standard normal quantile
        # times the standard error
        ll <- lhats - qnorm(0.975) * sqrt(lhats/t)
        # And then a bunch of upper limits, where I add the standard normal
        # quantile times the standard error
        ul <- lhats + qnorm(0.975) * sqrt(lhats/t)
        # And then I'm going to take the percentage of times that my interval
        # contains the true lambda used for simulation
        mean(ll < lambda & ul > lambda)
})

Coverage

Here’s the plot of the lambda values by the Monte Carlo estimated coverage.

(Gets really bad for small values of lambda)

plot(x = lambdavals, y = coverage4, type = "o", xlab = "pvals", ylim = range(0:1))
abline(h = 0.95, col = "red")

So here’s the plot of our lambda values by the coverage. So every point in this plot is an instance where we simulated and repeatedly generated Poisson confidence intervals and took the percentage of time that those intervals contained the true lambda value used for the simulation.

We can see as the lambda values get larger, the coverage gets closer to \(95\%\). There is of course some Monte Carlo error, because we didn’t do an infinite number of simulation, only \(1,000\). We also see that as the true value of lambda gets smaller, the coverage gets very poor, less than \(50\%\), so your purported \(95\%\) interval is only giving you \(50\%\) actual coverage.

So our brief simulation here is suggesting that for very small values of lambda, which we have a good sense of it, we could have relatively few events for a large amount of monitoring time. For relatively small values of lambda we shouldn’t be using this asymptotic interval.

What if we increase \(t\) to \(1000\)

Since it’s not immediately clear how the Central Limit Theorem works in the Poisson case I just wanted to put up one simulation to show that what’s going to infinity to make the Poisson case have very good approximations using the Central Limit Theorem. Basically, as the monitoring time goes to infinity, the coverage will converge to \(95\%\). So in this simulation, I changed \(t\) from \(100\) to \(1,000\) so that we’re monitoring not for a total of \(94.32\) days, but a total monitoring time of \(1,000\) days.

t <- 1000

plot(x = lambdavals, y = coverage4, type = "o", xlab = "pvals", ylim = range(0:1))
abline(h = 0.95, col = "red")

And here we see that the coverage, if we look at the refernce line here at \(95\%\), the coverage is much better for most values of lambda. There is some poor coverage down here, near the smaller values of lambda, but we know that this interval has problems with small values of lambda. But remember, in this case we do have the exact Poisson interval as an option.

Summary

The Law of Large Numbers states that averages of \(iid\) samples converge to the population means that they are estimating.
- I would also add that it also shows that Poisson rates converge to the rates that they’re estimating, though it’s maybe a little less clear how that works, but as the monitoring time goes to infinity, for example.
The Central Limit Theorem states that averages are approximately normal, with distributions
- Centered at the population mean \(-\) which we knew regardless of the Central Limit Theorem
- With standard deviation equal to the standard error of the mean \(-\) which we also already knew without the Central Limit Theorem
- However the Central Limit Theorem also tells us that they’re approximately normal.
- However, the Central Limit Theormem gives no guarantee that \(n\) is large enough for this approximation to be accurate and we’ve seen some instances with confidence intervals where it’s very accurate and instances where it’s less accurate.

Taking the mean and adding and subtracting the relevant normal quantile times the \(SE\) yields a confidence interval for the mean. And this process of estimate plus or minus some sort of quantile times the \(SE\) is our default way for creating confidence intervals, whether it’s in this context or in our regression class, or even when we’re talking about generalized linear models and more complex subjects, we tend to use so called Wald intervals.

Adding and subtracting \(2 SE\) works for \(95\%\) intervals \(-\) or if you want to be a little bit more accurate, use \(1.96\)

Confidence intervals get wider as the coverage increases, if you’re staying within a single technique, why?

Imagine if someone were to put a gun to your head, and say unless this \(CI\) contains the true parameter it’s estimaating I’m going to pull the trigger, which is a ridiculous circumstance but \(-\) humor me \(-\) what would you do? Well, you certainly don’t want them to pull the trigger, right? So what would you do is you’d make the confidence intervals as wide as you possibly could. Because you want to be very sure that the interval contains the parameter. So that’s the same thing the mathematics does. The more sure that you want to be that the interval will contain the parameter, the wider the procedure makes the interval. If you don’t care at all, if you want a \(2\%\) interval and only, then you’re going to have an interval that’s very close to the mean itself.

The Poisson and binomial case have exact intervals that don’t require the \(CLT\) \(-\) Because it sometimes not approximate their distributions well.

Bit of a quick fix for small sample size binomial calculations is to add \(2\) sucesses and failures.

07 03 Asymptotics and confidence intervals

Federico Viscioletti

17 novembre 2015

Confidence intervals

Give a confidence interval for the average height of sons

Sample proporions

Example

Binomial interval

Simulation

What’s happening?

Simulation

Simulation (continued)

Poisson interval

R code

Simulating the Poisson coverage rate

Coverage

What if we increase \(t\) to \(1000\)

Summary