MATH 138: Confidence Intervals

The Big Idea:

The sampling distribution tells us how close the sample mean, \(\bar{x}\), is likely to be to the population mean, \(\mu\).

All confidence intervals that we will construct in this call have a form similar to this:

\[\text{estimate} \pm \text{margin of error}\] The margin of error is the product of the critical value and the standard error.

\[\text{estimate} \pm \text{critical value} \times \text{standard error}\]

In the event that we want to calculate a confidence interval for a population mean, the calculation will look like:

\[ \bar{x} \pm z^*_{1-\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]

where \(z^*_{1-\alpha/2}\) is a value that represents the level of confidene we have that the interval we have constructed “captures” the population mean.

Significance Levels and Confidence Levels

The significance level sets a threshold for finding statistical signficance (more on this next week!) We use an \(\alpha\), alpha, to represent the signficance level.

The confidence level of an interval is set to be \((1-\alpha)\times 100%\). This \(1-\alpha\) represents the proportion of area in the middle of the sampling distribution, centered at the sample mean.

Common Confidence Levels

The common confidence levels are 80%, 90%, 95%, and 99%, where 95% is the most common/default confidence level.

Finding the Critical Value

We can use R to find the critical value. Since we are given the population standard deviation, \(\sigma\), we calculate the critical value from the standard normal distribution (denoted with a Z).

# STEP 1: Set the alpha significance level 
alpha=0.05

# STEP 2: Calculate the area to the left
conf<-1-alpha/2

# STEP 3: Calculate the critical value
qnorm(conf)

## [1] 1.959964

Example 1: Hummingbirds

Suppose a small group of 15 Allen’s hummingbirds (Selasphorus sasin) has been under study in Arizona. The average weight for these 15 birds is 3.15. Based on previous studies, we can assume that weights of all Allen’s hummingbirds have a Normal distribution with standard deviation of 0.33 grams.

The researcher of this study hopes to estimate the true mean weight of Allen’s hummingbirds.

Calculate the 99% confidence interval for the true mean weight of Allen’s hummingbirds.

Step 1: State the problem. This will help you plan your next move!

We wish to estimate the true mean, \(\mu\), weight of Allen’s hummingbirds.

Step 2: Take an inventory of the information given to you and decide what procedure is best to use. (This chapter we are talking about the “z-methods”)

\(\bar{x}\)=3.15 g
\(n=15\)
\(\sigma\) is known to be 0.33g

Step 3 – Check conditions.

SRS - CHECK!
Normal distribution, given by CLT - CHECK!
\(\sigma\) is known - CHECK!

Step 4: If conditions are satisfied, do calculations.

\[ \bar{x} \pm z^*_{1-\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]

# Information given in problem 
xbar<- 3.15
sigma<- 0.33
n <- 15

# STEP 1: Set the alpha significance level 
alpha=0.01

# STEP 2: Calculate the area to the left
conf<-1-alpha/2

# STEP 3: Calculate the critical value
qnorm(conf)

## [1] 2.575829

\[ 3.15 \pm 2.576 \times \frac{0.33}{\sqrt{15}}\]

xbar+c(-1,1)*qnorm(conf)*sigma/sqrt(n)

## [1] 2.930525 3.369475

Step 5: State the conclusion in the context of the problem:

We are 99% confident that the true mean weight of Allen’s hummingbirds is between (2.931, 3.369) grams.

Example 2: Triathletes BPM

n<-9 

# Swimming
xbar1<-188
sigma1<-7.2

# Biking
xbar2<-186
sigma2<-8.5

# Running 
xbar3<-194
sigma3<-7.8

Step A: ssuming that heart-rate distribution for each event is approximately Normal, construct a 95% confidence interval for the true mean heart rate for male triathletes for each event.

# STEP 1: Set the alpha significance level 
alpha=0.05

# STEP 2: Calculate the area to the left
conf<-1-alpha/2

# STEP 3: Calculate the critical value
qnorm(conf)

## [1] 1.959964

# Swimming
xbar1+c(-1,1)*qnorm(conf)*sigma1/sqrt(n)

## [1] 183.2961 192.7039

# Biking
xbar2+c(-1,1)*qnorm(conf)*sigma2/sqrt(n)

## [1] 180.4468 191.5532

# Running 
xbar3+c(-1,1)*qnorm(conf)*sigma3/sqrt(n)

## [1] 188.9041 199.0959

Step B: Interpret the confidence interval for the running event in the context of the problem.

We are 95% confident that the true mean maximum heart rate for the running event is between (188.904, 199.096) bpm.

Step C: Do the intervals overlap? Based on the computed intervals, do you think there is evidence that the mean maximum heart rate is higher for running than for the other two events? Explain.

Yes, the intervals overlap, but to test for a signficant difference we’d have to use a hypothesis test.

SAT Demo Activity

“SAT standard deviation is calculated so that 68% of students score within one standard deviation of the mean, 95% of students score within two standard deviations of the mean, and 99+% of students score within three standard deviations of the mean.”

We construct the following simulation to show how confidence intervals are constructed.

For this exercise we do NOT know the mean, but we want to estimate it and construct intervals to contain it.

The standard deviation, \(\sigma\) is given to be 210 points.

sigma <- 210

Now we run a simulation to create 1000 different confidence intervals to test for the calibration:

nsim=1000
coverage<-rep(0, nsim)
xbars<-rep(NA, nsim)

n=50
for(i in 1:nsim){
  scores<-rnorm(n=n, mean=mu, sd=sigma)
  xbars[i]<-mean(scores)
  lower<-xbars[i]-qnorm(0.975)*sigma/sqrt(n)
  upper<-xbars[i]+qnorm(0.975)*sigma/sqrt(n)
  if(mu>lower & mu<upper){
    coverage[i]<-1
  }
}
mean(coverage)

## [1] 0.947

hist(xbars)

Whats your best guess for the true mean?