Foundations for Statistical Inference: Confidence Intervals

The population

In this lab we treat the following Pew Research finding as the truth about the population of US adults:

62% say climate change is currently affecting their local community.
38% say it is not affecting their local community.

For easier computation we work with a synthetic population of 100,000 adults where 62,000 answer “Yes” and 38,000 answer “No”.

us_adults <- tibble(
  climate_change_affects = c(
    rep("Yes", 62000),
    rep("No", 38000)
  )
)

Population distribution

us_adults %>%
  count(climate_change_affects) %>%
  mutate(p = n / sum(n))

## # A tibble: 2 × 3
##   climate_change_affects     n     p
##   <chr>                  <int> <dbl>
## 1 No                     38000  0.38
## 2 Yes                    62000  0.62

ggplot(us_adults, aes(x = climate_change_affects)) +
  geom_bar() +
  labs(
    x = "",
    y = "",
    title = "Do you think climate change is affecting your local community?"
  ) +
  coord_flip()

The output above confirms that 62% of the population said “Yes” and 38% said “No”.

A single random sample

Now we pretend we do not know the population and only get to see a simple random sample of 60 adults.

n <- 60

samp <- us_adults %>%
  sample_n(size = n)

samp

## # A tibble: 60 × 1
##    climate_change_affects
##    <chr>                 
##  1 Yes                   
##  2 No                    
##  3 Yes                   
##  4 No                    
##  5 Yes                   
##  6 Yes                   
##  7 No                    
##  8 Yes                   
##  9 No                    
## 10 Yes                   
## # ℹ 50 more rows

Exercise 1

What percent of the adults in your sample think climate change affects their local community?

We can compute the sample proportion who say “Yes”.

samp_summary <- samp %>%
  count(climate_change_affects) %>%
  mutate(p_hat = n / sum(n))

samp_summary

## # A tibble: 2 × 3
##   climate_change_affects     n p_hat
##   <chr>                  <int> <dbl>
## 1 No                        23 0.383
## 2 Yes                       37 0.617

Answer (Exercise 1):
The table above shows the number and proportion of “Yes” and “No” responses in my sample of 60. The proportion labeled p_hat in the row for “Yes” is the percentage of adults in this sample who think climate change affects their local community. This value is my point estimate of the unknown population proportion.

Exercise 2

Would you expect another student’s sample proportion to be identical to yours? Would you expect it to be similar? Why or why not?

Answer (Exercise 2):
I would not expect another student’s sample proportion to be exactly the same as mine. We are both taking random samples of 60 adults, so by chance our samples will include different people and slightly different mixes of “Yes” and “No” responses. However, because we are sampling from the same population where the true proportion is 0.62, I would expect our sample proportions to be similar overall and usually fairly close to 0.62, not wildly different from each other.

A 95% confidence interval

We now build a 95% confidence interval for the population proportion of US adults who think climate change affects their local community using bootstrapping with the infer package.

ci_95 <- samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.95)

ci_95

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1      0.5    0.733

Exercise 3

In the interpretation above, we used the phrase “95% confident”. What does “95% confidence” mean?

Answer (Exercise 3):
A 95% confidence level means that if we repeated this entire process many times — taking a new random sample of 60, building a new bootstrap confidence interval from each sample — then about 95% of those intervals would contain the true population proportion. For a single interval, we either do or do not cover the true value, but 95% describes the long run success rate of the method, not the probability for one specific interval after it has been calculated.

Exercise 4

Does your confidence interval capture the true population proportion of US adults who think climate change affects their local community? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

The true population proportion in our synthetic population is 0.62. We can check whether it falls between the lower and upper bounds of the interval we just computed.

true_p <- 0.62

ci_95 %>%
  mutate(
    captures_true_p = (lower_ci <= true_p & upper_ci >= true_p)
  )

## # A tibble: 1 × 3
##   lower_ci upper_ci captures_true_p
##      <dbl>    <dbl> <lgl>          
## 1      0.5    0.733 TRUE

Answer (Exercise 4):
The output above shows whether my 95% confidence interval includes 0.62. If captures_true_p is TRUE, my interval does capture the true population proportion; if it is FALSE, this particular interval is one of the few (about 5% in the long run) that miss the true value. Similarly, each classmate’s interval may or may not contain 0.62, but overall we would expect roughly 95% of everyone’s 95% intervals to capture the true proportion.

Many confidence intervals

Next we explore what happens when we construct many confidence intervals.

We will:

Draw a random sample of 60 adults from the population.
From that sample, construct a 95% bootstrap confidence interval for the proportion who say “Yes”.
Repeat steps 1–2 a total of 50 times.
Check what proportion of these 50 intervals capture the true population proportion 0.62.

set.seed(606)

B <- 50           # number of intervals
n_boot <- 1000    # number of bootstrap resamples for each interval

cis_95 <- tibble(
  sim = 1:B,
  lower_ci = NA_real_,
  upper_ci = NA_real_
)

for (i in 1:B) {
  samp_i <- us_adults %>%
    sample_n(size = n)
  
  ci_i <- samp_i %>%
    specify(response = climate_change_affects, success = "Yes") %>%
    generate(reps = n_boot, type = "bootstrap") %>%
    calculate(stat = "prop") %>%
    get_ci(level = 0.95)
  
  cis_95$lower_ci[i] <- ci_i$lower_ci
  cis_95$upper_ci[i] <- ci_i$upper_ci
}

# Did each interval capture the true p?
cis_95 <- cis_95 %>%
  mutate(captures_true_p = (lower_ci <= true_p & upper_ci >= true_p))

# Proportion of intervals that capture the true p
prop_capture_95 <- mean(cis_95$captures_true_p)
prop_capture_95

## [1] 1

We can also visualize all 50 intervals on one plot.

ggplot(cis_95, aes(x = sim, ymin = lower_ci, ymax = upper_ci)) +
  geom_linerange() +
  geom_hline(yintercept = true_p, linetype = "dashed") +
  labs(
    x = "Interval index",
    y = "Proportion saying 'Yes'",
    title = "50 bootstrap 95% confidence intervals"
  )

Exercise 5

Each student should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why?

Answer (Exercise 5):
Because we are using a 95% confidence level, we expect the method to produce intervals that include the true population proportion about 95% of the time. So if many students each compute a 95% confidence interval from independent random samples, we would expect around 95% of those intervals to capture the true population proportion. A few intervals (around 5%) will miss the true value purely due to sampling variability.

Exercise 6

Given a sample size of 60, 1000 bootstrap samples for each interval, and 50 confidence intervals constructed, what proportion of your confidence intervals include the true population proportion? Is this proportion exactly equal to the confidence level? If not, explain why. Make sure to include your plot in your answer.

Answer (Exercise 6):
The value printed above as prop_capture_95 is the proportion of my 50 intervals that contain the true proportion 0.62. With a 95% confidence level we expect this number to be close to 0.95, but it will not usually be exactly 0.95 because we are only constructing a finite number (50) of intervals. Random sampling variation means that sometimes a few more or a few fewer intervals than expected will capture the true value, even though the long run capture rate of the method is 95%.

Changing the confidence level

Exercise 7

Choose a different confidence level than 95%. Would you expect a confidence interval at this level to be wider or narrower than the confidence interval you calculated at the 95% confidence level? Explain your reasoning.

Answer (Exercise 7):
Suppose I choose a 90% confidence level. A 90% interval should be narrower than a 95% interval because it does not need to capture the true parameter as often. To achieve a higher confidence level we must stretch the interval farther in both directions, which makes intervals wider. Reducing the confidence level lets us shorten the interval but accept a lower long run capture rate.

Exercise 8

Using code from the infer package and data from the one sample you have (samp), find a confidence interval for the proportion of US adults who think climate change is affecting their local community with a confidence level of your choosing (other than 95%) and interpret it.

We construct a 90% confidence interval from our original sample.

ci_90 <- samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.90)

ci_90

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1    0.517    0.717

Answer (Exercise 8):
Based on my sample of 60 adults, the 90% bootstrap confidence interval for the proportion who think climate change affects their local community is given by the lower and upper bounds printed above. I am 90% confident that the true proportion of all US adults who would answer “Yes” falls between these two values.

Exercise 9

Using the app, calculate 50 confidence intervals at the confidence level you chose in the previous question, and plot all intervals on one plot, and calculate the proportion of intervals that include the true population proportion. How does this percentage compare to the confidence level selected for the intervals?

Instead of using the app, we can use code very similar to what we wrote earlier, but now using a 90% confidence level.

cis_90 <- tibble(
  sim = 1:B,
  lower_ci = NA_real_,
  upper_ci = NA_real_
)

for (i in 1:B) {
  samp_i <- us_adults %>%
    sample_n(size = n)
  
  ci_i <- samp_i %>%
    specify(response = climate_change_affects, success = "Yes") %>%
    generate(reps = n_boot, type = "bootstrap") %>%
    calculate(stat = "prop") %>%
    get_ci(level = 0.90)
  
  cis_90$lower_ci[i] <- ci_i$lower_ci
  cis_90$upper_ci[i] <- ci_i$upper_ci
}

cis_90 <- cis_90 %>%
  mutate(captures_true_p = (lower_ci <= true_p & upper_ci >= true_p))

prop_capture_90 <- mean(cis_90$captures_true_p)
prop_capture_90

## [1] 0.88

ggplot(cis_90, aes(x = sim, ymin = lower_ci, ymax = upper_ci)) +
  geom_linerange() +
  geom_hline(yintercept = true_p, linetype = "dashed") +
  labs(
    x = "Interval index",
    y = "Proportion saying 'Yes'",
    title = "50 bootstrap 90% confidence intervals"
  )

Answer (Exercise 9):
The proportion of 90% intervals that contain the true proportion, prop_capture_90, should be close to 0.90. As with the 95% intervals, this number will not usually be exactly equal to the confidence level because we are only drawing a finite number of samples and intervals. Still, the result should be reasonably close to 0.90, which matches the chosen confidence level.

Exercise 10

Lastly, try one more (different) confidence level. First, state how you expect the width of this interval to compare to previous ones you calculated. Then, calculate the bounds of the interval using the infer package and data from samp and interpret it. Finally, use the app to generate many intervals and calculate the proportion of intervals that capture the true population proportion.

Here we choose a 99% confidence level.

ci_99 <- samp %>%
  specify(response = climate_change_affects, success = "Yes") %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "prop") %>%
  get_ci(level = 0.99)

ci_99

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1     0.45    0.767

cis_99 <- tibble(
  sim = 1:B,
  lower_ci = NA_real_,
  upper_ci = NA_real_
)

for (i in 1:B) {
  samp_i <- us_adults %>%
    sample_n(size = n)
  
  ci_i <- samp_i %>%
    specify(response = climate_change_affects, success = "Yes") %>%
    generate(reps = n_boot, type = "bootstrap") %>%
    calculate(stat = "prop") %>%
    get_ci(level = 0.99)
  
  cis_99$lower_ci[i] <- ci_i$lower_ci
  cis_99$upper_ci[i] <- ci_i$upper_ci
}

cis_99 <- cis_99 %>%
  mutate(captures_true_p = (lower_ci <= true_p & upper_ci >= true_p))

prop_capture_99 <- mean(cis_99$captures_true_p)
prop_capture_99

## [1] 1

Answer (Exercise 10):
A 99% confidence interval should be wider than both the 95% and the 90% intervals, because it must stretch farther to capture the true proportion in 99% of repeated samples. The bounds printed in ci_99 show this wider interval. The value prop_capture_99 is the proportion of 99% intervals that include the true proportion; with enough intervals this number should be close to 0.99.

Effects of sample size and number of bootstrap samples

Exercise 11

Using the app, experiment with different sample sizes and comment on how the widths of intervals change as sample size changes (increases and decreases).

Answer (Exercise 11):
As the sample size increases, the confidence intervals become narrower. Larger samples give more information about the population and reduce the standard error, so our estimate is more precise. When the sample size decreases, the intervals become wider, reflecting greater uncertainty when we have less data.

Exercise 12

Finally, given a sample size (say, 60), how does the width of the interval change as you increase the number of bootstrap samples. Hint: Does changing the number of bootstrap samples affect the standard error?

Answer (Exercise 12):
For a fixed sample size, increasing the number of bootstrap resamples (for example from 500 to 1000 or 2000) does not systematically change the true width of the confidence interval. The standard error is determined by the variability in the original sample, not by how many bootstrap resamples we draw. Using more bootstrap samples simply makes the bootstrap distribution smoother and the estimated interval bounds a bit more stable from run to run, but the typical width of the interval stays about the same.