2025-11-12

Interval Estimation — Overview

  • Topic: Interval Estimation: how to estimate population parameters with a margin of uncertainty
  • Focus: Confidence Intervals (CIs) for means and proportions
  • Examples use the built-in mtcars dataset and a small binary example
  • We’ll also see how sample size and confidence level change the interval width

What is a Confidence Interval? (LaTeX #1)

A \(100(1-\alpha)\%\) confidence interval for a parameter \(\theta\) is

\[ [L(X),\,U(X)] \]

constructed from data \(X\) such that

\[ \Pr_\theta\{\,L(X)\le\theta\le U(X)\,\}=1-\alpha. \]

Interpretation: The interval procedure would capture the true value in about \(100(1-\alpha)\%\) of repeated samples.

CI for a Mean with Unknown \(\sigma\) (LaTeX #2)

If we sample \(X_1,\dots,X_n\) from a population with mean \(\mu\) and unknown variance \(\sigma^2\):

\[ \bar{X} \pm t_{1-\alpha/2,\;n-1}\;\frac{S}{\sqrt{n}} \]

where
- \(\bar{X}\): sample mean
- \(S\): sample standard deviation
- \(t_{1-\alpha/2,\;n-1}\): critical value from the t-distribution

We will compute this for car mileage data (mtcars$mpg).

Setup (Packages & Data)

  • The dot shows the sample mean \(\bar{x}\).
  • The horizontal line shows the 95% confidence interval.
  • So we’re 95% confident the true mean MPG lies between the endpoints.

How Confidence Level Affects Width (ggplot #2)

Observation: As we increase confidence level,
the interval gets wider because we strive to be more sure the true mean is inside it.

Interactive: CIs from Many Resamples (plotly ≥1)

Each line represents a 95% confidence interval for the mean.
Most of them contain the dashed red line (our overall sample mean), showing how the confidence interval method works.

CI for a Proportion (Formulas)

For a binary variable with sample proportion \(\hat{p}\) and sample size \(n\):

  • Wald CI
    \[ \hat{p} \pm z_{1-\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

  • Wilson CI (more accurate for small \(n\) or extreme \(\hat{p}\)):
    \[ \frac{\hat{p} + \frac{z^2}{2n} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}}} {1 + \frac{z^2}{n}}. \]

We will test both on a small binary example from the car data.

Example: CI for a Proportion (Code Slide – Visible)

# Define "success" as mpg >= 25 (fuel-efficient car)
y <- as.integer(df$mpg >= 25)
n_p  <- length(y)
phat <- mean(y)
alpha <- 0.05
z <- qnorm(1 - alpha/2)

# Wald CI
wald <- c(
  phat - z * sqrt(phat*(1-phat)/n_p),
  phat + z * sqrt(phat*(1-phat)/n_p)
)

# Wilson CI
wilson_num <- phat + z^2/(2*n_p)
wilson_rad <- z*sqrt(phat*(1-phat)/n_p + z^2/(4*n_p^2))
wilson_den <- 1 + z^2/n_p
wilson <- c((wilson_num - wilson_rad)/wilson_den,
            (wilson_num + wilson_rad)/wilson_den)

list(n = n_p, phat = phat, Wald = wald, Wilson = wilson)
## $n
## [1] 32
## 
## $phat
## [1] 0.1875
## 
## $Wald
## [1] 0.05226615 0.32273385
## 
## $Wilson
## [1] 0.08889545 0.35309155

The Wilson interval is often be slightly narrower and better centered when proportions are near 0 or 1.

What to Report

When writing up confidence interval results, include:

  • Point estimate (like \(\bar{x}\) or \(\hat{p}\))
  • Confidence level and the interval bounds
  • Method used: t-interval, Wald, or Wilson
  • Assumptions: approximate normality or binomial model as appropriate
  • Sample size: larger \(n\) means a more precise (narrower) interval

Recreate the Mean CI (Code-Only Reference)

x <- mtcars$mpg
n <- length(x)
xbar <- mean(x); s <- sd(x)
alpha <- 0.05
tcrit <- qt(1 - alpha/2, df = n - 1)
ME <- tcrit * s / sqrt(n)
c(lower = xbar - ME, upper = xbar + ME)

Takeaways

  • Confidence Intervals quantify uncertainty, not probability.
  • Wider intervals = higher confidence or smaller samples.
  • Use t-intervals for means when \(\sigma\) is unknown.
  • Use Wilson for proportions when \(n\) is small or \(\hat{p}\) is near 0 or 1.
  • Always interpret intervals in words, not just numbers.