hw3

2025-11-12

Interval Estimation — Overview

Topic: Interval Estimation: how to estimate population parameters with a margin of uncertainty
Focus: Confidence Intervals (CIs) for means and proportions
Examples use the built-in mtcars dataset and a small binary example
We’ll also see how sample size and confidence level change the interval width

What is a Confidence Interval? (LaTeX #1)

A $100(1-\alpha)\%$ confidence interval for a parameter $\theta$ is

\[ [L(X),\,U(X)] \]

constructed from data $X$ such that

\[ \Pr_\theta\{\,L(X)\le\theta\le U(X)\,\}=1-\alpha. \]

Interpretation: The interval procedure would capture the true value in about $100(1-\alpha)\%$ of repeated samples.

CI for a Mean with Unknown $\sigma$ (LaTeX #2)

If we sample $X_1,\dots,X_n$ from a population with mean $\mu$ and unknown variance $\sigma^2$:

\[ \bar{X} \pm t_{1-\alpha/2,\;n-1}\;\frac{S}{\sqrt{n}} \]

where
- $\bar{X}$: sample mean
- $S$: sample standard deviation
- $t_{1-\alpha/2,\;n-1}$: critical value from the t-distribution

We will compute this for car mileage data (mtcars$mpg).

Setup (Packages & Data)

The dot shows the sample mean $\bar{x}$.
The horizontal line shows the 95% confidence interval.
So we’re 95% confident the true mean MPG lies between the endpoints.

How Confidence Level Affects Width (ggplot #2)

Observation: As we increase confidence level,
the interval gets wider because we strive to be more sure the true mean is inside it.

Interactive: CIs from Many Resamples (plotly ≥1)

Each line represents a 95% confidence interval for the mean.
Most of them contain the dashed red line (our overall sample mean), showing how the confidence interval method works.

CI for a Proportion (Formulas)

For a binary variable with sample proportion $\hat{p}$ and sample size $n$:

Wald CI
\[ \hat{p} \pm z_{1-\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
Wilson CI (more accurate for small $n$ or extreme $\hat{p}$):
\[ \frac{\hat{p} + \frac{z^2}{2n} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}}} {1 + \frac{z^2}{n}}. \]

We will test both on a small binary example from the car data.

Example: CI for a Proportion (Code Slide – Visible)

# Define "success" as mpg >= 25 (fuel-efficient car)
y <- as.integer(df$mpg >= 25)
n_p  <- length(y)
phat <- mean(y)
alpha <- 0.05
z <- qnorm(1 - alpha/2)

# Wald CI
wald <- c(
  phat - z * sqrt(phat*(1-phat)/n_p),
  phat + z * sqrt(phat*(1-phat)/n_p)
)

# Wilson CI
wilson_num <- phat + z^2/(2*n_p)
wilson_rad <- z*sqrt(phat*(1-phat)/n_p + z^2/(4*n_p^2))
wilson_den <- 1 + z^2/n_p
wilson <- c((wilson_num - wilson_rad)/wilson_den,
            (wilson_num + wilson_rad)/wilson_den)

list(n = n_p, phat = phat, Wald = wald, Wilson = wilson)

## $n
## [1] 32
## 
## $phat
## [1] 0.1875
## 
## $Wald
## [1] 0.05226615 0.32273385
## 
## $Wilson
## [1] 0.08889545 0.35309155

The Wilson interval is often be slightly narrower and better centered when proportions are near 0 or 1.

What to Report

When writing up confidence interval results, include:

Point estimate (like $\bar{x}$ or $\hat{p}$)
Confidence level and the interval bounds
Method used: t-interval, Wald, or Wilson
Assumptions: approximate normality or binomial model as appropriate
Sample size: larger $n$ means a more precise (narrower) interval

Recreate the Mean CI (Code-Only Reference)

x <- mtcars$mpg
n <- length(x)
xbar <- mean(x); s <- sd(x)
alpha <- 0.05
tcrit <- qt(1 - alpha/2, df = n - 1)
ME <- tcrit * s / sqrt(n)
c(lower = xbar - ME, upper = xbar + ME)

Takeaways

Confidence Intervals quantify uncertainty, not probability.
Wider intervals = higher confidence or smaller samples.
Use t-intervals for means when $\sigma$ is unknown.
Use Wilson for proportions when $n$ is small or $\hat{p}$ is near 0 or 1.
Always interpret intervals in words, not just numbers.