Interval Estimation

What is Interval Estimation?

Instead of giving a single value for an unknown parameter, interval estimation gives us a range of plausible values.

For example: “We are 95% confident the true mean lies between 48.2 and 53.8”

Main ideas:

Point Estimate: one best guess (like the sample mean)
Confidence Interval: a range built around that estimate
Confidence Level: how often the interval captures the true value (90%, 95%, 99%)

The Formula

For estimating a population mean \(\mu\), a confidence interval looks like this:

\[\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

where:

\(\bar{x}\) is the sample mean
\(z_{\alpha/2}\) is the critical value from a standard normal table
\(\sigma\) is the population standard deviation
\(n\) is the sample size

For a 95% CI we use \(z_{\alpha/2} = 1.96\), and for 99% we use \(z_{\alpha/2} = 2.576\).

What if we don’t know \(\sigma\)?

We mostly don’t know the true population standard deviation. In that case we use \(s\) and switch to the t-distribution:

\[\bar{x} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}}\]

The \(t\)-distribution has heavier tails than the normal, which accounts for the extra uncertainty from estimating \(\sigma\).

As \(n\) gets large, the t-distribution gets closer and closer to the standard normal.

Example: Weekly Study Hours

Lets say we collect data from 30 students about how many hours they study each week.

set.seed(42)
study_hours <- rnorm(30, mean = 15, sd = 4)
n <- length(study_hours)
xbar <- mean(study_hours)
s <- sd(study_hours)
t_crit <- qt(0.975, df = n - 1)
lower <- xbar - t_crit * (s / sqrt(n))
upper <- xbar + t_crit * (s / sqrt(n))
cat("mean:", round(xbar, 2), " lower:", round(lower, 2), " upper:", round(upper, 2))

## mean: 15.27  lower: 13.4  upper: 17.15

Histogram of the Sample

How Sample Size Affects CI Width

3D Plot: CI Width vs n and Confidence Level

Common Misunderstanding

A 95% CI does not mean there is a 95% probability that \(\mu\) is in the interval.

\(\mu\) is a fixed number, we just don’t know it. After we compute an interval, the true mean either is or isn’t in it.

The correct way to think about it:

If we took many samples and built a CI each time, about 95% of those intervals would contain the true \(\mu\).

This is a statement about the procedure, not any single interval.

Summary

	Formula
Z-interval	\(\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\)
t-interval	\(\bar{x} \pm t_{\alpha/2,n-1} \cdot \frac{s}{\sqrt{n}}\)
Margin of Error	\(E = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\)
Sample size needed	\(n = \left(\frac{z_{\alpha/2} \cdot \sigma}{E}\right)^2\)

Main takeaways: bigger sample = narrower interval, higher confidence = wider interval, use t when \(\sigma\) is unknown.