Foundations of Statistical Inference
Harriet Goers
Why We Need Statistical Inference
Suppose women give Dems an average rating of 54, men give 49.
Is the 5-point gap real or just sampling noise?
Inference tells us: how confident can we be that our result reflects the population?
Key Concepts
Population
: All cases (e.g., all U.S. voters)
Parameter
: True population value (unknown)
Sample
: Subset we actually study
Statistic
: Estimate of the parameter based on the sample
Why Random Sampling Matters
Only
random samples
give every member of the population an equal chance of selection.
Random Sampling Error
Even with good samples, estimates vary.
This variation is
random sampling error
, not a mistake.
The Standard Error
Measures how much a statistic is expected to vary due to sampling error.
Decreases as sample size increases.
Central Limit Theorem
Sampling distributions of means form a bell curve.
Holds regardless of population shape, if n is large enough.
Z Scores and the Normal Curve
A Z score = how many SEs away a value is from the mean
Helps us use the
empirical rule
:
68% within 1 SE
95% within 2 SEs
Confidence Intervals
95% CI = sample statistic ± 1.96 × SE
Shows range where we expect population value to fall
Sample Size and Diminishing Returns
More data = less error, but benefits shrink with size
The t-Distribution for Small Samples
For small samples, use the
t-distribution
instead of normal
Thicker tails, wider CIs
Wrap-Up
Random sampling lets us infer from sample to population
Standard error
and
confidence intervals
express uncertainty
Larger
samples
and less
variation
= more precise estimates
Use
t-distribution
when sample sizes are small