Central Limit Theorem

2026-03-09

The Central Limit Theorem states:

Regardless of the original distribution, the distribution of sample means approaches a normal distribution as the sample size increases.
Sample must be random and each observation independent
\[ SE = \frac{\sigma}{\sqrt{n}} \] shows how as sample size increases, standard error decreases / fluctation from true mean

Why is this helpful?

As the sampling distribution of means becomes normalized, we can apply statistical methods that rely on normality
confidence intervals and hypothesis tests are statistical methods that rely on normality
With a sufficiantly large sample size, we can normalize skewed data sets
\[z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}\] is the Z-score formula that allows for normal statistical methods

Non-Normal frequency graph, raw data

The next couple slides:

In the next slides, we will visualize just how taking a distribution of sample means will normalize this distribution, starting with 200 sample means, and then 1000

Taking 200 samples of 30 from dataset

Taking 1000 samples of 30 from data set

R Code for taking 1000 samples of 30

set.seed(1)

sample_means <- replicate(1000, mean(sample(mpg$hwy, 30, replace = TRUE)))

ggplot(data.frame(sample_means), aes(x = sample_means)) +
  geom_histogram(binwidth = .5, color = "blue")

The Central Limit Theorem states:

Why is this helpful?

Non-Normal frequency graph, raw data

The next couple slides:

Taking 200 samples of 30 from dataset

Taking 1000 samples of 30 from data set

R Code for taking 1000 samples of 30

Thanks for viewing!