2026-03-09

The Central Limit Theorem states:

  • Regardless of the original distribution, the distribution of sample means approaches a normal distribution as the sample size increases.
  • Sample must be random and each observation independent
  • \[ SE = \frac{\sigma}{\sqrt{n}} \] shows how as sample size increases, standard error decreases / fluctation from true mean

Why is this helpful?

  • As the sampling distribution of means becomes normalized, we can apply statistical methods that rely on normality

  • confidence intervals and hypothesis tests are statistical methods that rely on normality

  • With a sufficiantly large sample size, we can normalize skewed data sets

  • \[z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}\] is the Z-score formula that allows for normal statistical methods

Non-Normal frequency graph, raw data

The next couple slides:

In the next slides, we will visualize just how taking a distribution of sample means will normalize this distribution, starting with 200 sample means, and then 1000

Taking 200 samples of 30 from dataset

Taking 1000 samples of 30 from data set

R Code for taking 1000 samples of 30

set.seed(1)

sample_means <- replicate(1000, mean(sample(mpg$hwy, 30, replace = TRUE)))

ggplot(data.frame(sample_means), aes(x = sample_means)) +
  geom_histogram(binwidth = .5, color = "blue")

Thanks for viewing!