We will use two examples to better understand the Central Limit Theorem: flipping coins and rolling dice.

Flipping coins

Let’s look at how a normal distribution can emerge from sampling from populations with underlying non-normal distributions. The Central Limit Theorem states that the sampling distribution of the means of multiple samples drawn from the same population will approach normality as the number of samples increases (> ~30), regardless of the underlying distribution of the population we are sampling from.

We take coin flipping as an example. Each coin flip has only two possible outcomes (heads or tails) and follows therefore a binomial distribution Let’s simulate this:

We start by defining the space of possible outcomes:

possible_outcomes <- c(0,1)

where 0 = tails and 1 = heads.

We then define what the probability \(P\) of each outcome is – in this case it’ 50/50, as we assume the coin is not rigged:

P <- 0.5

Then we decide how many times we want to flip the coin. Let’s do it 10 times:

n <- 10

Now we are ready do the simulation. We use the sample() function to take 10 random samples from our space of possible outcomes, with replacement. I have built a very simple function called flip_coin that simulates the flipping for me (see whether you can figure out what the function does):

flip_coin <- function(){sample(possible_outcomes, size = n, prob = c(P, 1 - P), replace = TRUE)}

flips <- flip_coin()
flips
##  [1] 1 1 0 0 1 1 1 0 1 0

We get 4 tails and 6 heads. So this means that we got a rate of 0.6 heads to tails. Let’s plot this mean value in a histogram:

plot(mean(flips))

Now let’s repeat this sampling process (i.e., the 10 coin flips) a bunch of times, say, five times. For each time, we take the mean (i.e., ratio of heads to tails) and plot it:

hist(replicate(5, mean(flip_coin())))

As we increase the number of replications (and therefore the size of the sample means in the sampling distribution), the sampling distribution starts looking very much like a normal distribution:

hist(replicate(100, mean(flip_coin())))

hist(replicate(1000, mean(flip_coin())))

hist(replicate(10000, mean(flip_coin())), breaks = 100)

What does the sampling distribution of the sample means tell us? It tells us that for any time we flip a coin 10x, we have pretty good chances of getting a balanced ratio of heads and tails: most often 50% heads, sometimes 40% tails and 60% tails (or the opposite), and so on. It is much rarer to get more unbalanced results such as 20% tails and 80% heads, but this is still possible. Notice that the “true mean”, that is, the true ratio of possible heads and tails in the “population” is 50/50. We know this theoretically and can’t observe it directly, but the Central Limit Theorem tells us that the mean of the sampling distribution of the sample means (in our case, 0.5047) approximates the true mean of the population (0.5) if the sample size is large enough.

How large is “large enough”? The literature tells us that this means approximately > 30. Let’s see what happens if we again run 10,000 replications of the coin flips, this time with n = 30 instead of n = 10:

n <-  30
hist(replicate(10000, mean(flip_coin())), breaks = 100)

We get a sampling distribution of the sample means that is much more closely centered around the mean.

Rolling dice

Until now, we have dealt with a population that follows a binomial distribution (= only two possible outcomes), but the Central Limit Theorem works with any underlying population distribution. Let’s try rolling a die instead of flipping a coin. In this case, dice rolling theoretically follows a uniform distribution where all values between 1 and 6 are equally possible for any roll.

hist(purrr::rdunif(10000, 1, 6))

Let’s simulate rolling one die using the same procedure as above, but changing our parameters accordingly:

possible_outcomes <- as.integer(c(1:6))
P <- 100/6
n <- 10
rolls <- sample(possible_outcomes, size = n, replace = TRUE)
rolls
##  [1] 1 1 5 4 4 2 3 2 6 2

Nice. We got 2 x 1, 3 x 2, 1 x 3, 2 x 4, 1 x 5, and 1 x 6. The mean of this die roll is 3. Let’s write it down in a plot.

plot(mean(rolls))

Let’s replicate these 10 rolls 10,000 times!

die_roll <- function(){sample(possible_outcomes, size = n, replace = TRUE)}
hist(replicate(10, as.integer(mean(die_roll()))))

Although this is technically a discrete distribution (only integers are allowed), we still get a normally-shaped curve, where it is most likely to get a mean of three than a mean of 1 or 6.

What are the implications of this?

We can use the logic behind the Central Limit Theorem to quantify how well our sample mean \(\bar{x}\) approximates the true mean of the population \(\mu\). Let’s look back at the distribution of the means of our coin flipping:

possible_outcomes <- c(0,1)
P <- 0.5
n <-  30
hist(replicate(10000, mean(flip_coin())), breaks = 100)

This distribution is called the sampling distribution of the sample means. When the standard deviation of this distribution is small (i.e., the distribution is less flat and more pointy), the mean is as a good model of the data, that is, the mean is a good approximation of the true mean in the population. (Notice how this follows the same logic as the fact that the sample standard deviation \(s\) can be used as a measure of how good a model the mean \(\bar{x}\) is for our sample data.) The standard deviation of the sampling distribution of the sample means is called the standard error and it has this formula:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{N}} \]

This formula would allow us to calculate the standard deviation of the sampling distribution of the sample means, thus allowing to tell something about how well different samples drawn from the same population represent the population they came from.

However, there is a catch: often times, we only have access to one sample from one population. When we only have one sample, we also only have one mean, and thus no sampling distribution of the sample mean. What do we do? The good news is that if our sample size \(n\) is relatively large, and if our sample is relatively normally distributed, then we can use our sample standard deviation as an approximation of \(\sigma\) in the formula above. This means that we can calculate the standard error of the mean by only relying on the parameters we know from our population, in this case:

\[ \sigma_{\bar{x}} \approx \frac{s}{\sqrt{N}} \]

Let’s use our personality test data as an example. Say that we want to figure out what the standard error of the breath hold variable is. We know the standard deviation of the variable, and we know our sample size:

sd(data$breathhold)
## [1] 22.49044
length(data$breathhold)
## [1] 48

Following the formula above, we can use these two parameters to compute the standard error of the mean:

standard_error <- sd(data$breathhold)/sqrt(length(data$breathhold))
standard_error
## [1] 3.246215

The standard error of the mean for this variable is 3.25! We can use this number as a measure of the statistical accuracy of our sample mean as an estimate of the true population mean. Cool!