Overview
The Central Limit Theorem is one of the most important theorems in all of statistics. In this report, we will see it in action by simulating random variables from the exponential distribution and observing that the distribution of sample means is approximately normally distributed.
Simulation
- We will simulate random normal variables from the exponential distribution
- The PDF for the exponential distribution for given rate \(\lambda\) is \(f \left(x, \lambda\right) = \lambda e ^ {- \lambda x}\) for \(x \geq 0\)
- Note that \(\mu = E[X] = \frac1\lambda\) and \(\sigma = \frac1\lambda\)
- Let’s first simulate some random variables from this distribution to get a sense of what it looks like.
- We will use \(\lambda = 0.2\) for our simulations
dat <- data.frame(x = rexp(10000, 0.2))
g <- ggplot(dat, aes(x = x))
g + geom_histogram(aes(y = ..density..), binwidth = .5) +
stat_function(fun = dexp, args = list(rate = 0.2), size = 1) +
ggtitle("Random Variables from Exponential Distribution")

- Now that we have a sense of what the exponential distribution looks like, let’s simulate some sample means
- We expect our distribution of sample means to look somewhat like the normal distribution
- For this simulation, we will do 1000 instances of 40 random variables
dat <- data.frame(
x = apply(matrix(rexp(40000, .2), 1000), 1, mean))
g <- ggplot(dat, aes(x = x))
g + geom_histogram(aes(y = ..density..)) +
ggtitle("Distribution of Sample Means")

Sample Mean vs. Theoretical Mean
- Let’s calculate our sample mean and compare it to the population mean
- We know that expected value of our distribution of sample means is centered at the population mean
- In other words, our mean should converge to \(E[\bar{x}] = \frac1\lambda = 5\)
smean <- round(mean(dat$x), 2)
- So our expected value is \(E[\bar{x}] = 5\) and our sample mean is approximately 4.97
Sample Variance vs. Theoretical Variance
- Now let’s calculate our sample variance and compare it to the population variance
- We know that the variance of our distribution of sample means is defined as follows: \(Var[\bar{x}] = \frac {\sigma^2}{n}\)
- So for our simulation, \(Var[\bar{x}] = \frac {\sigma^2}{n} = \frac{1 / \lambda^2}{40} = \frac{1 / 0.2^2}{40} = .625\)
svar <- round(var(dat$x), 2)
- So our expected variance is \(Var[\bar{x}] = .625\) and our sample variance is approximately 0.61
Applying the Central Limit Theorem
- The Central Limit Theorem states that once normalized, the distribution of sample means \(\bar{x}\) converges to that of a standard normal
- Considering that \(\mu = \frac1\lambda\) and \(\sigma = \frac1\lambda\), our normalized sample mean can be rewritten as follows:
\[\frac{\bar X_n - \mu}{\sigma / \sqrt{n}}=
\frac{\sqrt n (\bar X_n - \mu)}{\sigma}=
\frac{\sqrt n (\bar X_n - 1 / \lambda)}{1 / \lambda}=
\sqrt n (\bar X_n - 1 / \lambda) \lambda
\]
- Let’s use this equation to normalize our distribution of sample means and compare it to that of a standard normal.
cfunc <- function(x, n){sqrt(n) * (mean(x) - 1/(.2)) * (.2)}
dat <- data.frame(
x = apply(matrix(rexp(40000, .2), 1000), 1, cfunc, 40))
g <- ggplot(dat, aes(x = x))
g + geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm) +
ggtitle("Normalized Distribution of Sample Means")

- We can tell visually that this distribution is approximately normal
- Additionally, the mean is now centered at \(x = 0\) and the variance is as we would expect