Overview

The Central Limit Theorem is one of the most important theorems in all of statistics. In this report, we will see it in action by simulating random variables from the exponential distribution and observing that the distribution of sample means is approximately normally distributed.

Simulation

We will simulate random normal variables from the exponential distribution
The PDF for the exponential distribution for given rate \(\lambda\) is \(f \left(x, \lambda\right) = \lambda e ^ {- \lambda x}\) for \(x \geq 0\)
Note that \(\mu = E[X] = \frac1\lambda\) and \(\sigma = \frac1\lambda\)
Let’s first simulate some random variables from this distribution to get a sense of what it looks like.
We will use \(\lambda = 0.2\) for our simulations

dat <- data.frame(x = rexp(10000, 0.2))
g <- ggplot(dat, aes(x = x))
g + geom_histogram(aes(y = ..density..), binwidth = .5) +
  stat_function(fun = dexp, args = list(rate = 0.2), size = 1) +
  ggtitle("Random Variables from Exponential Distribution")

Now that we have a sense of what the exponential distribution looks like, let’s simulate some sample means
We expect our distribution of sample means to look somewhat like the normal distribution
For this simulation, we will do 1000 instances of 40 random variables

dat <- data.frame(
  x = apply(matrix(rexp(40000, .2), 1000), 1, mean))
g <- ggplot(dat, aes(x = x))
g + geom_histogram(aes(y = ..density..)) +
  ggtitle("Distribution of Sample Means")

Sample Mean vs. Theoretical Mean

Let’s calculate our sample mean and compare it to the population mean
We know that expected value of our distribution of sample means is centered at the population mean
In other words, our mean should converge to \(E[\bar{x}] = \frac1\lambda = 5\)

smean <- round(mean(dat$x), 2)

So our expected value is \(E[\bar{x}] = 5\) and our sample mean is approximately 4.97

Sample Variance vs. Theoretical Variance

Now let’s calculate our sample variance and compare it to the population variance
We know that the variance of our distribution of sample means is defined as follows: \(Var[\bar{x}] = \frac {\sigma^2}{n}\)
So for our simulation, \(Var[\bar{x}] = \frac {\sigma^2}{n} = \frac{1 / \lambda^2}{40} = \frac{1 / 0.2^2}{40} = .625\)

svar <- round(var(dat$x), 2)

So our expected variance is \(Var[\bar{x}] = .625\) and our sample variance is approximately 0.61

Applying the Central Limit Theorem

The Central Limit Theorem states that once normalized, the distribution of sample means \(\bar{x}\) converges to that of a standard normal
Considering that \(\mu = \frac1\lambda\) and \(\sigma = \frac1\lambda\), our normalized sample mean can be rewritten as follows:
\[\frac{\bar X_n - \mu}{\sigma / \sqrt{n}}= \frac{\sqrt n (\bar X_n - \mu)}{\sigma}= \frac{\sqrt n (\bar X_n - 1 / \lambda)}{1 / \lambda}= \sqrt n (\bar X_n - 1 / \lambda) \lambda \]
Let’s use this equation to normalize our distribution of sample means and compare it to that of a standard normal.

cfunc <- function(x, n){sqrt(n) * (mean(x) - 1/(.2)) * (.2)}
dat <- data.frame(
  x = apply(matrix(rexp(40000, .2), 1000), 1, cfunc, 40))
g <- ggplot(dat, aes(x = x))
g + geom_histogram(aes(y = ..density..)) +
  stat_function(fun = dnorm) + 
  ggtitle("Normalized Distribution of Sample Means")

We can tell visually that this distribution is approximately normal
Additionally, the mean is now centered at \(x = 0\) and the variance is as we would expect

Demonstration of the Central Limit Theorem Through Simulation

Lucas McLaughlin

Thursday, June 18, 2015

Overview

Simulation

Sample Mean vs. Theoretical Mean

Sample Variance vs. Theoretical Variance

Applying the Central Limit Theorem