Visualization of The Central Limit Theorem

Project Overview

The purpose of this exercise is to compare the mean of a sample of 40 random exponentials with a given rate of .2, to the theoretical mean (1/rate) which would be 5. The average means will be calculated from a simulation of 1000 samples. Additionally, we will compare the variance of the means to the theoretical variance. Finally, we will compare the distribution of the means to an associated normal distribution to visualize The Central Limit Theorem.

Sample Mean and Theoretical Mean Comparison

To begin, let’s obtain our simulation data. We need a sample of 40 exponential random variables with rate .2, and we want 1000 of these samples.

set.seed(010101)
lambda <- .2 #rate
mu <- 1/lambda #mean
n_sims <- 1000
n_exp <- 40
#Find Sample means of 40 exponentials, simulated 1000 times:
mns = NULL
for(i in 1:n_sims) mns = c(mns, mean(rexp(n_exp, lambda)))

avg <- data.frame(mean(mns), mu, row.names = "Means")
colnames(avg) <- c("Sample", "Theoretical")
avg
##         Sample Theoretical
## Means 4.968124           5

We observe that the sample mean and the theoretical mean are very close. Our sample size of 40 exponentials is fairly large giving us reason to believe our results follow The Central Limit Theorem. Additionally, we have 1000 simulations of samples to average.

par(mfrow = c(1,2), mar = c(2,4,2,4))
hist(mns, prob = TRUE, breaks = seq(from = 2, to = 8, by = .2), main = "Distribution of\n Sample Means of\n 40 Exponentials \n With 1000 Simulations", xlab = "Sample Means")
lines(density(mns))
hist(mns, breaks = seq(from = 2, to = 8, by = .2), main = "Histogram of\n Sample Means of\n 40 Exponentials \n With 1000 Simulations", xlab = "Sample Means")

The figure on the right gives us a histogram of the means throughout our simulation. You will notice that there is a much higher frequency of means close to our theoretical mean of 5 versus mean values further from our population mean.

The figure on the left is the distribution of the means showing us the density of our simulated mean values. You will notice that the distribution of the means is close to the Gaussian distribution. This is a key result for a later conclusion.

Variablity of Sample Compared to Theoretical Variance

Next, we want to compare the variabilty of our sample means and the theoretical variance. By definition, the variance of an exponential is 1/lambda2.

true_var <- (1/lambda^2)
sample_var <- var(mns)*n_exp
vars <- data.frame(sample_var, true_var, row.names = "Variances")
colnames(vars) <- c("Sample", "Theoretical")
vars
##             Sample Theoretical
## Variances 24.03121          25

Once again, we observe that our sample variance and theoretical variance are very close, thanks to our large amount of simulations.

Note, the sample variance of our sample means is measuring the variability associated with the distribution of the means in our simulation. This variance should follow the population variance with a large simulation size (as it does in our case).

Comparing Distribution of Means to Corresponding Normal Distribution

Finally, taking the average of the means for each simulation and using these averages to observe the Gaussian nature of our simulation as per The Central Limit Theorem will prove that our simulation size is valid.

#Compare to normal
par(mfrow = c(1,1))#, mar = c(5,4,5,4))
hist(mns, prob = TRUE, breaks = seq(from = 2, to = 8, by = .2), main = "Comparing Distribution of Means (red) \nto Normal Distribution (blue)", xlab = "Sample Means")
lines(density(mns), col = "red")
xfit<-seq(min(mns),max(mns),length=1000) 
yfit<-dnorm(xfit,mean=mean(mns),sd=sd(mns)) 
lines(xfit, yfit, col="blue", lwd=2)

#http://www.statmethods.net/graphs/density.html sourced above normal curve code.

Notice that the blue curve is the true normal distribution. The red curve is the density curve for our sample means. You’ll notice that we have a very Gaussian distribution of means in our simulation.