Overview

Investigate the exponential distribution and compare it with the Central Limit Theorem.
The Central Limit Theorem (CLT) states that the sampling distribution (shape) of X (X-bar/sample means of the sampling distribution) is approximately normal if the sample size is large enough.
If you take many samples and then calculate the mean of each, then plot their distribution, it would look like a normal distribution as your number of samples get more and more (approaching infinity). This plot is called the Sampling Distribution of the Sample Mean. Also, as your sample sizes get bigger, the less variance from the mean you have on the plot of the sample means’ distribution.
You can use this to approximate the value distribution of X when you only have a small sample size.

Simulation

Generate Sample

Here we generate a sample of exponents, by using the rexp R function to create an exponential distribution with the following attributes:
lambda = 0.2
n (sample size) = 40

In this graph you will see that the distribution is not a Gaussian (normal) distribution, i.e. it is not bell-shaped. The blue line represents the actual mean and the red line the theoretical mean.

set.seed(770317)
expon <- rexp(40,0.2)
hist(expon, main = "Sample Distribution")
abline(v=mean(expon), col = "blue")
abline(v=5, col = "red")
legend("topright", pch = "_", col = c("blue", "red"), legend = c("Sample Mean", "Theoretical Mean"))

Findings

The theoretical mean is 5.00, and the generated sample mean is 4.29

Generate Sample Means

We will create a vector of 1000 mean values of of the same random distribution (lambda = 0.2 and n = 40) and then plot a histogram to see if the distribution is different, as well as calculate the mean and variance to see if it is closer to the theoretical mean and variance.

We can also see that the distribution is now closer to, or an approximate normal distribution and bell-shaped, and that it is approximately symetric around the mean (indicated by the red).

exponmns <- NULL
for (i in 1 : 1000) {
    exponmns <- c(exponmns, mean(rexp(40,0.2)))
}

hist(exponmns, main = "Sampling Distribution of the Sample Means")
abline(v = round(mean(exponmns),0), col = "red", lwd = 3)

Sample Mean vs. Theoretical Mean

The mean for the sampling distribution is 5.05

It is clear that the mean of the 1000 sample means is a lot closer to the theoretical mean of 5.00 than the initial mean of the sample we first generated.

Sample Variance vs. Theoretical Variance

The theoretical variance of an exponential distribution is calculated wit this formula: 1/(lambda^2 / n), and thus, in this case should be 1/(0.2)^2/40 = 0.63

The variance of the first sample is 0.39

The variance of the sampling distribution is 0.61 and it is a lot closer to the theoretical variance than the first sample.

There is also a lot less variance in the 1000 sample means than there is in the first sample we generated.

Conclusion

With the mean of the sample means and its variance being a lot closer to the theoretical mean and variance, and with the variance in the data being less in the sample means, it proves the Central Limit Theorem (CLT) which states that the sampling distribution (shape) of X (X-bar/sample means of the sampling distribution) is approximately normal, and that you can approximate the mean and variance of a population, if the sample size is large enough, and that as your sample sizes get bigger, the less variance from the mean you have in the sample means’ distribution.

Central Limit Theorem Simulation

Johann Raath

10 July 2017