Comparison of Theoretical and Sample Means

Overview

This is a simulation to compare the theoretical mean and variance of a large number of samples from an exponential distribution with the theoretical mean and variance of such a distribution. This goes to the heart of the Central Limit Theorem. Even though we are generating relatively small samples from an exponential distribution, according to the Central Limit Theorem, when we look at the mean of the sample means of a large enough number of samples, the result should approximate the means taken from a normal (or Gaussian) distribution and should closely match the theoretical mean.

Background

What is the Central Limit Theorem?

The Central Limit Theorem is the foundation of inferential statistics. It describes the relationship between the sampling distribution of sample means and the population from which the samples are taken. It says that the distribution of sample means has the same mean as the population. So even if the samples are drawn from a population that does not have a normal distribution, for large enough samples, when we look at the distribution of the sample means, this distribution should approximate a normal distribution and the means of this sampling distribution should be very close to that of the original population, as should the variance.

What is an Exponential Distribution?

The exponential distribution describes the arrival time of a randomly recurring independent event sequence. (This definition is not important to this project, but it is nice to know a little about the population from which we will be sampling.)

Initialize Simulation Settings

lambda <- 0.2
sampleSize <- 40
numberOfSamples <- 1000

Simulations

To investigate the Central Limit Theorem and to see whether or not the characteristics (mean, variance, and standard deviation) of repeated samples taken from an exponential distribution match the expected characteristics, we draw 1000 random samples of size N=r sampleSize from an exponential distribution with a defined rate of \(\lambda\) = 0.2. We will then look at the distribution of these sample means and compare the mean of the sample means to the theorectical mean to see how well it matches the theoretical mean. We also see that the distribution exhibits a large degree of normality.

The specified rate, \(\lambda\), for this simulation is 0.2. (\(\lambda\) is the average number of events per unit of time.) The expected (theoretical) mean for the population is \(\frac{1}{\lambda}\) = \(\frac{1}{0.2}\) = 5

Sample Mean versus Theoretical Mean

means <- vector()
for (i in 1:1000) {
    means <- c(means, mean(rexp(sampleSize, lambda)))
}
hist(means, breaks=40)
lines(density(means))
abline(v=1/lambda, col="red") # The red line indicates the actual mean.

The mean of sample means is 5.0236023, approximately equal to the theoretical mean of 5, as would be expected from the Central Limit Theorem.

Sample Variance versus Theoretical Variance

Variance

The theoretical variance of this distribution is \(\frac{(\frac{1}{\lambda})^2}{n}\), or \(\frac{\frac{1}{0.04}}{40}\) = 0.625.

The variance of the sample means is 0.7056078. As with the mean of the sample means, the sample variance is quite close to the expected theoretical variance.

Standard Deviation

The theoretical standard deviation for the exponential distribution is \(\frac{\frac{1}{\lambda}}{\sqrt{n}} = \frac{\frac{1}{0.2}}{\sqrt{40}} = 0.7905694\).

The standard deviation of the sample means is 0.8400046. This result is also approximately equal to the theoretical standard deviation of the distribution.

Distribution

To examine how closely this distribution of sample means approximates a normal distribution, we can look at the Quantile-Quantile plot. The closer the values lie to the line, the better the fit. Here we see that the plot suggests a high degree of normality. From the Central Limit Theorem, this is what we expect to see.

qqnorm(means)
qqline(means)

Conclusions

When looking at the characteristics of the 1000 samples taken from an exponential distribution, we see that the mean and variance of the sample means closely approximates the theoretical mean and variance. Additionally, the distribution of the sample means closely approximates a normal distribution. These results are what would be expected from the Central Limit Theorem.