A Comparison of the Exponential Distribution with the Central Limit Theorem

Synopsis

In this report I aim to investigate the exponential distribution in R and compare it with the Central Limit Theorem. The report will illustrate the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages 40 exponentials. It will illustrate the insight that a hypothetical distribution of sample means would be distributed normally in what is commonly referred to as a “bell curve”.

Simulations

This report uses the rexp(n, lambda) function in R to calculate the mean of a random sample of 40 exponentials and repeat for one thousand simulations. The sample mean and variance of the resulting distribution of averages will be compared to the theoretical mean and variance.

lambda <- 0.2 # the rate parameter of rexp function
mu <- 1/lambda # theoretical mean of the exponential distribution
sd <- 1/lambda # theoretical standard deviation of the exponential distribution
n <- 40 # number of sample exponentials
se <- sd / sqrt(n) # 
svar <- sd^2 / n # theoretical variance
nosim <- 1000 # number of simulations

set.seed(1) # to make the result reproducible
x <- NULL
for (i in 1:nosim) x = c(x, mean(rexp(n, lambda))) # create the distribution of a large collection (1000)
# of averages of 40 # exponentials

Sample Mean versus Theoretical Mean

The normal distribution has the property that it is symmetrical about the mean. The sample mean and the theoretical mean are both measures of central tendency. Comparing the sample mean to the theoretical mean demonstrates that the sample mean is an estimate of the population mean.

round(c(mean(x), mu), 2)

## [1] 4.99 5.00

Sample Variance versus Theoretical Variance

The normal distribution also has a predictable area under the curve within specified distances of the mean. Approximately 68%, 95% and 99% of the normal density lies within 1, 2 and 3 standard deviations from the mean respectively. Comparing the sample variance to the theoretical variance demonstrates that the sample variance is an estimate of the population variance.

round(c(var(x), svar), 2)

## [1] 0.61 0.62

Distribution

# create the X-axis grid to overlay the normal curve
x2 <- seq(min(x), max(x), length = length(x))

# create the Normal curve
fun <- dnorm(x2, mean = mean(x), sd = sd(x))

# create the Histogram of the simulation
par(mfrow = c(1,2))
hist(x, prob = TRUE, # probability = true produces a density plot 
     ylim = c(0, max(fun)),
     main = "", xlab = "")
lines(x2, fun, col = 2, lwd = 2)
abline(v = mean(x), col="red", lwd=3, lty=2)
abline(v = mean(mu), col="blue", lwd=3, lty=5)

set.seed(1)
y <- rexp(nosim, lambda) # create the distribution of a large collection (1000) of random exponentials
hist(y, prob = TRUE, freq = FALSE, main = "", xlab = "")
curve(dexp(x, rate=0.2, log=FALSE), add = TRUE)

Figure 1: On the left, a histogram of the distribution of averages of 40 exponentials over 1000 simulations. Fitted with a normal curve. The blue and red vertical lines represent the theoretical and sample means respectively. On the right is the distribution of a large collection of random exponentials

The Central Limit Theorem states that the distribution of averages of iid variables properly normalised becomes that of a standard normal as the sample size increases. This report has used the simulation of random exponentials to produce the distributions in Figure 1. One can determine that the distribution on the left is normally distributed because we can see it is shaped like a “bell curve”. The normal distribution can be characterised by its mean and standard deviation and I have shown by calculating the sample mean and variance and comparing how close the results are to the theoretical mean and variance that they are consistent with the population mean and variance.