Distribution of Exponential Means and The Central Limit Theorem

Overview

This report investigates the exponential distribution in R and compares it with the Central Limit Theorem. In other words, it compares the distribution of 1000 means of 40 randomly calculated exponentials to the standard normal curve.

Simulations

The blow code plots in a histogram the distribution of 40 random exponentials (lambda = 0.2), and shows that they are not normally distributed.

lambda <- 0.2; n <- 40; exp <- NULL; exp_mean <- NULL
hist(rexp(n, lambda), xlab=NULL,main=NULL)
title("Distribution of Exponentials", xlab=paste0("Exponentials \n (mean= ", 1/lambda, ", sd= ", 1/lambda, ")"))

The next code below, plots in a histogram 1000 means of 40 random expeonentials (lambda = 0.2). It then superimposes a density curve (green) that looks approximately normal, fitting the CLT. It also uperimposes a standard normal curve (blue) along with the boundaries for the standard deviations around the distribution mean (red lines). You can also view the mean and sd of the distribution at the bottom of the figure.

## Calculate 1000 means from 40 random exponentials. 
lambda <- 0.2; n <- 40; exp <- NULL; exp_mean <- NULL
for(i in 1:1000) exp_mean <- c(exp_mean, mean(rexp(n, lambda)))

## Calculate the mean and sd of the distribution of means 
mymean <- mean(exp_mean)
mysd <- sd(exp_mean)

## Create a histogram for the means of the exponential simulations 
myhist <- hist(exp_mean, xlab=NULL,main=NULL, ylim = c(0, 300))
title("Distribution of Exponential Means", xlab=paste0("Exponential means \n (mean= ", round(mymean,3), ", sd= ", round(mysd,3), ")"))

## Fit a corresponding density curve for the distribution
multiplier <- myhist$counts / myhist$density
mydensity <- density(exp_mean)
mydensity$y <- mydensity$y * multiplier[1]
lines(mydensity, col = "green", lwd = 4)

## Plot a theoretical normal curve (CLT) over it using the same mean and sd of the distribution
myx <- seq(min(exp_mean), max(exp_mean), length.out= 100)
normal <- dnorm(x = myx, mean = mymean, sd = mysd)
lines(myx, normal * multiplier[1], col = "blue", lwd = 4)


## Add the (lower and upper) boundaries for the standard deviations around the distribution mean
sd_x <- seq(mymean - 3 * mysd, mymean + 3 * mysd, by = mysd)
sd_y <- dnorm(x = sd_x, mean = mymean, sd = mysd) * multiplier[1]
segments(x0 = sd_x, y0= 0, x1 = sd_x, y1 = sd_y, col = "red", lwd = 5)

Sample Mean versus Theoretical Mean

The mean of the distribution (mean=5.0169574) is very close to the mean of the theoretical population, which is 1/lambda (where lambda = 0.2) = 5.

Sample Variance versus Theoretical Variance

If we compare the first figure above to the second figure, we can easily see that the exponentials in the top figure are more widely distributed, therefore their have a larger variance. The second figure of the distribution of exponential means is more centered around the mean (ie the variance is smaller), which conforms to the central limit theorem. The sample variance is the square of the standard deviation, which is 1/lambda (where lambda = 0.2). So, the sample variance is = 25, while the theoretical variance is 0.5763472.

Distribution:

As the second figure above shows, the density plot of the exponential means approximates the normal distribution (compare the blue to the green plots), which means it conforms to the Central Limit Theorem. Also notice that the breaks of the distribution standard deviations (red) of (r mysd) fall along the 68th (+/-1sd), 95th (+/-2sd) and 99th (+/-3sd) percentiles, respectively. The 68% of the distribution falls within 4.2577821 and 5.7761327, while 95% of the distribution falls within 3.4986068 and 6.5353081, and 99% of the distribution falls within 2.7394314 and 7.2944834.