Overview

According to Caffo (2014), the Central Limit Theorem states that the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases. This report tests the claim by comparing sample and a theoretical distributions.

Simulations

First, let’s set the required parameters for the simulations.

# number of simulations
nosim <- 1000 
# lambda
lambda <- 0.2
# number of exponentials
n = 40

Next, let’s simulate the exponentials and generate their averages.

# run simulations
exps <- NULL
for (i in 1 : nosim) exps <- c(exps, mean(rexp(n, lambda)))
# double-check that the simulations have produced results
head(exps)
## [1] 5.518720 3.916157 3.968112 5.130657 5.014523 5.452589
length(exps)
## [1] 1000
# generate means
means <- cumsum(exps)/(1:nosim)

Sample Mean versus Theoretical Mean

With the simulations completed, it is time to compare the sample mean with the theoretical mean. The sample size is 40 exponential averages. The code below generates a chart with the cumulative means in blue and the theoretical mean in red. It is clear that the first few simulations generate means that range between 4.6 and 5.4. As the number of simulations increases, the means start converging around the theoretical mean 5.0 (red line) and vary only minimally. The sample mean in this case is 5.03 (green line).

# calculate and print the sample and theoretical means
smean <- mean(means)
tmean <- 1/lambda
round(smean, 2)
## [1] 5
round(tmean, 2)
## [1] 5
# plot sample and theoretical and means
plot(means, type="l", lwd=1.5, col = "blue",
     main = "Sample Mean versus Theoretical Mean",
     xlab = "Simulations",
     ylab = "Cumulative mean")
abline(h=tmean, col="red", lwd=1.5)
abline(h=smean, col = "green", lwd = 1.5)

Sample Variance versus Theoretical Variance

Following the logic of the Central Limit Theorem, the cumulative sample variance should get closer to the theoretical variance as the sample number increases. The plot below illustrates that by showing that sample variances fluctuate in initially and plateau around the theoretical variance (red line) as the number of samples increases. In the plot below, the green line maps out the average sample variance (0.64) and compared to the theoretical variance (0.62).

# calculate and print the sample and theoretical variances
vars <- cumsum(exps^2)/(1:nosim)-means^2
svar <- var(exps)
tvar <- (1/lambda)^2/n
round(svar, 2)
## [1] 0.6
round(tvar, 2)
## [1] 0.62
# plot sample and theoretical variances
plot(vars, type="l", lwd=1.5,
     main = "Sample Variance versus Theoretical Variance",
     xlab = "Number of simulations",
     ylab = "Cumulative variance")
abline(h=(1/lambda)^2/n, col="red", lwd=1.5)
abline(h=svar, col="green", lwd=1.5)

Distribution

The analysis above suggests that the cumulative sample mean and variance gravitate toward those of the theoretical distribution. If the Central Limit Theorem is correct, the shape of the distribution of exponential averages should resemble that of a standard normal instead of the theoretical distbitution of exponentials. That distribution shoudl be centered around the theoretical mean.

# generate the distribution of exponentials
set.seed(73)
texp <- rexp(nosim, lambda)
# plot the distribution of averages
par(mfrow=c(1, 2))
hist(exps, probability=TRUE, breaks=30,
     main="Averages of 40 Exponentials",
     xlab=NULL, cex.main=.85)
lines(seq(min(exps), max(exps), length=100),
      dnorm(seq(min(exps), max(exps), length=100),
            mean=1/lambda, sd=1/lambda/sqrt(n)),
      col="blue", lwd=1.5)
abline(v=1/lambda, col="red", lwd=2)
# plot the distribution of exponentials
hist(texp, probability=TRUE, breaks=30,
     main="Random Exponentials",
     xlab=NULL, cex.main=.85)
lines(seq(min(texp), max(texp), length=100),
      dexp(seq(min(texp), max(texp), length=100), lambda),
      col="blue", lwd=1.5)
abline(v=1/lambda, col="red", lwd=1.5)