Overview

The purpose of this study is to compare the distribution between exponentials and the normal distribution as defined in the Central Limit Theorem. We use a sample of forty exponentials simulated one thousand times. We will compare by examining the mean, variance and distribution.


Simulation

# Load Library Packages
library(stats)
library(ggplot2)
library(pastecs)
library(knitr)
# Simulate the average of forty exponentials a thousand times
set.seed(1234)                          # set seed for reproducibility
n <- 40                                 # number of samples
s <- 1000                               # number of simulations
lambda <- 0.2                           # rate parameter

mns <- data.frame(mean = numeric(s))    # create a blank data frame to hold the averages

for (i in 1:s) {
    mns[i, 1] <- mean(rexp(n, lambda))
}

# Show various statistics for mns
options(scipen = 999)
kable(stat.desc(mns$mean), digits = 3)
nbr.val 1000.000
nbr.null 0.000
nbr.na 0.000
min 3.170
max 7.390
range 4.220
sum 4974.239
median 4.938
mean 4.974
SE.mean 0.024
CI.mean.0.95 0.047
var 0.571
std.dev 0.755
coef.var 0.152

Analysis: Here we have 1000 simulations of the mean of forty exponentials. The mean and median are right around 5.


1. Show the sample mean and compare it to the theoretical mean of the distribution.

# Theoretical mean
theoMean <- 1 / lambda
theoMean
## [1] 5
# Sample mean
samMean <- mean(mns$mean)
samMean
## [1] 4.974239
# Chart a histogram of the simulations with theoretical and samples means displayed
m <- ggplot(mns, aes(x = mean))
m + geom_histogram(binwidth = .25, color = "black", fill = "#59ABE3") +
    ggtitle("Distribution of Sample Means") +
    geom_vline(aes(xintercept = samMean), color = "#D91E18", linetype = "dashed", 
               size = 1) + # red dashed-line
    geom_vline(aes(xintercept = theoMean), color = "#1E824C", linetype = "solid", 
               size = 1) + # green solid line
    annotate("text", x = samMean - 0.65, y = 135, label = "Sample Mean", color = "#D91E18", 
             fontface = "bold") +
    annotate("text", x = theoMean + 0.82, y = 135, label = "Theoretical Mean", color = "#1E824C", 
             fontface = "bold") +
    scale_x_continuous(breaks = seq(3, 7, 1))

Analysis: The theoretical mean is exactly 5. While the mean of our sample was 4.974. After overlaying the theoretical and sample means on the histogram of the sample it is plain to see that they are in the middle of the histogram. This behaves exactly as the Central Limit Theorem says it should.


2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

# Theoretical Variance
theoVar <- (1 / lambda)^2 / n
theoVar
## [1] 0.625
# Sample Variance
samVar <- var(mns$mean)
samVar
## [1] 0.5706551

Analysis: The theoretical variance is 0.625 while the sample variance is 0.57. These aren’t as close as the theoretical and sample means, but they are close enough given the simulation size of 1000. If we performed more simulations, then the sample variance would be closer to the theoretical variance.


3. Show that the distribution is approximately normal.

# Chart the distribution
g <- ggplot(mns, aes(x = mean))
g + geom_histogram(aes(y = ..density.., fill = ..count..), binwidth = .5, color = "black") +
    scale_fill_gradient("Density", low = "#DCDCDC", high = "#7C7C7C") +
    ggtitle("Sample Distribution vs Normal Distribution") +
    geom_density() +
    stat_function(fun = dnorm, args = list(mean = samMean, sd = sd(mns$mean)), 
                  color = "red") +
    annotate("text", x = 6.8, y = .55, label = "Distr. of Sample Means", 
             fontface = "bold") +
    annotate("text", x = 6.65, y = .52, label = "Theoretical Density", color = "red", 
             fontface = "bold") +
    scale_x_continuous(breaks = seq(3, 8, 1))

Analysis: In the histogram above, the black line is the distribution curve of the sample means while the red line is the theoretical density for a normal distribution. Given how close the black line is to the red line is safe to say that the sample distribution is approximately normal.


Conclusion

Does the distribution of means of 40 exponentials behave as predicted by the Central Limit Theorem?

The Central Limit Theorem states that the mean of iid variables approaches the standard normal distribution as the sample size increases. We showed that the theoretical mean and sample mean are very close together. Also, the variance is fairly close to each other. In the chart “Sample Distribution vs Normal Distribution” above we showed that the black line (sample distribution) is approximately similar to the red line (normal distribution). Given all this, it is safe to conclude that the distribution of means of 40 exponentials behave as predicted by the Central Limit Theorem.