Comparing the R Exponential Distribution with Central Limit Theorem (CLT)

by Sandy Sng
15 May 2018

Overview

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem.The simulated samples are used to illustrate and explain the properties of the distribution of the mean of 40 exponentials in the following ways:

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Simulations

The exponential distribution can be simulated in R with rexp(n, lambda) where:

We will set and investigate:

We start by simulating a thousand sets of 40 exponentials using lambda 0.2 and calculate the mean for each set.

lambda <- 0.2   # Set lambda
n <- 40         # Set 40 exponentials
nosim <- 1000   # Set no. of simulations to 1000

set.seed(123)   # Set seed for consistent simulations

# Create a exponentials data frame 1000 X 40 from rexp(x, lambda) 
simulatedData <- matrix(rexp(n*nosim, lambda), nrow = nosim, ncol = n)

# Calculate the average/mean of each row (40 exponentials) -- i.e. sample mean
simMeans <- apply(simulatedData, 1, mean)

Sample Mean versus Theoretical Mean

Show the sample mean and compare it to the theoretical mean of the distribution.

tm <- 1/lambda       # Calculate theoretical mean
sm <- mean(simMeans) # Calculate average sample mean
## [1] 5
## [1] 5.011911

The sample mean (solid green) is very close at value 5.01 to the theoretical mean (dotted purple) at value 5.

Sample Variance versus Theoretical Variance

Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

We can make the comparison by defining the theoretical variance as 1/lambda squared, divided by the number of exponential observations (i.e. n):

tv <- ((1/lambda)^2)/n  # Calculate theoretical variance
sv <- var(simMeans)     # Calculate sample variance
## [1] 0.625
## [1] 0.6088292

The sample variance is very close at value 0.609 to the theoretical variance at value 0.625.

Distribution

Show that the distribution is approximately normal.

Conclusion

We observe that the histogram for the mean of 1000 simulated 40 random exponential values is symmetric around the mean with a bell shape. The normal distribution of the mean of 40 random exponentials is consistent with the characteristic of the Central Limit Theorem.

Appendix

Code for Figure “Histogram of 1000 Means of 40 Sample Exponentials”, under Sample Mean versus Theoretical Mean


# Mean distribution of 1000 simulations 
hist(simMeans, 
     main = "Histogram of 1000 Means of 40 Sample Exponentials", 
     xlab = "Sample Means from 1000 Simulations", 
     ylab = "Frequency")

# Highlight the 2 means we are comparing
abline(v = sm, col = "green", lwd = 3, lty = 1)
abline(v = tm, col = "purple", lwd = 3, lty = 2)

# Add a legend
legend('topright', 
       c("Average Sample Mean","Theoretical Mean"), 
       col = c("green", "purple"), lwd = 3, lty = c(1,2), bty = "n")

Code for Figure “Distribution of Sample Means vs Theoretical Density of Exponentials”, under Distribution

# Mean distribution of 1000 simulations 
hist(simMeans, prob = T,
     main = "Distribution of Sample Means 
     vs Theoretical Density of Exponentials", 
     xlab = "Sample Means from 1000 Simulations", 
     ylab = "Frequency")

# Theoretical density of the exponential distribution
xfit <- seq(min(simMeans), max(simMeans), length=100)
yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda/sqrt(n)))
lines(xfit, yfit, pch = 22, col = "orange", lwd = 3, lty = 1)

# Theoretical Mean - Orange line
abline(v = tm, col = 'orange', lwd = 1)

# Density of the simulated sample means - Blue line
lines(density(simMeans), col = "blue",lwd = 3, lty = 2)

# Legend
legend('topright', c("Simulation", "Theoretical"), 
       col=c("blue", "orange"), lwd = 3, lty=c(2,1))