Part 1: Simulation Exercise

Overview: In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

set.seed(12345) #to reproduce simulations
lambda <- .2 #set lambda for all simulations
#the distribution of 1000 averages of 40 exponentials
avg = NULL
for (i in 1 : 1000) avg = c(avg, mean(rexp(40, lambda)))

This code runs 1000 simulations to create a distribution of averages of 40 exponentials.

1. Show the sample mean and compare it to the theoretical mean of the distribution.

mean(avg) 
## [1] 4.971972
print(paste("mean based on simulations =", round(mean(avg),2)))
## [1] "mean based on simulations = 4.97"
t_mean <- 1/lambda 
print(paste("theoretical mean = ", round(t_mean,2)))
## [1] "theoretical mean =  5"
#histogram of distribution with sample mean in red and theoretical mean in blue
hist(avg, xlab = "mean", main="Exponential Distribution from Simulations", col="light gray")
abline(v=mean(avg), col="red", lwd = 8)
abline(v=t_mean, col="blue", lwd=3)

The simulation distribution sample mean (4.97) and theoretical mean (5.00) are nearly identical.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

var(avg)
## [1] 0.5954369
print(paste("variance based on simulations =", round(var(avg),2)))
## [1] "variance based on simulations = 0.6"
t_var <- (1/lambda)^2/40
print(paste("theoretical variance =", round(t_var,2)))
## [1] "theoretical variance = 0.62"

The simulation distribution sample variance (.6) and theoretical variance (.62) are nearly identical.

3. Show that the distribution is approximately normal.

#impose normal distribution on histogram
h <- hist(avg, breaks = 60, xlab = "mean", main="Normal Curve on Exponential Histogram")
  xfit <- seq(min(avg), max(avg), length=40)
  yfit <- dnorm(xfit, mean=mean(avg), sd=sd(avg))
  yfit <- yfit*diff(h$mids[1:2])*length(avg)
  lines(xfit, yfit, col="purple", lwd=2)

#q-q plot 
qqnorm(avg)
qqline(avg, col = "magenta", lwd=2)

From the plots of the normal curve laid over the histogram and the q-q plots showing a nearly straight line, this distribution is approximately normal.

This demonstrates the Central Limit Theorum: that the distribution of averages of iid variables becomes the distribution of the standard normal as n increases.