Overview: In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
set.seed(12345) #to reproduce simulations
lambda <- .2 #set lambda for all simulations
#the distribution of 1000 averages of 40 exponentials
avg = NULL
for (i in 1 : 1000) avg = c(avg, mean(rexp(40, lambda)))
This code runs 1000 simulations to create a distribution of averages of 40 exponentials.
mean(avg)
## [1] 4.971972
print(paste("mean based on simulations =", round(mean(avg),2)))
## [1] "mean based on simulations = 4.97"
t_mean <- 1/lambda
print(paste("theoretical mean = ", round(t_mean,2)))
## [1] "theoretical mean = 5"
#histogram of distribution with sample mean in red and theoretical mean in blue
hist(avg, xlab = "mean", main="Exponential Distribution from Simulations", col="light gray")
abline(v=mean(avg), col="red", lwd = 8)
abline(v=t_mean, col="blue", lwd=3)
The simulation distribution sample mean (4.97) and theoretical mean (5.00) are nearly identical.
var(avg)
## [1] 0.5954369
print(paste("variance based on simulations =", round(var(avg),2)))
## [1] "variance based on simulations = 0.6"
t_var <- (1/lambda)^2/40
print(paste("theoretical variance =", round(t_var,2)))
## [1] "theoretical variance = 0.62"
The simulation distribution sample variance (.6) and theoretical variance (.62) are nearly identical.
#impose normal distribution on histogram
h <- hist(avg, breaks = 60, xlab = "mean", main="Normal Curve on Exponential Histogram")
xfit <- seq(min(avg), max(avg), length=40)
yfit <- dnorm(xfit, mean=mean(avg), sd=sd(avg))
yfit <- yfit*diff(h$mids[1:2])*length(avg)
lines(xfit, yfit, col="purple", lwd=2)
#q-q plot
qqnorm(avg)
qqline(avg, col = "magenta", lwd=2)
From the plots of the normal curve laid over the histogram and the q-q plots showing a nearly straight line, this distribution is approximately normal.
This demonstrates the Central Limit Theorum: that the distribution of averages of iid variables becomes the distribution of the standard normal as n increases.