Overview

In this report we aim to investigate the exponential distribution in R and compare it with the Central Limit Theorem. In this case, the theoretical mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda (we will use lambda = 0.2 for the simulation). We will simulate the means of 40 exponentials 1000 times, compare the sample mean/variance to the theoretical mean/variance, as well as show that the distribution is approximately normal.

Simulations

We first define the variables to be used in the simulation, in this case the number of exponentials, the number of simulations, and the lambda value.
n <- 40
nosim <- 1000
lambda <- 0.2
We then simulate the mean of 40 exponentials, repeat 1000 times, and collect the distribution in a data frame. We are now ready to do our analysis.
xmean <- replicate(nosim, mean(rexp(n, lambda)))
size <- factor(rep(c(n), rep(nosim,1)))
dat <- data.frame(xmean,size)

Sample Mean versus Theoretical Mean

As pointed out earlier, the theoretical mean is 1/lambda. We will then compare it to the mean of the distribution that we created from the simulation.
tmean <- 1/lambda
print(tmean)
## [1] 5
smean <- mean(dat$xmean)
print(smean)
## [1] 5.003922
The two means are comparable and almost the same. In the supporting figures at the end of this analysis, the theoretical mean is shown as vertical black line and the sample mean is shown as vertical red line, which are very close. This is an example of CLT which states that distribution of averages of iid variables become close to standard normal with large sample size, and in this case, it also applies to exponential distribution.

Sample Variance versus Theoretical Variance

As pointed out earlier, the standard deviation is also 1/lambda. We can convert it into variance and compare it to the sample variance.
tvar <- ((1/lambda)^2)/n
print(tvar)
## [1] 0.625
svar <- var(dat$xmean)
print(svar)
## [1] 0.6006893
The two variances are comparable and almost the same. This again reaffirms that CLT holds for exponential distribution with large sample size. In the supporting figures at the end of this analysis, the normal curve almost overlays with the density plot of the distribution, which qualitatively show the similarity of the variances.

Distribution

The histogram of distribution of the means from our simulation is shown below. The black curve represents the theoretical normal distribution, whereas the red dashed curve is the density plot of the simulated distribution. Since they are very close, we can conclude that our distribution is approximately normal.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
ggplot(dat, aes(x = xmean, fill = size)) + 
  geom_histogram(alpha = .50, binwidth=.2, colour = "black", aes(y = ..density..)) + 
  stat_function(fun = dnorm, arg = list(mean = tmean, sd = sqrt(tvar))) +
  geom_density(inherit.aes=F, aes(rnorm(n = nosim, mean = smean, sd = sqrt(svar))), linetype='dashed', colour = "red") +
  geom_vline(aes(xintercept=tmean)) +
  geom_vline(aes(xintercept=smean, colour = "red"))

Just to show CLT at work, we will plot the histogram of 40 random exponentials, which clearly is nowhere close to normal distribution.
hist(rexp(40,0.2))

Now compare it to the histogram of 1000 averages of 40 random exponentials, which becomes a lot closer to normal distribution.
mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(rexp(40,0.2)))
hist(mns)

Conclusion

Central Limit Theorem states that with larger sample size, the distribution of averages become closer to normal distribution. Through our simulation, we had demonstrated that this is also the case for exponential distribution.