This work investigates the exponential distribution in R and compares it with the Central Limit Theorem. We investigate the distribution of averages of 40 exponentials. In total we run 1000 simulations.

Simulate the data

n_sims = 1000
n_exp = 40
lambda = 0.2
data = as.data.frame(matrix(rexp(n_sims*n_exp, lambda), nrow = n_sims, ncol = n_exp))
means = rowMeans(data)
hist(means, breaks=20, col="thistle3", ylab="Density", xlab = "Means", main="Exponential distribution")

1. Show the sample mean and compare it to the theoretical mean of the distribution.

theoretical_mean = 1/lambda
sample_mean = mean(means)
hist(means, breaks=20, col="thistle3", ylab="Density", xlab = "Means", main="Exponential distribution")
abline(v = theoretical_mean, lty=2, col = "red", lwd=4)
abline(v = sample_mean, lty = 3, col = "blue", lwd=4)

Mean: theoretical - 5, sample 4.9772221
So we see that the real data converges very nicely to the theoretical value. The more simulations we take the more the actual value converges to the theoretical.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

theoretical_variance = (1/lambda)^2/n_exp
sample_variance = var(means)
theoretical_sd = (1/lambda)/sqrt(n_exp)
sample_sd = sd(means)

Variance: theoretical - 0.625, sample - 0.6420786
SD: theoretical - 0.7905694, sample - 0.8012981

3. Show that the distribution is approximately normal.

To show that this distribution is approxmately normal we will try to fit the normal curve to this distribution by using the stat_function of ggplot2 library. I have to hide the code for the sake of the 3-page limit, but you can enjoy the full version on RPubs.

library(ggplot2)
g = ggplot(as.data.frame(means), aes(x=means)) +
        geom_histogram(aes(y=..density..), binwidth = max(means)/n_exp, color="black",fill="thistle3") +
        stat_function( 
                fun = dnorm, 
                args = with(as.data.frame(means), c(mean = mean(means), sd = sd(means))),
                lwd=2,
                col = "thistle4")+
        labs(title = "Fitting the normal curve to the distribution", y = "Density", x="Means")
g