This work investigates the exponential distribution in R and compares it with the Central Limit Theorem. We investigate the distribution of averages of 40 exponentials. In total we run 1000 simulations.
n_sims = 1000
n_exp = 40
lambda = 0.2
data = as.data.frame(matrix(rexp(n_sims*n_exp, lambda), nrow = n_sims, ncol = n_exp))
means = rowMeans(data)
hist(means, breaks=20, col="thistle3", ylab="Density", xlab = "Means", main="Exponential distribution")
theoretical_mean = 1/lambda
sample_mean = mean(means)
hist(means, breaks=20, col="thistle3", ylab="Density", xlab = "Means", main="Exponential distribution")
abline(v = theoretical_mean, lty=2, col = "red", lwd=4)
abline(v = sample_mean, lty = 3, col = "blue", lwd=4)
Mean: theoretical - 5, sample 4.9772221
So we see that the real data converges very nicely to the theoretical value. The more simulations we take the more the actual value converges to the theoretical.
theoretical_variance = (1/lambda)^2/n_exp
sample_variance = var(means)
theoretical_sd = (1/lambda)/sqrt(n_exp)
sample_sd = sd(means)
Variance: theoretical - 0.625, sample - 0.6420786
SD: theoretical - 0.7905694, sample - 0.8012981
To show that this distribution is approxmately normal we will try to fit the normal curve to this distribution by using the stat_function of ggplot2 library. I have to hide the code for the sake of the 3-page limit, but you can enjoy the full version on RPubs.
library(ggplot2)
g = ggplot(as.data.frame(means), aes(x=means)) +
geom_histogram(aes(y=..density..), binwidth = max(means)/n_exp, color="black",fill="thistle3") +
stat_function(
fun = dnorm,
args = with(as.data.frame(means), c(mean = mean(means), sd = sd(means))),
lwd=2,
col = "thistle4")+
labs(title = "Fitting the normal curve to the distribution", y = "Density", x="Means")
g