Investigation on the application of Central Limit Theorem for the Exponential Distribution

Overview

This study is about the exponential distribution and wether or not it follows the Central Limit Theorem (CLT).
Thus, the overall obejective is too check if the distribution of averages of 40 random exponentials is approximately normal.

This will be performed through the following steps:

Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.

Our dataset for this project is the result of 1000 simulations of average of 40 exponentials.
(All exponentials are generated with lamdba set to 0.2)

Setting up the Simulation…

We create our sample by generating 1000 simulations (nbsim) of 40 exponentials (n).
We start by setting variables with values from the instructions

nbsim  <- 1000 ; n <- 40
lambda <- .2
mx <- matrix(rexp(n*nbsim, lambda), nbsim, n)  # build matrix of 1000 rows of 40 random exponentials each
means <- apply(mx, 1, mean)                    # computing the mean of each row

Calculate Statistics

We can now find our principle statistics that going to be useful (mean & variance).
If there was any doubt, the suffix ’_smp’ stands for sample and ’th’ for theorical!

mean_th  <- mean_exp <- sd_exp <- 1/lambda
mean_smp <- mean(means)
var_th   <- sd_exp^2 / n
var_smp  <- var(means)

Comparisons, Exploration & Discoveries

We can store all our new stats in a df to display a nice table

print(data.frame(type=c("theory", "sample"), mean=c(mean_th, mean_smp), 
                 var=c(var_th, var_smp), sd=c(sqrt(var_th), sqrt(var_smp))))

##     type     mean       var        sd
## 1 theory 5.000000 0.6250000 0.7905694
## 2 sample 5.040758 0.6166482 0.7852695

We can already see that theorical and sample means are very close (5 ~ 5.0407582).
The same is true for variances (0.625 ~ 0.6166482).

Graphical Comparisons

First we can start by simply plotting the distribution of 1000 random exponentials.
This as no other objective than to show the difference with the 2nd plot…

par(mar=c(4,4,1.5,0.5))
hist(rexp(nbsim, lambda), freq=0, main="distrib of 1000 exp", xlab="random exponentials")

Now we can continue with our main graphic about the distribution of 1000 averages of 40 exponentials. This histogram is a bit more complex as it displays several additional elements:

the theorical mean (1/lambda=5) of exponential distrib (in magenta)
the sample mean (5.0407582) computed from our sample (in blue)
the curve of our sample mean distribution ( also in blue)
the gaussian curve of the normal distribution (to prepare the last question ;) (in red)

par(mar=c(4,4,1.5,0.5))
hist(means, freq=0,   col='grey',  lty=0, main="distribution of 1000 averages of 40 random exponentials")
lines(density(means), col='blue',  lwd=2)                           # curve of sample distrib
curve(add=TRUE, dnorm(x, mean_th, sqrt(var_th)), lwd=1, col='red')  # gaussian curve of normal distrib
abline(v=mean_smp, col='blue',     lwd=2, lty=2)                    # sample mean
abline(v=mean_th,  col='magenta',  lwd=1, lty=2)                    # theoretical mean of the exp distrib
legend("topright", legend=c("sample distrib", "sample mean", "theoretical mean", "normal distrib"), 
       col=c("blue", "blue", "magenta", "red"), lty=c(1,2,2,1), cex=.8)

It is obvious that our distribution is centered around its mean (5.0407582 ~ theorical mean 5 ).
We are also pleased to see that our distribution looks like a gaussian (the bell curve from the normal distribution drawn in red).

Checking normality

We already seen on the last graphic, thanks to the gaussian curve we had draw, that the distribution of our sample mean was very close to a normal one (ie. red and blue curve similar).
We can complete our normality check using a useful tool provided by R: ‘qqnorm’ that display a Q-Q (quantile-quantile) plot that plots 2 distributions’ quantiles against each other (the distribution passed in parameter agains the normal one).

par(mar=c(4,4,1.5,0.5))
qqnorm(means, pch=3, lty=2, col='grey')
qqline(means, col='blue', lwd=2)

The graphics clearly demonstrate that our distribution’s quantiles are very related to the normal ones (the additional blue line being the perfect equity between the 2 distribution quantiles).

Conclusions

We carefully checked that the Central Limit Theory was working on our case, and can conclude that our distribution is approximately normal. We could even get closer to a normal distribution by getting more samples,