Overview

This study investigates the exponential distribution in R and compares it with the Central Limit Theorem (CLT). The distribution is simulated using the function ‘rexp()’ with the arguments n (sample size) and lambda (rate parameter) taken to be as 40 and 0.2 respectively. The investigation is centred around the distribution of averages of 40 exponentials simulated a thousand times.

Simulations

The first part of the simulation would involve running the function ‘rexp()’ with arguments 40 and 0.2 a thousand times and storing the mean of the result in the variable ‘mean_exp’.

mean_exp = NULL
for (i in 1 : 1000) 
mean_exp = c(mean_exp, mean(rexp(40, 0.2)))

Further on, a frequency plot of the values in the variable ‘mean_exp’ are plotted using the ‘hist’ function. Onto the frequency plot, a normal curve (in blue) with mean 5 and standard deviation 0.8 is drawn, followed by a curve (in red) connecting the densities of the mean distribution.

hist(mean_exp, main = "Distribution of Simulated Sample Means", 
          xlab = "Sample Mean", 
          ylab = "Density", 
          prob = TRUE, 
          ylim = c(0, 0.6))
curve(dnorm(x, 5, 5/sqrt(40)), col = "red", add = T)
lines(density(mean_exp), col = "blue")
legend(6,0.6,c("Normal Curve","Sample Curve"), lty=c(1,1), lwd=c(1,1), col= c("red","blue"))

Now calculating the sample and population mean and variance:

mean_pop<- 1/0.2
mean_sam<- mean(mean_exp)
var_pop<- 5^2/40
var_sam<- var(mean_exp)
The sample and population mean is 4.994111 and 5 respectively.

The sample and population variance is 0.6377933 and 0.625 respectively.

Interpretations and Conclusions

  1. From the simulations, it becomes evident that with more data the mean of the sample approaches the mean of the population 5. This is in direct concurrence with the CLT.
  2. The simulations seem to also produce the same effect with the variance of the sample. With more data, the variance of the sample approaches 0.625, which is essentially what the CLT states.
  3. Since the curve (in red) of the sample distribution can almost be traced within the normal curve (in blue), we can conclude with a high percentage of certainty that the sample distribution is approximately normal.