knitr::opts_chunk$set(echo = TRUE)
The purpose of this excersise is to compare the simulated exponential distribution of of averages of 40 exponential with the Central Limited Theorem (CLT). The CLT states that as sample size increases the distribution of averages of individual independent variables becomes that of the standard normal. This follows the Law of Large Numbers which states that an increasing sample size results in the sample mean better representing the population mean.
To investigate the Central Limit Theorem for the exponential distribution, I first created an empty object called ExpMeans to store the results of my simulations. Each value stored in this object will represent the mean of a sample of 40 exponential random variables.
I then used the function rexp() to generate samples from an exponential distribution with rate parameter λ = 0.2. In each iteration of the simulation, rexp(40,0.2) produces 40 independent exponential values. I calculated the mean of these 40 values using the mean() function. To build up a distribution of sample means, I repeated this process 1000 times using a for loop. Each iteration produces one sample mean, which is appended to the vector. After the loop finishes, ExpMeans contains 1000 sample means, each based on a sample size of 40.
ExpMeans = NULL
for (i in 1:1000) {
ExpMeans = c(ExpMeans, mean(rexp(40, 0.2)))
}
Each iteration of the simulation produces the mean of a sample of 40 exponential random variables. After repeating this process 1000 times, I obtain 1000 sample means stored in ExpMeans. The average of these 1000 sample means is my empirical estimate of the population mean. The theoretical population mean of an exponential distribution with rate λ = 0.2 is 1/λ = 5. I compare the empirical mean of to this theoretical value to assess whether the simulation behaves as expected under the Central Limit Theorem.
To visualise the distribution of these 1000 sample means aI plotted a histogram. This allows me to assess whether the distribution appears approximately normal, as predicted by the Central Limit Theorem (CLM).
I added a vertical line at the theoretical population mean of the exponential distribution. This allows me to compare the center of the simulated distribution with the theoretical value predicted by the exponential distribution. The proximity of the histogram’s center to the vertical line illustrates how well the sample means approximate the population mean, as expected under (CLM).
The two lines lie almost exactly on top of each other, indicating that the average of the simulated sample means is extremely close to the theoretical population mean. This agreement is expected: the Central Limit Theorem states that the mean of repeated sample means should converge to the true population mean as the number of simulations increases. The plot therefore provides strong visual evidence that the simulation behaves as predicted by theory.
hist(ExpMeans)
abline(v = 1/0.2, col = "red", lwd = 2)
abline(v = mean(ExpMeans), col = "blue", lwd = 2)
legend("topright",
legend = c("Theoretical mean", "Simulated mean"),
col = c("red", "blue"),
lwd = 2
)
To get a better understanding of the difference I calculated both means.
SimMean <-mean(ExpMeans)
TheoMean <- 1/0.2
SimMean - TheoMean
The simulated mean is 5.005 which is extremely close to the theoretical value of 5, the difference being 0.005.
I overlaid the theoretical normal density curve on the histogram of the 1000 simulated sample means. The curve is based on the theoretical mean 1/λ =5 and theoretical standard deviation which equals the square root of 1 divided n multiplied by λ squared. The normal curve aligns closely with the shape and spread of the histogram, indicating that the sampling distribution of the mean is approximately normal.
This visual agreement supports CLM: even though the exponential distribution is highly skewed, the distribution of sample means becomes approximately normal when the sample size is sufficiently large (n = 40). The close match between the simulated variance (0.6086) and the theoretical variance (0.625) further confirms that the spread of the sampling distribution behaves as predicted.
hist(ExpMeans, prob = TRUE,
main = "Sampling Distribution of the Mean of 40 Exponentials",
xlab = "Sample Means")
curve(dnorm(x, mean = 1/0.2, sd = sqrt(1/(40 * 0.2^2))),
col = "darkgreen", lwd = 2, add = TRUE)
abline(v = 1/0.2, col = "red", lwd = 2)
abline(v = mean(ExpMeans), col = "blue", lwd = 2)
legend("topright",
legend = c("Theoretical mean", "Simulated mean", "Theoretical normal curve"),
col = c("red", "blue", "darkgreen"),
lwd = 2)
I also calculated the skewness of the 1000 simulated sample means to assess the symmetry of the sampling distribution. The exponential distribution is highly right‑skewed, with a theoretical skewness of 2, but the skewness of the sample means was only 0.291. This much smaller value indicates that the sampling distribution is close to symmetric. This reduction in skewness is consistent with the CLM, which states that the distribution of sample means becomes increasingly normal as the sample size increases. The low skewness therefore provides additional evidence that the sampling distribution of the mean of 40 exponentials is approximately normal.
library(moments)
## Warning: package 'moments' was built under R version 4.5.2
skewness(ExpMeans)
## [1] 0.3622451
skewness(rexp(100000, 0.2))
## [1] 2.000132
The simulation results provide clear evidence that the sampling distribution of the mean of 40 exponential random variables behaves as predicted by CLM. The theoretical population mean of the exponential distribution with rate λ = 0.2 is . The simulated mean of the 1000 sample means was 5.005489, lying almost exactly on top of the theoretical value in the figure. This demonstrates that the centre of the sampling distribution closely matches the theoretical expectation.
The theoretical variance of the sample mean is 1/(40λ^2) = 0.625 . The simulated variance of the 1000 sample means was 0.609, which is very close to the theoretical value. The small difference is attributable to random sampling variation and would diminish further with a larger number of simulations. This agreement confirms that the spread of the sampling distribution aligns with the theoretical variance sigma^2/n.
Finally, the skewness of the sampling distribution was 0.291. This is substantially smaller than the theoretical skewness of the exponential distribution (which is 2), indicating that the sampling distribution is far more symmetric than the underlying population. This reduction in skewness provides additional evidence that the distribution of sample means is approaching normality.
Taken together, the close agreement in mean, variance, and skewness — supported visually by the histogram and theoretical normal curve — demonstrates that the sampling distribution of the mean of 40 exponential variables is approximately normal, centred at the theoretical mean, and with the correct theoretical spread. These results are fully consistent with the CLM.