In this report I aim to investigate the exponential distribution in R and compare it with the Central Limit Theorem. The report will illustrate the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages 40 exponentials. It will illustrate the insight that a hypothetical distribution of sample means would be distributed normally in what is commonly referred to as a “bell curve”.
This report uses the rexp(n, lambda) function in R to calculate the mean of a random sample of 40 exponentials and repeat for one thousand simulations. The sample mean and variance of the resulting distribution of averages will be compared to the theoretical mean and variance.
lambda <- 0.2 # the rate parameter of rexp function
mu <- 1/lambda # theoretical mean of the exponential distribution
sd <- 1/lambda # theoretical standard deviation of the exponential distribution
n <- 40 # number of sample exponentials
se <- sd / sqrt(n) #
svar <- sd^2 / n # theoretical variance
nosim <- 1000 # number of simulations
set.seed(1) # to make the result reproducible
x <- NULL
for (i in 1:nosim) x = c(x, mean(rexp(n, lambda))) # create the distribution of a large collection (1000)
# of averages of 40 # exponentials
The normal distribution has the property that it is symmetrical about the mean. The sample mean and the theoretical mean are both measures of central tendency. Comparing the sample mean to the theoretical mean demonstrates that the sample mean is an estimate of the population mean.
round(c(mean(x), mu), 2)
## [1] 4.99 5.00
The normal distribution also has a predictable area under the curve within specified distances of the mean. Approximately 68%, 95% and 99% of the normal density lies within 1, 2 and 3 standard deviations from the mean respectively. Comparing the sample variance to the theoretical variance demonstrates that the sample variance is an estimate of the population variance.
round(c(var(x), svar), 2)
## [1] 0.61 0.62
# create the X-axis grid to overlay the normal curve
x2 <- seq(min(x), max(x), length = length(x))
# create the Normal curve
fun <- dnorm(x2, mean = mean(x), sd = sd(x))
# create the Histogram of the simulation
par(mfrow = c(1,2))
hist(x, prob = TRUE, # probability = true produces a density plot
ylim = c(0, max(fun)),
main = "", xlab = "")
lines(x2, fun, col = 2, lwd = 2)
abline(v = mean(x), col="red", lwd=3, lty=2)
abline(v = mean(mu), col="blue", lwd=3, lty=5)
set.seed(1)
y <- rexp(nosim, lambda) # create the distribution of a large collection (1000) of random exponentials
hist(y, prob = TRUE, freq = FALSE, main = "", xlab = "")
curve(dexp(x, rate=0.2, log=FALSE), add = TRUE)
Figure 1: On the left, a histogram of the distribution of averages of 40 exponentials over 1000 simulations. Fitted with a normal curve. The blue and red vertical lines represent the theoretical and sample means respectively. On the right is the distribution of a large collection of random exponentials
The Central Limit Theorem states that the distribution of averages of iid variables properly normalised becomes that of a standard normal as the sample size increases. This report has used the simulation of random exponentials to produce the distributions in Figure 1. One can determine that the distribution on the left is normally distributed because we can see it is shaped like a “bell curve”. The normal distribution can be characterised by its mean and standard deviation and I have shown by calculating the sample mean and variance and comparing how close the results are to the theoretical mean and variance that they are consistent with the population mean and variance.