A simulation exercise of the exponential distribution and compare it with the Central Limit Theorem. (part 1)
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:
Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal. Focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.
First, set the seed of the random number generator, so the outcome is reproducible and define the parameters of the exponential distribution:
set.seed(123)
lambda <- 0.2 # rate parameter
n <- 40 # nr of exponentials
nosim <- 1000 # nr of simulations
Now run the simulation 1000 times using the exponential distribution (rexp(n, lambda)).
simulations <- apply(matrix(rexp(nosim*n, rate = lambda), nosim), 1, mean)
The theoretical mean of the exponential distribution is:
meanExp <- 1/lambda
meanExp
## [1] 5
The sample mean of the simulations is now calculated and the difference between the sample mean and the theoretical mean:
meanSim <- mean(simulations)
meanSim
## [1] 5.011911
diff <- meanExp - meanSim
round(diff, 3)
## [1] -0.012
hist(simulations, main = "Histogram of Sample Means", col = "lightblue", breaks = 100, xlab="Means")
abline(v = meanSim, col = "red")
abline(v = meanExp, col = "blue")
legend("topright", c("Sample Mean", "Theoretical Mean"), fill=c("red", "blue"))
First show the theoretical variance of the exponential distribution. The standard deviation is (1/lambda)/sqrt(n).
sdExp <- (1/lambda)/sqrt(n)
varExp <- sdExp^2
varExp # variance of exponential distribution
## [1] 0.625
The sample variance is:
sdSim <- sd(simulations)
varSim <- sdSim^2
varSim # sample variance of simulations
## [1] 0.6088292
vardiff <- varExp - varSim # difference
vardiff
## [1] 0.01617077
The difference in the sample variance and theoretical variance is very small, the conclusion is that 1000 simulations is a good estimator for the theoretical variance of the distribution.
Below is the normal curve added to the distribution of the sample means. This plot demonstrates that the distribution of the sample means is approximately normal.
h <- hist(simulations, main = "Histogram of Sample Means", col = "lightblue", breaks = 100, xlab="Means")
xfit <- seq(min(simulations), max(simulations), length=100)
yfit <- dnorm(xfit, mean=meanSim, sd = sdSim)
yfit <- yfit * diff(h$mids[1:2]*length(simulations))
lines(xfit, yfit, col="black", lwd=2)
legend("topright", "Normal Curve", fill="black")