This project explores, via simulation and comparison with theory, the asymptotic convergence to normality of a distribution of means of 40 random iid draws from an exponential distribution with rate \(\lambda = 0.2\).
The simulation consists of taking 1,000 means of 40 randomly generated exponentials. We explore the mean and variance of this distribution of means.
As expected from the Central Limit Theorem (CLT), the resulting distribution is approximately normal, despite its origins in the positively-skewed exponential distribution.
Since we expect, per the CLT, that the distribution of means will be normal:
Since we will be using the rate \(\lambda = 0.2\), the theoretical values in this particular simulation are:
We perform a unique simulation of 1,000 means of 40 random exponentials and plot the distribution of these simulated means against the theoretical normal distribution using the theoretical values.
As can be seen, in this simulation, the sample mean is just shy of the theoretical mean. Most similar simulations yield closer means – the set.seed() value (0134) was chosen so as to provide a perhaps meaningful difference and challenge normality in part 3 of the project.
A quick summary call shows the variability of the mean in our simulation. The error in estimating the mean can get as high as 58.18 % (max. value), but half of the error is contained around 21% of the mean value.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.063 4.412 4.898 4.958 5.465 7.909
The following table contrasts the theoretical values we defined in part 1 with the simulated values from our distribution of means:
| Mean | Variance | Standard Deviation | |
|---|---|---|---|
| Theoretical | \(\mu = 5\) | \(\sigma^2/n = 0.625\) | \(\sigma/\sqrt{n} = 0.791\) |
| Sample | \(\bar X = 4.958\) | \(s^2/n = 0.601\) | \(s/\sqrt{n} = 0.775\) |
| Difference (value) | \(0.042\) | \(0.024\) | \(0.016\) |
| Relative Difference | \(0.834 \%\) | \(3.84 \%\) | \(2.023 \%\) |
As expected, measures of spread deviate further than measures of center from their respective theoretical values, in an almost 1:2:4 ratio for mean:sd:var estimates (respectively).
The set.seed value chosen generates a distribution of means that is only approximately normal, retaining some of the skew present in the exponential distribution from which the iid draws originated. For comparison, here is the p.d.f. of the exponential distribution with \(\lambda = 0.2\), the density of a simulation of 1,000 random exponentials from this distrubution, and the previous density of the 1,000 means of 40 random exponentials and its associated normal curve.
As the CLT predicts, the distribution of means of 40 exponentials approximates the normal distribution. Further proof of this asymptotic behavior can be examined in the following normality plots:
# theoretical values
n <- 40; lambda <- 0.2
theory_mean <- 1/lambda
theory_var <- (1/lambda^2)/n
theory_sd <- (1/lambda)/sqrt(n)
# simulation 1
sims <- 1000
set.seed(0134)
simulation <- apply(matrix(rexp(sims*40, lambda), sims), 1, mean)
sample_mean <- mean(simulation)
sample_var <- var(simulation)
sample_sd <- sd(simulation)
# density plot 1
par(mar=c(2, 4, 3, 1))
plot(density(simulation), col="red", main="Simulated Vs. Theoretical Distribution of
Means", ylab="density", xlab="", cex.main=0.9)
abline(v=sample_mean, col="red", lty="dashed")
curve(dnorm(x, mean=theory_mean, sd=theory_sd), add=TRUE, col="black")
abline(v=theory_mean, col="black", lty="dashed")
text(x=3.4, y=.4, labels="Sample Simulation", col="red", cex=0.8)
text(x=6, y=.4, labels="Normal p.d.f.", col="black", cex=0.7)
text(x=4.8, y=.18, expression(bar(x) == 4.958), col="red", srt=90)
text(x=5.15, y=.14, expression(mu == 5), col="black", srt=90)
summary(simulation)
# simulation 2
set.seed(0134)
simulation2 <- rexp(sims, lambda)
sample2_mean <- mean(simulation2)
# density plot 2
par(mar=c(2, 4, 2, 0.5))
plot(density(simulation2), col="blue", lwd=2, main="Exponentials and Means of
Exponentials", cex.main=0.8, cex.axis=0.7, ylab="density", xlab="",
xlim=c(-0.05,15), ylim=c(0,0.55))
curve(dexp(x, lambda), add=TRUE, col="black")
lines(density(simulation), col="red", lwd=2)
curve(dnorm(x, mean=5, sd=theory_sd), add=TRUE, col="black")
legend(10, 0.5, legend=c("Exponentials", "Means of Exponentials", "Theoretical Curves"),
col=c("blue", "red", "black"), cex=0.66, lty=1)
text(x=0.2, y=0.22, labels="Exponential", cex=0.5)
text(x=6.2, y=0.44, labels="Normal", cex=0.5)
# q-q plots
par(mfrow=c(1,2))
qqnorm(simulation2, pch = 1, frame = FALSE, main="Exponentials", cex.main=0.9,
cex.axis=0.6, cex.lab=0.6, col="blue", cex=0.5)
qqline(simulation2, probs = c(0.001, 0.999), col = "steelblue", lwd = 2)
qqnorm(simulation, pch = 1, frame = FALSE, main="Means of Exponentials", cex.main=0.9,
cex.axis=0.6, cex.lab=0.6, col="red", cex=0.5)
qqline(simulation, probs = c(0.001, 0.999), col = "steelblue", lwd = 2)