This project investigates the exponential distribution in R and compares it to the Central Limit Theorem. The analysis compares the mean and standard deviation of the exponential distribution (both 1/lambda) to 1000 simulations of 40 random exponentials. We expect the sample mean and standard deviations to match the theoretical values.
First we set the variables. We are using lambda = 0.2 for 40 random exponentials for 1000 simulations. We also set the random seed for reproducibility.
lambda <- 0.2
n <- 40
sim <- 1000
set.seed(738923)
We will run 1000 simulations of 40 exponentials and calculate the means of the 40 exponentials 1000 times. The resulting histogram is shown.
means = NULL
for (i in 1 : sim) means = c(means, mean(rexp(n, lambda)))
exp_mean <- mean(means)
png("figure/mean_histogram.png", width = 480, height = 480)
hist(means,
breaks = 20,
main = "Histogram of Sample Mean for Exponential Distribution",
xlab = "Sample Mean",
ylab = "Frequency")
abline(v = exp_mean, col = "blue")
abline(v = 5.0, col = "red")
legend("topright", legend = c("Sample Mean","Expected Mean"), col = c("blue","red"), lty = 1)
dev.off()
## quartz_off_screen
## 2
We know the calculated mean should be 1/lambda, which is 1/0.2 = 5. For comparison, the sample mean for the 1000 simulations is 4.939668. These are pretty much the same.
We will run 1000 simulations of 40 exponentials and calculate the variance of the 40 exponentials 1000 times. The resulting histogram is shown.
var = NULL
for (i in 1 : sim) var = c(var, var(rexp(n, lambda)))
exp_var <- mean(var)
png("figure/variance_histogram.png", width = 480, height = 480)
hist(var,
breaks = 30,
main = "Histogram of Sample Variance for Exponential Distribution",
xlab = "Sample Variance",
ylab = "Frequency")
abline(v = exp_var, col = "blue")
abline(v = 25.0, col = "red")
legend("topright", legend = c("Sample Mean","Expected Mean"), col = c("blue","red"), lty = 1)
dev.off()
## quartz_off_screen
## 2
We know the calculated variance should be (1/lambda)2, which is (1/0.2)2 = 25. For comparison, the average sample variance for the 1000 simulations is 24.144393. These are pretty close.
The Central Limit Theorem states that the sample means of the exponential distribution will be normally distributed for a sufficiently large sample. To demonstrate that the distribution of the means of the exponential distribution is approximately normal, we will first fit the normal curve to the histogram.
h <- hist(means,
breaks = 50,
main = "Histogram of Sample Mean for Exponential Distribution",
xlab = "Sample Mean",
ylab = "Frequency")
xfit <- seq(min(means),max(means),length=40)
yfit<-dnorm(xfit,mean=mean(means),sd=sd(means))
yfit <- yfit*diff(h$mids[1:2])*length(means)
lines(xfit, yfit, col="blue", lwd=2)
Another way to compare the distribution of means of these exponential simulations to the normal distribution would be looking at standard deviations. As we know, 68% of a normal distribution falls within one standard deviation above or below the mean. Therefore, we would expect about 68% of the 1000 simulations to be within one standard deviation of the mean.
less <- sum(means < (exp_mean-sd(means)))
more <- sum(means > (exp_mean+sd(means)))
one_sd <- (sim-less-more)/sim
As you can see, 0.675 percent of the simulations fall within one standard deviation of the sample mean, which is very close to 68%. This satisfies another characteristic of the normal distribution.