Overview:

This project investigates the exponential distribution in R and compares it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials of a 1000 member sample set compared to the theoretical calculation.

Simulations: Create a 1000 member vector where each member is the average of 40 exponentials

lambda<-0.2
AvgExp <- NULL
num_exponentials <- 40
set.seed(9999)

for(i in 1:1000) {
   AvgExp <- c(AvgExp, mean(rexp(num_exponentials, lambda)))
}
  1. Show the theoretical mean of the distribution compares to the sample mean of the distribution.
Theoretical Mean of the distribution:
1/lambda = 5

Sample Mean of the distribution:
mean(AvgExp) = 4.979165

The sample mean of 4.979165 is a good approximation for the theoretical mean of 5 as they are quite close.

  1. Show how the theoretical variance of the distribution compares to the sample variance of the distribution.
Theoretical Variance of the distribution:
((1/lambda)/(sqrt(num_exponentials)))^2 = 0.625

Sample Variance of the distribution:
var(AvgExp) = 0.5970658

The sample variance of 0.5970658 is a good approximation for the theoretical variance of 0.625 as they are quite close.

Theoretical Standard Deviation of the distribution:
(1/lambda)/(sqrt(num_exponentials)) = 0.7905694

Sample Standard Deviation of the distribution:
sd(AvgExp) = 0.7727003

The sample standard deviation of 0.7727003 is a good approximation for the theoretical standard deviation of 0.7727003 as they are quite close.

  1. Show that the distribution is approximately normal.
library(ggplot2)

qqnorm(AvgExp, col="blue")
qqline(AvgExp, lwd=2)

The Quantile by Quantile plot demonstrates that that the sample data are very close to the theoretical quantiles. The exceptions are at the extremes of the plot and these represent only a marginal fraction of the data. Therefore, the Quantile by Quantile plot shows that the sample data is very close to a normal distribution.

The sample mean confidence interval is:

mean(AvgExp)+c(-1,1) * 1.96 * (1/lambda)/(sqrt(num_exponentials))
## [1] 3.429649 6.528681

All of our calculations fall within the sample confidence interval.

library(ggplot2)

AvgExpdata <- as.data.frame(AvgExp)
ggplot(data = AvgExpdata, aes(x = AvgExp)) + 
geom_histogram(aes(y = ..density..), fill = I("light green"), binwidth = 0.2, color = I("red")) + 
stat_function(fun = dnorm, arg = list(mean = 5, sd = (5)/(sqrt(num_exponentials))), color = "blue", size = 1) +
geom_vline(xintercept = mean(AvgExp), size=1, color="red") +
geom_vline(xintercept = 5, size=1, color="blue") +
geom_density(color="red", size=1)

This plot compares a sample size of 1000 members to the normal distribution. The normal distribution mean and density curve are in Blue. The sample distribution mean and density curve are in Red. The plot demonstrates that the sample distribution are very close to the normal distribution but not perfect as you can plainly see the skew.

set.seed(9999)

for(i in 1:100000) {
   AvgExp <- c(AvgExp, mean(rexp(num_exponentials, lambda)))
}


library(ggplot2)

AvgExpdata <- as.data.frame(AvgExp)
ggplot(data = AvgExpdata, aes(x = AvgExp)) + 
geom_histogram(aes(y = ..density..), fill = I("light green"), binwidth = 0.2, color = I("red")) + 
stat_function(fun = dnorm, arg = list(mean = 5, sd = (5)/(sqrt(num_exponentials))), color = "blue", size = 1) +
geom_vline(xintercept = mean(AvgExp), size=1, color="red") +
geom_vline(xintercept = 5, size=1, color="blue") +
geom_density(color="red", size=1)

This plot compares a sample size of 100000 members to the normal distribution. The normal distribution mean and density curve are in Blue. The sample distribution mean and density curve are in Red. Comparing this plot to the previous plot shows that the sample distribution converges towards the theoretical distribution as the sample size increases.