This project investigates the exponential distribution in R and compares it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials of a 1000 member sample set compared to the theoretical calculation.
lambda<-0.2
AvgExp <- NULL
num_exponentials <- 40
set.seed(9999)
for(i in 1:1000) {
AvgExp <- c(AvgExp, mean(rexp(num_exponentials, lambda)))
}
Theoretical Mean of the distribution:
1/lambda = 5
Sample Mean of the distribution:
mean(AvgExp) = 4.979165
The sample mean of 4.979165 is a good approximation for the theoretical mean of 5 as they are quite close.
Theoretical Variance of the distribution:
((1/lambda)/(sqrt(num_exponentials)))^2 = 0.625
Sample Variance of the distribution:
var(AvgExp) = 0.5970658
The sample variance of 0.5970658 is a good approximation for the theoretical variance of 0.625 as they are quite close.
Theoretical Standard Deviation of the distribution:
(1/lambda)/(sqrt(num_exponentials)) = 0.7905694
Sample Standard Deviation of the distribution:
sd(AvgExp) = 0.7727003
The sample standard deviation of 0.7727003 is a good approximation for the theoretical standard deviation of 0.7727003 as they are quite close.
library(ggplot2)
qqnorm(AvgExp, col="blue")
qqline(AvgExp, lwd=2)
The Quantile by Quantile plot demonstrates that that the sample data are very close to the theoretical quantiles. The exceptions are at the extremes of the plot and these represent only a marginal fraction of the data. Therefore, the Quantile by Quantile plot shows that the sample data is very close to a normal distribution.
The sample mean confidence interval is:
mean(AvgExp)+c(-1,1) * 1.96 * (1/lambda)/(sqrt(num_exponentials))
## [1] 3.429649 6.528681
All of our calculations fall within the sample confidence interval.
library(ggplot2)
AvgExpdata <- as.data.frame(AvgExp)
ggplot(data = AvgExpdata, aes(x = AvgExp)) +
geom_histogram(aes(y = ..density..), fill = I("light green"), binwidth = 0.2, color = I("red")) +
stat_function(fun = dnorm, arg = list(mean = 5, sd = (5)/(sqrt(num_exponentials))), color = "blue", size = 1) +
geom_vline(xintercept = mean(AvgExp), size=1, color="red") +
geom_vline(xintercept = 5, size=1, color="blue") +
geom_density(color="red", size=1)
This plot compares a sample size of 1000 members to the normal distribution. The normal distribution mean and density curve are in Blue. The sample distribution mean and density curve are in Red. The plot demonstrates that the sample distribution are very close to the normal distribution but not perfect as you can plainly see the skew.
set.seed(9999)
for(i in 1:100000) {
AvgExp <- c(AvgExp, mean(rexp(num_exponentials, lambda)))
}
library(ggplot2)
AvgExpdata <- as.data.frame(AvgExp)
ggplot(data = AvgExpdata, aes(x = AvgExp)) +
geom_histogram(aes(y = ..density..), fill = I("light green"), binwidth = 0.2, color = I("red")) +
stat_function(fun = dnorm, arg = list(mean = 5, sd = (5)/(sqrt(num_exponentials))), color = "blue", size = 1) +
geom_vline(xintercept = mean(AvgExp), size=1, color="red") +
geom_vline(xintercept = 5, size=1, color="blue") +
geom_density(color="red", size=1)
This plot compares a sample size of 100000 members to the normal distribution. The normal distribution mean and density curve are in Blue. The sample distribution mean and density curve are in Red. Comparing this plot to the previous plot shows that the sample distribution converges towards the theoretical distribution as the sample size increases.