In this project we investigate the exponential distribution in R and compare it with the Central Limit Theorem. We investigate the sample mean and sample variance and how they compare against the theoretical mean and variance of the distribution. Finally we prove that the distribution is approximately normal.
The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. We set lambda = 0.2 for all of the simulations. We investigate the distribution of averages of 40 exponentials and do a thousand simulations.
The r code used to calculate and draw the plots can be found in the Appendix section.
Set the parameters according to course project instructions
n <- 40
lambda <- 0.2
sim <- 1000
Run simulations 1000 times and calculate their mean using following r code: for (i in 1 : sim) {simMeans = c(simMeans, mean(rexp(n,lambda)))}
sampleMean <- mean(simMeans)
## sample Mean: 4.998511
theoMean <- 1/lambda
## theoretical mean: 5
As you can see from the above the sample mean, 4.998511, is very close to the theoretical mean, 5, of the exponential distribution.
The red vertical line is the calculated average sample mean,4.998511, of 40 exponentials from 1000 simulations.
sampleVar <- var(simMeans)
## sample variance: 0.6113794
theoVar <- (1/lambda)^2/n
## theoretical variance: 0.625
The graph shows that the sampling distribution of sample means of our exponential distribution follows normal distribution in accordance with the Central Limit Theory. If we increased the number of simulations (currently 1000), the distribution would be even closer to the standard normal distribution.
The plot above shows the histogram can be approximated with the normal distribution.
## sample confidence interval: 4.756 5.241
## theoretical confidence interval: 4.755 5.245
The confidence intervals prove that the mean and variance of the sample distribution are very close to that of a normal distribution.
The Central Limit Theorem (CLT) tells us that the sampling distribution of the sample mean is, at least approximately, normally distributed, regardless of the distribution of the underlying random sample. In this study we confirmed that the distribution of the sample mean of an underlying exponential distribution is approximately normally distributed.
#load required packages
library(ggplot2)
#set seed for reproducibilaty
set.seed(25)
#set parameters according to course project instructions
n <- 40
lambda <- 0.2
sim <- 1000
simMeans = NULL
#run simulations 1000 times and calculate their mean
for (i in 1 : sim) {simMeans = c(simMeans, mean(rexp(n,lambda)))}
head(simMeans)
## [1] 4.483931 5.007794 5.731865 4.465404 4.098390 5.855495
simMeansDf <- as.data.frame(simMeans)
#calculate the mean of the simulated exponential distributions
sampleMean <- mean(simMeans)
cat("sample Mean: ", sampleMean)
## sample Mean: 4.998511
#calcualate the theoretical mean of exponential distribution
theoMean <- 1/lambda
cat("theoretical mean: ", theoMean)
## theoretical mean: 5
g <- ggplot(simMeansDf, aes(x=simMeans))
g <- g + geom_histogram(binwidth = .2, color="black", fill="gray") +
geom_vline(xintercept = sampleMean, color="red", size=1, linetype=1) +
labs(x="Simulated Mean", y= "frequecy",title="Plot1 - Distribution of simulated means")
g
sampleVar <- var(simMeans)
cat("sample variance: ", sampleVar)
## sample variance: 0.6113794
theoVar <- (1/lambda)^2/n
cat("theoretical variance: ", theoVar)
## theoretical variance: 0.625
g <- ggplot(simMeansDf, aes(x=simMeans))
g <- g + geom_histogram(binwidth = .2, color="black", fill="gray" , aes(y=..density..))+
stat_function(fun=dnorm, args=list(mean=theoMean, sd=sd(simMeans)),
color="red", size =1) +
labs(x="Simulated Mean", y= "density",
title="Plot 2 - Simulated Exponential Distribution vs Normal Distribution ")
g
sampleConInterval <- round (mean(simMeans) + c(-1,1)*1.96*sd(simMeans)/sqrt(n),3)
cat("sample confidence interval: ", sampleConInterval)
## sample confidence interval: 4.756 5.241
theoConInterval <- theoMean + c(-1,1)*1.96*sqrt(theoVar)/sqrt(n);
cat("theoretical confidence interval: ", theoConInterval)
## theoretical confidence interval: 4.755 5.245