Overview

This project investigated the exponential distribution in R and compare it with the Central Limit Theorem. The project sets lambda = 0.2 for all of the simulations. The distribution of averages of 40 exponentials was investigated. A thousand simulations were made.

For our purposes, the CLT states that the distribution of averages of iid variables, properly normalized, becomes that of a standard normal as the sample size increases

Simulations

By employing rexp(), exponential distributions are simulated with the help of the replicate() function.

set.seed(2017)
lambda <- 0.2 
exponentials <- 40 
simulations <- 1000

expDistSimulations <- replicate(simulations, rexp(n=exponentials ,rate=lambda))
expDistMeans <- apply(expDistSimulations , 2, mean)
# Mean normalization
NormEDM <- (expDistMeans-mean(expDistMeans))/sd(expDistMeans)

m <- rbind(c(1, 1), c(2, 3))
layout(m)
par(mar = c(3, 3, 0, 0))


hist(expDistSimulations, main="", ylim=c(0,0.2),col=4, prob=TRUE)
lines(density(expDistSimulations), lwd=3, col="red")
hist(expDistMeans, main="", col=4, prob=TRUE)
lines(density(expDistMeans), lwd=3, col="red")
hist(NormEDM, main="", col=4, prob=TRUE)
lines(density(NormEDM), lwd=3, col="red")

The previous graphic shows a the top the distribution the simulation, on the lower left the distribution of the means of those same simulations and on the lower right the normalization of the means of the simulations

Sample Mean versus Theoretical Mean

library(plotly)
sampleMean <- mean(expDistMeans)
theoreticalMean<- 1/lambda
means<-c(sampleMean,theoreticalMean)
names(means) <- c("Sample Mean ","Theoretical Mean ")
means
     Sample Mean  Theoretical Mean  
         4.982863          5.000000 

Sample Variance versus Theoretical Variance

sampleVar <- var(expDistMeans)
theoreticalVar<- ((1/lambda)*(1/sqrt(exponentials)))^2
vars<-c(sampleVar,theoreticalVar)
names(vars) <- c("Sample Variance ","Theoretical Variance ")
vars
     Sample Variance  Theoretical Variance  
            0.6267826             0.6250000 

Distribution

library(ggplot2)

ggplot() + aes(expDistMeans)+
        geom_histogram(aes(y=..density..),binwidth=0.08, fill="blue",
                      alpha = .6) + 
        geom_density(col=4, size=1) + 
        geom_vline(xintercept=5, col=2) +
        labs(title="Sampling Distribution - Distribution  of means for 1000 simulations") +
        labs(x="Means", y="Density")

According to the Central Limit Theorem, the distribution of simulated means should be nearly normal. The mean of the sampling distribution should be approximately equal to the population mean ( 5 ) and the standard error (the standard deviation of simulated means should be approximately equal to the SD of the population divided by square root of sample size:

(5)/sqrt(40)
## [1] 0.7905694
sqrt(sampleVar)
## [1] 0.791696