Exponential Distribution and the Central Limit Theorem

SONJA OFFWOOD

Introduction

The following document investigates the exponential distribution and attempts to verify the Central Limit Theorem. The analysis shows that the random variable defined by the mean of a distribution tends to be distributed normally. In this case the random variable we are using to show this result is the distributed exponentially.

Exponential Distribution

Lets first look at the Exponential Distribution. The following code sets our variables and number of simulations required for the rest of the document. To make sure the code is fully reproducible, we set the seed and load the required package for the analysis. We then simulate 1000 exponentially distributed values. We will plot a histogram of this data in a bit.

lambda=0.2
sim=1000
Xsim=40

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.1
set.seed(100)
simu=rexp(sim,lambda)

Distribution of Mean of Exponential Distribution

We now create a random variable defined by the mean of the exponential distribution. We want to examine the distribution of this random variable (and compare it to the distribution of the actual exponential distribution). After simulating 40 means (i.e. 40 times the mean of 1000 simulated exponential random variables), we can look at the mean and the variance of this distribution. We compare the theoretical mean of the exponential distribution to the mean of the sample of means of exponential distribution. We expect these to be similar in magnitude.

mns = NULL
for (i in 1 : sim) mns = c(mns, mean(rexp(Xsim,lambda)))

Theoretical vs Sample Mean

mean_theo =  1/lambda
mean_sim= mean(mns)

print(mean_theo)
## [1] 5
print(mean_sim)
## [1] 4.997191

The theoretical mean of the exponential distribution is 5 and the sample mean of the means of exponential distributions is 4.9971912 - as expected.

Lets view this on plots to compare the exponential distribution (first graph), to the distribution of the mean of the exponential distribution (second graph). The theoretical mean of the distribution is indicated on the first histogram, the sample mean is indicated on the second historgram.

qplot(simu, geom="histogram", xlab="Simulations", ylab="Frequency",fill=I("red"), 
      alpha=I(.5))+geom_vline(xintercept=mean_theo, col="red", lty=2, lwd=2)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(mns, geom="histogram", xlab="Simulations", ylab="Frequency",fill=I("blue"), 
      alpha=I(.5))+geom_vline(xintercept=mean_sim, col="blue", lty=2, lwd=2)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Theoretical vs Sample Variance

We perform a similar analysis with the variance. We compare the theoretical variance of the exponential distribution to the variance of the sample of means of the exponential distribution. We also compare the variance of the sample of means of the exponential distribution to the theoretical variance of the equivalent distribution by the Central Limit Theorem. We expect the sample variance to be the variance of the exponential distribution divided by 1000. Note that the standard deviation of the exponential distribution is 1/lambda, therefore the variance of the exponential distribution is 1/lambda^2.

var_theo =  1/lambda^2
var_sim = var(mns)
var_simCLT = var_theo/Xsim

print(var_sim)
## [1] 0.6481455
print(var_simCLT)
## [1] 0.625

The theoretical variance of the exponential distribution is 25. More importantly, we can see here that the sample variance of the distribution of means is 0.6481455. By the CLT we expect this variance to be 0.625, which as we can see is very much in line with our expectations.

Central Limit Theorem

To verify the Central Limit Theorem using the data above, we compare the plot of a histogram of the newly produced random variable (shown in blue above).

From the histogram, we can see the shape of the Gaussian distribution, we overlay the sahpe onto the previous histogram.

data_plot=data.frame(mns)
p <- ggplot(data_plot,aes(x = mns))+geom_histogram(aes(y=..density..), fill="blue",alpha=I(.5))
p<-p+labs(title="Distribution of Means of Exponential Distribution", y="Density")
p<-p +stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(var_simCLT)),color = "red", size = 1.0)
print(p)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

From the following analysis we have shown that

  • The mean of an exponentially distributed random variable, is normally disributed.
  • This is evidence of the Central Limit Theorm.
  • If we increase the number of simulations from 1000, the distribution will become more normally distributed.