Overview

In this report we will investigate the exponential distribution and compare it with the Central Limit Theorem (CLT).

The CLT states that the distribution of the sum (or average) of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution. The exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.

The mean of exponential distribution is \(\mu= \frac{1}{\lambda}\) and the standard deviation is also \(\sigma= \frac{1}{\lambda}\).

With \({\lambda}\)=0.2, we will investigate the distribution of the averages of 40 exponentials doing a thousand simulations.

Simulation

library(ggplot2)

lambda <- 0.2  
number.exponentials <- 40 
simulations.count <- 1000

#run the simulation and store it in a matrix
simulation.data = matrix(data= rexp(n=number.exponentials * simulations.count, rate=lambda), 
                         nrow=simulations.count)


#calculate the sampling distribution
sampling.dist <- data.frame( mean = apply(simulation.data, 1, mean))
sample.mean <- mean(sampling.dist[,1])

Sample Mean versus Theoretical Mean

Given that the theoretical mean is \(\mu= \frac{1}{\lambda}\) which is:

1/lambda
## [1] 5

and the calculated sample mean is:

sample.mean  #output the sample mean
## [1] 4.975964

We can conclude that indeed the sample mean is very close to the theoretical mean. Below is a plot of the sampling distribution with its mean.

ggplot(sampling.dist, aes(x=mean)) + 
  geom_histogram(binwidth=.1, colour="black", fill="#99c5ff") + 
  geom_vline(xintercept=sample.mean, size = 1.3, color = '#cc0000', linetype="dashed")+ 
  labs(title="Sampling Distribution with Mean") +
  labs(x="", y="")

Sample Variance versus Theoretical Variance

The variance of standard deviation \(\sigma\) is Var = \(\sigma^2\).
The expected standard deviation \(\sigma\) of a exponential distribution of rate \(\lambda\) is \(\sigma = \frac{1/\lambda}{\sqrt{n}}\)

The theoretical variance for the sampling distribution is:

# Calculate the theoretical standard deviation 
theoretical.standard.deviation <- 1/lambda/sqrt(number.exponentials)  
# Calculate the theoretical variance
theoretical.variance <-  theoretical.standard.deviation ^2
theoretical.variance
## [1] 0.625

The calculated variance of the distribution is :

#Calculate the standard deviation
sampling.dist.standard.deviation <- sd(sampling.dist$mean)  
#Calculate the variance
variance <- sampling.dist.standard.deviation ^2
variance
## [1] 0.6634679

The difference between the theoretical variance and the calculated variance is:

theoretical.variance - variance
## [1] -0.03846793

As we can see the two variances are very close.

Distribution

Comparing the sampling distribution on the graph below, we can see that it resembles a normal distribution.

mu <- 1/lambda
sd <- 1/lambda/sqrt(number.exponentials)

ggplot(sampling.dist, aes(x=mean)) + 
  geom_histogram(binwidth=.1, aes(y=..density..),  colour="black", fill="#99c5ff") + 
  stat_function(fun = dnorm, arg = list(mean = mu, sd = sd ), colour = "blue", size=1) +
  geom_density(colour="green", fill="green", alpha=0.3, size=1) +
  geom_vline(xintercept = mu, size=1, colour="#CC0000", linetype="dashed") + 
  labs(title="Comparing to a Normal Distribution") +
  labs(x="", y="")