In this report we will investigate the exponential distribution and compare it with the Central Limit Theorem (CLT).
The CLT states that the distribution of the sum (or average) of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution. The exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.
The mean of exponential distribution is \(\mu= \frac{1}{\lambda}\) and the standard deviation is also \(\sigma= \frac{1}{\lambda}\).
With \({\lambda}\)=0.2, we will investigate the distribution of the averages of 40 exponentials doing a thousand simulations.
library(ggplot2)
lambda <- 0.2
number.exponentials <- 40
simulations.count <- 1000
#run the simulation and store it in a matrix
simulation.data = matrix(data= rexp(n=number.exponentials * simulations.count, rate=lambda),
nrow=simulations.count)
#calculate the sampling distribution
sampling.dist <- data.frame( mean = apply(simulation.data, 1, mean))
sample.mean <- mean(sampling.dist[,1])
Given that the theoretical mean is \(\mu= \frac{1}{\lambda}\) which is:
1/lambda
## [1] 5
and the calculated sample mean is:
sample.mean #output the sample mean
## [1] 4.975964
We can conclude that indeed the sample mean is very close to the theoretical mean. Below is a plot of the sampling distribution with its mean.
ggplot(sampling.dist, aes(x=mean)) +
geom_histogram(binwidth=.1, colour="black", fill="#99c5ff") +
geom_vline(xintercept=sample.mean, size = 1.3, color = '#cc0000', linetype="dashed")+
labs(title="Sampling Distribution with Mean") +
labs(x="", y="")
The variance of standard deviation \(\sigma\) is Var = \(\sigma^2\).
The expected standard deviation \(\sigma\) of a exponential distribution of rate \(\lambda\) is \(\sigma = \frac{1/\lambda}{\sqrt{n}}\)
The theoretical variance for the sampling distribution is:
# Calculate the theoretical standard deviation
theoretical.standard.deviation <- 1/lambda/sqrt(number.exponentials)
# Calculate the theoretical variance
theoretical.variance <- theoretical.standard.deviation ^2
theoretical.variance
## [1] 0.625
The calculated variance of the distribution is :
#Calculate the standard deviation
sampling.dist.standard.deviation <- sd(sampling.dist$mean)
#Calculate the variance
variance <- sampling.dist.standard.deviation ^2
variance
## [1] 0.6634679
The difference between the theoretical variance and the calculated variance is:
theoretical.variance - variance
## [1] -0.03846793
As we can see the two variances are very close.
Comparing the sampling distribution on the graph below, we can see that it resembles a normal distribution.
mu <- 1/lambda
sd <- 1/lambda/sqrt(number.exponentials)
ggplot(sampling.dist, aes(x=mean)) +
geom_histogram(binwidth=.1, aes(y=..density..), colour="black", fill="#99c5ff") +
stat_function(fun = dnorm, arg = list(mean = mu, sd = sd ), colour = "blue", size=1) +
geom_density(colour="green", fill="green", alpha=0.3, size=1) +
geom_vline(xintercept = mu, size=1, colour="#CC0000", linetype="dashed") +
labs(title="Comparing to a Normal Distribution") +
labs(x="", y="")