In this project, we investigate the exponential distribution and compare it to the normal distribution under the Central Limit Theorem, which states that, regardless of the distribution, with a sufficiently large sample size the arithmetic mean of any random variable will be normally distributed. We compare the observed mean and variance to the theoretical mean and variance to show that the exponential distribution can, in fact, be modeled by a normal distribution.
First the exponential data is generated. An exponential distribution is a probability distribution that describes the time between events in a Poisson process. An exponential distribution is defined by a single value, lambda, which yields both the mean and standard deviation. For this particular investigation, it is given that the mean and standard deviation (sd) of the exponential distribution are 5. Thus, the value for lambda is 0.2 since mean = sd = 1/lambda = 5. We carry out 1000 simulations, each with 40 draws from a random exponential variable with mean and standard deviation equal to 5.
sims<-1000 #number of simulations of our random exp variable
n<-40 #number of draws per simulation
lambda<-0.2 #lambda = 1/mean = 1/sd
set.seed(557) #to create reproducibility
expdata<-matrix(rexp(sims*n,lambda),sims,n)
expmeans<-rowMeans(expdata)
Now we plot the means of the samples using a histogram.
hist(expmeans,
main = "Sample Mean Frequencies",
xlab = "Sample Mean", ylab = "Number of Samples",
col = "green")
It was given that the theoretical mean of the exponential distribution is 5.
theoreticalMean<-5
Now we calculate the sample mean and compare.
sampleMean<-mean(expmeans)
sampleMean
## [1] 5.055293
The sample mean and the theoretical mean differ only by:
require(scales) #to use the percent() function
## Loading required package: scales
percent(abs(theoreticalMean-sampleMean)/theoreticalMean)
## [1] "1.11%"
The theoretical variance is the square of the standard deviation divided by the sample size:
theoreticalVar<-(1/lambda)^2/n
theoreticalVar
## [1] 0.625
The sample variance is:
sampleVar<-var(expmeans)
sampleVar
## [1] 0.6188408
The sample mean and the theoretical mean differ only by:
percent(abs(theoreticalVar-sampleVar)/theoreticalVar)
## [1] "0.985%"
Finally, to compare the distribution of the means of the samples to the normal distribution, the following plot is created:
#density plot of the random exponential samples
plot(density(expmeans),
main="Sample Exponential Distribution vs. Normal Curve",
xlab="mean",ylab="proportion of samples",
type="l", col="red", lwd=3)
#plot of a normal distribution
xnorm<-seq(min(expmeans),max(expmeans),length=50)
ynorm<-dnorm(xnorm,mean=1/lambda,sd=(1/lambda)/sqrt(n))
lines(xnorm,ynorm, col="blue", lwd=3)
legend("topright", lwd=3, lty=1, col=c("red","blue"),
legend=c("sample distribution","normal distribution"))
Based on the analysis, it was shown that random exponential variable samples of size 40 support the Central Limit Theorem by behaving as a random normal variable. The sample mean differed from the theoretical mean by just over 1% and the sample variance differed from the theoretical variance by just under 1%. Finally, the density plot of the sample against a normal density showed that the densities are very similar. Therefore, to reiterate, means of sufficiently large random samples (in this case of size = 40) of an exponential variable can indeed be modeled by a normal distribution; this is an example of the Central Limit Theorem in action.