In this project I will investigate the exponential distribution and compare it with the Central Limit Theorem. I will investigate the distribution of averages of 40 exponentials and illustrate via simulation and associated explanatory figures.
Exponential distribution describes times between events happening at constant rate λ with expected value 1/λ.
The exponential distribution is used to model the time between the occurrence of events in an interval of time, or the distance between events in space. The exponential distribution may be useful to model events such as
If the below conditions are true, then X is an exponential random variable, and the distribution of X is an exponential distribution.
First, I generate histograms of randomized exponential distribution with size 20,100,500 and 1000 in one graph to get a sense of four sample distributions. The PDF curve from Wiki is very close to the four sample distributions. The sample mean is drawn as a red vertical line in each sub graph. We can see that when size is 20 and 100, the sample mean is close to 5 which is the theoretical mean (1/0.2=5), when the size is 500 and 1000, the sample mean is extremely close to theoretical mean. The sample standard deviation is also getting closer to theoretical standard deviation(1/0.2=5) as the n increases.
lambda = 0.2
set.seed(5555)
r20<-NULL
r100<-NULL
r500<-NULL
r1000<-NULL
par(mfrow=c(2,2))
z.plot<-function(n){
r<-rexp(n,lambda)
hist(r,main=paste("Exponential distibution with size ",n),xlab="x")
abline(v=mean(r), col="red")
abline(v=sd(r), col="blue")
return(r)
}
r20 <- z.plot(20)
r100<-z.plot(100)
r500<-z.plot(500)
r1000<-z.plot(1000)
According to Central Limitation Theorem, For large enough n, the distribution of Sn is close to the normal distribution with mean µ and variance σ2/n. Sn is sample average, n=40. The code is to randomize 1000 sample averages of size 40. And compare the sample statistics with the theoretical statistics.
##generate simulated data, 1000 samples of size 40
r <- matrix(rexp(40*1000, lambda), nrow=1000, ncol=40)
##calculate the mean of each 40
mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(r[i,]))
hist(mns, main ="Simulated Averages", prob=TRUE, ylab="Density", xlab="Simulated Averages")
##caluclate the simulated mean and variance
simulatedMean<- mean(mns)
SimulatedVariance <- var(mns)
##calulate the theorectical mean and variance
theoreticalMean <- 1/0.2
theoreticalVar <- (1/0.2)^2 /40
abline(v=simulatedMean, col="blue")
abline(v=theoreticalMean, col="purple")
legend('right', c("simulated", "theoretical"), lty = c(2,1), col = c("blue", "purple"))
The simulated mean is 4.9707122 and the theoretical mean is 5. They are very close. The sample is in agreeance with CLT.
The simulated variance is 0.616453nand the theoretical variance is 0.625. They are very close too.The sample is in agreeance with CLT.
Plot a normal distribution to overlay on the sample distribution. Use theoretical mean and theoretical standard deviation. If the two plots match well with each other, it means the distribution is approximately normal.
hist(mns, main ="Simulated Averages", prob=TRUE, ylab="Density", xlab="Simulated Averages")
curve(dnorm(x,mean=5, sd=((1/0.2)^2 /40)^0.5), from=0, to=9,add = TRUE, col = "violet")
Based on graphs and data, the simulation data is in agreeance with Central Limiation Theorem.