Author: Farhad.M
Date: Sep. 23, 2015
This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem via simulation. The investigation includes simulation of collections of exponential distributions. The sample and theoritical mean and variance are compared to those of theoretical, and it is finally shown that the distribution is approximately normal.
The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter, where both the mean and standard deviation of the distribution are 1/lambda. Lambda is set to 0.2 for all of the simulations in which the distribution of averages of 40 exponentials with 1000 simulations are investigated. The R code for initiating simulation is as follows.
set.seed(12321)
lambda<-0.2
n_dist<-40
n_sim<-10000
means = NULL
vars= NULL
for (i in 1 : n_sim) means = c(means, mean(rexp(n_dist, lambda)))
for (i in 1 : n_sim) vars = c(vars, var(rexp(n_dist, lambda)))
To answer the first question the mean of 40 random exponential distribution for 1000 collections is plotted as follows (The code can be found in Appendix). Also, the sample and theoretical means are reported for the comparison.
## Theoretical Sample
## Mean 5 4.99
Conclusion: Referring to the theoretical and sample mean values, the two values very close.
To answer the second question, the mean of variances of 40 random exponential distribution for 1000 simulation runs is plotted as follows (The code can be found in Appendix).
## Theoretical Sample
## Variance 25 24.88
Conclusion: It is observable in the graph that the sample variance is very close to the theoritical variance.
To answer the third question for investigating whether the distibution looks approximately like normal distribution, two different simulations are run and corresponding graphs are plotted. The first graph shows the distrbution of 1000 random expnoentilas, and the second graph shows the everage of 1000 simulations of 40 exponentials as follws.
Conclution: As it can be seen in the last two diagrams, both graphs have similar sample-means around 5. However, the average of 40 exponentials for a large number of simulations (corresponding to the right-side plot) looks like normal distribution while the collection of random exponentials (left-side graph) looks different from a normal distribution.
This simulation confirms the Central Limit Theorem (CLT), as the distribution of averages of a distribution behaves like a normal distibution for the sufficiently large smaple sizes.
Simulation setup:
set.seed(12321)
lambda<-0.2
n_dist<-40
n_sim<-10000
means = NULL
vars= NULL
for (i in 1 : n_sim) means = c(means, mean(rexp(n_dist, lambda)))
for (i in 1 : n_sim) vars = c(vars, var(rexp(n_dist, lambda)))
Sample mean versus theoretical mean
sample_mean <- round(mean(means), 2)
theoretical_mean <- 1/lambda
hist(means, main="Distribution of 1000 averages of 40 exponential distributions", xlab= "Mean", ylab="Density", breaks= 50, col="cyan4")
abline(v= sample_mean, col="red",lwd= 5)
matrix(data=c(theoretical_mean, sample_mean), nrow=1, ncol=2, byrow=TRUE, dimnames=list(c("Mean"), c("Theoretical", "Sample")))
Sample variance versus theoretical vriance
sample_var <- round(mean(vars), 2)
theoretical_var <- (1/lambda)^2
hist(vars, main="Distribution of 1000 variances of 40 exponential distributions", xlab= "Mean of Variances", ylab="Density", breaks= 50, col="blue4")
abline(v= sample_var, col="red",lwd= 5)
matrix(data=c(theoretical_var, sample_var), nrow=1, ncol=2, byrow=TRUE, dimnames=list(c("Variance"), c("Theoretical", "Sample")))
Comparing average of 40 exponentials with a large collection of exponentials
set.seed(12321)
re<- rexp(n_sim, lambda)
m<- mean(re)
par(mfrow=c(1,2))
hist(re, main="Distribution of 1000 exponentials", xlab= "Random Exponentials", ylab="Density", breaks= 1000, col="Green3")
abline(v= m, col="red",lwd= 5)
hg<- hist(means,breaks= 50,col="grey",main="", xlab= "Distribution of 1000 collections of avergaes of 40 exponentials", ylab="Density")
x_fit<-seq(min(means),max(means),length= 50)
y_fit<-dnorm(x_fit,mean=mean(means), sd=sd(means))
y_fit<-y_fit*diff(hg$mids[1:2])*length(means)
abline(v=mean(means),col="red",lwd=2)
lines(x_fit, y_fit,col="blue",lwd=4)