Simulation of the expotential Distribution in R and comparison with the Central Limit Theorem.

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean and the standard deviation of exponential distribution is 1/lambda. Lambda is set to 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials and will do a thousand simulations.

Comparing the Sample Mean with the Theoretical Mean of the distribution.

Running simulations of 40 expotentials and Calculate the Means and Variances

set.seed(2017)
simulations1<-1000
n<-40
lambda<-0.2
matrix<-matrix(rexp(n*simulations1,lambda), simulations1)
mns <- apply(matrix, 1, mean)
vr <- apply(matrix, 1, var)
dat<-as.data.frame(cbind(mns,vr))
colnames(dat)<-c("Means","Variance")
g4 <- ggplot(dat, aes(x = dat$Means))+geom_histogram(binwidth=.3, fill=DRGcolorstran[2],colour = "black")+geom_vline(xintercept = mean(dat$Means), size = 1,color="red")
g4+labs(title="Figure1. Histogram of Simulated Means",x="Simulation Means",y="Frequency")

SampleMean<-round(mean(dat$Means),2)
TheoriticalMean<-1/lambda

Sample Mean :4.98 Empirical Mean :5 No difference between the sample mean and theoritical mean noticed

Comparing the Sample Variance with the Theoretical Variance of the distribution.

g3 <- ggplot(dat, aes(x = dat$Variance))+geom_histogram(binwidth=2, fill=DRGcolorstran[1],colour = "black")+geom_vline(xintercept = mean(dat$Variance), size = 1,color="red")
g3+labs(title="Figure2. Histogram of Simulated Variances",x="Simulation Variances",y="Frequency")

SampleVariance<-round(mean(dat$Variance),2)
TheoriticalVariance<-(1/lambda)^2

Sample Variance:24.95 Theoritical Variance :25 No difference between the sample Variance and theoritical Variance noticed

Showing that the distribution of 1.000 averages of 40 random simulated Exponentials is approximately normal.

In red line we see the sample means,the normal distrubution with Theoritical Mean and variance in Black and simulated values distributions in Blue

set.seed(2017)
simulations2<-40000
dat2<-data.frame(rexp(simulations2,lambda))
colnames(dat2)<-c("random")
g1 <- ggplot(dat, aes(x = dat$Means))+geom_histogram(binwidth=.2, fill=DRGcolorstran[2],colour = "black",
aes(y = ..density..))+geom_density(size=1,
color="blue")+geom_vline(xintercept = mean(dat$Means), size = 1,color="red")
g1<-g1+stat_function(fun = dnorm , args = list(mean = 1/lambda, sd = 1/(lambda*sqrt(n))), size = 1,color="black")+labs(x="Simulation Means",y="Density")
g2 <- ggplot(dat2, aes(x = dat2$random))+geom_histogram(binwidth=.6, fill=DRGcolorstran[2],colour = "black",
aes(y = ..density..))+geom_density(size=1,
color="blue")+geom_vline(xintercept = mean(dat2$random), size = 1,color="red")
g2<-g2+stat_function(fun = dnorm , args = list(mean = 1/lambda, sd = 1/(lambda*sqrt(n))), size = 1,color="black")+labs(x="Simulation Expotentials",y="")
title1=textGrob("Figure3. Difference between the distribution \n of 40.000 random exponentials (right) \n and the distribution of 1.000 averages of 40 Exponentials (left)", gp=gpar(fontface="bold",fontsize=10))
grid.arrange(g1, g2, ncol=2,top=title1)

We see that the distribution of means of our sampled exponential distributions appear to follow a normal distribution but that is not the case also for the simulated expotentials theselves.