Investigate the exponential distribution in R and compare it with the Central Limit Theorem.The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:
set.seed(1)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
sim.n<-1000
lambda<-0.2
n<-40
sim.matrix<-matrix(rexp(sim.n*n,rate = lambda),sim.n,40)
sim.mean<-apply(sim.matrix,1,mean)
hist(sim.mean,col="lightblue")
sample.mn<-mean(sim.mean)
sample.sd<-sd(sim.mean)
theory.mn<-1/lambda
theory.sd<-1/lambda
sample.mn;theory.mn
## [1] 4.990025
## [1] 5
Center of the ditribution made by simulation is 4.990025 whereas theoretical mean is 5 with lambda=0.2. It implies that CLT works well and it generates pretty close estimation.
sample.var<-var(sim.mean)
theory.var<-(1/lambda)^2/n
sample.var;theory.var
## [1] 0.6177072
## [1] 0.625
Variance based on simulation is 0.6177072 while theoretical variance is 0.625. Both are close enugh.
sim.mean<-as.data.frame(sim.mean)
sim.mean.n<-as.numeric(unlist(sim.mean))
ggplot(sim.mean,aes(x=sim.mean))+
geom_histogram(aes(y=..density..),fill="lightgreen",col="black",alpha=0.5,position = "dodge",bins = 20)+
geom_density(aes(y=..density..),col='red',alpha=0.5)
The distribution of the mean of simulation looks very normally distributed with the mean of 4.9 and the variance of 0.6
qqnorm(sim.mean$sim.mean);qqline(sim.mean$sim.mean)
Based on the plotted graph, it seems pretty normally distributed since, data points are reasonably closely scattered aruond the qqline.