Synopsis - This report explored and simulates exponentional distribution and compares it against CLT. The distribution generates 40 exponentional variables 1000 times and calculates mean of those 40 variables. The report then analyzes the distribution of those means thus obtained.
ExpMean <- c(integer(0))
for( i in 1:1000) { ExpMean[i] <- mean(rexp(40,0.2)) }
Above piece of code generates a vector ExpMeanDist which contains the mean of 40 random exponential variables with lambda = 0.02.
Below is what this distribution looks like
## Loading required package: ggplot2
dist <- data.frame(Exp = ExpMean, label = c(rep("Mean Exp", times=1000)))
ggplot(dist,aes(x=Exp),bindwidth=0.5) +
geom_histogram(fill="red", colour="black", alpha=0.5, binwidth=0.5) +
geom_vline(xintercept=mean(ExpMean), lwd=0.5, linetype="dashed", colour="darkgreen") +
geom_vline(xintercept=5, lwd=0.5, linetype="solid", colour="orange")
The mean of this sample distribution,shown by dashed green line is 4.96
The theoritical mean shown by solid orange line is 5
Inference - both sample mean and the theoritical means are in agreement
The theoritical variance (\(\sigma\)) of this distribution is \((1/\lambda)^2\) i.e. 25
Assuming sample to comprise of iid variables, then the variance of the sample mean should be = \(\sigma^2/n\) = 25/40 = 0.62
The variance of the distribution of sample means, given by round(var(ExpMean),2) is 0.61
Inference - The sample is a normally distributed around variance of 0.62. The variance of distribution returned by var function and variance estimated by \(\sigma^2/n\) are inline with each other.
To evaluate above, let us compare the distribution with a another normal distribution of 1000 random exponential variables and plot them side by side to see if they appear similar. To do this, create another data fram with 1000 random exponetial variables with lambda 0.2 and label as ‘Random Exp’
RandDist <- data.frame(Exp = rexp(40,0.2), label= "Random Exp")
## Row bind two data frames to merge the data
dist <- rbind(dist,RandDist)
ggplot(dist,aes(x=Exp,fill=label)) +
geom_density(alpha=0.2) +
geom_vline(xintercept=c(mean(dist$Exp), mean(RandDist$Exp)), colour=c("red", "green"), linetype="dashed", lwd=1)
This is a clear applicaiton of CLT. according to which the distirbution of mean of iid samples is normally distributed.