Project Description

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Project Simulation

Sample Mean vs Theoretical Mean

We firstly analyze the sample mean and compere it to the theoretical mean

library(ggplot2)
set.seed(25082502)
lambda <- 0.2 ## Set lambda as per instructions
nexp <- 40 ## number of distributions
nsim <- 1000 ## number of simulations
mns <- NULL  ## set msn to null 
for (i in 1 : nsim) mns <- c(mns, mean(rexp(40,lambda)))
hist(mns,col="blue",main="Distribution of Means of rexp")

If we observe histogram, with a mean of 4.9900252 , we can compare to 1/lambda (5), we observed the distribution of means is centered in the theoretical mean.

Sample Variance vs Theoretical Variance

varxp <- ((1/lambda)^2)/nexp ## theoretical variance
varmean <- var(mns) ## variance of the means

And we see that the theoretical variance is 0.625, whereas the variance of the means is 0.6111165, which they can be compared.

Comparing to a Normal Distribution

Accordoing to above histagram, is is “similar” to a normal distribution (i.e. what the Central Limit Theorem states “In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution”)

plotdata <- data.frame(mns)
plot1 <- ggplot(plotdata,aes(x = mns))
plot1 <- plot1 +geom_histogram(aes(y=..density..), colour="black",fill="green")
plot1<-plot1+labs(title="Distribution of Means of rexp", y="Density")
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(varxp)),color = "red", size = 1.0)
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=mean(mns), sd=sqrt(varmean)),color = "black", size = 1.0)
print(plot1)

So we see it can compare to a Normal distribution (Black represents the calculated Normal Distribution, and Red represents the theoretical one)

What would happen with a bigger number of simulations… let’s say 100.000:

set.seed(25082502)
lambda <- 0.2 ## Set lambda as per instructions
nexp <- 40 ## number of distributions
nsim <- 100000 ## number of simulations
mns <- NULL  ## set msn to null 
for (i in 1 : nsim) mns <- c(mns, mean(rexp(40,lambda)))
varxp <- ((1/lambda)^2)/nexp ## theoretical variance
varmean <- var(mns) ## variance of the means

plotdata <- data.frame(mns)
plot1 <- ggplot(plotdata,aes(x = mns))
plot1 <- plot1 +geom_histogram(aes(y=..density..), colour="black",fill="green")
plot1<-plot1+labs(title="Distribution of Means of rexp", y="Density")
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt(varxp)),color = "red", size = 1.0)
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=mean(mns), sd=sqrt(varmean)),color = "black", size = 1.0)
print(plot1)

Conclusion

It was observed that with a larger number of observations in our example 100000, the approximation is better.