Exponential Distribution Compare to Central Limit Theorem

Overview

In this project we will investigate the Central Limit Theorem (CLT) for exponential distribution. According to the Central Limit Theorem under certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the distribution of the random variables. We will test the theorem with exponential distribution by simulating 1000 samples of size 40 and will compare the mean and variance of the distribution to the theoretical mean and variance of the distribution.

Simulations

Sample Mean vs Theoretical Mean

We will run a series of 1000 simulations to create a data set for comparison to theory. Each simulation will contain 40 observations and the expoential distribution function will be set to “rexp(40, 0.2)”.

Known values: lambda = 0.2, n = 40 (number of distributions), simulations = 1000

library(ggplot2)
set.seed(259)
lambda <- 0.2 
nexp <- 40 
nsim <- 1000 
mns <- data.frame(ncol=2,nrow=1000)  
names(mns) <- c("Index","Mean")
for (i in 1 : nsim) 
{
  mns[i,1] <- i
  mns[i,2] <- mean(rexp(40,lambda))
}

Mean of n = 1000

sample_mean <- mean(mns$Mean)
        sample_mean
## [1] 5.038664

Theoretical exponential mean of exponential distribution

theor_mean <- 1/lambda
theor_mean
## [1] 5

Plot

hist(mns$Mean,col="grey",breaks=100,main="Distribution of Means of rexp",xlab="Spread")
abline(v = theor_mean,col=3,lwd=2)
abline(v = sample_mean,col=2,lwd=2)
legend('topright', c("Sample Mean", "Theoretical Mean"),bty = "n",lty = c(1,1),
       col = c(col = 3, col = 2))

If we observe histogram, with the Sample Mean and Theoretical Mean, we observed the distribution of means is centered in the theoretical mean.

Sample Variance versus Theoretical Variance

Now we will compare the variance present in the sample means of the 1000 simulations to the theoretical varience of the population.

The variance of the sample means estimates the variance of the population by using the varience of the 1000 entries in the means vector times the sample size, 40. That is, ??2=Var(samplemeans)×N.

varxp <- ((1/lambda)^2)/nexp 
varmean <- var(mns$Mean) 

Theoretical Variance

varxp 
## [1] 0.625

Variance of the Means

varmean
## [1] 0.6141231

Plot

hist(mns$Mean, 
                        breaks = 100, 
                        prob = TRUE, 
                        main = "Exponential Distribution n = 1000", 
                        xlab = "Spread")
                        lines(density(mns$Mean))
                        abline(v = 1/lambda, col = 3)
                        xfit <- seq(min(mns$Mean), max(mns$Mean), length = 100)
                        yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda/sqrt(40)))
                        lines(xfit, yfit, pch = 22, col = 4, lty = 2)
                        legend('topright', c("Simulated Values", "Theoretical Values"), 
                        bty = "n", lty = c(1,2), col = c(4, 3))

So we see it can compare to a Normal distribution (Black represents the calculated Normal Distribution, and Red represents the theoretical one)

Show that the distribution is approximately normal

The q-q plot below suggests the normality. The theoretical quantiles again match closely with the actual quantiles. This methods of comparison prove that the distribution is approximately normal.

qqnorm(mns$Mean,main ="Normal Q-Q Plot")
qqline(mns$Mean,col = "3")