In this project we will investigate the Central Limit Theorem (CLT) for exponential distribution. According to the Central Limit Theorem under certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the distribution of the random variables. We will test the theorem with exponential distribution by simulating 1000 samples of size 40 and will compare the mean and variance of the distribution to the theoretical mean and variance of the distribution.
We will run a series of 1000 simulations to create a data set for comparison to theory. Each simulation will contain 40 observations and the expoential distribution function will be set to “rexp(40, 0.2)”.
Known values: lambda = 0.2, n = 40 (number of distributions), simulations = 1000
library(ggplot2)
set.seed(259)
lambda <- 0.2
nexp <- 40
nsim <- 1000
mns <- data.frame(ncol=2,nrow=1000)
names(mns) <- c("Index","Mean")
for (i in 1 : nsim)
{
mns[i,1] <- i
mns[i,2] <- mean(rexp(40,lambda))
}
sample_mean <- mean(mns$Mean)
sample_mean
## [1] 5.038664
theor_mean <- 1/lambda
theor_mean
## [1] 5
hist(mns$Mean,col="grey",breaks=100,main="Distribution of Means of rexp",xlab="Spread")
abline(v = theor_mean,col=3,lwd=2)
abline(v = sample_mean,col=2,lwd=2)
legend('topright', c("Sample Mean", "Theoretical Mean"),bty = "n",lty = c(1,1),
col = c(col = 3, col = 2))
If we observe histogram, with the Sample Mean and Theoretical Mean, we observed the distribution of means is centered in the theoretical mean.
Now we will compare the variance present in the sample means of the 1000 simulations to the theoretical varience of the population.
The variance of the sample means estimates the variance of the population by using the varience of the 1000 entries in the means vector times the sample size, 40. That is, ??2=Var(samplemeans)×N.
varxp <- ((1/lambda)^2)/nexp
varmean <- var(mns$Mean)
Theoretical Variance
varxp
## [1] 0.625
Variance of the Means
varmean
## [1] 0.6141231
hist(mns$Mean,
breaks = 100,
prob = TRUE,
main = "Exponential Distribution n = 1000",
xlab = "Spread")
lines(density(mns$Mean))
abline(v = 1/lambda, col = 3)
xfit <- seq(min(mns$Mean), max(mns$Mean), length = 100)
yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda/sqrt(40)))
lines(xfit, yfit, pch = 22, col = 4, lty = 2)
legend('topright', c("Simulated Values", "Theoretical Values"),
bty = "n", lty = c(1,2), col = c(4, 3))
So we see it can compare to a Normal distribution (Black represents the calculated Normal Distribution, and Red represents the theoretical one)
The q-q plot below suggests the normality. The theoretical quantiles again match closely with the actual quantiles. This methods of comparison prove that the distribution is approximately normal.
qqnorm(mns$Mean,main ="Normal Q-Q Plot")
qqline(mns$Mean,col = "3")