A Study of the Exponential Distribution and Central Limit Theorem

Mar. 2015 by Alfred Lu

Overview

This report is trying to explore the asymptotics of the exponential distribution and compare with normal distribution. The method we used is Monte Carlo simulating under a given explonential distribution, dividing into small chuncks (n=40), and calculating sample mean and sample variance (unbaised with Bessel’s correction) for further interrogation.

Simulation

The following code generates the samples, for the demonstration, the rate of the exponential distribution is set to 0.2

means <- NULL
vars <- NULL
lambda <- 0.2
set.seed(11)
nC <- 40
nSim <- 5000
meanPop <- 1/lambda
varPop <- 1/lambda^2
smps <- rexp(n=nSim*nC,rate=lambda)

Results

Comparing the sample mean to the theoretical mean of distribution

According to central limit theorem (CLT), the sample mean follows a normal distribution with \(\mu\) = \(\mu_{0}\) = \(\frac{1}{\lambda}\) = 5, \(\sigma^{2}\) = \(\frac{\sigma^{2}_{0}}{n}\) = \(\frac{1}{n\lambda^{2}}\) = 0.625, let’s explore the distribution of sample means, and compare the mean of sample means to the theoretical calculation. Finally, a quantile-quantile plot of first 100 sample means against the theoretical quantiles of a normal distribution (\(\mu_{0},\sigma^{2}_{0}\)) with 97.5% confidence period.

library(ggplot2)
library(car)
means <- apply(matrix(smps,nrow=nSim),1,mean)

ggplot() +
    geom_histogram(aes(x=means, y=..density..),
                   binwidth=0.1, colour="black", fill="white") +
    geom_density(aes(x=means,y=..density..), alpha=.1, colour="blue",
                 size=0.3) +
    geom_vline(aes(xintercept=meanPop), color="red", linetype="dashed")+
    theme_bw()

mean(means)
[1] 4.988261
var(means)
[1] 0.6353672
qqPlot(means[1:100],distribution="norm", mean=5, sd=0.625, envelope=.975)

Comparing the sample variance to the theoretical variance of ditribution

Regarding to the variance of sample variance, its expected value is the population variance, but its distribution mostly is not a normal one (more likly follows a chi-squared distribution).

vars <- apply(matrix(smps,nrow=nSim),1,var)

ggplot() +
    geom_histogram(aes(x=vars, y=..density..),
                   binwidth=2, colour="black", fill="white") +
    geom_density(aes(x=vars,y=..density..), alpha=.1, colour="blue",
                 size=0.3) +   
    geom_vline(aes(xintercept=varPop), color="red", linetype="dashed")+
    theme_bw()

mean(vars)
## [1] 24.88456

Comparing the distribution to a normal distribution

We perform a hypothesis test, with H0: \(\bar{X}\) ~ N(\(\mu_{0}\),\(\frac{\sigma^{2}_{0}}{n}\)), to evaluate if the mean of sample means following a normal distribution we estimated. Considering the total number of sample means we have is large, we are safly use Z score directly.

Calculation shows the if null hypothesis is correct, and confidencal level is set to with 20% probability to commit a type I error (a corresponding low probabilty to commit a type II error, since we don’t want to miss the detection of alternative hypothesis), the simulated means of sample means is still fall into this confidencal period, and we can’t reject the hypothesis.

Z_score <- (mean(means)-meanPop)/sqrt(varPop/nC)
qnorm(p=0.90,mean=0,sd=1,lower.tail=T) < abs(Z_score)
[1] FALSE