Mar. 2015 by Alfred Lu
This report is trying to explore the asymptotics of the exponential distribution and compare with normal distribution. The method we used is Monte Carlo simulating under a given explonential distribution, dividing into small chuncks (n=40), and calculating sample mean and sample variance (unbaised with Bessel’s correction) for further interrogation.
The following code generates the samples, for the demonstration, the rate of the exponential distribution is set to 0.2
means <- NULL
vars <- NULL
lambda <- 0.2
set.seed(11)
nC <- 40
nSim <- 5000
meanPop <- 1/lambda
varPop <- 1/lambda^2
smps <- rexp(n=nSim*nC,rate=lambda)
According to central limit theorem (CLT), the sample mean follows a normal distribution with \(\mu\) = \(\mu_{0}\) = \(\frac{1}{\lambda}\) = 5, \(\sigma^{2}\) = \(\frac{\sigma^{2}_{0}}{n}\) = \(\frac{1}{n\lambda^{2}}\) = 0.625, let’s explore the distribution of sample means, and compare the mean of sample means to the theoretical calculation. Finally, a quantile-quantile plot of first 100 sample means against the theoretical quantiles of a normal distribution (\(\mu_{0},\sigma^{2}_{0}\)) with 97.5% confidence period.
library(ggplot2)
library(car)
means <- apply(matrix(smps,nrow=nSim),1,mean)
ggplot() +
geom_histogram(aes(x=means, y=..density..),
binwidth=0.1, colour="black", fill="white") +
geom_density(aes(x=means,y=..density..), alpha=.1, colour="blue",
size=0.3) +
geom_vline(aes(xintercept=meanPop), color="red", linetype="dashed")+
theme_bw()
mean(means)
[1] 4.988261
var(means)
[1] 0.6353672
qqPlot(means[1:100],distribution="norm", mean=5, sd=0.625, envelope=.975)
Regarding to the variance of sample variance, its expected value is the population variance, but its distribution mostly is not a normal one (more likly follows a chi-squared distribution).
vars <- apply(matrix(smps,nrow=nSim),1,var)
ggplot() +
geom_histogram(aes(x=vars, y=..density..),
binwidth=2, colour="black", fill="white") +
geom_density(aes(x=vars,y=..density..), alpha=.1, colour="blue",
size=0.3) +
geom_vline(aes(xintercept=varPop), color="red", linetype="dashed")+
theme_bw()
mean(vars)
## [1] 24.88456
We perform a hypothesis test, with H0: \(\bar{X}\) ~ N(\(\mu_{0}\),\(\frac{\sigma^{2}_{0}}{n}\)), to evaluate if the mean of sample means following a normal distribution we estimated. Considering the total number of sample means we have is large, we are safly use Z score directly.
Calculation shows the if null hypothesis is correct, and confidencal level is set to with 20% probability to commit a type I error (a corresponding low probabilty to commit a type II error, since we don’t want to miss the detection of alternative hypothesis), the simulated means of sample means is still fall into this confidencal period, and we can’t reject the hypothesis.
Z_score <- (mean(means)-meanPop)/sqrt(varPop/nC)
qnorm(p=0.90,mean=0,sd=1,lower.tail=T) < abs(Z_score)
[1] FALSE