Central Limit Theorem: Simulated vs Theorectical means of the exponential distribution

Author: Wesley

Overview

We explore the mean of an exponential distribution where lamba = 0.2, through the repeated drawing of 40 exponentials repeated over 1000 times. We observe that the resulting distribution of means is comparable with that of a normal distribution thus proving the validity of the Central Limit Theorem (CLT).

Simulations:

## Warning: package 'dplyr' was built under R version 3.1.2

Initial Parameters

set.seed(11)
lambda                 = 0.2
# No. of draws from an exponential distribution for each iteration of the simulation
ndraws                 = 40
#Number of iterations for simulation
n                      = 1000

Simulation

df = do.call(rbind,lapply(1:n, function(sim){
        values = rexp(ndraws, lambda)
        data.frame(
                  simNum=sim,                    # i-th iteration
                  values= values,                # the values
                  simulationMean = mean(values)) # the mean of the values
}))

#remove values
newDF = df %>% select(-values) %>% unique

1. Comparing Sample Mean against Theoretical mean

sample_mean      = mean(newDF$simulationMean)
theoretical_mean = 1/lambda

2. Sample Variance versus Theoretical Variance

theoretical_variance = ((1/lambda)/sqrt(ndraws))^2
sample_variance      = var(newDF$simulationMean)

Parameter	Theoretical	Sample
Mean	5	4.9871567
Variance	0.625	0.6009383

Distribution

Given lambda be set as 0.2, we observe that the distribution of sample means of 40 exponentials, 4.9871567, is vary similar but not exactly the same as the theoretical mean of 5.

3. Show that the distribution is approximately normal

ggplot(newDF, aes(simulationMean))                            +
geom_histogram(aes(y=..density..), color='white',fill="grey") +
geom_density()                                                +
geom_density(inherit.aes=F, aes(
        rnorm(n=1000,
              mean=theoretical_mean,
              sd = sqrt(theoretical_variance))),
              linetype='dashed')                              +
geom_vline(aes(xintercept=sample_mean))

## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Fig Given lambda = 0.2, distribution of sample means of 40 exponentials (n=1000) gives a normal distribution with mean, r sample_mean.This is very close to but not exactly the same as the theoretical mean of r theoretical_mean. Dashed line follows that of the theoretical distribution.

qqplot(x=newDF$simulationMean, y=rnorm(n=1000, mean=theoretical_mean, sd = sqrt(theoretical_variance)))

Fig: The figure shows a quantile-quantile (q-q) plot which compares the quantiles of two distributions, (1) distribution of means of a exponential distribution (lambda = 0.2) with that of the a normal distribution. We see that the quantiles of the two distributions lie on the line of identity thus showing the distribution of means of an exponential distribution follows that of a normal distribution; proving the CLT.

sessionInfo()

## R version 3.1.1 (2014-07-10)
## Platform: x86_64-apple-darwin13.1.0 (64-bit)
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_1.0.0 dplyr_0.4.1  
## 
## loaded via a namespace (and not attached):
##  [1] DBI_0.3.1        MASS_7.3-35      Rcpp_0.11.3      assertthat_0.1  
##  [5] colorspace_1.2-4 digest_0.6.7     evaluate_0.5.5   formatR_1.0     
##  [9] grid_3.1.1       gtable_0.1.2     htmltools_0.2.6  knitr_1.9.4     
## [13] labeling_0.3     lazyeval_0.1.10  magrittr_1.5     munsell_0.4.2   
## [17] parallel_3.1.1   plyr_1.8.1       proto_0.3-10     reshape2_1.4.1  
## [21] rmarkdown_0.4.2  scales_0.2.4     stringr_0.6.2    tools_3.1.1