Author: Wesley
We explore the mean of an exponential distribution where lamba = 0.2, through the repeated drawing of 40 exponentials repeated over 1000 times. We observe that the resulting distribution of means is comparable with that of a normal distribution thus proving the validity of the Central Limit Theorem (CLT).
## Warning: package 'dplyr' was built under R version 3.1.2
set.seed(11)
lambda = 0.2
# No. of draws from an exponential distribution for each iteration of the simulation
ndraws = 40
#Number of iterations for simulation
n = 1000
df = do.call(rbind,lapply(1:n, function(sim){
values = rexp(ndraws, lambda)
data.frame(
simNum=sim, # i-th iteration
values= values, # the values
simulationMean = mean(values)) # the mean of the values
}))
#remove values
newDF = df %>% select(-values) %>% unique
sample_mean = mean(newDF$simulationMean)
theoretical_mean = 1/lambda
theoretical_variance = ((1/lambda)/sqrt(ndraws))^2
sample_variance = var(newDF$simulationMean)
| Parameter | Theoretical | Sample |
|---|---|---|
| Mean | 5 | 4.9871567 |
| Variance | 0.625 | 0.6009383 |
Given lambda be set as 0.2, we observe that the distribution of sample means of 40 exponentials, 4.9871567, is vary similar but not exactly the same as the theoretical mean of 5.
ggplot(newDF, aes(simulationMean)) +
geom_histogram(aes(y=..density..), color='white',fill="grey") +
geom_density() +
geom_density(inherit.aes=F, aes(
rnorm(n=1000,
mean=theoretical_mean,
sd = sqrt(theoretical_variance))),
linetype='dashed') +
geom_vline(aes(xintercept=sample_mean))
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Fig Given lambda = 0.2, distribution of sample means of 40 exponentials (n=1000) gives a normal distribution with mean, r sample_mean.This is very close to but not exactly the same as the theoretical mean of r theoretical_mean. Dashed line follows that of the theoretical distribution.
qqplot(x=newDF$simulationMean, y=rnorm(n=1000, mean=theoretical_mean, sd = sqrt(theoretical_variance)))
Fig: The figure shows a quantile-quantile (q-q) plot which compares the quantiles of two distributions, (1) distribution of means of a exponential distribution (lambda = 0.2) with that of the a normal distribution. We see that the quantiles of the two distributions lie on the line of identity thus showing the distribution of means of an exponential distribution follows that of a normal distribution; proving the CLT.
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-apple-darwin13.1.0 (64-bit)
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_1.0.0 dplyr_0.4.1
##
## loaded via a namespace (and not attached):
## [1] DBI_0.3.1 MASS_7.3-35 Rcpp_0.11.3 assertthat_0.1
## [5] colorspace_1.2-4 digest_0.6.7 evaluate_0.5.5 formatR_1.0
## [9] grid_3.1.1 gtable_0.1.2 htmltools_0.2.6 knitr_1.9.4
## [13] labeling_0.3 lazyeval_0.1.10 magrittr_1.5 munsell_0.4.2
## [17] parallel_3.1.1 plyr_1.8.1 proto_0.3-10 reshape2_1.4.1
## [21] rmarkdown_0.4.2 scales_0.2.4 stringr_0.6.2 tools_3.1.1