Overview
This report demonstrates the Law of Large Numbers and the Central Limit
Theorem. It includes code to simulate a large data set of random exponential
variables; calculate the sample mean and variance; compare these parameters
to the population parameters; and plot the distributions to reveal the bell
shape curve. This code and analysis is created using R 3.3.1 on
August 14, 2016.
Simulations
This R code uses rexp to create 1000 simulated samples, each of size 40.
The means and variances of these samples are also calculated. The population
lambda = .02, the mean = 1/lambda and the standard deviation = 1/lamda.
lambda <- .02
mean <- 1/lambda
sigma <- 1/lambda
nosims <- 1000
n <- 40
semean <- sigma/sqrt(n)
popvar <- sigma^2
set.seed(10109)
mns <- NULL
for (i in 1:nosims) mns = c(mns, mean(rexp(n,lambda)))
stdev <- NULL
for (i in 1:nosims) stdev= c(stdev, sd(rexp(n,lambda)))
Sample Mean vs. Theoretical Mean
This first chart compares the Sample Mean vs. Theoretical Mean to
demonstrate the Law of Large Numbers. The mean approaches
what it is trying to estimate as sample size increases. This chart plots
1000 means calculated from random samples of exponential variables, each
sample size 40. It plots the population mean = 1/lambda = 50, vs. the
overall mean of the the 1000 sample means.
means <- cumsum(mns)/(1:nosims)
library(ggplot2)
g <- ggplot(data.frame(x=1:nosims, y=means),
aes(x=x, y=y))
g <- g + geom_hline(yintercept = mean) +
geom_line(size=2)
g <- g + labs(x = "Number of observations",
y= "cumulative mean")
g

Sample Variance vs. Theoretical Variance
The Central Limit Theorem also states that not only are averages
approximately normal, but they are centered around the mean and the standard
deviation is equal to the standard error of the mean. The code that follows
demonstrates this holds true also for the variance = (standard error mean)^2.
Here the variances hover around (1/lambda)^2 = 2500 as the number of
observations(simulations) increases.
variances <- cumsum(stdev^2)/(1:nosims)
g <- ggplot(data.frame(x=1:nosims, y=variances),
aes(x=x, y=y))
g <- g + geom_hline(yintercept = popvar) +
geom_line(size=2)
g <- g + labs(x = "Number of observations",
y= "cumulative variance")
g

Distribution
The Central Limit Theorem states that the distribution of the averages of
independant and identically distributed variables becomes that of a standard
normal as the sample sizes increases. To demonstrate this the following
histogram plots the distribution of means calculated from 1000 samples
of 40 random exponentials. The shape of the means, is bell shaped,
suggesting that the means follow a normal distribution.
par(mfrow=c(1,1))
hist(mns, main="Means of 1000 Exponential Samples",
sub = "Output of Simulations with n=40",
xlab = "Theoretical Mean = 1/lambda = 50",
cex.main=0.75, cex.sub=0.75, cex.lab=0.75, cex.axis=0.75)
abline(v=mean, col="red")
abline(v=mean+sigma/sqrt(n), col="green")
abline(v=mean-sigma/sqrt(n), col="green")

Finally, the following code confirms that the approximately normal
distribution of the sample means by normalizing the sample and comparing
its mean and sd to the normal (0,1) respectively.
samplingmean <- mean(mns)
semean <- sigma/sqrt(n)
normalizedsample <- (mns-samplingmean)/semean
mean(normalizedsample)
## [1] 2.162354e-16
sd(normalizedsample)
## [1] 0.9894667
hist(normalizedsample)
