Overview

This report demonstrates the Law of Large Numbers and the Central Limit

Theorem. It includes code to simulate a large data set of random exponential

variables; calculate the sample mean and variance; compare these parameters

to the population parameters; and plot the distributions to reveal the bell

shape curve. This code and analysis is created using R 3.3.1 on

August 14, 2016.

Simulations

This R code uses rexp to create 1000 simulated samples, each of size 40.

The means and variances of these samples are also calculated. The population

lambda = .02, the mean = 1/lambda and the standard deviation = 1/lamda.

lambda <- .02
mean   <- 1/lambda
sigma  <- 1/lambda
nosims <- 1000
n      <- 40
semean <- sigma/sqrt(n)
popvar <- sigma^2
set.seed(10109)
mns    <- NULL
for (i in 1:nosims) mns = c(mns, mean(rexp(n,lambda)))
stdev   <- NULL
for (i in 1:nosims) stdev= c(stdev, sd(rexp(n,lambda)))

Sample Mean vs. Theoretical Mean

This first chart compares the Sample Mean vs. Theoretical Mean to

demonstrate the Law of Large Numbers. The mean approaches

what it is trying to estimate as sample size increases. This chart plots

1000 means calculated from random samples of exponential variables, each

sample size 40. It plots the population mean = 1/lambda = 50, vs. the

overall mean of the the 1000 sample means.

means <- cumsum(mns)/(1:nosims)
library(ggplot2)
g     <- ggplot(data.frame(x=1:nosims, y=means), 
                aes(x=x, y=y))
g     <- g + geom_hline(yintercept = mean) +
  geom_line(size=2)
g     <- g + labs(x = "Number of observations", 
                  y= "cumulative mean")
g

Sample Variance vs. Theoretical Variance

The Central Limit Theorem also states that not only are averages

approximately normal, but they are centered around the mean and the standard

deviation is equal to the standard error of the mean. The code that follows

demonstrates this holds true also for the variance = (standard error mean)^2.

Here the variances hover around (1/lambda)^2 = 2500 as the number of

observations(simulations) increases.

variances <- cumsum(stdev^2)/(1:nosims)
g         <- ggplot(data.frame(x=1:nosims, y=variances), 
                aes(x=x, y=y))
g         <- g + geom_hline(yintercept = popvar) +
  geom_line(size=2)
g     <- g + labs(x = "Number of observations", 
                  y= "cumulative variance")
g

Distribution

The Central Limit Theorem states that the distribution of the averages of

independant and identically distributed variables becomes that of a standard

normal as the sample sizes increases. To demonstrate this the following

histogram plots the distribution of means calculated from 1000 samples

of 40 random exponentials. The shape of the means, is bell shaped,

suggesting that the means follow a normal distribution.

par(mfrow=c(1,1))
hist(mns, main="Means of 1000 Exponential Samples",
     sub = "Output of Simulations with n=40", 
     xlab = "Theoretical Mean = 1/lambda = 50", 
     cex.main=0.75, cex.sub=0.75, cex.lab=0.75, cex.axis=0.75)
abline(v=mean, col="red")
abline(v=mean+sigma/sqrt(n), col="green")
abline(v=mean-sigma/sqrt(n), col="green")

Finally, the following code confirms that the approximately normal

distribution of the sample means by normalizing the sample and comparing

its mean and sd to the normal (0,1) respectively.

samplingmean <- mean(mns)
semean       <- sigma/sqrt(n)
normalizedsample <- (mns-samplingmean)/semean
mean(normalizedsample)
## [1] 2.162354e-16
sd(normalizedsample)
## [1] 0.9894667
hist(normalizedsample)