Overview

This exercise uses simulation to prove the Central Limit Theorem for exponentials. To do this, we are randomly generating 1,000 means and variances for a given lambda and comparing these values to what is expected theoretically. Spoiler: We prove it is true!

Simulations

We want to run 1,000 simulations of 40 exponentials. The lambda is 0.2. We develop values for the means of these simulations using a ‘for’ loop and a vector named mns. I was also curious about the standard deviations of each simulation. Both the mean and the sd should have a theoretical value of 1/lambda, or 5. The standard error (SE) of the distribution of means is a function of n.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.2.3

sim <- 1000
n <- 40
lambda <- 0.2
set.seed(500)

mns = NULL
for (i in 1:sim)
  mns = c(mns, mean(rexp(n, lambda)))

sds = NULL
for (i in 1:sim)
  sds = c(sds, sd(rexp(n, lambda)))

t_mean <- 1/lambda
t_variance <- (1/lambda^2)/n

For the seed I have chosen, I get a sample mean of 5.01 (compared to theoretical mean of 5) and a sample variance of 0.620 (compared to theoretical variance of 0.625) - pretty close!

mean(mns)

## [1] 5.010562

t_mean

## [1] 5

var(mns)

## [1] 0.6201215

t_variance

## [1] 0.625

Sample Mean vs. Theoretical Mean

We have already shown above that the means are approximately equal. Below, we provide a chart of the same values to show this visually. The theoretical mean is yellow, and the sample mean is black.

m.table <- data.frame(mns)
m <- ggplot(m.table, aes(x=mns))
m <- m + geom_histogram(bins = 50, fill='#A4A4A4', color="darkred")
m <- m + xlab("Sample Means") + ylab("Count of Means") + labs(title = "Validating the CLT with Exponential Distributions")
m <- m + geom_vline(xintercept = mean(mns), color="black") + geom_vline(xintercept = t_mean, color="yellow")
m

Sample Variance vs. Theoretical Variance

We have already shown above that the variances are approximately equal. Below, we provide a chart of the same values to show this visually. The theoretical variance is yellow, and the sample variance is black.

NOTE: I couldn’t figure out how to create a separate normal distribution line. Any advice would be much appreciated! (You can see that mine is a density function here.)

v.table <- data.frame(mns)
v <- ggplot(v.table, aes(x=mns))
v <- v + geom_histogram(bins = 50, fill='#A4A4A4', color="darkred")
v <- v + xlab("Sample Means") + ylab("Count of Means") + labs(title = "Validating the CLT with Exponential Distributions")
v <- v + stat_function(fun = dnorm, args = list(mean = t_mean, sd = t_variance))
v

Distribution

To prove normality, I am doing two checks - the first is to convert the sample set to its Z scores, but subtracting the mean and dividing by the standard deviation. This should give a mean of 0 with a standard deviation of 1, which is what we find. Second, I created a QQ Plot, for which linearity suggests normality. This also proves true.

normvar = (mns-(1/lambda))/sqrt((1/(lambda^2)/n))

q <- ggplot(m.table, aes(sample=mns))
q <- q + geom_point(stat = "qq") + labs(title = "Q-Q Plot Suggesting Normality")

mean(normvar)

## [1] 0.01335958

sd(normvar)

## [1] 0.9960895

Statistical Inference Project

Brad Allen

January 20, 2016

Overview

Simulations

Sample Mean vs. Theoretical Mean

Sample Variance vs. Theoretical Variance

Distribution