Overview

We will run 1000 simulations where 40 random values are generated from an exponential distribution using the rexp() function. We will create a vector of the means of the 40 samples and then find the mean of the sample means as well as the standard deviation. We wil l check the sample means for a Gaussian distribution as an investigation of the Central Limit Theorem (“CLT”).

Simulation

We create a 1000 X 40 matrix that contains the simulation. This matrix has 1000 rows of 40 randomly generated values from an exponential distribution with lambda = 0.2. We then create a 1000 x 1 vector by taking the means of the simulation matrix.

set.seed(5927); ExpSim <- matrix(rexp(40*1000,.2), 1000, 40)
SMeans <- apply(ExpSim, 1, mean)

Analysis of the Sample Mean

We determine the theoretical mean and the sample mean and compare them.

lambda = .2; ActMean <- 1/lambda; SampleMean <- mean(SMeans)
data.frame(ActMean, SampleMean)
##   ActMean SampleMean
## 1       5   5.031298

As expected under the CLT, the sample mean is close to the theoretical mean.

Analysis of the Sample Standard Deviation

SampleSD <- sd(SMeans); ActSD <- 1/lambda/sqrt(40)
SampleVar <- SampleSD^2; ActVar <- ActSD^2
data.frame(ActSD, SampleSD,ActVar, SampleVar)
##       ActSD  SampleSD ActVar SampleVar
## 1 0.7905694 0.7913945  0.625 0.6263052

The sample standard deviation and theoretical standard deviation are very close in value. Similarly, the theoretical and sample variance are close in value.

Sample.CI <- SampleMean + c(-1,1)*qnorm(.975)*SampleSD
Act.CI <- ActMean + c(-1,1)*qnorm(.975)*ActSD
data.frame(Act.CI, Sample.CI)
##     Act.CI Sample.CI
## 1 3.450512  3.480193
## 2 6.549488  6.582402

The sample confidence intervals are close approximations of the theoretical confidence interval.

The distribution of the sample and the Gaussian distribution

library(ggplot2)
g <- ggplot(data.frame(SMeans), aes(x = SMeans)) + geom_histogram(binwidth=.3, 
        fill="#0066CC", colour = "#003399", aes(y = ..density..)) + 
        labs(title = "Figure 1: Sample Means with Overlay Plot of Gaussian Distribution", 
             x = "Sample Means", y = "Density")
g <- g + geom_vline(xintercept = ActMean, colour = "red") + 
        geom_vline(xintercept = SampleMean, colour = "orange") +
        geom_vline(xintercept = c(Sample.CI[1],Sample.CI[2]), colour = "green") + 
        stat_function(fun = dnorm, args = c(mean = ActMean, sd = ActSD), size = 1) + 
        geom_density(colour = "purple", size=1)
g

The histogram was created from the 1000x1 vector of sample means. The overlay of the Gaussian plot is generated with mean \(= \frac{1}{\lambda}=5\) and standard deviation \(= \frac{\sqrt{40}}{\lambda}=0.7905694\). The purple line represents the sampled density curve. The red line represents the theoretical mean of the distribution, the orange line is the sample mean. The means are roughly symmetric around the theoretical mean with the most frequently sampled mean is close to the theoretical mean. The green lines are the sample confidence interval.

Quantile Plot

qqnorm(SMeans, main = "Figure 2: Normal Q-Q Plot"); qqline(SMeans)

This graph plots the sample quantiles against the theoretical quantiles which gives an indication of the normality of the sample. The line represents the threshold where the sample meets its theoretical normal distribution, so the more point that fall on this line the closer the sample is to a Gaussian distribution. As we can see, many points lie close to or on the line.

Conclusions

Given the above analysis, the distribution of the means behave as predicted by the CLT. The sample mean is close to the theoretical mean, the same is true for the standard deviation/variance. The histogram of the distribution appears approximately normal versus the theoretical curve. The sample quantiles confirm an approximately Gaussian distribution.