Overview

This is part 1 of the Statistical Inference Course Project by Coursera. The goal of this project is to use simulations to explore the properties of exponential distribution and compare it with the Central Limit Theorem. We will find the sample mean and variance of the 40 samples and compare them with the theoretical mean and variance. According to the Central Limit Theorem, the sample mean and variance should closely represents each theoretical counterparts as the number of simulation increases.

Simulations

Here we run 40 exponential distribution samples in 1000 simulations. We set the seed of the data so that the data can be reproduced. the lambda and simulations are set to be 0.2 and 1000 respectively and n is the number of samples in each simulation.

set.seed(2018)
lambda <- 0.2
n <- 40
simulations <- 1000

expSimulation <- replicate(simulations, rexp(n,lambda))
simulationMeans <- apply(expSimulation, 2, mean)

1. Show the sample mean and compare it to the theoretical mean of the distribution.

sampleMean <- mean(simulationMeans)
sampleMean
## [1] 5.020107
theoreticalMean <- 1/lambda
theoreticalMean
## [1] 5
hist(simulationMeans, main = "Theoretical vs sample mean", col = "azure", breaks = 20, xlab = "Means")
abline(v = sampleMean, lwd = 2, col = "red")
abline(v= theoreticalMean, lwd = 2, col = "purple")
text(6.5,110, paste("Sample mean = ", round(sampleMean, 2)), col = "red")
text(6.5, 95, paste("Theoretical mean = ", theoreticalMean), col = "purple")

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

sampleVar <-var(simulationMeans)
sampleVar
## [1] 0.6261326
theoreticalVar <- (1/lambda)^2/n
theoreticalVar
## [1] 0.625

3. Show that the distribution is approximately normal.

qqnorm(simulationMeans)
qqline(simulationMeans, col = "red")