Overview

Part I: Simulation Exercise
In this part of the project, I investigate the exponential distribution in R and compare it with the Central Limit Theorem

Part I

## load libraries and set constants 
library(RColorBrewer)
lam = 0.2
nsim = 1000
reps = 40
## Sample vs theoretical distribution Mean
samDist = rexp(reps, lam)
samMean = mean(samDist)
sim = matrix(sample(samDist, nsim*reps, replace = TRUE), nrow = nsim, ncol = reps)
simMeans = apply(sim, 1, mean)
theorMean = mean(simMeans)
cols = brewer.pal(n = 11, name = "RdBu")
par(mfrow=c(1,1))
hist(simMeans, col=cols, main="Theoretical Distibution of Means", xlab=" ")
abline(v=samMean,col="black", lwd=4)

  1. I generated a sample exponential distribution with lambda = 0.2, and found its mean to be 5.4730362. I then used bootstrapping to simulate a theoretical distribution by generating 1000 means of 40 exponentials (1000*40 observations total), whose average turned out to be 5.4776322. The histogram above shows the theoretical distribution of the means, and the black line indicates the sample mean.
## Sample vs theoretical distribution variability
samVar = var(samDist)
theorVar = mean(apply(sim, 1, var))
hist(apply(sim, 1, var), col=cols, main="Thoretical Distibution of Variance", xlab=" ")
abline(v=samVar,col="black", lwd=4)

  1. Next, I found the sample variance and theoretical distribution of variance in order to compare variability. (sample variance = 22.9756467; theoretical variance = 22.5124696). The histogram above shows the theoretical distribution of the variances, and the black line indicates the sample variance.
## Show dist is approx normal
par(mfrow=c(1,2))
hist(simMeans, col=cols, main="Theoretical population mean \n distribution", xlab=" ", xlim=range(simMeans))
hist(rexp(nsim, lam), col=cols, main="Distribution of 1000 random \n exponential values", xlab=" ")

  1. Finally, I compared the distribution of the simulated means against the distribution of 1000 random exponentials with lambda = 0.2, and the histograms above clearly show that the distribution of the simulated means is far more Gaussian (bell-shaped curve) than the random variables from the exponential distribution.