In this project you I’ll investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter equal 0.2. I will investigate the distribution of averages of 40 exponentials and there will be a thousand simulations.
n <- 1000
samples <- 40
lambda <- 0.2
I will generate the simulation
dataRaw <- matrix(data = rexp(n = n * samples, rate = lambda), nrow = n, ncol = samples)
dataRaw <- as.data.frame(dataRaw)
I will generate the distribution of the averages of 40 exponentials
dataMean <- apply(X = dataRaw, MARGIN = 1, FUN = mean)
theoreticalMean <- 1 / lambda
theoreticalMean
## [1] 5
sampleMean <- mean(dataMean)
sampleMean
## [1] 4.993059
The sample mean is close to the theoretical mean
theoreticalVariance <- (1 / lambda)^2/samples
theoreticalVariance
## [1] 0.625
sampleVariance <- var(dataMean)
sampleVariance
## [1] 0.6166396
The sample variance is close to the theoretical variance
theoreticalSD <- 1/lambda/sqrt(samples)
theoreticalSD
## [1] 0.7905694
standardDeviation <- sd(dataMean)
standardDeviation
## [1] 0.785264
The sample standard deviation is close to the theoretical standard deviation
ggplot(data = as.data.frame(dataMean), aes(dataMean)) +
geom_histogram(aes(y=..density..), fill="white", col="lightblue", alpha=.3, bins = 30) +
geom_density(col="lightblue3", lwd=2) +
geom_vline(xintercept = sampleMean, col="lightblue2", linetype="dashed", lwd=2) +
geom_vline(xintercept = theoreticalMean, col="red") +
labs(x="Mean", y = "Density") +
theme_light()
As you can see above the distribution of averages of independent and identically distributed (IID) variables becomes that of a standard normal.