This exercise uses simulation to prove the Central Limit Theorem for exponentials. To do this, we are randomly generating 1,000 means and variances for a given lambda and comparing these values to what is expected theoretically. Spoiler: We prove it is true!
We want to run 1,000 simulations of 40 exponentials. The lambda is 0.2. We develop values for the means of these simulations using a ‘for’ loop and a vector named mns. I was also curious about the standard deviations of each simulation. Both the mean and the sd should have a theoretical value of 1/lambda, or 5. The standard error (SE) of the distribution of means is a function of n.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
sim <- 1000
n <- 40
lambda <- 0.2
set.seed(500)
mns = NULL
for (i in 1:sim)
mns = c(mns, mean(rexp(n, lambda)))
sds = NULL
for (i in 1:sim)
sds = c(sds, sd(rexp(n, lambda)))
t_mean <- 1/lambda
t_variance <- (1/lambda^2)/n
For the seed I have chosen, I get a sample mean of 5.01 (compared to theoretical mean of 5) and a sample variance of 0.620 (compared to theoretical variance of 0.625) - pretty close!
mean(mns)
## [1] 5.010562
t_mean
## [1] 5
var(mns)
## [1] 0.6201215
t_variance
## [1] 0.625
We have already shown above that the means are approximately equal. Below, we provide a chart of the same values to show this visually. The theoretical mean is yellow, and the sample mean is black.
m.table <- data.frame(mns)
m <- ggplot(m.table, aes(x=mns))
m <- m + geom_histogram(bins = 50, fill='#A4A4A4', color="darkred")
m <- m + xlab("Sample Means") + ylab("Count of Means") + labs(title = "Validating the CLT with Exponential Distributions")
m <- m + geom_vline(xintercept = mean(mns), color="black") + geom_vline(xintercept = t_mean, color="yellow")
m
We have already shown above that the variances are approximately equal. Below, we provide a chart of the same values to show this visually. The theoretical variance is yellow, and the sample variance is black.
NOTE: I couldn’t figure out how to create a separate normal distribution line. Any advice would be much appreciated! (You can see that mine is a density function here.)
v.table <- data.frame(mns)
v <- ggplot(v.table, aes(x=mns))
v <- v + geom_histogram(bins = 50, fill='#A4A4A4', color="darkred")
v <- v + xlab("Sample Means") + ylab("Count of Means") + labs(title = "Validating the CLT with Exponential Distributions")
v <- v + stat_function(fun = dnorm, args = list(mean = t_mean, sd = t_variance))
v
To prove normality, I am doing two checks - the first is to convert the sample set to its Z scores, but subtracting the mean and dividing by the standard deviation. This should give a mean of 0 with a standard deviation of 1, which is what we find. Second, I created a QQ Plot, for which linearity suggests normality. This also proves true.
normvar = (mns-(1/lambda))/sqrt((1/(lambda^2)/n))
q <- ggplot(m.table, aes(sample=mns))
q <- q + geom_point(stat = "qq") + labs(title = "Q-Q Plot Suggesting Normality")
mean(normvar)
## [1] 0.01335958
sd(normvar)
## [1] 0.9960895
q