For this part of the project we’re going to investigate the exponential distribution in R and compare it with the Central Limit Theorem. The Central Limit Theorem specifies that the means drawn from multiple samples will resemble the bell-shaped normal curve.
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Using lambda = 0.2 for all of the simulations, we’re goint to investigate the distribution of averages of 40 exponentials, running one thousand (1000) simulations.
library(ggplot2)
set.seed(999)
num_sims <- 1000
n <- 40
lambda <- 0.2
To run the simulations, a Matrix of num_sims by n (1000x40) was created and populated with the R function rexp.
ed_matrix <- matrix(rexp(n*num_sims, lambda), num_sims, n)
Now, the mean of every simulation was calculated using rowMeans. Note that the result was stored as data frame, this is to allow the use of ggplot2 for our plots.
ed_mean <- as.data.frame(rowMeans(ed_matrix))
names(ed_mean) <- 'SampleMeans'
The theoretical Mean, as explained before, is 1/lambda:
theoretical_mean <- 1/lambda
theoretical_mean
## [1] 5
While the Sample Mean is the Mean of our simulations:
sample_mean <- mean(ed_mean$SampleMeans)
sample_mean
## [1] 5.029028
Plotting the distribution of samples, along with the values of the Means, we can see the bell-shaped form we expected, as well as how close the computation of the means (Theoretical and Sample) are.
plot1 <- ggplot(ed_mean, aes(SampleMeans)) +
geom_histogram(breaks=seq(2, 8, by=0.25), fill="gold") +
geom_vline(aes(xintercept = theoretical_mean, color="Theoretical")) +
geom_vline(aes(xintercept = sample_mean, color="Sample"), linetype=5) +
scale_color_manual(name = "Mean", values = c(Theoretical = "black", Sample = "red")) +
labs(title="Sample Means Distribution")
print(plot1)
The Theoretical Variance for the exponential distribution is:
theoretical_var <- (1/lambda)^2/n
theoretical_var
## [1] 0.625
The Sample variance is:
sample_var <- var(ed_mean$SampleMeans)
sample_var
## [1] 0.5930372
Confidence Intervals can be calculated with the formula \[\ \bar{X} \pm 1.96\sigma/\sqrt{n} \]
Confidence interval for both theoretical and sample values are calculated below:
theoretical_ci <- theoretical_mean + c(-1,1)*1.96*sqrt(theoretical_var)/sqrt(n)
theoretical_ci
## [1] 4.755 5.245
sample_ci <- sample_mean + c(-1,1)*1.96*sqrt(sample_var)/sqrt(n)
sample_ci
## [1] 4.790375 5.267681
A QQ-Plot is very useful to show how a sample distribution is close to a normal distribution.
plot2 <- ggplot(ed_mean, aes(sample=SampleMeans)) +
geom_point(stat="qq", shape=21, size=3) +
geom_abline(intercept=sample_mean, slope=sqrt(sample_var), color="red") +
labs(title="Normal QQ Plot")
print(plot2)
The mean, variance and confidence intervals obtained from our simulation showed values very close to the theoretical values. Likewise, the graphs are consistent with what is expected from a normal distribution, confirming the Central Limit Theorem.