In Part 1 of this Project, we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. Given that lambda = 0.2 for all of the simulations, we will investigate the distribution of averages of 40 exponentials over a thousand simulations.
# Setting pre-defined parameters
lambda <- 0.2
n <- 40
sims <- 1:1000
set.seed(123)
# Loading necessary R packages
if(!require(ggplot2)){install.packages('ggplot2')}; library(ggplot2)
## Loading required package: ggplot2
# Simulating the population
population <- data.frame(x=sapply(sims, function(x) {mean(rexp(n, lambda))}))
# Plotting the histogram
hist.pop <- ggplot(population, aes(x=x)) +
geom_histogram(aes(y=..count.., fill=..count..)) +
labs(title="Histogram of Averages of 40 Exponentials over 1000 Simulations", y="Frequency", x="Mean")
hist.pop
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Tabulating the sample mean and theoretical mean
sample.mean <- mean(population$x)
theoretical.mean <- 1/lambda
cbind(sample.mean, theoretical.mean)
## sample.mean theoretical.mean
## [1,] 5.011911 5
# Checking the 95% confidence interval for sample mean
t.test(population$x)[4]
## $conf.int
## [1] 4.963824 5.059998
## attr(,"conf.level")
## [1] 0.95
As shown above, the sample mean and theoretical mean are very close in value. For the 95% confidence interval, the sample mean is between 4.96 and 5.06.
# Tabulating the sample variance and theoretical variance
sample.variance <- var(population$x)
theoretical.variance <- ((1/lambda)^2)/n
cbind(sample.variance, theoretical.variance)
## sample.variance theoretical.variance
## [1,] 0.6004928 0.625
As shown above, the sample variance and theoretical variance are very close in value.
# Plotting sample mean & variance versus theoretical mean & variance
gg <- ggplot(population, aes(x=x)) +
geom_histogram(aes(y=..density.., fill=..density..)) +
labs(title="Histogram of Averages of 40 Exponentials over 1000 Simulations", y="Density", x="Mean") +
geom_density(colour="blue") +
geom_vline(xintercept=sample.mean, colour="blue", linetype="dashed") +
stat_function(fun=dnorm, args=list(mean=1/lambda, sd=sqrt(theoretical.variance)), color="red") +
geom_vline(xintercept=theoretical.mean, color="red", linetype="dashed")
gg
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
As shown above, the averages of 40 exponentials over 1000 simulations are very close in value to the theoretical mean of a normal distribution. This suggests that the distribution is approximately normal.