The purpose of this experiment is to investigate the exponential distribution in R and compare it with the Central Limit Theorem. You can simulate the exponential distribution in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. I will set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. A thousand simulations will be performed.
The properties of the distribution of the mean of 40 exponentials will be described and illustrated via graphical plots.
The three specific things I will illustrate are:
1) The sample mean and how it compares to the theoretical mean of the distribution.
2) The variability of the sample (via variance) compared to the theoretical variance of the distribution.
3) That the distribution is approximately normal.
To perform the simulation I will use the replicate wrapper to evaluate the rexp() function. Once I have the performed the simulations I will use apply to calculate the means.
Once we have the means of the sample exponents we can calculate the mean, standard deviation and variance so it can be compared to the theoretical equivalent.
The sample mean is: 4.996344
The theoretical mean is: 5
The sample standard deviation is: 0.7968283
The theoretical standard deviation is: 0.7905694
The sample variance is: 0.6349353
The theoretical variance is: 0.625
All of our calculations, calculated via sampling or theoretically, are nearly identical. The variance is negligible.
Below are two plots, each sharing the same histogram created using the sample data. On the left I have overlayed the sample data’s distribution in blue with the mean in magenta. The plot on the right has the theoretical equivalent overlayed in red and green, respectively. As you can see the distribution is normal.
To more easily illustrate the similarity of the sample and theoretical results I have also included a third plot below them with the sample and theoretical overlayed on each other without the histogram in the background.
They are so close it is hard to tell them apart, and I had to make one of the means a dash just so it could be seen.
Below you will find the code used to perform this experiment.
library(ggplot2)
library(grid)
library(gridExtra)
# set seed
set.seed(827)
# 40 exponentials
n <- 40
# a thousand simulations
sims <- 1000
# use 0.2 for lambda
lambda <- 0.2
# execute simulations
expo_sims <-replicate(sims, rexp(n, lambda))
# calculate mean of exponentials
expo_means <- apply(expo_sims, 2, mean)
# calculate sample and theoretical means and variances
sample_mean <- mean(expo_means)
theoretical_mean <- 1/lambda
sample_sd<-sd(expo_means)
theoretical_sd<-((1/lambda) * (1/sqrt(n)))
sample_var<-var(expo_means)
theoretical_var<-theoretical_sd^2
# use ggplot2 and grid.arrange to produce plots visualizing results
data <- data.frame(expo_means)
figure <- ggplot(data, aes(x=expo_means))
figure_hist <- figure + geom_histogram(binwidth = lambda, fill="orange", color="black", aes(y = ..density..))
sample_figure <- figure_hist + stat_function(fun=dnorm, size=1, args=list(mean=sample_mean, sd=sample_sd), color="blue") + labs(title="Simulated", x="Mean of Exponentials", y="Density") + geom_vline(xintercept=sample_mean, size=1.0, color="magenta")
theoretical_figure <- figure_hist + stat_function(fun=dnorm, size=1, args=list(mean=theoretical_mean, sd=theoretical_sd), color = "red") + labs(title="Theoretical", x="Mean of Exponentials", y="Density") + geom_vline(xintercept=theoretical_mean, size=1.0, color="green")
nohist_figure <- figure + stat_function(fun=dnorm, size=1, args=list(mean=sample_mean, sd=sample_sd), color="blue") + labs(title="Simulated", x="Mean of Exponentials", y="Density") + geom_vline(xintercept=sample_mean, size=1.0, color="magenta") + stat_function(fun=dnorm, size=1, args=list(mean=theoretical_mean, sd=theoretical_sd), color = "red") + labs(title="Sample Vs. Theoretical", x="Mean", y="Density") + geom_vline(xintercept=theoretical_mean, size=1.0, color="green", linetype="dashed")
grid.arrange(sample_figure, theoretical_figure, ncol=2)
print(nohist_figure)