The purpose of this data analysis is to investigate the exponential distribution and compare it to the Central Limit Theorem.
To achieve our goal, we will follow the following steps: 1. We will show the sample mean and compare it to the theoretical mean of the distribution. 2. We will show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3. We will show that the distribution is approximately normal.
For this analysis, the lambda will be set to 0.2 for all of the simulations. This investigation will compare the distribution of averages of 40 exponentials over 1000 simulations.
First, we will set the simulation variables lambda, exponentials, and seed.
ECHO=TRUE
set.seed(25)
lambda = 0.2
exponentials = 40
Run Simulations with variables
simMeans = NULL
for (i in 1 : 1000) simMeans = c(simMeans, mean(rexp(exponentials, lambda)))
Calculating the mean from the simulations with give the sample mean.
mean(simMeans)
## [1] 4.998511
hist(simMeans, breaks=40, xlim = c(2,9), main="Exponential Function Simulation Means Distribution", col = "blue")
The theoretical mean of an exponential distribution is: lambda^-1.
lambda^-1
## [1] 5
theoretical.mean<-lambda^-1
We can see that the simulations sample mean and the exponential distribution theoretical mean are very similar.The difference between them is:
abs(mean(simMeans)-lambda^-1)
## [1] 0.001488939
In percentage (%):
abs((((mean(simMeans)-(lambda^-1))/(mean(simMeans)))*100))
## [1] 0.02978765
Calculating the variance from the simulation means with give the sample variance.
var(simMeans)
## [1] 0.6113794
The theoretical variance of an exponential distribution is: (lambda * sqrt(n))^-2.
(lambda * sqrt(exponentials))^-2
## [1] 0.625
We can see that the simulations sample variance and the exponential distribution theoretical variance are also very similar.The difference between variances is only of:
abs(var(simMeans)-(lambda * sqrt(exponentials))^-2)
## [1] 0.01362062
In percentage (%):
abs((((var(simMeans)-(lambda * sqrt(exponentials))^-2))/(var(simMeans)))*100)
## [1] 2.227851
This is a density histogram of the 1000 simulations.The blue line represents the best fit curve of the sample means, while the red line represents the normal distribution curve. As we can see, the distribution of means of our sampled exponential distributions is very close to a normal distribution, due to the Central Limit Theorem.If we increased our number of samples the distribution would be even closer to the standard normal distribution.
library(ggplot2)
dist<-ggplot(data.frame(y=simMeans), aes(x=y)) +
geom_histogram(aes(y=..density..), binwidth=0.2, fill="#0072B2",
color="black") +
stat_function(fun = dnorm, color = "red", linetype = "dashed", size = 1,
args = list(mean = lambda^-1, sd=(lambda*sqrt(exponentials))^-1),
size=2) +
geom_vline(xintercept=mean(simMeans), colour="blue", linetype="dashed") +
geom_vline(xintercept=theoretical.mean, colour="red", linetype="dashed") +
geom_density(alpha=.2, color = "blue") +
labs(title="Simulations vs normal distribution", x="Simulation Mean")
## Warning: Duplicated aesthetics after name standardisation: size
dist