Overview

The purpose of this data analysis is to investigate the exponential distribution and compare it to the Central Limit Theorem.

To achieve our goal, we will follow the following steps: 1. We will show the sample mean and compare it to the theoretical mean of the distribution. 2. We will show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3. We will show that the distribution is approximately normal.

For this analysis, the lambda will be set to 0.2 for all of the simulations. This investigation will compare the distribution of averages of 40 exponentials over 1000 simulations.

Simulations

First, we will set the simulation variables lambda, exponentials, and seed.

ECHO=TRUE
set.seed(25)
lambda = 0.2
exponentials = 40

Run Simulations with variables

simMeans = NULL
for (i in 1 : 1000) simMeans = c(simMeans, mean(rexp(exponentials, lambda)))

Sample Mean vs Theoretical Mean

Sample Mean

Calculating the mean from the simulations with give the sample mean.

mean(simMeans)
## [1] 4.998511

Means Histogram

hist(simMeans, breaks=40, xlim = c(2,9), main="Exponential Function Simulation Means Distribution", col = "blue")

Theoretical Mean

The theoretical mean of an exponential distribution is: lambda^-1.

lambda^-1
## [1] 5
theoretical.mean<-lambda^-1

Comparison (sample mean - theoretical mean).

We can see that the simulations sample mean and the exponential distribution theoretical mean are very similar.The difference between them is:

abs(mean(simMeans)-lambda^-1)
## [1] 0.001488939

In percentage (%):

abs((((mean(simMeans)-(lambda^-1))/(mean(simMeans)))*100))
## [1] 0.02978765

Sample Variance versus Theoretical Variance

Sample Variance

Calculating the variance from the simulation means with give the sample variance.

var(simMeans)
## [1] 0.6113794

Theoretical Variance

The theoretical variance of an exponential distribution is: (lambda * sqrt(n))^-2.

(lambda * sqrt(exponentials))^-2
## [1] 0.625

Comparison

We can see that the simulations sample variance and the exponential distribution theoretical variance are also very similar.The difference between variances is only of:

abs(var(simMeans)-(lambda * sqrt(exponentials))^-2)
## [1] 0.01362062

In percentage (%):

abs((((var(simMeans)-(lambda * sqrt(exponentials))^-2))/(var(simMeans)))*100)
## [1] 2.227851

Distribution

This is a density histogram of the 1000 simulations.The blue line represents the best fit curve of the sample means, while the red line represents the normal distribution curve. As we can see, the distribution of means of our sampled exponential distributions is very close to a normal distribution, due to the Central Limit Theorem.If we increased our number of samples the distribution would be even closer to the standard normal distribution.

library(ggplot2)
dist<-ggplot(data.frame(y=simMeans), aes(x=y)) + 
  geom_histogram(aes(y=..density..), binwidth=0.2, fill="#0072B2", 
                 color="black") +
  stat_function(fun = dnorm, color = "red", linetype = "dashed", size = 1, 
                  args = list(mean = lambda^-1, sd=(lambda*sqrt(exponentials))^-1), 
                size=2) +
  geom_vline(xintercept=mean(simMeans), colour="blue", linetype="dashed") +   
geom_vline(xintercept=theoretical.mean, colour="red", linetype="dashed") +
  geom_density(alpha=.2, color = "blue") +
  
  labs(title="Simulations vs normal distribution", x="Simulation Mean")
## Warning: Duplicated aesthetics after name standardisation: size
dist