Overview

In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT). The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. The variable lambda will be set to 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. Note that I will produce a thousand simulations.The aim of the project it´s to demonstrate the CLT applied with exponential distribution.

Simulations

The first step we need to do it’s to load the libraries that we are going to use in our analysis

# Load libraries
library(knitr)
library(ggplot2)

Now we need to declare and define the variables and parameters needed

lambda <- 0.2 # Set the value of lambda 
n <- 40 # Number of measures needed for the averages
nOfSimul <- 1000 # Number of simulations 
set.seed(10) # Declare set the seed for reproducibility

Now it´s time to simulate the exponential distribution with n as average:

distExponential <- matrix(data = rexp(n * nOfSimul, lambda), nrow = nOfSimul)
meanDistExponential <- data.frame(means = apply(distExponential, 1, mean))

Once we have simulated the data it´s time to plot it

Sample Mean versus Theoretical Mean

We will calculate the theoretical mean mu of an exponential distribution of rate lambda

mu <- 1/lambda
print(mu)
## [1] 5

Now it´s time to calculated xBar, this is the average sample of 1000 simulations of the exponential data calculated in the previous steps

xBar <- mean(meanDistExponential$means)
print(xBar)
## [1] 5.04506
qqnorm(meanDistExponential$means)
qqline(meanDistExponential$means)

With this calculations and graphic I can say that the theoretical mean and the average sample mean are very close as expected.

Sample Variance versus Theoretical Variance

It´s time now to calculate the theoretical standard deviation (sigma):

stdDev <- 1/lambda/sqrt(n)
print(stdDev)
## [1] 0.7905694

Now we get the variance

var <- stdDev^2
print(var)
## [1] 0.625

At this point I just need to calculate the standard deviation and variance of 1000 simulations of the exponential data calculated in the previous steps

stdDevX <- sd(meanDistExponential$means)
print(stdDevX)
## [1] 0.80881
varX <- var(meanDistExponential$means)
print(varX)
## [1] 0.6541736

Again, as expected standard deviations of theoretical and simulated data are also close to each other.

Distribution

Now lets show and compare the calculated population means and standard deviation (blue dashed lines) with a normal distribution (red lines) of the theoretical values. For better understanding I’ve add lines that display the calculated and expected means:

As a conclusion for this analysis with the above graphic and calculations made, we can observate that the calculated distribution of means of the random sampled exponential distribution almost overlaps with the theorical normal distribution. With this the Central Limit Theorem (CLT) have been demonstrated whit an exponential distribution.