Overview:

Compare the exponential distribution in R to the central limit theorem.the standard deviation is also \(\frac{1}{\lambda}\) and the mean of the Exponential distribution is \(\frac{1}{\lambda}\). This project will focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

Simulations:

Simulate the exponential distribution in R with rexp(n, lambda), where lambda is the rate parameter and n is the number of observations.lambda = 0.2.
First we load the ggplot2 plotting library.

library(ggplot2)

Initialize the simulation controlling variables.

noSim <- 1000
sampSize <- 40
lambda <- 0.2

Set seed of Random Number Generator, so analysis is reproducible.

set.seed(3)

Create a matrix with thousand rows to 1000 simulations and 40 columns to each of 40 random simulations.

simulationMatrix <- matrix(rexp(n = noSim * sampSize, rate = lambda), noSim, sampSize)

Create vector of 1000 rows containing the mean of each row of the simulationMatrix.

simulationMean <- rowMeans(simulationMatrix)

Create data frame containing the whole data.

simulationData <- data.frame(cbind(simulationMatrix, simulationMean))

Plot the simulation data.

ggplot(data = simulationData, aes(simulationData$simulationMean)) + 
  geom_histogram(breaks = seq(2, 9, by = 0.2), col = "blue", aes(fill = ..count..)) + 
  labs(title = "Histogram of Mean Distribution", x = "Simulation Means", y = "Frequency") + 
  geom_vline(aes(xintercept=mean(simulationData$simulationMean)), color="red", 
             linetype="dashed", size=1)

Sample Mean Versus Theoretical Mean:

The actual mean of the simulated mean sample data is 4.9866197, calculated by:

actualMean <- mean(simulationMean)

Theoretical mean = 5:

theoreticalMean <- (1 / lambda)

Simulated mean sample data is similar to the theoretical mean of original data distribution.

Sample Variance Versus Theoretical Variance:

Actual variance of the simulated mean sample data is 0.6257575, calculated by:

actualVariance <- var(simulationMean)

Theoretical variance = 0.625:

theoreticalVariance <- ((1 / lambda) ^ 2) / sampSize

hence the actual variance of the simulated mean sample data is similar to the theoretical variance of original data distribution.

Distribution:

To tell that the distribution is approximately normal, create approximate normal distribution and study how the sample data alligns with it. Then compare 05% confidence intervals of the simulated mean sample data and the theoretical normally distributed data and plot for the qunatiles.

Step 1: Create an approximate normal distribution and see how the sample data alligns with it.

qplot(simulationMean, geom = 'blank') + 
  geom_line(aes(y=..density.., colour='Empirical'), stat='density', size=1) + 
  stat_function(fun=dnorm, args=list(mean=(1/lambda), sd=((1/lambda)/sqrt(sampSize))), 
                aes(colour='Normal'), size=1) + 
  geom_histogram(aes(y=..density.., fill=..density..), alpha=0.4, 
                 breaks = seq(2, 9, by = 0.2), col='red') + 
  scale_fill_gradient("Density", low = "yellow", high = "red") + 
  scale_color_manual(name='Density', values=c('brown', 'blue')) + 
  theme(legend.position = c(0.85, 0.60)) + 
  labs(title = "Mean Density Distribution", x = "Simulation Means", y = "Density")

The simulated mean sample data can be accurately approximated with the normal distribution from the above histogram.

Step 2: Compare the 95% confidence intervals of the simulated mean sample data and the theoretical normally distributed data.

actualConfInterval <- actualMean+c(-1,1)*1.96*sqrt(actualVariance)/sqrt(sampSize)
theoreticalConfInterval <- theoreticalMean+c(-1,1)*1.96*
  sqrt(theoreticalVariance)/sqrt(sampSize)

Actual 95% confidence interval is [4.7414712, 5.2317681] and Theoretical 95% confidence interval is [4.755, 5.245] and we can conclude that both are approximately same.

Step 3: q-q Plot for Qunatiles.

qqnorm(simulationMean)
qqline(simulationMean)

The actual quantiles also almost identical to the theoretical quantiles, therefore we can tell that the distribution is approximately normal.