Compare Exponential Distribution with Central Limit Theorem

Overview

In this project we investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(\(n\), \(\lambda\)) where \(\lambda\) is the rate parameter. The mean of exponential distribution is \(1/\lambda\) and the standard deviation \(\sigma\) is also \(1/\lambda\). Set \(\lambda\) = 0.2 for all of the simulations. We investigate the distribution of averages of 40 exponentials.

Simulations

First, we simulate 1000 random sampling of 40 exponentials with \(\lambda\) = 0.2.

## Set up parameters for expoentials
lambda <- 0.2
nosim <-1000
n <- 40
## Set seed for reproducibility
set.seed(12345)
## Create samplings
samp <- data.frame(x = rexp(nosim*40,lambda))

Next, we compute calculate sample mean \(\bar{x}\) generated from each sampling of n=40

samp_means <- data.frame(x = apply(matrix(samp$x,nosim),1,mean))

We will perform analyses on the simulated samples to show:
1. the sample mean and compare it to the theoretical mean of the distribution.
2. how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
3. that the distribution is approximately normal.

Analysis

Sample Mean

Considering the distribution of the mean of 40 exponentials. Because the mean is an unbiased estimator and is consistent, the sample mean of this distribution converges to the population mean of the initial exponential distribution. Thereby the theoretical mean of this distribution is \(\mu\) = 1/\(\lambda\) = 5.
Calculate the simulated sample mean

sm <- mean(samp_means$x)
sm

## [1] 4.971972

Which is very close to the theoritical mean of exponential distribution \(\mu\) = 5
Next, we plot the density of the sample means.
From the figure above, it’s easy to see that the average sample mean of 40 exponential random variables with rate \(\lambda\) = 0.2 is the redline which is very close to the theoritical mean of the exponential distribution \(\mu\) = 5.

Sample Variance

The theoritical value for the varaince of the distribution of sample means of size n is:
\(var(\bar{X})\) = \(\frac{\sigma^2}{n}\) = \(\frac{1}{40\lambda^2}\) = \(0.625\)
We calculate the simulated sample variance of the sample mean:

var(samp_means$x)

## [1] 0.6157926

The variance of sample mean is close to but not equal the theoritical variance.
We can repeat the simulation 100 times and graph the distribution of sample mean and sample variance each time.
The mean of sample means is 4.9999331 and the mean of sample variances is 0.6246563 which are very close to the theoritical mean (5) and variance (0.625).

Distribution

The Central Limit Theorem (CLT) states that The Central Limit Theorem (CLT) states that the distribution of means of independent and identically distributed(iid) variables becomes that of a standard normal as the sample size increasese.
Based on the result from Analysis Step 1 - perform the following hypothethis testing \(H_0\): the distribution of averages of sample mean is normal with mean \(\mu\) = 1/\(\lambda\) and standard deviation \(\sigma\) = 1/\(\lambda\)\(\sqrt{40}\)
The test stastic \(Z\) and p-value are:

lambda <- 0.2
n <-40
mu <- 1/lambda
sigma <- 1/lambda

Z <- abs((sm - mu)/(sigma/sqrt(n)))
p <- 2*pnorm(Z, lower.tail = FALSE)
c(Z,p)

## [1] 0.03545297 0.97171855

The high p-value suggests that we fail to reject the null hypothesis. Hence the distribution of average of sample mean is standard normal.

Appendix - R code for graphs

R code for Figure 1

library(ggplot2)
ggplot()+
        geom_histogram(
                mapping = aes(x = samp_means$x, y = ..density..),
                fill = "blue",
                color ="blue",
                alpha = 0.2,
                bins = 50)+
        geom_vline(xintercept = sm, size = 1.5, color = "red",show.legend=TRUE)+
        labs(
                title = "Fig. 1: Distribution of Sample Mean  ",
                x = "Value of Sample Mean (n=40)",
                y = "Frequency"
        )+
        theme(legend.position="none",plot.title = element_text(hjust = 0.5))

R code for Figure 2

library(gridExtra)
set.seed(12345)
nosim  <- 1000
lambda <- 0.2  # rate parameter
data  <- data.frame( x = rexp(nosim * 40, lambda))

sample.mean = NULL
sample.var  = NULL
for (i in 1 : 100) {
        x = apply(matrix(rexp(nosim * 40, lambda),nosim),1,mean)
        sample.mean = c(sample.mean, mean(x))
        sample.var  = c(sample.var,  var(x))
}
dat <- data.frame(sample.mean,sample.var)
plot1 <- ggplot(dat, aes(sample.mean)) + geom_density() +
        geom_vline(xintercept=mean(sample.mean), size = 1, color = 'red') 
plot2 <- ggplot(dat, aes(sample.var)) + geom_density() +
        geom_vline(xintercept=mean(sample.var), size = 1, color = 'red')
grid.arrange(plot1,plot2,nrow = 2, top = "Fig.2 Distribution of Sample Mean and Sample Variance")