Exponential Distribution and Central Limit Theorem

Simulations

We first define the variables to be used in the simulation, in this case the number of exponentials, the number of simulations, and the lambda value.

n <- 40
nosim <- 1000
lambda <- 0.2

We then simulate the mean of 40 exponentials, repeat 1000 times, and collect the distribution in a data frame. We are now ready to do our analysis.

xmean <- replicate(nosim, mean(rexp(n, lambda)))
size <- factor(rep(c(n), rep(nosim,1)))
dat <- data.frame(xmean,size)

Sample Mean versus Theoretical Mean

As pointed out earlier, the theoretical mean is 1/lambda. We will then compare it to the mean of the distribution that we created from the simulation.

tmean <- 1/lambda
print(tmean)

## [1] 5

smean <- mean(dat$xmean)
print(smean)

## [1] 5.003922

The two means are comparable and almost the same. In the supporting figures at the end of this analysis, the theoretical mean is shown as vertical black line and the sample mean is shown as vertical red line, which are very close. This is an example of CLT which states that distribution of averages of iid variables become close to standard normal with large sample size, and in this case, it also applies to exponential distribution.

Sample Variance versus Theoretical Variance

As pointed out earlier, the standard deviation is also 1/lambda. We can convert it into variance and compare it to the sample variance.

tvar <- ((1/lambda)^2)/n
print(tvar)

## [1] 0.625

svar <- var(dat$xmean)
print(svar)

## [1] 0.6006893

The two variances are comparable and almost the same. This again reaffirms that CLT holds for exponential distribution with large sample size. In the supporting figures at the end of this analysis, the normal curve almost overlays with the density plot of the distribution, which qualitatively show the similarity of the variances.

Distribution

The histogram of distribution of the means from our simulation is shown below. The black curve represents the theoretical normal distribution, whereas the red dashed curve is the density plot of the simulated distribution. Since they are very close, we can conclude that our distribution is approximately normal.

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.1.3

ggplot(dat, aes(x = xmean, fill = size)) + 
  geom_histogram(alpha = .50, binwidth=.2, colour = "black", aes(y = ..density..)) + 
  stat_function(fun = dnorm, arg = list(mean = tmean, sd = sqrt(tvar))) +
  geom_density(inherit.aes=F, aes(rnorm(n = nosim, mean = smean, sd = sqrt(svar))), linetype='dashed', colour = "red") +
  geom_vline(aes(xintercept=tmean)) +
  geom_vline(aes(xintercept=smean, colour = "red"))

Just to show CLT at work, we will plot the histogram of 40 random exponentials, which clearly is nowhere close to normal distribution.

hist(rexp(40,0.2))

Now compare it to the histogram of 1000 averages of 40 random exponentials, which becomes a lot closer to normal distribution.

mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(rexp(40,0.2)))
hist(mns)

Exponential Distribution and Central Limit Theorem

Marowen Ng

Thursday, June 18, 2015

Overview

Simulations

We first define the variables to be used in the simulation, in this case the number of exponentials, the number of simulations, and the lambda value.

We then simulate the mean of 40 exponentials, repeat 1000 times, and collect the distribution in a data frame. We are now ready to do our analysis.

Sample Mean versus Theoretical Mean

As pointed out earlier, the theoretical mean is 1/lambda. We will then compare it to the mean of the distribution that we created from the simulation.

Sample Variance versus Theoretical Variance

As pointed out earlier, the standard deviation is also 1/lambda. We can convert it into variance and compare it to the sample variance.

Distribution

Just to show CLT at work, we will plot the histogram of 40 random exponentials, which clearly is nowhere close to normal distribution.

Now compare it to the histogram of 1000 averages of 40 random exponentials, which becomes a lot closer to normal distribution.

Conclusion

Central Limit Theorem states that with larger sample size, the distribution of averages become closer to normal distribution. Through our simulation, we had demonstrated that this is also the case for exponential distribution.