Overview

This project investigates the exponential distribution in R and compare it with the Central Limit Theorem. Properties of the distribution of the mean of n exponentials are illustrated via simulation.

Simulations

We will simulate exponential distribution with rexp(n, lambda), where 1/lambda equals the mean and also the standard deviation.

First, we set the required parameter, lambda.

#lambda is set to 0.2
lambda <- 0.2

Let’s fix the sample size and number of simulations as well.

#Sample size is set to 40
n <- 40
#Doing 1000 simulations
nsims <- 1000

To see the distribution of the sample mean, let’s do n exponentials, take the mean, and repeat it for nsims times.

mns = NULL

# Simulate data
for (i in 1 : nsims) mns = c(mns, mean(rexp(n, lambda)))

We know that (theoretically) the sample means are centered at the population mean

1/lambda
## [1] 5

, with standard deviation of

1/lambda * 1/sqrt(n)
## [1] 0.7905694

Let’s verify that it’s approximately centered at 1/lambda = 5 and has variance of (close to) 1/lambda * 1/sqrt(n) = 0.79

mean(mns)
## [1] 5.018998
sd(mns)
## [1] 0.8096596

Next, we convert the data to standard normal and overlay a Gaussian plot over the histogram.

#Convert to standard normal
mns <- sqrt(n) * (mns - 1/lambda) * lambda 

#Plotting
library(ggplot2)
g <- ggplot(data.frame(X = mns), aes(x = X))
g <- g + geom_histogram(alpha = .20, binwidth=.3, colour = "black", aes(y = ..density..))
g <- g + stat_function(fun = dnorm, size = 2)
g

As evident from the chart, the distribution of sample means is approximately Gaussian, as predicted by the Central Limit Theorem.

Appendix

Sample Mean versus Theoretical Mean

We can begin by simulating 1000 exponentials.

pexp <- rexp(1000, lambda)
hist(pexp, main = paste("Histogram of", 1000, "exponentials"))

The mean of the above sample is

mean(pexp)
## [1] 4.961201

The sample mean tries to estimate the population mean. With 1000 observations, it should be quite close to the theoretical mean which is given by

1/lambda
## [1] 5

According to the Law of Large Numbers, the estimated mean will more likely be closer to the theoretical mean as the sample size increases. Let’s test this with 10000 simulations.

library(ggplot2)
ntest <- 10000
means <- cumsum(rexp(ntest, lambda))/(1:ntest)
g <- ggplot(data.frame(x = 1:ntest, y = means), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + labs(x = "Sample size", y = "Cumulative mean")
g

The sample mean converges to the theoretical mean with increasing sample size as predicted by the LLN.

Sample Variance versus Theoretical Variance

Since the exponential distribution has theoretical standard deviation that is equal to the theoretical mean, the variance is

1/lambda * 1/lambda
## [1] 25

Recall our first simulation above stored in pexp. This has variance of

var(pexp)
## [1] 25.77046

Again, with 1000 observations, we know from the Law of Large Numbers that it should be quite close to the theoretical variance. A plot of the cumulative variance as sample size increases will illustrate this clearly.

ntest <- 10000
sim <- rexp(ntest, lambda)
vars <- NULL
for(i in 1:ntest) vars <- c(vars,var(sim[1:i]))

g <- ggplot(data.frame(x = 2:ntest, y = vars[2:ntest]), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + labs(x = "Sample size", y = "Cumulative variance")
g

Again, we see that the sample variance converges to the theoretical variance of 25 as sample size increases. This agrees with the Law of Large Numbers.