Overview

In this project we will investigate the exponential distribution and compare it with the Central Limit Theorem. We will investigate the distribution of averages of 40 exponentials.

Simulations

The mean of exponential distribution is \(\frac{1}{\lambda}\) and the standard deviation is also \(\frac{1}{\lambda}\).
So with \(\lambda = 0.2\) theoretical mean is 5 and the variance is \(\frac{1}{\lambda^2} = 25\). Let’s check it. (See code in Appendix 1)

The mean (red line) of this distribution is 4.996, variance is 24.96. Very close to the theoretical.

Sample Mean versus Theoretical Mean

Now get the distribution of 1 000 000 averages of 40 exponentials. With accordance to Law of Large Numbers (LLN) we expect that distribution are centered around \(\frac{1}{\lambda}\) and mean of sample averages will be close to 5. (See code in Appendix 2)

The mean of this distribution is 5.0002. Very close to the theoretical.

Sample Variance versus Theoretical Variance

We know that the sample variance is the estimator of the population variance. So the distribution of 1 000 000 variances of 40 exponentials must be centered around \(\frac{1}{\lambda^2} = 25\). Let’s check it. (See code in Appendix 3)

The mean of this distribution is 25.004 - very close to the theoretical.

Distribution of sample averages

The Central Limit Theorem (CLT) states that the distribution of averages of iid variables becomes that of a standard normal as the sample size increases. So we expect that distribution of 1 000 000 averages of 40 exponentials will be normal with mean at \(\frac{1}{\lambda} = 5\) and variance = \(\frac{1}{\lambda^2\times n} = \frac{25}{40} = 0.625\). \[\bar X_n \sim N(\mu, \sigma^2 / n) = N(5, 0.625)\]
Let’s check it. (See code in Appendix 4)

The mean of this distribution is 5.0003, variance is 0.6254. Very close to the theoretical.
The blue line is the obtained distribution, the green line is the normal distribution with mean 5 and variance 0.625 - N(5, 0.625). The obtained distribution have a small skew to the left from normal.

Appendix 1

library(ggplot2)

lambda <- 0.2
nosim <- 10^6

data <- data.frame(x = rexp(nosim, lambda))
g <-ggplot(data, aes(x = x)) +
    geom_histogram(alpha = .4, binwidth = 1, colour = "black",
                   aes(y = ..density..)) +
    scale_x_continuous(breaks = seq(from=0, to=100, by=5),
                       name = 'random exp') +
    stat_function(fun = dexp, args = c(0.2), size = 1, colour = "blue") +
    geom_vline(aes(xintercept = mean(x)), size = 1,
               colour = "red", linetype = "longdash") +
    ggtitle('Distribution of 1 000 000 random exponentials with lambda=0.2')
print(g)

Appendix 2

sample_size <- 40
data <- data.frame(x = c(apply(matrix(rexp(nosim*sample_size, lambda), nosim),
                               1, mean)))
g <-ggplot(data, aes(x = x)) +
    geom_histogram(alpha = .4, binwidth = .1, colour = "black",
                   aes(y = ..density..)) +
    scale_x_continuous(breaks = seq(from=0, to=100, by=.5),
                       name = 'average of 40 exponentials') +
    stat_density(size = 1, adjust = 1, geom = "line", colour = "blue") +
    geom_vline(aes(xintercept = mean(x)), size = 1,
               colour = "red", linetype = "longdash") +
    ggtitle('Distribution of 1 000 000 averages of 40 exponentials with lambda=0.2')
print(g)

Appendix 3

data <- data.frame(x = c(apply(matrix(rexp(nosim*sample_size, lambda), nosim),
                               1, var)))
g <-ggplot(data, aes(x = x)) +
    geom_histogram(alpha = .4, binwidth = 1, colour = "black",
                   aes(y = ..density..)) +
    scale_x_continuous(breaks = seq(from=0, to=100, by=5), limits = c(0, 100),
                       name = 'variance of 40 exponentials') +
    stat_density(size = 1, adjust = 1, geom = "line", colour = "blue") +
    geom_vline(aes(xintercept = mean(x)), size = 1,
               colour = "red", linetype = "longdash") +
    ggtitle('Distribution of 1 000 000 variances of 40 exponentials with lambda=0.2')
print(g)

Appendix 4

data <- data.frame(x = c(apply(matrix(rexp(nosim*sample_size, lambda), nosim),
                               1, mean)))
g <-ggplot(data, aes(x = x)) +
    geom_histogram(alpha = .4, binwidth = .1, colour = "black",
                   aes(y = ..density..)) +
    scale_x_continuous(breaks = seq(from=0, to=100, by=.5),
                       name = 'average of 40 exponentials') +
    stat_density(size = 1, adjust = 1, geom = "line", colour = "blue") +
    stat_function(fun = dnorm, args = c(mean=5, sd=sqrt(25/40)),
                  size = 1, colour = "green") +
    geom_vline(aes(xintercept = mean(x)), size = 1,
               colour = "red", linetype = "longdash") +
    ggtitle('Distribution of 1 000 000 averages of 40 exponentials with lambda=0.2')
print(g)