Overview

In this report, we will show a simulation of the Central Limit Theorem applied to an exponential distribution. This theorem states that the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases. We will generate samples and show the behaviour descrived by the theorem.

Simulations

First of all, and as a good practice in statistical reports with random generations, we start by setting the random seed.

set.seed(1234)

After that, we will set the parameters as required for this report. We will calculate 1000 averages of groups of n = 40 exponential distributed random numbers. As stated in the instructions for this project, we will consider a lambda = 0.2. These variables will be used along this report.

repetitions <- 1000
n <- 40
lambda <- 0.2

Before studying the behaviour of the Central Limit Theorem, we will take a look at an example of the Exponential distribution with an histogram of 1000 samples taken from a Exponential distributed random variable with lambda = 0.2. (Figure available in the Appendix)

We can see how values tend to decrease exponentially from the initial values around 0.

Now that we have seen how the exponential distribution behaves it is time to proceed with the study of the Central Limit Theorem. In this point, we will generate 1000 means of groups of 40 numbers extracted from a Exponential Distribution with lambda = 0.2.

means <- NULL
for (i in 1 : repetitions) means <- c(means, mean(rexp(n, lambda)))

As we can see in the figure, the means seem to follow a normal distribution. Theoretically, if the original distribution had a population mean of \(\mu\) and a variance of \(\sigma^2\), then the random variable of the averages of groups of \(n\) members would be \(\bar{X}_n \sim N(\mu, \frac{\sigma^2}{n})\). Since the expected value of an Exponential Distributed random variable of lambda \(\lambda\), \(X \sim Exp(\lambda)\), is \(E[X] = \frac{1}{\lambda}\) and the variance is \(Var(X) = \sigma^2 = \frac{1}{\lambda^2}\), the resulting distribution should be \(N(\frac{1}{\lambda}, \frac{1}{\lambda^2 \cdot n})\). We have stored these values in some (useful) variables.

org_mean <- 1/lambda
org_sd <- 1/lambda

sim_mean <- org_mean
sim_sd <- org_sd / sqrt(n)

Sample Mean versus Theoretical Mean

In this section we will compare the sample mean, that mean obtained from the average of the generated means; and the theoretical mean, that value calculated using the definition of the Exponential Distribution and the Central Limit Theorem.

sample_mean <- mean(means)
theore_mean <- sim_mean

The sample mean is 4.974 and the theoretical mean is 5. As we can see in the plot, both variables are very close one to the other. The histogram is centered around these values.

Sample Variance versus Theoretical Variance

In this section we will compare the sample variance, that variance obtained from the average of the generated means; and the theoretical variance, that value calculated using the definition of the Exponential Distribution and the Central Limit Theorem.

sample_var <- var(means)
theore_var <- sim_sd^2

The sample mean 0.571 and the theoretical mean 0.625 are quite close. According to the Law of Large Numbers, these two values will become closer and closer as larger the sample size becomes.

Distribution

Using these values and normalizing the averages obtained in the previous step, we can show that “the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases” as stated in the Central Limit Theorem.

hist(means, prob=TRUE, ylim=c(0, 0.5), main = "Histogram of normalized means")
curve(dnorm(x, mean=sim_mean, sd=sim_sd), col="darkblue", lwd=3, add=TRUE, yaxt="n")

Appendix

Figure showing the example of a Exponential Distribution.

hist(rexp(repetitions, lambda), 
     main = paste("Exponential distribution ( lambda =", lambda, ")"),
     xlab = "Exponential samples", 20)