Exponential Distribution vs Central Limit Theorem

Overview

We are going to demonstrate how an exponential distribution can be approximated using a normal distribution. To do so, we are going to compare a random sample of data and its empirical characteristics, with the theoretic values calculated using the Central Limit Theorem.

Sample of data

We parameterize the generation of the sample of data and establish a seed that will allow the experiment to be reproduced. Likewise, we are going to calculate the mean of 40 numbers generated by the function rexp of R, and we are going to repeat this calculation 10,000 times to obtain a sample of data that is large enough to allow us to find the desired results.

Initial values :
- seed = 13
- lambda = 0.2
- sample = 40
- Nº observations = 10.000

set.seed(13)
lambda <- 0.2
sample <- 40
nobs <- 10000
data <- NULL
for(i in 1:nobs)
    data <- c(data, mean(rexp(sample, lambda)))

We compare the means. Theoretical vs Empirical.

We calculate the theoretical mean, using the Central Limit Theorem.

1 / lambda

## [1] 5

We calculate the empirical mean of the total from the sample.

mean(data)

## [1] 4.993761

The two values are almost the same. The distribution of the sample behaves very similarly to a normal distribution, with regard to the mean.

We compare the variance. Theoretical vs Empirical.

We calculate the theoretical variance, using the Central Limit Theorem.

(lambda * sqrt(sample)) ^ -2

## [1] 0.625

We calculate the empirical variance of the total of the sample.

var(data)

## [1] 0.6180696

As we might have expected, the two values are again nearly the same. They behave similarly to a normal distribution, as well as in terms of their variance.

Graphic demonstration

We are going to represent the distribution of the sample in a histogram of density (gray), and we are also going to draw the function generated by the theoretical values (red) and the function generated by the empirical values (green).

g <- ggplot(data.frame(variable = data), aes(x = variable))
g <- g + geom_histogram(aes(y = ..density..), binwidth = 0.2, fill = 'gray', color = 'black')
g <- g + stat_function(fun = dnorm, args = list(mean = lambda^-1, sd=(lambda*sqrt(sample))^-1), size=1, col = "red")
g <- g + stat_function(fun = dnorm, args = list(mean = mean(data), sd=sd(data)), size=1, col = "green")
g <- g + labs(title = "Exponential Distribution vs Normal", x = "Means", y = "Density")
g

We can observe that the two functions are practically superimposed. Therefore, it is clear that the exponential distribution of the sample of 10,000 means of 40 numbers each acts in accordance with the Central Limit Theorem - hence we can approximate it with a great deal of accuracy with a normal distribution. If we increase the dimensions of the sample, which is not easy to do at present, the resulting distributions will be identical.