Introduction

The Central Limit Theorem (CLT) tells us, that if \(X_i\), \(i = 1 \dots n\) are independent identically distributed (iid) random variables with mean \(\mu\) and standard deviation \(\sigma\), then the distribution of their average will converge point wise to the Normal Distribution with mean \(\mu\) and standard deviation \(\frac{\sigma}{\sqrt{n}}\).

Formally written: let \(X^{*} = \frac{\sum_{i=1}^{n} X_i}{n}\). Then

\[ P\left( X^{*} < x \right) \rightarrow F(x), \quad \mathrm{if} \quad n \rightarrow \infty \]

where \(F(x)\) is the distribution function of the normal distribution \(N(\mu, \frac{\sigma^2}{n})\).

The CLT is true independent of the distribution of the original iid \(X_i\), \(i = 1 \dots n\). The aim of this document is to show this theorem by investigating the distribution of exponential distribution..

Simulation

The exponential distribution can be simulated in R with rexp(n, \(\lambda\)) where \(\lambda\) is the rate parameter. The mean of exponential distribution is \(\frac{1}{\lambda}\) and the standard deviation is also \(\frac{1}{\lambda}\). In our simulations \(\lambda = 0.2\) and we are going to investigate the distribution of averages of 40 exponentials with thousand simulations.

set.seed(1234)
sim <- 1000
n <- 40
lambda <- 0.2
exp <- matrix(rexp(sim*n, lambda), nrow=sim, ncol=n)
avg <- apply(exp, 1, mean)

The vector avg contains the 1000 average of 40 exponentials with rate \(\lambda = 0.2\). The mean of the averages should also be approximately \(\mu\), and its variance should approximate \(\frac{\sigma^2}{n}\), and if we would simulate infinitely many such averages, these values would converge exactly. In the below table these values are summarized.

Theoretical and empirical mean, standard deviation and variance of the Exponential Distribution and the Average
Statistic	Exponential Distribution	Theoretical Average	Simulated Average
Mean	5	5	4.9742388
Standard Deviation	5	0.7905694	0.7713431
Variance	25	0.625	0.5949702

Indeed, the simulation mean and standard deviation of the averages and the theoretical mean and standard deviation are very close to each other, respectively. Since, the standard deviation is the square root of the variance, the difference in the simulated and theoretical values are squared in the variance causing a bigger difference.

Density functions

library(ggplot2)

## Warning: a(z) 'ggplot2' csomag az R 4.5.3 verziójával lett fordítva

First we investigate the density function of the exponential distribution with rate 0.2. Figure 1 shows the histogram of 10.000 simulated values from this distribution (salmon color) together with the theoretical density function (black line). We see that the theoretical line and the histogram align together very well as expected.

The demonstrate the Central limit theorem, Fig 2 shows the histogram of the simulated averages (salmon color) together with the density function of the normal distribution with mean and standard derivation that of the theoretical average (black line). The green and blue colors are respectively the theoretical mean and the mean of the observed averages. The figure presents beautifully how the distribution of the averages converge to the normal distribution function, and their means are again very close to each other.

set.seed(123)
s = 10000
data <- data.frame(x = rexp(s, lambda))

g <- ggplot(data = data, mapping = aes(x=x))
g <- g + geom_histogram(binwidth=.7, colour = "black", fill = "salmon", 
                        aes(y = after_stat(density)))
g <- g + stat_function(fun = dexp, args = (lambda=lambda), linewidth=1)
g <- g + labs(title = "Figure 1: Exponential Distribution") + theme_bw()
g

g2 <- ggplot(data = as.data.frame(avg), mapping = aes(x=avg))
g2 <- g2 + geom_histogram(binwidth=.2, colour = "black", fill = "salmon",
                          aes(y = after_stat(density)))
g2 <- g2 + stat_function(fun = dnorm, args= list( mean=thmean, sd=avgsd), 
                         linewidth=1, aes(colour = "Normal Density"))
g2 <- g2 + 
  geom_vline( aes(xintercept = thmean, colour = "Theoretical Mean"), linewidth = 1) +
  geom_vline( aes(xintercept = avgmean, colour = "Simulated Mean") , linewidth = 1)
g2 <- g2 + scale_colour_manual(name = "Legend", 
                      values = c("Theoretical Mean" = "green3", 
                                 "Simulated Mean" = "blue", 
                                 "Normal Density" = "black"))
g2 <- g2 + labs(title = "Figure 2: Normal Distribution Function and the Histogram of the Averages", 
       x = "x") + theme_bw()

g2

Exponential Distribution and the Central Limit Theorem

rxvrg

Introduction

Simulation

Density functions