The Central Limit Theorem and Exponential Distributions

Synopsis

The following project report aims to study the characteristics of exponential distributions, by using the Central Limit Theorem (CLT). In a simulation exercises the report compares the sample mean and variance of exponential distributions, with its theoretical mean and variance.The simulation exercise shows that the sample mean and variance of exponential distribution is centered around the theoretical mean and variance. It can therefore be concluded that CLT is applicable for exponential distributions. All associated R codes are provided in the appendix of the report.

Simulation Exercise

The central limit theorem is one of the most important insights in statistics. For the analysis below the report adopts the definition provided in class. Namely: CET states that distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases (Brian Caffo, Coursera). Starting from this general definition this report will apply the concept to exponential distribution. Exponential distributions are characterized by a mean and standard deviation of \(\frac{1}{\lambda}\). However, before looking at the sample mean distribution. Let us first consider how non-normal the distribution of 1000 exponential looks.

The Figure above depicts the distribution of an exponential function with the given parameters. The mean of the distribution is 5.1063445 and the standard deviation 4.8898732. Even if the mean and sd of the distribution confirm the theoretical expectation, the overlaid normal distribution centered at the theoretical mean of 5 clearly shows that this is not a normal distribution.

The report will now proceed by:

Generating 1000 random exponential samples with a sample size of 40 and a \(\lambda\) level of 0.2.
Taking the mean and the variance of this samples, which are random variables themselves.
Comparing the characteristics this two random variables to the theoretical mean of \(\frac{1}{\lambda}\) and theoretical variance \(\frac{1}{\lambda^{2}}\). i.e. 5 and 25 respectively for the given \(\lambda\) level.

Sample Mean of Exponeital Distribution

The visualization above reveals that the expectations from CET about the mean are met. CET stipulates that the sample mean is \[ \begin{aligned} X &\sim \mathcal{N} \left(\mu,\frac{\sigma}{\sqrt{n}}\right)\\ X &\sim \mathcal{N} \left(\frac{1}{\lambda},\,\frac{\frac{1}{\lambda}}{\sqrt{n}}\right)\\ X &\sim \mathcal{N} \left(\frac{1}{0.2},\frac{\frac{1}{0.2}}{\sqrt{40}}\right)\\ X &\sim \mathcal{N} \left(5,0.79\right)\\ \end{aligned} \]

The distribution of the sample mean is fairly normal as shown by the overlaid Gaussian distribution of mean 5 and centered around the theoretical mean, with a mean of 4.9473092 and standard deviation 0.7999399. However, as the working definition adopted in the report states that, the average of the iid variables should be normalized to resemble a standard normal distribution, a normalized sample mean is depicted below.

Normalization was achieved by subtracting the theoretical mean from the sample mean and dividing the result by standard error of the mean. According to the CLT the normalized distribution would be that of standard normal with a mean of 0 and standard deviation of 1.

With a mean of -0.1633315 and a standard deviation of 1.0456585, it shows that the standard normal distribution is a good approximation for the distribution of normalized exponential sample means.

Sample Variance of Exponential Distributions

Analysis of sample variance will follow a very similar path, with the only major caveat being that the variance of the samples would be taken instead of the mean. The variance distribution is then depicted against a theoretical mean of 25 \(\left({\sigma}^2 = \left(\frac{1}{\lambda}\right)^2 = \frac{1}{0.2^2}\right)\) and overlaid with normal distribution with a mean of 25 and standard deviation of the sample variance.

Like the sample means, the sample variance is centered around its theoretical variance with a mean of 24.9327736. However, unlike the mean the distribution is not symmetric with many more outliers on the right side of the distribution. This is because the variance of a distribution is strictly positive but unbounded. The small sample size of 40 also limits the ability of the simulation to guarantee a sufficiently “good” bell curve.

Appendix

The R code for the figures is presented below in the order they apper in the report.

Figure. 1

X <- rexp(1000, 0.2)
library(ggplot2)
ggplot(data = data.frame(X), aes(x = X))+
        geom_histogram(aes(y = ..density..), color = "white", fill = "green", binwidth = .5)+
        geom_vline(aes(xintercept= 5,color= "Theoretical_Mean"),size= 2,show.legend = F)+
        stat_function(fun = dnorm, aes(color = "Normal_Distribution"), 
                      args = list(mean = 5, sd = sd(X)), size = 2)+
        scale_colour_manual(name=NULL,
                values=c(Theoretical_Mean="red", Normal_Distribution ="blue"))+
        labs(title = "Distribution of Exponential Function"~
                     (lambda ~ "= 0.2, n = 1000"))+
        xlab(label = "Sample Mean")+
        theme(plot.title = element_text(size=10), axis.title=element_text(size=8),
              legend.position = c(.885, .935))

Figure. 2

MED <- NULL # The vector MED is a vector of 1000 sample means.
# Each sample is an exponential distribution with size 40 and lambda 0.2.
for (i in 1 : 1000){
        MED <- c(MED, mean(rexp(40, 0.2))) 
}
library(ggplot2); ggplot(data = data.frame(MED), aes(x = MED))+
        geom_histogram(aes(y = ..density..), color = "white", fill = "green", binwidth = .1)+
        geom_vline(aes(xintercept= 5,color= "Theoretical_Mean"),size= 2,show.legend = F)+
        stat_function(fun = dnorm, aes(color = "Normal_Distribution"), 
                      args = list(mean = 5, sd = sd(MED)), size = 2)+
        scale_colour_manual(name=NULL,
                values=c(Theoretical_Mean="red", Normal_Distribution ="blue"))+
        labs(title = "Sample Mean of 1000 Exponentially Distributed Samples"~
                     (lambda ~ "= 0.2, n = 40"))+
        xlab(label = "Sample Mean")+
        theme(plot.title = element_text(size=10), axis.title=element_text(size=8),
              legend.position = c(.885, .935))

Figure. 3

NMED <- NULL
for (i in 1 : 1000){
        sample <- rexp(n = 40, rate = 0.2)
        NMED[i] <- (sqrt(40)*(mean(sample)-5))/sd(sample) 
}
ggplot(data = data.frame(NMED), aes(x = NMED))+
        geom_histogram(aes(y = ..density..), color = "white", fill = "green", binwidth = .1)+
        geom_vline(aes(xintercept= 0,color= "Theoretical_Mean"),size= 2, show.legend = F)+
        stat_function(fun = dnorm, aes(color = "Standard_Normal_Distribution"), 
                      args = list(mean = 0, sd = 1), size = 2)+
        scale_colour_manual(name=NULL,
                values=c(Theoretical_Mean="red", Standard_Normal_Distribution ="blue"))+
        labs(title = "Normalized Sample Mean of 1000 Exponentially Distributed Samples"
             ~(lambda ~ "= 0.2, n = 40"))+
        xlab(label = "Normalized Sample Mean")+
        theme(plot.title = element_text(size=10), axis.title=element_text(size=8),
              legend.position = c(.175, .935))

Figure. 4

VED <- NULL
for (i in 1 : 1000){
        sample <- rexp(n = 40, rate = 0.2)
        VED[i] <- var(sample)
}
library(ggplot2); ggplot(data = data.frame(VED), aes(x = VED))+
        geom_histogram(aes(y = ..density..), color = "white", fill = "green", binwidth = 1)+
        geom_vline(aes(xintercept= 25, color= "Theoretical_Variance"),size= 2,show.legend= F)+
        stat_function(fun = dnorm, aes(color = "Normal_Distribution"), 
                      args = list(mean = 25, sd = sd(VED)), size = 2)+
        scale_colour_manual(name=NULL,
                values=c(Theoretical_Variance="red", Normal_Distribution ="blue"))+
        labs(title = "Sample Variance of 1000 Exponentially Distributed Samples"
             ~(lambda ~ "= 0.2, n = 40"))+
        theme(plot.title = element_text(size=10), axis.title=element_text(size=8),
              legend.position = c(.885, .935))