Overview

In this project the exponential distribution (ed) and its relationship with the Central Limit Theorem (CLT) will be investigated. The ed will be simulated in R through the rexp(n, \(\lambda\)) function, where n is the number of samples and \(\lambda\) (0.2 in this work, corresponds to the time between events in a Poisson process, i.e. a process in which events take place continuously and independently at a constant average rate) is the parameter.

The mean and the standard deviation is \(1/\lambda\). One of the most prominent features of this distribution is its memorylessness: if an event has not occurred after, say, 20 seconds, the conditional probability that the event will take place in at least 10 more seconds is equal to the unconditional probability of observing the event more than 10 seconds relative to the initial time.

Let us plot the distribution and the histogram of 40 exponentials.

lambda<-0.2
exponential<-rexp(40,lambda)

par(mfrow=c(1,2))
plot(exponential,type="s", xlab="(a)", main="Exponential plot")
hist(exponential, xlab="(b)")

Fig. 1. The exponential distribution: (a) Fourty values of the exponential distribution with \(\lambda\)=0.2 - (b) A histogram of these 40 values.

The CLT theorem states that, regardless of the underlying distribution, the arithmetic mean of a sufficiently large number of independent random variables with mean \(\mu\) and standard deviation \(\sigma\) will be approximately normally distributed. Let us calculate the distribution of a thousand averages of fourty exponentials

A thousand mean of fourty exponentials

Let us compute and plot a sample (n=40) of the exponential distribution a thousand times.

mns = NULL
for (i in 1 : 1000) {
        mns = c(mns, mean(rexp(40,0.2)))
}

par(mfrow=c(1,3))
plot(mns, type="l", xlab="(a)")
hist(mns, main="", xlab="(b)")
boxplot(mns, xlab="(c)")

par(mfrow=c(1,1))

Fig. 2. The exponential distribution: (a) A thousand means of samples (size 40) of the exponential distribution - (b) A histogram of the means - (c) A boxplot of the means.

media <- round(mean(mns), 2)
varianza <- round(var(mns), 2)
var<-1/40/lambda^2

How does the sample mean compare to the theoretical mean of the distribution of averages

The mean of the thousand means is 5.01, which, as expected, is very close to the theoretical mean of the distribution: 1/\(\lambda\) = 1/0.2 = 5

How does the sample variance compare to the theoretical variance of the distribution of averages

The theoretical value of the variance of the distribution of a thousand averages is the variance of the original population divided by the size of the samples (i.e. 40):

\(var(\bar X) = \frac {\sigma^2}{n} = \frac {1}{40\lambda^2} = 0.625\)

Whereas the real \(\sigma^2\) of the thousand samples is 0.63

How does the distribution compare to the normal

Let us plot the histogram of the distribution along with a plot of the normal \(N(\mu, \frac {\sigma} {\sqrt{n}})\) with \(\mu = \sigma = \frac {1}{\lambda}\). We can also check that the distribution is nearly normal with the qqnorm plot, as it fits pretty well a straight line.

library(ggplot2)
mu    <- 1/lambda
sigma <- 1/lambda
mns_df<-data.frame(x=mns)

ggplot(mns_df, aes(x = x)) + 
geom_histogram(alpha = .10, binwidth=0.1, colour = "black", aes(y = ..density..)) +
stat_function(geom = "line", fun = dnorm, arg = list(mean = mu, sd = sigma/sqrt(40)), size = 2, colour = "blue", fill = NA)

qqnorm(mns)
qqline(mns)