1 Background: Exponential Distribution

According to Wikipedia, the exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. Its probability density function is given by:

\[ P(X = x; \lambda) = \lambda e^{-\lambda x} \] for x >= 0, and 0 otherwise. Here \(\lambda\) > 0 is the parameter of the distribution, often called the rate parameter. Both the mean and standard deviation of this distribution is \(\frac {1}{\lambda}\).

Here is how the Exponential Distribution looks like, as simulated in R through the rexp function.

2 Motivation

The purpose of this treatise is to investigate the exponential distribution in R and compare it with the Central Limit Theorem. In other words, we will show that when we draw samples from a population of the exponential distribution, the sample means are approximately normally distributed.

For example, if we simulate 1000 points of the Exponential Distribution, its histogram looks like the following:

qplot(rexp(1000,rate=0.2), main = "Exponential Distribution with n=1000, rate=0.2")

However, when we pick 40 samples from the distribution and calculate the mean, and then repeat the step 1000 times, the distribution of the means looks completely different:

means <- NULL
for(i in 1:1000) means <- c(means, mean(rexp(40, rate=0.2)))
qplot(means, main = "Distribution of means of 40 samples of Exponential Distribution")

This in fact looks very similar to the normal distribution. This is in fact as per the Central Limit Theorem that says that the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases.

The rest of the report is organized as follows:

3 Simulation Experiment

We conduct a simulation experiment in which we generate n numbers from the Exponential Distribution, take their average, and then repeat this process nbsim number of times.

We conduct this experiment for nbsim=1000, and with varying values of n:1, 5, 10, 20, 40. When n=1, we in fact generate the Exponential distribution. When n=40, we are generating the distribution of 1000 means of 40 exponential numbers each.

nbsim <- 1000; n <- c(1,5,10,20,40); v <- NULL
for(i in 1:length(n)) {
  v <- c(v, apply(matrix(rexp(n[i]*nbsim, rate=0.2), nbsim), 1, mean))
}
dat <- data.frame(x=v, sample_size = factor(rep(n, c(nbsim, nbsim,nbsim,nbsim,nbsim))))


ggplot(dat, aes(x = x, colour = sample_size)) + geom_density() + xlim(0,20)

The plot shows that as we take the average of more and more numbers from the exponential distribution, the distribution of their means becomes less and less exponential and more and more normal. In fact, the distribution of means of 40 exponentials is almost a normal distribution.

4 Sample Mean versus Theoretical Mean

Let us compare the sample mean with the theoretical mean of the distribution. In section 2, we stored the 1000 means of 40 numbers drawn from an exponential distribution in the vector means. Let us examine this distribution of means.

summary(means)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.474   4.424   5.014   5.001   5.524   7.984

The mean of these means is therefore pretty close to the population mean of our Exponential Distribution (1/rate = 1/0.2 = 5).

5 Sample Variance versus Theoretical Variance

The variance of the exponential distribution (population) is 1/rate = 5.

According to theory, the sample variance for the means of 40 is given by

\[ Var(\bar X) = \frac{\sigma^2}{n} =\frac{5*5}{40} = 0.625 \]

Let us see the actual variance of the distribution of our means:

var(means)
## [1] 0.6271094

This is pretty close to the theoretical value of 0.625.

6 Examining the Sample Distribution

In this section, we will show that the distribution of means is approximately normal.

According to the Central Limit Theorem,

\[ \frac{\mbox{Estimate} - \mbox{Mean of estimate}}{\mbox{Std. Err. of estimate}} = \frac{\bar X_n - \mu}{\sigma / \sqrt{n}} \] has a distribution like that of a standard normal for large \(n\).

To test this, we take each of the averages of 40 exponential samples, subtract off the population mean of 5, divide by the standard error = sqrt(population variance/n) = sqrt(5*5/40) = sqrt(0.625); and then repeat this step for the 1000 simulations. Let us see how this distribution looks like compared to the normal distribution.

dd <- (means-5)/sqrt(0.625)
qplot(dd, geom="density") + stat_function(fun=dnorm, color="red")

It is clear that our simulation corroborates the Central Limit Theorem, since the distribution of the means properly normalized (with population mean subtracted and divided by standard error) is very close to the normal distribution that is drawn in red.