Simulation Exercise: The Exponential Distribution

16 April 2018

This project investigates the exponential distribution in R in relation to the Central Limit Theorem. Using 1000 simulations, we investigate the properties of the distribution of the mean of 40 exponentials.

Simulations and the Central Limit Theorem

The exponential distribution describes the time between events, where the events occur continuously and independently at a constant average rate.¹ In R, the exponential distribution can be simulated with rexp(n, lambda) where lambda is the rate parameter and n is the number of observations desired. The theoretical mean of the exponential distribution is 1/lambda, the standard deviation is also 1/lambda, and the variance is therefore (1/lambda)². In our simulations we used lambda = 0.2.

The central limit theorem states that the means of random samples of size n drawn from any distribution with mean m and variance s² will have an approximately normal distribution with a mean equal to m and a variance equal to s²/n. This is true regardless of whether or not the sampled distribution is itself normally distributed.²

1000 random exponentials

The following R code generates 1000 random values from the exponential distribution using the rexp function with lambda = 0.2. We then compute the mean, standard deviation, and variance of the sample and plot a histogram with the mean of the distribution shown by a vertical blue line (Figure 1).

nExp <- 1000
expData <- rexp(nExp, 0.2)
expMean <- mean(expData)
expSD <- sd(expData)
expVar <- var(expData)
hist(expData, main = NULL, xlab = paste0(nExp, " Random Exponentials"))
abline(v = expMean, lwd = 5, col = "blue")

Figure 1. Distribution of 1000 random values from the exponential distribution, lambda = 0.2.

The mean of this sample, 4.73, is indicated by the blue vertical line in Figure 1. The theoretical mean, 1/lambda, is 5. The variance of the sample is 22.28, and the theoretical variance is (1/lambda)², or 25. Thus there is general agreement between the theoretical and calculated values of the mean and variance for the simulated sample from the exponential distribution.

It is clear from the histogram of Figure 1 that the sample of 1000 exponentials is highly skewed. However, we can also use the Kolmogorov-Smirnov (KS) function in R³ to explicitly test our simulated distribution for normality. The following R code computes a KS test by comparing the simulated sample from the exponential distribution to a normal distribution having the same mean and standard deviation:

expKS <- ks.test(expData, "pnorm", mean = expMean, sd = expSD)

The null hypothesis of the KS test is that the two samples come from the same distribution. If one of the samples is from a normal distribution, the KS test amounts to a test of whether the other sample is also from a normal distribution.⁴

The p-value for the KS test for the sample of 1000 exponentials is 0. We thus reject the null hypothesis that the two samples come from the same distribution, which is tantamount to saying that the exponential sample is not normally distributed. This is to be expected because the exponential distribution is highly skewed by definition.

Distribution of the means of 40 exponentials

We next simulate drawing 1000 samples of 40 values from the exponential distribution, computing the mean for each sample. The following R code computes the simulated data, plots a histogram, and indicates the mean with a vertical blue line in Figure 2 (this is a “mean of means”).

mnsData = NULL
nMns <- 40
for (i in 1:1000) mnsData = c(mnsData, mean(rexp(nMns, 0.2)))
mnsMean <- mean(mnsData)
mnsSD <- sd(mnsData)
mnsVar <- var(mnsData)
hist(mnsData, main = NULL, xlab = "Mean of 40 random exponentials")
abline(v = mnsMean, lwd = 5, col = "blue")

Figure 2. Distribution of 1000 means of 40 random exponentials, lambda = 0.2.

The mean of the 1000 simulated means of this sample is 5.01, indicated by the blue vertical line in Figure 2. This agrees closely with the theoretical mean of an exponential distribution with lambda = 0.2, namely 1/lambda, or 5. From the central limit theorem, the theoretical variance of our simulated means is s²/n, or (1/lambda)²/40. The theoretical variance is therefore 0.62, which compares to the simulated variance of 0.67. This again agrees closely with predictions of the central limit theorem.

From Figure 2 we see that the distribution of 1000 means from the exponential distribution looks much more Gaussian than the original exponential distribution from which the means were sampled. In addition, the KS test for the distribution of 1000 means is given by:

mnsKS <- ks.test(mnsData, "pnorm", mean = mnsMean, sd = mnsSD)

The p-value for the KS test of normality for the sample of 1000 means is 0.802. We thus do not reject the null hypothesis that the two samples come from the same distribution, meaning that the distribution of 1000 means follows that of a normal distribution.

We thus have confirmed the expectations of the central limit theorem.

See Exponential distribution.↩
See Explaining the Central Limit Theorem.↩
See Kolmogorov-Smirnov Tests.↩
See Kolmogorov-Smirnov Goodness-of-Fit Test.↩

Simulation Exercise: The Exponential Distribution

Richard A. Lent

Simulations and the Central Limit Theorem

1000 random exponentials

Distribution of the means of 40 exponentials