Analyzing the Exponential Distribution in R

Overview

This analysis will investigate the exponential distribution using R. The means and variances of the simulations will be compared with their theoretical counterparts using the Central Limit Theorem.

Simulating the Exponential Distribution

In order to simulate the exponential distribution, I will use R’s “rexp(num, rate)” function, which returns num iid draws from an exponential distribution with the given rate. 40,000 (40 * 1,000) draws from the exponential distribution will be needed for the analysis below, and the first thing I will do is plot a histogram of these 40,000 draws, which will yield a graphic representation of the simulated exponential distribution with rate = .2.

set.seed(999) #setting seed for purposes of reproducibility
simulations <- 1000; n <- 40; rate <- .2

exponentialDraws <- rexp(simulations * n, rate)
hist(exponentialDraws, xlim = c(0, 1/rate * 6), breaks = 40, col = "red", 
     xlab = "Draw values", main = "")

As can be seen from the graph above, plotting a histogram of 40,000 random draws from the exponential distribution with rate = .2 yields a seemingly close approximation to the true distribution, which shows exponential decay.

Sample Mean Versus Theoretical Mean

Let X be a continuous random variable that is the value of a random draw from the exponential distribution with rate = .2. The theoretical mean, or the population mean (\(\mu\)), of the exponential distribution is 1 / rate, which in our case is 1 / .2 = 5. Assuming our sample mean (\(\overline{X}_{40}\)) is unbiased, its distribution should be centered at the population mean, \(\mu\). Thus, the distribution of \(\overline{X}_{40}\) should have a mean of about 5. Since each mean is compromised of the average of 40 random draws from the distribution, and not infinitely many, we shouldn’t expect its distribution to be centered at exactly 5. Further, according to the Central Limit Theorem, the distribution of \(\overline{X}_{40}\) should be \(\sim\) N(\(\mu\), \(\frac{\sigma}{\sqrt{n}}\)) = N(5, 0.791).

In order to confirm this, I will take 1,000 averages of 40 random draws from the exponential distribution, using the 40,000 random draws generated above. A density plot of these 1,000 averages should show an approximately normal distribution centered at 5.

matrix <- matrix(exponentialDraws, nrow = simulations) #each row is one of our simulations
means <- apply(matrix, 1, mean) #takes mean of each row
dens <- density(means)

plot(dens, xlab = "Sample means (n = 40)", main = "")
polygon(dens, col="red", border="black")

As can be seen in the density plot above, the distribution of the sample mean \(\overline{X}_{40}\) is approximately normally distributed and is centered around 5: 5.03. According to the Central Limit Theorem, the distribution of \(\overline{X}_{40}\) should have an approximate standard deviation of \(\frac{\sigma}{\sqrt{n}}\) = \(\frac{5}{\sqrt{40}}\) = 0.79. The actual standard deviation of the distribution of \(\overline{X}_{40}\): 0.77.

I will further calculate a 99% confidence interval for \(\mu\). I will use one of our 1,000 simulations for our sample mean, which will be randomly selected. I will use the standard deviation of this randomly selected sample as an estimate for \(\sigma\).

index <- sample(1:1000, 1) #row index of a randomly selected simulation
round(mean(matrix[index,]) + c(-1,1) * qnorm(.995) * sd(matrix[index,]) * sqrt(1/n), 2)

## [1] 2.43 6.69

Sample Variance Versus Theoretical Variance

The theoretical variance, or the population variance \(\sigma^{2}\), of the exponential distribution is \((1/rate)^{2}\) = \((1/.2)^{2}\) = 25. The average sample variance should be close to this value. Instead of taking the means of each set of 40 draws as we did when comparing the theoretical mean versus the sample mean, we will instead take the variance of each of the 40 draws.

variances <- apply(matrix, 1, var)
hist(variances, breaks = 30, 
     xlim = c(mean(variances) - 2 * sd(variances), mean(variances) + 2 * sd(variances)), 
     col = "red", xlab = "Sample variances (n = 40)", main = "")

The histogram above shows the distribution of sample variances. As can be seen, the distribution of sample variances is centered around 25. The average sample variance: 25.54.

Distribution

The density plot included above in the “Sample Mean Versus Theoretical Mean” section showed that the distribution of \(\overline{X}_{40}\) is approximately normal. However, I will attempt to make this even more clear. The Central Limit Theorem says that the distribution of (\(\overline{X}_{40}\) - \(\mu\)) / \(\frac{\sigma}{\sqrt{n}}\) should be that of a standard normal, given our n (40) is large enough. I will make this conversion (normalize the distribution) then plot it against an actual standard normal (approximate).