Statistical Analysis of the Exponential Distribution

By MAS, Feb 2019

Overview

This report will describe three statistical properties of the exponential distribution. Specifically, the difference between sample mean and theoretical mean, the difference between sample variance and theoretical variance, the nature of the distribution of sample means with respect to the distribution of random exponentials and the central limit theorem.

The exponential distribution follows the density: \(f(x)=\lambda\exp^{-\lambda x}\) with population mean, \(\mu=1/\lambda\) and population standard deviation \(\sigma=1/\lambda\) (in this report, \(\lambda = 0.2\)). For random samples taken from this population, the sample mean, \(\bar{X}\), is a random variable that follows a normal distribution with theoretical mean \(E[\bar{X}]=\mu\) and theoretical variance \(Var(\bar{X})=\sigma^2/n\).

Simulations

To explore the relationships above, 1000 different simulations (sample size, \(n=40\)) are performed using the following R code. First, we assign all of the variables. Next, 40 random exponentials are generated and their mean is calculated. After each simulation, the calculated mean is stored in the allmeans vector. We will monitor the convergence of the mean and variance using mean_montitor and var_monitor, respectively. The process is reapeated 1000 times.

n <- 40 ## Sample size 
lambda <- 0.2 ## Lambda rate parameter
mu <- 1.0/lambda ## Popoulation mean
sigma <- 1.0/lambda ## Population standard deviation
theoretical_mean <- mu ## Theoretical sample mean
theoretical_var <- (sigma^2)/n ## Theoretical sample variance
allmeans <- NULL ## Empty vector to store means
mean_monitor <- NULL ## Empty vector to monitor mean convergence
var_monitor <- NULL ## Empty monitor variances convergence
set.seed(1) ## Set seed to make data reproducible
## Generate samples
for(i in 1:1000){
        allmeans <- c(allmeans, mean(rexp(40, rate=lambda)))
        mean_monitor <- c(mean_monitor, mean(allmeans))
        var_monitor <- c(var_monitor, var(allmeans))
}

Sample Mean vs. Theoretical Mean

If we plot a histogram (R code in Appendix) of the 1000 sample means generated above, we see that it is centered approximately at 5 which is the theoretical mean (red line). We can also plot the convergence as a function of number of simulations.

We see excellent agreement between the sample mean and theoretical mean (R code in appendix):

## [1] "Sample Mean = 4.99002520077716  ;  Theoretical Mean = 5"

Sample Variance vs. Theoretical Variance

The sample variance also converges to the theoretical variance as the number of simulations increases (R code in Appendix).

We see excellent agreement between the sample variance (variability in mean of 40 exponentials) and theoretical variance (R code in Appendix):

## [1] "Sample Variance = 0.611116466559575  ;  Theoretical Variance = 0.625"

The Distribution of Sample Means and Distribution of Exponentials

Let’s compare the distribution of 1000 means from 40 random exponentials we generated above to the distribution of 1000 random exponentials (R code in Appendix).

We can see that the distirbution of sample means is bell-shaped, consistent with a normal distribution whereas the distribution of exponentials is not and has an exponential decay shape.

To show that the distribution of sample means is normal we can do two things: The first is to convert our histogram counts to probabilities then fit the observed distribution to a normal probability density given by \(f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp^{\frac{(x-\mu)^2}{2\sigma^2}}\) for which we plug in our sample mean and sample variance. Alternatively, we can use a quantile-quantile plot which will plot theoretical quantiles for a normal distribution vs. those observed for the distribution. This should therefore be linear if it is normal (R code in Appendix). Both methods show that the distribution is normal.

We can also show the distribution of sample means behaves as predicted by the central limit theorem by comparing the distributions of sample means that have been properly normalized according to \(\frac{\bar{X}-\mu}{\sigma\sqrt{n}}\) to that of a standard normal (R code in Appendix).

## [1] "Sample Mean =  -0.01  ;  Sample Standard Deviation =  0.99"

The mean is ~ 0, the standard deviation is ~ 1, and the distribution is well approximated by a standard normal fit. Therefore, the the distribution of sample means behaves as predicted by the central limit theorem.

Appendix

Sample Mean Plots

par(mfrow=c(1,2))
hist(allmeans, col="green", main="Histogram of Sample Means", 
     xlab="Sample Means", ylab="Count")
abline(v=theoretical_mean, col="red", lwd=5.0)
legend("topright",legend=c("Theoretical Mean"), col=c("red"), lty=1)

plot(1:1000, mean_monitor, pch=19, main="Sample Mean Convergence", 
     xlab="Simulations", ylab="Mean")
abline(h=theoretical_mean, col="red", lwd=5.0)
legend("topright",legend=c("Theoretical Mean"), col=c("red"), lty=1)

Sample Mean and Theoretical Mean

sample_mean <- mean(allmeans)
paste("Sample Mean =",toString(sample_mean), " ; ", 
      "Theoretical Mean =",toString(theoretical_mean))

Sample Variance Convergence Plot

plot(1:1000, var_monitor, pch=19, main="Sample Variance Convergence", 
     xlab="Simulations", ylab="Variance")
abline(h=theoretical_var, col="red", lwd=5.0)
legend("topright",legend=c("Theoretical Variance"), col=c("red"), lty=1, cex=0.75)

Sample Variance and Theoretical Variance

sample_var <- var(allmeans)
paste("Sample Variance =",toString(sample_var), " ; ",
      "Theoretical Variance =",toString(theoretical_var))

Distribution of Sample Means and Distribution of Exponentials

## Generate 1000 random exponentials
set.seed(3)
exp_dist <- rexp(1000, rate=0.2)

## Plot 
par(mfrow=c(1,2)) ## Set panels
hist(allmeans, col="green", main="Histogram of Sample Means", 
     xlab="Sample Means", ylab="Count")
hist(exp_dist, col="blue", main="Histogram of Exponentials", 
     xlab="Exponentials", ylab="Count")

Testing if the Distribution of Sample Means is Normal

par(mfrow=c(1,2)) ## Set panels

## Method 1: Fit normal probability density function to histogram
## Pass the prob=TRUE argument to use probabilities 
hist(allmeans, prob=TRUE, col="green", main="Histogram of Sample Means", 
     xlab="Sample Means", ylab="Probability")
curve(dnorm(x, mean=sample_mean,sd=sqrt(sample_var)), min(allmeans), 
      max(allmeans), add=T, col="blue", lwd=3)
legend("topright",legend=c("Normal Fit"), col=c("blue"), lty=1)

## Method 2: Quantile-Quantile plot
qq_vals <- qqnorm(allmeans, pch=19)
qqline(allmeans, lwd=3, lty="dashed", col="red")
legend("topleft",legend=c("QQ Fit"), col=c("red"), lty=1)

Testing the Central Limit Theorem

## Normlize sample means
z_val <- (allmeans-theoretical_mean)/sqrt(theoretical_var)

par(mfrow=c(1,2)) ## Set Panels

## Histogram and Fit to Standard Normal PDF
hist(z_val, prob=TRUE, col="green", main="Histogram of Normalized Sample Means", 
     xlab="Sample Means", ylab="Probability")
curve(dnorm(x, mean=0, sd=1), min(z_val), max(z_val), add=T, col="blue", lwd=3)
legend("topright",legend=c("Normal Fit"), col=c("blue"), lty=1)

## Quantile-Quantile Plot and Fit
qq_vals <- qqnorm(z_val, pch=19)
qqline(z_val, lwd=3, lty="dashed", col="red")
legend("topleft",legend=c("QQ Fit"), col=c("red"), lty=1)

## Print Mean  and Standard Deviation
paste("Sample Mean = ", round(mean(z_val),2), " ; ", 
      "Sample Standard Deviation = ", round(sd(z_val),2))