Statistical Inference, Part One

Overview

An investigation of the exponential distribution in R for a given set of parameters, and related comparison with the Central Limit Theorem. For this illustration, we generate 1000 simulations of the averages of 40 exponentials, with lambda=0.2.

Our objectives are as follows:

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Exponential Distribution Defined

Exponential distribution is a statistical model generally used to illustrate the distribution of time or distance between events. In the exponential distribution, mean is 1/lambda, and standard deviation is also 1/lambda.

Lambda is the rate (per unit of time or space) at which events occur.

For our purposes Theoretical Mean is the same as the population mean: 1/lambda = 5.

Establish the variables:

set.seed(500)
sims <- 1000
n <- 40
lambda <- 0.2

Run simulation:

runSim <- replicate(sims, rexp(n, lambda))

Calculate means of the exponential simulation:

meanExsim <- apply(runSim, 2, mean)

Generate plot of exponential means:

hist(meanExsim, col = "aquamarine", border = "aquamarine4", xlab = "Mean", main = "Means: 1000 Simulations of 40 Exponentials")

1. Compare Sample Mean to Theoretical Mean

As demonstrated below, our sample mean 5.010562 is extremely close to the theoretical mean, 5.

Theoretical mean is, by definition, the same as the population mean: 1/lambda:

theoMean <- 1/0.2
theoMean
## [1] 5

Calculate sample mean of our simulations:

samMean <- mean(meanExsim)
samMean
## [1] 5.010562
hist(meanExsim, col = "aquamarine", border = "aquamarine4", xlab = "Mean", main = "Means: Sample vs. Theoretical")
abline(v = samMean, col = "mediumblue", lwd = 6, lty = 2)
abline(v = theoMean, col = "lightcoral", lwd = 4)
legend("topright", legend=c("Theoretical", "Sample"),
       col=c("lightcoral", "mediumblue"), lty=1:2, cex=0.8,
       box.lty=0)

2. Compare Simulation SD and Variance to Theoretical SD and Variance

The calculations below indicate that again, the simulation variance is very close to the theoretical variance.

# Calculate standard deviation of the simulation means
sdSim <- sd(meanExsim)
sdSim
## [1] 0.7874779
# Calculate standard deviation of the theoretical mean
sdTheo <- (1/lambda)/sqrt(n)
sdTheo
## [1] 0.7905694
# Calculate variance of simulation
varSim <- sdSim^2
varSim
## [1] 0.6201215
# Calculate theoretical variance
varTheo <- ((1/lambda)*(1/sqrt(n)))^2
varTheo
## [1] 0.625

3. Does Simulation Follow Normal Distribution?

According to Central Limit Theorem, given the large number of simulations, the means of our exponential distribution should follow a normal distribution. As demonstrated below, yes our simulation means are indeed normally distributed.

xfit <- seq(min(meanExsim), max(meanExsim), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(n)))
hist(meanExsim, breaks=n, prob=T, col="aquamarine", border = "aquamarine4", xlim = c(3, 8), xlab = "Mean", main="Mean Density", ylab="Density")
lines(xfit, yfit, pch=22, col="lightcoral", lty=5, lwd=3)

QQ Plot Comparison

Finally, let us take another view of our distribution to check the correlation between our sample and the normal distribution. We see that our sample closely follows the normal reference line.

qqnorm(meanExsim, col = "aquamarine", main = "1000 Sims of 40 Exponentials vs. Normal Distribution")
qqline(meanExsim, col = "lightcoral")

Summary

This investigation demonstrates that 1000 simulations of the averages of 40 exponentials in an exponential distribution does closely conform to the normal distribution.