Statistical Inference Project Part 1

Overview

This will be a simulation exercise comparing the exponential distribution to the Central Limit Theorem.

Instructions

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:

Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.

In point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

Simulation Setup

First always set your seed so the simulation is reproducible. Here I’ll use 33, my favorite seed! Store all the known variables we’re going to work with in the sim.
The exponential distribution can be simulated in R with rexp(n, lambda).
Do a thousand simulations.

set.seed(33)

lambda <- 0.2
n <- 40

expo_sims <- replicate(1000, rexp(n, lambda))

means_expo_sims <- apply(expo_sims, 2, mean)

1. Show the sample mean and compare it to the theoretical mean of the distribution.

We’re told the theoretical mean of the distribution is 1/lambda. The sample mean is just the mean of the 1,000 simulations we just did.

theo_mean <- 1/lambda
theo_mean

## [1] 5

sample_mean <- mean(means_expo_sims)
sample_mean

## [1] 4.964431

As you can see the theoretical mean and the sample mean are very close.

hist(means_expo_sims, xlab = "Simulated Mean Buckets", main = "Distribution of Simulated Means", col = "grey")
abline(v=theo_mean, col = 2, lwd = 2)
abline(v=sample_mean, col=4, lwd = 2)
legend("topright", legend = c("theo_mean", "sample_mean"), pch = "|", col = c(2,4))

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

We’re told the standard deviation of the theoretical distribution is 1/lambda. To calculate the variance we square the standard deviation and divide that by the n. For our sample we can just square the calculated standard deviation.

theo_var <- ((1/lambda)^2)/n
theo_var

## [1] 0.625

sample_var <- (sd(means_expo_sims))^2
sample_var

## [1] 0.6456619

The sample variation is fairly close to the theoretical variation. I prefer to compare standard deviations instead of variances since they aren’t squared and we’ll need them for part 3 anyway so here’s how we get those:

theo_sd <- 1/(lambda * sqrt(n))
theo_sd

## [1] 0.7905694

sample_sd <- sd(means_expo_sims)
sample_sd

## [1] 0.8035309

Again, fairly close.

3. Show that the distribution is approximately normal.

To do this I’ll just use the same histogram I used in part one but break it up into more buckets. I’ll also add the prob = TRUE argument and a line showing the density.

hist(means_expo_sims, xlab = "Simulated Mean Buckets", main = "Distribution of Simulated Means", col = "grey", prob = TRUE, breaks = 100)
lines(density(means_expo_sims))

That looks pretty normal to me. To be more confident we can look at the confidence intervals!

theo_conf <- theo_mean + c(-1,1) * 1.96 * sqrt(theo_var)/sqrt(n)
theo_conf

## [1] 4.755 5.245

sample_conf <- mean(means_expo_sims) + c(-1,1)*1.96*sd(means_expo_sims)/sqrt(n)
sample_conf

## [1] 4.715414 5.213447

Those are close as well.

Finally, since visual tests aren’t always reliable we can do something fancy and use the Shapiro-Wilk method.

shapiro.test(means_expo_sims)

## 
##  Shapiro-Wilk normality test
## 
## data:  means_expo_sims
## W = 0.99052, p-value = 4.752e-06

Since that results in a really low p-value we can confidently say the data is normally distributed.