Part 1: Simulation of the the exponential distribution

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda \( \lambda \) is the rate parameter. The mean of exponential distribution is \( 1/\lambda \) and the standard deviation is also \( 1/\lambda \). We set \( \lambda=0.2 \) for all of the simulation and we investigate the distribution of averages of 40 expontential(0.2)s. We do a thousand simulated averages of 40 exponentials.

set.seed(3)
lambda <- 0.2
simulations <- 1000
sample <- 40
s <- matrix(rexp(simulations*sample, rate=lambda), simulations, sample)
means <- rowMeans(s)

1. Show where the distribution is centered at and compare it to the theoretical center of the distribution

# Histogram of averages
hist(means, breaks=50, prob=TRUE,
     main="Distribution of averages of samples 
     using exponential distribution with lambda=0.2",
     xlab="")

# Density of the averages of samples
lines(density(means))

# Theoretical center of distribution
abline(v=1/lambda, col="red")

# Theoretical density of the averages of samples
xfit <- seq(min(means), max(means), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(sample)))
lines(xfit, yfit, pch=22, col="red", lty=2)

# Legend
legend('topright', c("simulation", "theoretical"), lty=c(1,2), col=c("black", "red"))

plot of chunk unnamed-chunk-2

The distribution is centered at: 4.9866 which is very close to the theoretical center of the distribution: \( \lambda^{-1} \) = 5.

2. Show how variable it is and compare it to the theoretical variance of the distribution.

The simulated variance is: 0.6258, which is very close to the theoretical one: \( \sigma^2 / n = 1/(\lambda^2 n) = 1/(0.04 \times 40) \) = 0.625.

3. Show that the distribution is approximately normal.

Due to the central limit theorem, the averages of samples follow normal distribution. The figure above also shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values. Also, the q-q plot below suggests the normality.

qqnorm(means); qqline(means)

plot of chunk unnamed-chunk-3

4. Evaluate the coverage of the confidence interval for \( 1/\lambda = \bar{X} \pm 1.96 \frac{S}{\sqrt{n}} \)

lambda_vals <- seq(4, 6, by=0.01)
coverage <- sapply(lambda_vals, function(lamb) {
    mu_hats <- rowMeans(matrix(rexp(sample*simulations, rate=0.2),
                               simulations, sample))
    ll <- mu_hats - qnorm(0.975) * sqrt(1/lambda**2/sample)
    ul <- mu_hats + qnorm(0.975) * sqrt(1/lambda**2/sample)
    mean(ll < lamb & ul > lamb)
})

library(ggplot2)
qplot(lambda_vals, coverage) + geom_hline(yintercept=0.95)

plot of chunk unnamed-chunk-4

The 95% confidence intervals for the rate parameter (\( \lambda \)) to be estimated (\( \hat{\lambda} \)) are \( \hat{\lambda}_{low} = \hat{\lambda}(1 - \frac{1.96}{\sqrt{n}}) \) agnd \( \hat{\lambda}_{upp} = \hat{\lambda}(1 + \frac{1.96}{\sqrt{n}}) \). As can be seen from the plot above, for selection of \( \hat{\lambda} \) around 5, the average of the sample mean falls within the confidence interval at least 95% of the time. Note that the true rate, \( \lambda \) is 5.