Simulation Exercises

Statistical Inference Course

Alvaro Espinel - Coursera StatInference -012

Overview

The exponential distribution can be simulated using R withthe expression rexp(n, lambda) where lambda \(\lambda\) is the rate parameter. The mean of exponential distribution is \(1/\lambda\) and the standard deviation is also \(1/\lambda\). Assuming \(\lambda=0.2\), in this simulation, it is presented the distribution of averages of 40 numbers sampled from exponential distribution with \(\lambda=0.2\).

Performing a thousand simulated averages of 40 samples with exponential distribution.

set.seed(3)
lambda <- 0.2
num_sim <- 1000
sample_size <- 40
sim <- matrix(rexp(num_sim*sample_size, rate=lambda), num_sim, sample_size)
row_means <- rowMeans(sim)

The distribution of sample means is:

# plot the histogram of averages
hist(row_means, breaks=50, prob=TRUE,
     main="Distribution of averages of samples,
     withdrawn from exponential distribution assuming lambda=0.2",
     xlab="")
# density of the averages of samples
lines(density(row_means))
# theoretical center of distribution
abline(v=1/lambda, col="green", lwd =2)
# theoretical density of the averages of samples
xfit <- seq(min(row_means), max(row_means), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(sample_size)))
lines(xfit, yfit, pch=22, col="green", lty=4)
# add legend
legend('topright', c("simulation", "theoretical"), lty=c(1,2), col=c("black", "green"))

The distribution of sample means is centered at 4.9866197. The theoretical center of the distribution is \(\lambda^{-1}\) = 5. The variance of sample means is 0.6257575 where the theoretical variance of the distribution is \(\sigma^2 / n = 1/(\lambda^2 n) = 1/(0.04 \times 40)\) = 0.625.

According to the central limit theorem, the averages of samples follow a normal distribution. The previous igure shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values. Also, the q-q plot below suggests a normal distribution of the samples.

qqnorm(row_means); qqline(row_means)

Now evaluate the confidence interval for \(1/\lambda = \bar{X} \pm 1.96 \frac{S}{\sqrt{n}}\)

lambda_values <- seq(4, 6, by=0.01)
coverage <- sapply(lambda_values, function(lamb) {
    mu_hats <- rowMeans(matrix(rexp(sample_size*num_sim, rate=0.2),
                               num_sim, sample_size))
    ll <- mu_hats - qnorm(0.975) * sqrt(1/lambda**2/sample_size)
    ul <- mu_hats + qnorm(0.975) * sqrt(1/lambda**2/sample_size)
    mean(ll < lamb & ul > lamb)
})

library(ggplot2)
qplot(lambda_values, coverage) + geom_hline(yintercept=0.95)

The 95% confidence interval for the parameter (\(\lambda\)) to be estimated (\(\hat{\lambda}\)) is \(\hat{\lambda}_{low} = \hat{\lambda}(1 - \frac{1.96}{\sqrt{n}})\) and \(\hat{\lambda}_{upp} = \hat{\lambda}(1 + \frac{1.96}{\sqrt{n}})\). As can be seen from the previous plot, for selection of \(\hat{\lambda}\) around 5, the average of the sample mean falls within the confidence interval at least 95% of the time. The true rate, \(\lambda\) is 5.