The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials. Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s.
## Declare constants
lambda <- 0.2
mut <- 1/lambda
n <- 40
sigmat <- 1/lambda/sqrt(n)
nosim <- 1000
## Compute matrix of nosim rows of n exponential(lambda) samples
expmatrix <- matrix(rexp(n * nosim, lambda), nosim)
## Compute data frame with nosim rows of sample means
meandf <- data.frame(
meanofn <- apply(expmatrix, 1, mean)
)
library(ggplot2)
## Define colours
orange <- "#E69F00"
blue <- "#56B4E9"
## Construct base plot
g <- ggplot(meandf, aes(x = meanofn)) + xlab("Mean") + ylab("Density")
## Add sample data distribution
g <- g + geom_histogram(alpha=0.20, binwidth=0.1, fill=orange, colour=orange, aes(y=..density..))
## Add title and render
g + ggtitle("Fig. 1. Sample data")
We display vertical lines for the mean of the sample data (orange) and the theoretical mean \(\mu_t = \mu = 5\) (blue, dashed).
## Add mean of sample data
g <- g + geom_vline(data=meandf, aes(xintercept=mean(meanofn)), colour=orange, size=1)
## Add theoretical mean
g <- g + geom_vline(aes(xintercept=mut), colour=blue, linetype="dashed", size=1)
## Add title and render
g + ggtitle("Fig. 2. Added samples and theoretical means")
We display vertical lines for the \(\pm\) sd of the sample data (orange) as well as those for the theoretical sd \(\sigma_t = \frac{\sigma}{\sqrt{n}} = 0.79\) (blue, dotted).
## Add sd of sample data
g <- g + geom_vline(data=meandf, aes(xintercept=mean(meanofn)-sd(meanofn)), colour=orange, size=1)
g <- g + geom_vline(data=meandf, aes(xintercept=mean(meanofn)+sd(meanofn)), colour=orange, size=1)
## Add theoretical sd
g <- g + geom_vline(aes(xintercept=mut-sigmat), colour=blue, linetype="dotted", size=1)
g <- g + geom_vline(aes(xintercept=mut+sigmat), colour=blue, linetype="dotted", size=1)
## Add title and render
g + ggtitle("Fig. 3. Added sample data and theoretical sds")
We overlay the normal distribution parametrized by \(\mu_t\) and \(\sigma_t^2\) (blue).
## Add normal distribution
g <- g + stat_function(fun=dnorm, arg=list(mean=mut, sd=sigmat), colour=blue)
## Add title and render
g + ggtitle("Fig. 4. Added normal distribution")
We reuse the previously computed matrix with \(1000\) rows of \(n\) exponential(\(\lambda\)) samples. For each row, we determine whether \(\frac{1}{\lambda}\) is within the confidence interval computed for that row \(\bar X \pm 1.96 \frac{S}{\sqrt{n}}\).
The results are collected in a vector of length \(1000\) with values \(0\) or \(1\). The sum of this vector divided by its length is de coverage ratio for the confidence interval for \(\frac{1}{\lambda}\).
coverage <- apply(expmatrix, 1, function(x) {
ll <- mean(x) - qnorm(.975) * sd(x) / sqrt(n)
ul <- mean(x) + qnorm(.975) * sd(x) / sqrt(n)
mean(ll < 1/lambda & ul > 1/lambda)
})
round(sum(coverage)/length(coverage),2)
## [1] 0.92