The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s.

nosim <- 1000
lambda <- 0.2
n <- 40
if(!require(ggplot2)){install.packages("ggplot2")}
## Loading required package: ggplot2

1. Show where the distribution is centered at and compare it to the theoretical center of the distribution.

set.seed(13)
sim <- matrix(rexp(nosim*n, lambda), nosim)
sampmean <- apply(sim, 1, mean) 
mean(sampmean)
## [1] 4.973
1/lambda
## [1] 5

The calculated mean is close to the theoretical mean.

2. Show how variable it is and compare it to the theoretical variance of the distribution.

Variance:

sd(sampmean)^2
## [1] 0.685
(1/lambda/sqrt(n))^2
## [1] 0.625

Standard deviation:

sd(sampmean)
## [1] 0.8276
(1/lambda)/sqrt(n)
## [1] 0.7906

The calculated variance is close to the theoretical variance, while the calculated standard deviation is close to the theoretical standard deviation.

3. Show that the distribution is approximately normal.

cfunct <- function (x, n)
        {
        sqrt(n)*(mean(x)-1/lambda)/(1/lambda)
        }

dat <- data.frame(x=apply(sim, 1, cfunct, n))
p <- ggplot(dat, aes(x=x, fill="red")) + geom_histogram(alpha=0.5, binwidth=0.3,color="black", aes(y=..density..))+theme(legend.position = "none")
p + stat_function(fun = dnorm, size = 2)

plot of chunk unnamed-chunk-5

From the graph, it is easy to see that the sample mean of the sampling distribution follows an approximately standard normal distribution.

4. Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96Sn√.

mean(sampmean)+c(-1,1)*1.96*sd(sampmean)
## [1] 3.350 6.595

The confidence interval is given by the range of the two values above.