This project is for the Coursera - Statistical Inference class (Data Science specialization). It consists of two parts:
1. Simulation exercises
2. Basic inferential data analysis

Synopsis

The exponential distribution can be simulated in R with rexp(n, lambda) where n is the number of observations and lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda.
In these simulation exercises, we investigate the distribution of averages of 40 exponentials over a thousand observations (n=100), assuming the lambda = 0.2

Results

Let us create a thousand simulated averages of 40 exponentials, i.e. rexp(40,0.2)

expdist <- rep(NA,1000)
for (i in 1:1000){
    expdist[i] <- mean(rexp(40,0.2))
}

Let’s calculate where the mean is centered. The theoretical center of the mean is 1/lambda = 1/0.2: 5

calcmean <- mean(expdist)

From above, we can see that the calculated mean is 5.0008 and the theoretical mean is 5; therefore the variation is negligible.

Let’s now see how variable the simulated distribution is compared to the theoretical. The theoretical variance is ((1/0.2)^2)/40 : 0.625

calcvar <- var(expdist)

From above, we can see that the calculated variance is 0.6132 and the theoretical variance is 0.625; therefore both distributions have similar variability.

Let’s now investigate whether the distribution resembles a normal distribution. We use the scale() function to plot the distribution.

expscale <- scale(expdist)
hist(expscale,probability=T, main="", ylim=c(0, 0.5))
lines(density(expscale))
# Compare with the standard normal distribution
curve(dnorm(x,0,1), -3, 3, col="red", add=T)

plot of chunk analysis3

As can be seen from above plots, the distribution is approximately normal.

Let’s now evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96Snv

# Calculate upper and lower limits using standard deviation as 1/lambda
lowercl <- expdist - qnorm(0.975) * (1/0.2)/sqrt(40)
uppercl <- expdist + qnorm(0.975) * (1/0.2)/sqrt(40)
expci <- mean(lowercl < (1/0.2) & uppercl > (1/0.2))

The confidence interval is thus expci : 95.9%