Statistical Inference Course Project

Part 1: Simulation Exercise

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Questions

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:

Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal. Focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

First, set the seed of the random number generator, so the outcome is reproducible and define the parameters of the exponential distribution:

set.seed(123)
lambda <- 0.2 # rate parameter
n <- 40 # nr of exponentials
nosim <- 1000 # nr of simulations

Now run the simulation 1000 times using the exponential distribution (rexp(n, lambda)).

simulations <- apply(matrix(rexp(nosim*n, rate = lambda), nosim), 1,  mean)

Q1 - Sample Mean versus Theoretical Mean of the distribution.

The theoretical mean of the exponential distribution is:

meanExp <- 1/lambda
meanExp

## [1] 5

The sample mean of the simulations is now calculated and the difference between the sample mean and the theoretical mean:

meanSim <- mean(simulations)
meanSim

## [1] 5.011911

diff  <- meanExp - meanSim
round(diff, 3)

## [1] -0.012

hist(simulations, main = "Histogram of Sample Means", col = "lightblue", breaks = 100, xlab="Means")
abline(v = meanSim, col = "red")
abline(v = meanExp, col = "blue")
legend("topright", c("Sample Mean", "Theoretical Mean"), fill=c("red", "blue"))

Q2 - Sample Variance versus Theoretical Variance.

First show the theoretical variance of the exponential distribution. The standard deviation is (1/lambda)/sqrt(n).

sdExp <- (1/lambda)/sqrt(n)
varExp <- sdExp^2 
varExp # variance of exponential distribution

## [1] 0.625

The sample variance is:

sdSim <- sd(simulations)
varSim <- sdSim^2
varSim # sample variance of simulations

## [1] 0.6088292

vardiff <- varExp - varSim # difference
vardiff

## [1] 0.01617077

The difference in the sample variance and theoretical variance is very small, the conclusion is that 1000 simulations is a good estimator for the theoretical variance of the distribution.

Q3 - Distribution

Below is the normal curve added to the distribution of the sample means. This plot demonstrates that the distribution of the sample means is approximately normal.

h <- hist(simulations, main = "Histogram of Sample Means", col = "lightblue", breaks = 100, xlab="Means")

xfit <- seq(min(simulations), max(simulations), length=100)
yfit <- dnorm(xfit, mean=meanSim, sd = sdSim)
yfit <- yfit * diff(h$mids[1:2]*length(simulations))
lines(xfit, yfit, col="black", lwd=2)
legend("topright", "Normal Curve", fill="black")

Statistical Inference Course Project

Tim Quivooij

2023-11-06

Overview

Part 1: Simulation Exercise

Questions

Q1 - Sample Mean versus Theoretical Mean of the distribution.

Q2 - Sample Variance versus Theoretical Variance.

Q3 - Distribution