Statistical Inference Course Project

Project Introduction

The project consists of two parts:

1.A simulation exercise.

2.Basic inferential data analysis.

Part 1

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should

1.Show the sample mean and compare it to the theoretical mean of the distribution.

2.Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

3.Show that the distribution is approximately normal.

In point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

Simulations

set.seed(1234)

lambda <- 0.2
n <- 40
sim <- 1000

simExp <- replicate(sim, rexp(n, lambda))

meanExp <- apply(simExp, 2, mean)

Question 1: Sample Mean Versus Theoretical Mean

sampleMean  <- mean(meanExp)
sampleMean

## [1] 4.974239

theoMean <- 1/lambda
theoMean

## [1] 5

hist(meanExp, main = "Simulated Exponential Sample Means", col = "aliceblue", breaks = 100)
abline(v = sampleMean, col = "red")
abline(v = theoMean, col = "blue")

Question 2: Sample Variance Versus Theoretical Variance

samplesd  <- sd(meanExp)
samplevar <- samplesd^2
samplevar

## [1] 0.5706551

theosd <- (1/lambda)/sqrt(n)
theovar <- theosd^2
theovar

## [1] 0.625

Question 3: Distribution

Finally, I’ll investigate whether the exponential distribution is approximately normal. Because of the Central Limit Theorem, the means of the sample simulations should follow a normal distribution.

hist(meanExp, main = "Normal Distribution", col = "aliceblue", breaks = 100)

xfit <- seq(min(meanExp), max(meanExp), length = 100)
yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda)/sqrt(n))
lines(xfit, yfit*60, lty = 5)