Statistical Inference Project Part 1 - Demonstration of CLT with Exponential Distribution

Introduction

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do 1000 simulations.

Generate data using exponential distribution

Set to show R code.

library(knitr)
opts_chunk$set(echo=TRUE, cache=TRUE)

Setup(lambda=0.2, n=40, 1000 simulations)

n<-40
nosim<-1000
lambda<-0.2
#Generate a matrix of nosim*n of exponential distribution.
set.seed(123)
simdata <- matrix(rexp(nosim * n, rate=lambda), nosim)
str(simdata)

##  num [1:1000, 1:40] 4.217 2.883 6.645 0.158 0.281 ...

Show the sample mean and compare it to the theoretical mean of the distribution

sample_mean<-mean(rowMeans(simdata))
theoretical_mean<-1/lambda

The sample mean is 5.0119113, the theoretical mean is 5. This means the simulation is close to the population mean.

Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

means<-apply(simdata,1,mean)
sample_var<-var(means)
theoretical_var<-((1/lambda)^2)/n

The theoretical variance is 0.625, the sample variance is 0.6088292. This means the simulation is close to the population mean.

Show that the distribution of average IID variables is approximately normal.

hist(means, breaks=50, prob=TRUE,
     main="Distribution of averages of samples from exponential distribution",
     xlab="Mean")
# density of the averages of samples
lines(density(means))
# center of normal distribution
abline(v=theoretical_mean, col="red")
# theoretical density of the averages of samples
x <- seq(min(means), max(means), length=100)
y <- dnorm(x, mean=theoretical_mean, sd=sqrt(theoretical_var))
lines(x, y, col="red")
# add legend
legend('topright', c("simulation", "theoretical"), lty=c(1,1),col=c("black", "red"))

It shows that \(\bar X_{n}\)is approximate N(\(\mu\), \(\sigma^2\)/n) where \(\mu\)=5 and \(\sigma^2\)/n=0.625. It is centered near the thoretical mean, and it is approximately normal.