Statistical Inference Course Project

Overview

This project investigates the exponential distribution and compares it with the Central Limit Theorem. The exponential distribution can be simulated in R with the “rexp” function. The distribution of averages of 40 exponentials will be investigated over 1,000 simulations.

Simulations

In this section, the actual simulations are carried out. A seed was set so that the results can be reproduced. The other parameters are also set in this section. Lambda was set at 0.2, number of exponentials (num) was set at 40, and 1,000 simulations (sim) were carried out. The R function “rexp” was used inside the “replicate” function to replicate 1,000 simulations of 40 exponentials being drawn with a lambda of 0.2.

A histogram of the means of the simulations is included.

## Set the seed for reproducibility
set.seed(112358)

## Set the Lambda rate
lambda <- 0.2

## Set the number of exponentials
num <- 40

## Set the number of simulations
sim <- 1000

## Run simulations using the exponential distribution
experiment <- replicate(sim, rexp(num, lambda))

## Calculate the means of the simulated exponential distributions
experimentMeans <- apply(experiment, 2, mean)

## Diaplay a histogram of the means of the simulated data
hist(experimentMeans, xlim=c(2,8), breaks=40, xlab="Simulation Means", main="Means of Simulated Exponential Distribution")

## Add a red vertical line at the mean of the sample means
abline(v=mean(experimentMeans), lwd="2", col="red")

## Add a blue vertical line at the theoretical mean (1/lambda)
abline(v=1/lambda, lwd="2", col="blue")

Sample Mean versus Theoretical Mean

The histogram above shows the means of the simulations. A red vertical line was added to show the mean of the simulated means. Also added to the histogram was a blue vertical line at the theoretical mean (1/lambda). Since the sample mean and theoretical mean are so close in value, it is hard to see both lines.

For clarity, the sample and theoretical means are computed and displayed below.

## Display experimental mean
print(paste("Experimental mean: ", mean(experimentMeans)))

## [1] "Experimental mean:  5.02460196977246"

## Display theoretical mean
print(paste("Theoretical mean: ", (1/lambda)))

## [1] "Theoretical mean:  5"

Sample Variance versus Theoretical Variance

The theoretical variance is defined as (1/lambda)/sqrt(n))^2. The calculations below show that, as expected, the sample and theoretical variance are also very close.

## Display experimental variance
print(paste("Experimental variance: ", sd(experimentMeans)^2))

## [1] "Experimental variance:  0.591617057278862"

## Display theoretical variance
print(paste("Theoretical variance: ", ((1/lambda)/sqrt(num))^2))

## [1] "Theoretical variance:  0.625"

Distribution

The Central Limit Theorem essentially says that the distribution of the sum of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution. The histogram below shows the experimental distribution. A red line denoting the experimental distribution curve has been added to the histogram. Finally, the blue dotted line in the histogram below represents the theoretical distribution curve. It appears as though our experiment supports the Central Limit Theorem.

## Display a histogram with the ditribution curve
hist(experimentMeans, xlim=c(2,8), breaks=40, xlab="Simulation means", main="Means of Simulated Exponential Distribution", prob=TRUE)

## Add a red experimental distribution curve
lines(density(experimentMeans), lwd="3", col="red")

# Add a blue theoretical distribution curve
x <- seq(min(experimentMeans), max(experimentMeans), length=2*num)
y <- dnorm(x, mean=1/lambda, sd=sqrt(((1/lambda)/sqrt(num))^2))
lines(x, y, pch=22, col="blue", lwd="3", lty=2)

Statistical Inference Course Project - Part 1

Steve Wenck

August 20, 2017

Overview

Simulations

Sample Mean versus Theoretical Mean

Sample Variance versus Theoretical Variance

Distribution