Statistical Inference Project Part 1

Overview

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem.
This will be illustrated via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.

library(ggplot2)  #load the following packages to support the analysis:
#knitr settings as follows:
knitr::opts_chunk$set(echo=TRUE, fig.path='part1/', fig.width=10, fig.height=6, cache=TRUE)

set.seed(10)

Perform Simulation

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. * Set lambda = 0.2 for all of the simulations, the number ofexponentials to 40, and number of simulations to 1000.

lambda <- 0.2                               # setting lambda
n <- 40                                     # setting number of exponentials
sim <- 1000                                 # setting number of simulations
sim_run <- replicate(sim, rexp(n,lambda))   # run simulations
means_exp <- apply(sim_run, 2, mean)        # calc the means of the exp simulations

1. Show the Sample Mean and compare it to the theoretical mean od the distribution

The mean of the exponential distribution is 1/lambda. We are setting lambda as 0.2 for all simulations. As a result, the theoretical should be equal to 5 (1/.02).

sample_mean <- mean(means_exp)      
theoretical_mean <- 1/lambda
cat("Sample Mean:  ", sample_mean,"     ", "Theoretical Mean:  ", theoretical_mean)

## Sample Mean:   5.04506       Theoretical Mean:   5

It appears the sample mean is comes very close to matching the theoretical mean.

1a.We will now create a histogram to support out analysis.

# create a histogram of the exponential means to support further analysis & vertical line for theoretical mean
hist(means_exp, xlab="Mean of 40 Exponentials", ylab="Frequency", xlim=c(2,9), main="Distribution of the Averages for 40 Exponentials", col="red")
abline(v=mean(means_exp), lwd=5, col="blue")

This histogram shows the sample data and the theoretical mean line (blue).

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution

sample_var <- var(means_exp)              # calculate the sample variance
theoretical_var <- (1/lambda)^2/n         # calculate the theoretical variance
cat("Sample Variance:  ", sample_var,"     ", "Theoretical Variance:  ",theoretical_var)

## Sample Variance:   0.6372544       Theoretical Variance:   0.625

The sample variance and the theoretical variance appear to be closely aligned.

3. Show that the distribution is approximately normal

# create a histogram with curves.  Adjust the density and the breaks in the histogram to get more precise view of the data (test different levels and colors)
hist(means_exp, density=20, breaks=20, prob=TRUE, xlim=c(2,9), xlab="Mean of 40 Exponentials", ylab="Frequency", main="Distribution of the Averages for 40 Exponentials", col="green")
curve(dnorm(x, mean=sample_mean, sd=sqrt(sample_var)), col="black", lwd=3, lty="dotted", add=TRUE)
curve(dnorm(x,theoretical_mean, sd=sqrt(theoretical_var)), col="blue", lwd=3, add=TRUE)

Based upon the results of the histogram, it does appear to be an approximately normal distribution.

Let’s take a look at the Confidence Intervals.

# generate the Confidence Intervals for the sample and theoretical (round to three decimal places)
sample_ci <- round(mean(means_exp) + c(-1,1)*1.96*sd(means_exp)/sqrt(n),3)
theoretical_ci <- theoretical_mean + c(-1,1)*1.96*sqrt(theoretical_var)/sqrt(n)
cat("Sample CI:", sample_ci,"     ","Theoretical CI:", theoretical_ci)

## Sample CI: 4.798 5.292       Theoretical CI: 4.755 5.245

The Confidence Intervals are very closely aligned.

Lastly, we will look a q Q-Q Plot to to compare the sample and theoretical quantiles

qqnorm(means_exp)
qqline(means_exp, col="red", lwd=4)

It appears the distribution is approximately normal with some break away on the tail ends.

Statistical Inference Project Part 1 - A Simulation Exercise

Terry Jones

October 31, 2018