Overview

The goal of this project (the part 1 of the course project of the Statistical Infenrence Course from Coursera) is to investigate the Exponential Distribution in R and compare it with the Central Limit Theorem.

The Exponential Distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of Exponential Distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations.

We investigate the distribution of averages of 40 exponentials. We plot a histogram of the mean of 40 exponentials. We show the sample mean and compare it to the theoretical mean of the distribution. We show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. Finally, we show that the distribution is approximately normal and the confidence interval for 1/lambda.

Simulations

Sample Mean versus Theoretical Mean

## [1] "Sample Mean: 4.97      Theoretical Mean: 5"

The means of Sample and Theoretical distributions are very similar.

Sample Variance versus Theoretical Variance

## [1] "Sample Variance: 0.59      Theoretical Variance: 0.62"

The variances of Sample and Theoretical distributions are very similar.

Distribution

As shown in the graph, the calculated distribution of means of random sampled exponantial distributions overlaps with the normal distribution, due to the Central Limit Theorem.

We use qqplot and qqline to compare the distribution of averages of 40 exponentials to a normal distribution.

The q-q plot suggests the distribution of averages of 40 exponentials is very close to a normal distribution, due to the Central Limit Theorem.

We evaluate the coverage of the confidence interval for 1/lambda.

## [1] 3.462406 6.486071

Appendix

We show all the R code use to do the analysis.

###############################################################################
# Author: Sergio Contador
# Date: March 2017
# Title: Statistical Inference Course Project from Coursera, part 1
###############################################################################


# Load Libraries Required
library(knitr)
library(ggplot2)


# Select a seed for reproducibility
set.seed(1234)


# Simulation exponencial distribution
lambda <- 0.2
size <- 40 
n <- 1000
simulation <- matrix(rexp(n * size, rate = lambda), n, size)


# Plot results in Histogram
hist(
        
        rowMeans(simulation), 
        breaks = 20,
        xlab = "Mean of 40 Exponentials", 
        ylab = "Frequence", 
        main = "Histogram of the Mean of 40 Exponentials", 
        col = "snow2"
        
)


# 1. Show the sample mean and compare it to the theoretical mean 
# of the distribution
simulationMean <- mean(simulation)
theoreticalMean <- 1 / lambda

paste("Sample Mean:", round(simulationMean, digits = 2), 
      "    ", "Theoretical Mean:", theoreticalMean)

# Plot results in Histogram
hist(
        
        rowMeans(simulation), 
        breaks = 20,
        xlab = "Mean of 40 Exponentials", 
        ylab = "Frequence", 
        main = "Histogram of the Mean of 40 Exponentials", 
        col = "snow2"
        
)
abline(v = simulationMean, lwd = "4", col = "green")
abline(v = theoreticalMean, lwd = "4", col = "orange")
legend(5.8, 100, c(paste("Sample Mean =", round(simulationMean, digits = 2)), paste("Theorical Mean =", round(theoreticalMean, digits = 2))), 
       lty = c(1, 1), lwd = c(2, 2), col = c("green","orange"))


# 2. Show how variable the sample is (via variance) and compare it to the 
# theoretical variance of the distribution.
simulationVariance <- var(rowMeans(simulation))
theoreticalVariance <- ((1 / lambda) ^ 2) / size

paste("Sample Variance:", round(simulationVariance, digits = 2), 
      "    ", "Theoretical Variance:", round(theoreticalVariance, digits = 2))


# 3. Show that the distribution is approximately normal.
hist(
        
        rowMeans(simulation), 
        breaks = 20, 
        prob = TRUE, 
        xlab = "Mean of 40 Exponentials", 
        ylab = "Density", 
        main = "Histogram of the Mean of 40 Exponentials", 
        col = "snow2"
        
)

curve(dnorm(x, mean = simulationMean, sd = sqrt(simulationVariance)), 
      col = "green", lwd = 2, lty = "dotted", add = TRUE, yaxt = "n")

curve(dnorm(x, mean = theoreticalMean, sd = sqrt(theoreticalVariance)), 
      col = "orange", lwd = 2, add = TRUE, yaxt = "n")

legend(6, 0.5, c("Sample", "Theoretical"), lty = c(3, 1), lwd = c(2,2), col = c("green", "orange"))


# qqnorm is a generic function the default method of which produces a normal QQ
# plot of the values in y. qqline adds a line to a “theoretical” which passes 
# through the probs quantiles.
qqnorm(rowMeans(simulation), pch = 19, col = "snow4")
qqline(rowMeans(simulation), col = "red")


# calculate the confidence interval
mean(rowMeans(simulation)) + c(-1, 1) * 1.96 * sd(rowMeans(simulation))