1. Overview

In this project we investigate the exponential distribution in R and compare it with the Central Limit Theorem. We investigate the sample mean and sample variance and how they compare against the theoretical mean and variance of the distribution. Finally we prove that the distribution is approximately normal.

2. Compare the sample mean to the theoretical mean of the distribution.

The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. We set lambda = 0.2 for all of the simulations. We investigate the distribution of averages of 40 exponentials and do a thousand simulations.

The r code used to calculate and draw the plots can be found in the Appendix section.

Set the parameters according to course project instructions

n <- 40

lambda <- 0.2

sim <- 1000

Run simulations 1000 times and calculate their mean using following r code: for (i in 1 : sim) {simMeans = c(simMeans, mean(rexp(n,lambda)))}

We calculate the mean of the simulated exponential distributions.

sampleMean <- mean(simMeans)

## sample Mean:  4.998511

We calcualate the theoretical mean of exponential distribution.

theoMean <- 1/lambda

## theoretical mean:  5

As you can see from the above the sample mean, 4.998511, is very close to the theoretical mean, 5, of the exponential distribution.

Plot 1 is graphical representation of the distribution of simulated means

The red vertical line is the calculated average sample mean,4.998511, of 40 exponentials from 1000 simulations.

3. Compare the sample variance to the theoretical variance of the distribution.

We calculate the variance of the simulated exponential distribution.

sampleVar <- var(simMeans)

## sample variance:  0.6113794

We calculate the variance of the theoretical exponential distribution.

theoVar <- (1/lambda)^2/n

## theoretical variance:  0.625

->The variance of the sample means, 0.6113794, is very close to the theoretical variance, 0.625.

4. Is the distribution approximately normal?

Plot 2 provides graphical proof that distribution is approximately normal.

The graph shows that the sampling distribution of sample means of our exponential distribution follows normal distribution in accordance with the Central Limit Theory. If we increased the number of simulations (currently 1000), the distribution would be even closer to the standard normal distribution.

The plot above shows the histogram can be approximated with the normal distribution.

Calculate 95% confidence interval that distribution is approximately normal.

## sample confidence interval:  4.756 5.241

## theoretical confidence interval:  4.755 5.245

The confidence intervals prove that the mean and variance of the sample distribution are very close to that of a normal distribution.

5. Conclusion

The Central Limit Theorem (CLT) tells us that the sampling distribution of the sample mean is, at least approximately, normally distributed, regardless of the distribution of the underlying random sample. In this study we confirmed that the distribution of the sample mean of an underlying exponential distribution is approximately normally distributed.

APPENDIX - R CODE not shown in body of report

Load required packages, run simulations, calculate sample means and theoretical means

#load required packages
library(ggplot2)
#set seed for reproducibilaty
set.seed(25)
#set parameters according to course project instructions
n <- 40
lambda <- 0.2
sim <- 1000
simMeans = NULL
#run simulations 1000 times and calculate their mean
for (i in 1 : sim) {simMeans = c(simMeans, mean(rexp(n,lambda)))}
head(simMeans)

## [1] 4.483931 5.007794 5.731865 4.465404 4.098390 5.855495

simMeansDf <- as.data.frame(simMeans)
#calculate the mean of the simulated exponential distributions
sampleMean <- mean(simMeans)
cat("sample Mean: ", sampleMean)

## sample Mean:  4.998511

#calcualate the theoretical mean of exponential distribution
theoMean <- 1/lambda
cat("theoretical mean: ", theoMean)

## theoretical mean:  5

Draw plot 1 - distribution of simulated means, calculate sample variance and theoretical variance

g <- ggplot(simMeansDf, aes(x=simMeans))
g <- g + geom_histogram(binwidth = .2, color="black", fill="gray") +
  geom_vline(xintercept = sampleMean, color="red", size=1, linetype=1) +
  labs(x="Simulated Mean", y= "frequecy",title="Plot1 - Distribution of simulated means")
g

sampleVar <- var(simMeans)
cat("sample variance: ", sampleVar)

## sample variance:  0.6113794

theoVar <- (1/lambda)^2/n 
cat("theoretical variance: ", theoVar)

## theoretical variance:  0.625

Draw plot2 - simulated exponential distribution vs normal distribution

g <- ggplot(simMeansDf, aes(x=simMeans))
g <- g + geom_histogram(binwidth = .2, color="black", fill="gray" , aes(y=..density..))+
  stat_function(fun=dnorm, args=list(mean=theoMean, sd=sd(simMeans)), 
                color="red", size =1) +
  labs(x="Simulated Mean", y= "density", 
       title="Plot 2 - Simulated Exponential Distribution vs Normal Distribution ")
g

Calculate confidence intervals

sampleConInterval <- round (mean(simMeans) + c(-1,1)*1.96*sd(simMeans)/sqrt(n),3)
cat("sample confidence interval: ", sampleConInterval)

## sample confidence interval:  4.756 5.241

theoConInterval <- theoMean + c(-1,1)*1.96*sqrt(theoVar)/sqrt(n);
cat("theoretical confidence interval: ", theoConInterval)

## theoretical confidence interval:  4.755 5.245

Statistical Inference Course Project 1 - Investigate Exponential distribution in R and compare it with Central Limit Theory

JP Van Steerteghem

10/8/2017