Josh Katz 02.09.16

Overview

In part 1 of this project, the exponential distribution will be used to demonstrate the Central Limit theorem. The following goals will be accomplished: 1) Show the sample mean and compare it to the theoretical mean of the distribution. 2) Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3) Show that the distribution is approximately normal.

Simulation

A simulation of 40 exponentials will be run 1000 times and stored in a variable to compare mean/variance/normality in the following sections

lambda=0.2
n=40
  ##define exponential variables

 set.seed(189)
  ##makes simulation reproducible
 
 sim_exp=matrix(rexp(n*1000, lambda),nrow=1000,ncol=n);dim(sim_exp)
## [1] 1000   40
  ##generated box called sim_exp and put 40 random exponentials in each of the 1000 rows

Comparison of simulation mean to theoretical mean

hist(rowMeans(sim_exp),col="green",main="Distribution of Sample Means")

mean(rowMeans(sim_exp))
## [1] 5.006347
 ##mean of the 1000 means,each derived from a run of 40
 
1/.2 
## [1] 5
 ##theoretical mean which is 1/lambda

The mean of the sampling distribution, 5.006347 is very close the theoretical one, 5.0. The mean of the 1000 simulations of 5.006347 is shown in the histogram as 5 appears to be the point of center mass.

Comparison of simulation variance to theoretical variance

var(rowMeans(sim_exp))
## [1] 0.6069175
  ##variance of the 1000 means,each derived from a run of 40

(1/lambda)^2/40
## [1] 0.625
  ##theoretical variance which is (1/lambda)^2/n 

The variance of the sampling distribution, 0.6069 is very close the theoretical one, 0.625.

Normality of simulated means

Demonstrate normality with plot

 set.seed(199)
 par(mfrow=c(1,2))
 hist(rexp(1000,.2),col="red",main=
        "Distribution of 1000
      Random Exponentials")
 hist(rowMeans(sim_exp),col="green",main="Distribution of Sample Means")

Using several sample means (1000 in this case), see green histogram, a much more bell-shaped or normal distribution is achieved versus using a single sample of exponentials, see red histogram. These histograms, derived from a non-normal distibution, the exponential, demonstrate the central limit theorem: the distribution of the average of a large number of samples will be approximately normal even if they came from a non-normal distribution.

 summary(colMeans(sim_exp))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.548   4.872   5.042   5.006   5.112   5.357

The numerical summary agrees with the green histogram’s appearance of normality as the mean and median are very close.

Computer Operating System and R Version used to run code above:

R version 3.2.3 (2015-12-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1