Introduction

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Our results will: Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. We shall: 1. Show the sample mean and compare it to the theoretical mean of the distribution. 2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal.

Simulation

Initialization of global parameters

    library(ggplot2)
    set.seed(40000)
    lambda = 0.2
    n = 40 
    nLarge = 1000
    nsims = 1000
    mns <- NULL
    mnsLarge <- NULL

We will do some simulations for exponential result. We will calculate different results and compare them agains theoretical results

Do the simulations

    for (i in 1 : nsims) mns <- c(mns, mean(rexp(n,lambda)))
    
    for (i in 1 : nsims) mnsLarge <- c(mnsLarge, mean(rexp(nLarge,lambda)))

    hist(mns,col="red",main="Distribution of Means for exponential")

Means comparison

Calculated mean from simulated

## [1] 5.037539

Theoretical mean

## [1] 5

Theoretical and simulated mean are close (5 and 5.03)

Variance comparison

Calculated variance from simulated

## [1] 0.6524401

Theoretical variance

## [1] 0.625

Theoretical and simulated variance are close (0.625 and 0.652)

Comparison to Normal Distribution

We are going to compare Normal Distribution agains Simulated data. First with n = 40 and second with n = 1000 (Red line will be the normal distribution)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plotdata <- data.frame(mnsLarge)
plot1 <- ggplot(plotdata,aes(x = mnsLarge))
plot1 <- plot1 +geom_histogram(aes(y=..density..), colour="black",fill="yellow")
plot1<-plot1+labs(title="Normal Distribution Comparison for n=1000", y="Density")
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=1/lambda, sd=sqrt((1/lambda)^2/nLarge)),color = "red", size = 1.0)
plot1<-plot1 +stat_function(fun=dnorm,args=list( mean=mean(mnsLarge), sd=sqrt(var(mnsLarge))),color = "black", size = 1.0)
print(plot1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Approximation will be better for larger n (n=1000 in this case)