by Mughundhan Chandrasekar - 14/May/2016

Project assignment:

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s. You should 1. Show where the distribution is centered at and compare it to the theoretical center of the distribution. 2. Show how variable it is and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal. 4. Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96Sn???.

Simulate the necessary data

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
nosim <- 1000;
        n <- 40;
        lambda <- 0.2;
        ## create a matrix with 1000 simulations each with 40 rexp generated 
        ## values
        set.seed(234)
        simdata <- matrix(rexp(nosim * n, rate=lambda), nosim);
        ## for each of the 1000 simulations calculate the average of across the 
        ## 40 values
        simdata_means <- apply(simdata, 1, mean);
        hist(simdata_means, col="yellow");

1. Show where the distribution is centered at and compare it to the theoretical center of the distribution

theoretical_mean <- 1/lambda;
        print (paste("Theoretical center of the distribution = ", 
                     theoretical_mean));
## [1] "Theoretical center of the distribution =  5"
print (paste("Actual center of the distribution based on the simulations 
                     = ", round(mean(simdata_means), 3)));
## [1] "Actual center of the distribution based on the simulations \n                     =  5.002"

2. Show how variable it is and compare it to the theoretical variance of the distribution

theoretical_var <- (1/lambda)^2/n;
        theoretical_sd <- 1/lambda/sqrt(n);
        print (paste("Theoretical variance = ", theoretical_var));
## [1] "Theoretical variance =  0.625"
print (paste("Actual variance based on the simulations = ", 
                     round(var(simdata_means), 3)));
## [1] "Actual variance based on the simulations =  0.663"
print (paste("Theoretical standard deviation = ", 
                     round(theoretical_sd, 3)));
## [1] "Theoretical standard deviation =  0.791"
print (paste("Actual standard deviation based on the simulations = ", 
                     round(sd(simdata_means), 3)));
## [1] "Actual standard deviation based on the simulations =  0.814"

3. Show that the distribution is approximately normal

plotdata <- data.frame(simdata_means);
        m <- ggplot(plotdata, aes(x = simdata_means)); 
        m <- m + geom_histogram(aes(y=..density..), colour="black", 
                                fill = "yellow")
        m + geom_density(colour="green", size=1);
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qqnorm(simdata_means); qqline(simdata_means)

Due to the central limit theorem, the averages of samples follow normal distribution. The figure above also shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values. Also, the q-q plot suggests the distribution of averages of 40 exponentials is very close to a normal distribution.