Overview

This report describes a brief experiment to evaluate the relationship between a exponential distribution and the Central Limit Theorem. In this brief experiment, the data was generated by working with 1000 averages of exponential distributions (n = 40) and lambda = 0.2. These values are part of the project’s instructions.

Simulations

The code below performs the simulation of the 1000 averages of the 40 exponentials with a fixed seed. A table is produced as output, which contains the Simulated and Theoretical means. The package pander was used to produce the table.

library(pander)
Sim <- 1000
Vals.sim <- 40
lambda <- 0.2
set.seed(640)
Data.Sim <- t(replicate(Sim, rexp(Vals.sim, lambda)))
d.frame <- data.frame(Mean=c(mean(rowMeans(Data.Sim)), 1/lambda),
                 Variance=c(mean(apply(Data.Sim, 1, var)), 1/lambda^2))
rownames(d.frame) <- c("Simulated", "Theoretical")
pander(d.frame, round=2)
  Mean Variance
Simulated 5.03 24.74
Theoretical 5 25

Plot to compare the two distributions (Empirical vs Normal) produced as result of the simulation performed above.

library(ggplot2)
Vals.Means <- rowMeans(Data.Sim)
Z.mean <- (Vals.Means - mean(Vals.Means)) / sd(Vals.Means)
qplot(Z.mean, geom = "blank") +
    geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density') +  
    stat_function(fun = dnorm, aes(colour = 'Normal')) +                       
    geom_histogram(aes(y = ..density..), alpha = 0.4, binwidth=.35) +
    geom_vline(xintercept=0, colour="blue", linetype="longdash") +
    scale_colour_manual(name = 'Density', values = c('brown', 'black')) +
    ylab("Density") + xlab("Z") + ggtitle("Mean Values of the Distribution") +
    theme_bw() + theme(legend.position = c(0.80, 0.80))

The code below 1) first verifies that the mean is within the confidence interval, 2) then estimates each simulations’ coverage, and 3) finally performs a simulation and calculates its mean.

set.seed(640)
lambda <- 0.2
## 1) 
withinconfint <- function(lambda) {
                events <- rexp(1000, lambda)
                stderr <- sd(events)/sqrt(1000)
                l1 <- mean(events) - 1.96 * stderr
                l2 <- mean(events) + 1.96 * stderr
                (l1 < 1/lambda & l2 > 1/lambda)
        }
## 2) 
Sim.coverage <- function(lambda) {
    Vals.coverage <- replicate(100, withinconfint(lambda))
    mean(Vals.coverage)
}
## 3) 
repsim <- replicate(100, Sim.coverage(lambda))
paste("Mean of the Simulation is", mean(repsim))
## [1] "Mean of the Simulation is 0.9496"

94.84% of the time the confidence interval contains the value, which supports the expectation (close to the 95% expected).