by Mughundhan Chandrasekar - 14/May/2016
The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s. You should 1. Show where the distribution is centered at and compare it to the theoretical center of the distribution. 2. Show how variable it is and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal. 4. Evaluate the coverage of the confidence interval for 1/lambda: X¯±1.96Sn???.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
nosim <- 1000;
n <- 40;
lambda <- 0.2;
## create a matrix with 1000 simulations each with 40 rexp generated
## values
set.seed(234)
simdata <- matrix(rexp(nosim * n, rate=lambda), nosim);
## for each of the 1000 simulations calculate the average of across the
## 40 values
simdata_means <- apply(simdata, 1, mean);
hist(simdata_means, col="yellow");
theoretical_mean <- 1/lambda;
print (paste("Theoretical center of the distribution = ",
theoretical_mean));
## [1] "Theoretical center of the distribution = 5"
print (paste("Actual center of the distribution based on the simulations
= ", round(mean(simdata_means), 3)));
## [1] "Actual center of the distribution based on the simulations \n = 5.002"
theoretical_var <- (1/lambda)^2/n;
theoretical_sd <- 1/lambda/sqrt(n);
print (paste("Theoretical variance = ", theoretical_var));
## [1] "Theoretical variance = 0.625"
print (paste("Actual variance based on the simulations = ",
round(var(simdata_means), 3)));
## [1] "Actual variance based on the simulations = 0.663"
print (paste("Theoretical standard deviation = ",
round(theoretical_sd, 3)));
## [1] "Theoretical standard deviation = 0.791"
print (paste("Actual standard deviation based on the simulations = ",
round(sd(simdata_means), 3)));
## [1] "Actual standard deviation based on the simulations = 0.814"
plotdata <- data.frame(simdata_means);
m <- ggplot(plotdata, aes(x = simdata_means));
m <- m + geom_histogram(aes(y=..density..), colour="black",
fill = "yellow")
m + geom_density(colour="green", size=1);
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qqnorm(simdata_means); qqline(simdata_means)
Due to the central limit theorem, the averages of samples follow normal distribution. The figure above also shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values. Also, the q-q plot suggests the distribution of averages of 40 exponentials is very close to a normal distribution.