Examination of Theoretical and Empirical Means and Variances of Exponential Distributions

Aaron Chandler

Assignment Purpose

This assignment compares the theoretical and empirical means and variances of the exponential distribution with lambda given as .2. The comparison requires an understanding and application of the Central Limit Theorom. Applying the CLT allows for the comparison between the mean of sample averages and the expected mean, as well as the variance of the sample averages and the expected variance.

Analytical Questions

mu <- 1/0.2
mns = NULL
for (i in 1:1000) mns = c(mns, mean(rexp(40, 0.2)))
vars <- var(mns)
x_bar <- mean(mns)
theor_ci <- mu + c(1, -1) * qnorm(0.975) * (5/sqrt(40))
print(mu)
## [1] 5
print(x_bar)
## [1] 4.983
t.test(mns, alternative = "two.sided", m = 5, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  mns 
## t = -0.691, df = 999, p-value = 0.4897
## alternative hypothesis: true mean is not equal to 5 
## 95 percent confidence interval:
##  4.933 5.032 
## sample estimates:
## mean of x 
##     4.983

From the results above we fail to reject the null hypothesis that mu=5, that x_bar is not statistically different from mu, and x_bar is within the 95% confidence interval. Thus, we can conclude that mean of the sample averages of the exponential distribution is a consistent estimator of the population mean.

The figure below shows a comparison between the empirical distribution of sample means and normal probability density functions given by the theoretical and empirical point estimates. The distribution of sample means is standardized.

hist(mns, main = "Comparison between Distribution \nof Sample Means and Normal Curves", 
    xlab = "Sample Means", ylab = "Density", prob = T, ylim = c(0, 0.6))
curve(dnorm(x, mu, 5/sqrt(40)), col = "red", add = T)
curve(dnorm(x, mean(mns), sqrt(var(mns))), add = T, col = "blue")
abline(v = 5, col = "red")
legend("topright", c("Sample Means", "Normal", "Empirical", "E[x]"), col = c("black", 
    "red", "blue", "red"), lty = 1)

plot of chunk unnamed-chunk-5

print(sqrt(var(mns)))
## [1] 0.7937
5/sqrt(40)
## [1] 0.7906
lvals <- seq(0, 10, by = 0.5)
coverage <- sapply(lvals, function(l) {
    lhats <- mns
    ll <- lhats - qnorm(0.975) * sqrt((1/0.2^2)/40)
    ul <- lhats + qnorm(0.975) * sqrt((1/0.2^2)/40)
    mean(ll < l & ul > l)
})

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 2.15.3
qplot(lvals, coverage) + geom_hline(yintercept = 0.95) + ylab("Percent Coverage") + 
    xlab("Lambda Values") + ggtitle("Coverage of 95% Confidence Interval of Lambda_Hat") + 
    geom_vline(xintercept = c(3.45, 6.55), color = "red")

plot of chunk unnamed-chunk-8

sigma <- 1/0.2^2
vars = NULL
for (i in 1:1000) vars = c(vars, var(rexp(40, 0.2)))
s_bar <- mean(vars)
theor_ci <- sigma + c(1, -1) * qnorm(0.975) * sqrt(25)/sqrt(40)
print(s_bar)
## [1] 25.08
t.test(vars, alternative = "two.sided", m = 25, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  vars 
## t = 0.212, df = 999, p-value = 0.8322
## alternative hypothesis: true mean is not equal to 25 
## 95 percent confidence interval:
##  24.37 25.78 
## sample estimates:
## mean of x 
##     25.08
hist(vars, main = "Comparison between Distribution \nof Sample Variances and Normal Distribution", 
    xlab = "Sample Variances", ylab = "Density", prob = T, ylim = c(0, 0.6))
curve(dnorm(x, 25, 5/sqrt(40)), col = "red", add = T)
curve(dnorm(x, mean(vars), sd(vars)/sqrt(40)), add = T, col = "blue")
abline(v = 25, col = "red")
legend("topright", c("Sample Means", "Normal", "Empirical", "E[x]"), col = c("black", 
    "red", "blue", "red"), lty = 1)

plot of chunk unnamed-chunk-12