Coursera Statistical Inference

Overview:

This report shows the exponential distribution of averages of 40 exponentials from a thousand simulations in order to compare them with the Central Limit Theorem. I illustrate the properties of the distribution of the mean of 40 exponentials by:

Showing the sample mean and compare it to the theoretical mean of the distribution.
Showing how variable the sample is (via variance) and comparing it to the theoretical variance of the distribution.
Showing that the distribution is approximately normal.

Simulation:

set.seed(459)

# create 1000 randomly generated means for the exponential distribution with lambda as rate.
# The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter and equal to 0.2. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.
lambda=.2
mymean=1/lambda
mysd=1/lambda
n=40
data = NULL
for (i in 1 : 1000) data = c(data, mean(rexp(n, lambda)))
summary(data) #mean is just below mymean of 1/lambda=5

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.792   4.427   4.923   4.981   5.477   7.717

Sample Mean versus Theoretical Mean:

The simulated mean is close to the theoretical mean of 5.

#compare Theoretical Mean to Simulated Mean
theoretical_mean <- 1/lambda
print (paste("Theoretical Mean Distribution = ", theoretical_mean))

## [1] "Theoretical Mean Distribution =  5"

print (paste("Simulated Mean Distribution = ", mean(data)))

## [1] "Simulated Mean Distribution =  4.98131322933715"

Sample Variance versus Theoretical Variance:

The simulated variance is close to the theoretical variance of 0.625. Performing a t-test results in t=2261.7 and fail to reject the null hypotheis that true difference in means of variances equal zero.

#compare Theoretical Variance to Simulated Variance
theoretical_variance <- (1/lambda)^2/n;
print (paste("Theoretical Variance = ", theoretical_variance))

## [1] "Theoretical Variance =  0.625"

print (paste("Simulated Variance = ", var(data)))

## [1] "Simulated Variance =  0.628150010288495"

# Fail to reject null hypothesis that true difference in means=0
t.test(x=c(var(data), var(data)+.00001), y=c(theoretical_variance,theoretical_variance+.00001))

## 
##  Welch Two Sample t-test
## 
## data:  c(var(data), var(data) + 1e-05) and c(theoretical_variance, theoretical_variance + 1e-05)
## t = 445.48, df = 2, p-value = 5.039e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.003119586 0.003180435
## sample estimates:
## mean of x mean of y 
##  0.628155  0.625005

Distribution:

The histgram shows the distribution of simulated means closely follows the normal curve (overlaid in blue). The Normal Q-Q Plot shows evidence of normality.

hist(data, density=25, breaks=20, prob=TRUE, xlab="Means", main="Distribution of Simulated Means with Normal Curve Overlay")
    curve(dnorm(x, mean=mean(data), sd=sd(data)),
    col="blue", lwd=2, add=TRUE, yaxt="n")
    abline(v=5, col="red", lwd=4)

qqnorm(data); qqline(data) #shows normality