Analytical Questions

How does the sample mean of exponentials compare to the expected mean of exponentials, both with lambda=.2?
Is the distribution of the sample means of exponentials approximately normal?

The procedure for answering this question includes:

Define the expected mean of an exponential distribution with lambda=.2.
Generate 40 exponentials from the exponential distribution with the same lambda as above.
Take the mean of the 40 exponentials and store the value in a vector.
Standardize the 1000 means of the 40 exponentials using the calculation: (x_bar-mu)/standard error of x_bar
Calculate the mean of the 1000 means and compare the value to theoretical mean using a z-test and the expected 95% confidence interval.
Plot the histogram of the distribution of the 40 point sample averages, the normal distribution given by the empirical mean and standard error of the means, and the normal distribution given by the theoretical mean and standard error of the mean.
Plot the confidence interval coverage and calculate the KS statistic of the distribution of the means.

The r code below carries out the above procedure and the results and analysis are below the code.

mu<-1/.2
mns=NULL
for(i in 1:1000) mns=c(mns,mean(rexp(40,.2)))
vars<-var(mns)
x_bar<-mean(mns)
theor_ci<-mu+c(1,-1)*qnorm(.975)*(5/sqrt(40))

The theoretical mean-also known as the center-of the distribution described above is:

print(mu)

## [1] 5

The empirical mean of the distribution of sample means (based on a simulation of 1000 means of 40 expoentials) is:

print(x_bar)

## [1] 4.988352

This is the center of the empirical distribution.

A t-test between x_bar and mu yields the following results:

t.test(mns,alternative="two.sided",m=5,conf.level=.95)

## 
##  One Sample t-test
## 
## data:  mns
## t = -0.4566, df = 999, p-value = 0.6481
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  4.938290 5.038414
## sample estimates:
## mean of x 
##  4.988352

From the results above we fail to reject the null hypothesis that mu=5, that x_bar is not statistically different from mu, and x_bar is within the 95% confidence interval. Thus, we can conclude that mean of the sample averages of the exponential distribution is a consistent estimator of the population mean.

hist(mns,main="Comparison between Distribution \nof Sample Means and Normal Curves",xlab="Sample Means",ylab="Density",prob=T,ylim=c(0,.6))
curve(dnorm(x,mu,5/sqrt(40)),col="red",add=T)
curve(dnorm(x,mean(mns),sqrt(var(mns))),add=T,col="blue")
abline(v=5,col="red")
legend("topright",c("Sample Means","Normal","Empirical","E[x]"),col=c("black","red","blue","red"),lty=1)

A review of the figure above shows how closely the distributions based on theoretical and estimated parameters are to eachother.

The standard error of the sample means is:

print(sqrt(var(mns)))

## [1] 0.8067409

compared to theoretical standard error of:

5/sqrt(40)

## [1] 0.7905694

lvals<-seq(0,10,by=.5)
coverage<-sapply(lvals,function(l){
  lhats<-mns
  ll<-lhats-qnorm(.975)*sqrt((1/.2^2)/40)
  ul<-lhats+qnorm(.975)*sqrt((1/.2^2)/40)
  mean(ll <l & ul > l )
})

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.1.3

qplot(lvals,coverage)+geom_hline(yintercept=.95)+ylab("Percent Coverage")+xlab("Lambda Values")+ggtitle("Coverage of 95% Confidence Interval of Lambda_Hat")+geom_vline(xintercept=c(3.45,6.55),color="red")

The plot above provides more evidence that the distribution of sample means follows a normal distribution. However, the level of coverage expected is not met. From visual examination, the confidence interval coverage of lambda_hat seems to only be 68%

How does the variance of the means of exponentials compare to the expected variance of exponentials?

The procedure for answering this question inovles the following:

Review the figure above for insight into variances from the distributions.
Define the expected variance of the given exponential distribution.
Calculate the variance of the distribution of the sample means.
Generate a distribution of sample variances from sample exponentials and repeat that 1000 times.
Compare the distribution of variances to theoretical distribution of variances using the same method as for the means.

Moving on to the theoretical and empirical mean sample variances, the expected variance is 1/lambda^2, or 1/.2^2=25. To get the empiricial sample variances, we employ a similar method as the one use to estimate the population mean above, but calculating the variance instead of the mean. Doing this only required substituting the population variance in for the population mean as the parameter to estimate.

The r code below carries out the procedure for estimating variance and the results and analysis are below the code.

sigma<-1/.2^2
vars=NULL
for(i in 1:1000) vars=c(vars,var(rexp(40,.2)))
s_bar<-mean(vars)
theor_ci<-sigma+c(1,-1)*qnorm(.975)*sqrt(25)/sqrt(40)

The empirical variance is:

print(s_bar)

## [1] 25.07329

A t-test between s_bar and sigma yields the following results:

t.test(vars,alternative="two.sided",m=25,conf.level=.95)

## 
##  One Sample t-test
## 
## data:  vars
## t = 0.2258, df = 999, p-value = 0.8214
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
##  24.43625 25.71033
## sample estimates:
## mean of x 
##  25.07329

The figure below compares the distribution of sample variances to normal distribution with mu=sigma and population standard error of 5/sqrt(40). The figure belows shows the standardized distribution of the sample variances.

hist(vars,main="Comparison between Distribution \nof Sample Variances and Normal Distribution",xlab="Sample Variances",ylab="Density",prob=T,ylim=c(0,.6))
curve(dnorm(x,25,5/sqrt(40)),col="red",add=T)
curve(dnorm(x,mean(vars),sd(vars)/sqrt(40)),add=T,col="blue")
abline(v=25,col="red")
legend("topright",c("Sample Means","Normal","Empirical","E[x]"),col=c("black","red","blue","red"),lty=1)

The figure shows the poor fit of the distribution of the sample variances with the normal curve of mu=sigma and standard error of the population, although the distribution is centered around 25, close to the population variance. However, the standard error of the sample variances is much higher compared to the expected standard error, which explains the flatter curve. A simulation of a higher order could solve this.

Examination of Theoretical and Empirical Means and Variances of Exponential Distributions

Assignment Purpose

Analytical Questions