This assignment compares the theoretical and empirical means and variances of the exponential distribution with lambda given as .2. The comparison requires an understanding and application of the Central Limit Theorom. Applying the CLT allows the comparison between the mean of sample averages and the expected mean, as well as the variance of the sample averages and the expected variance.
The r code below carries out the above procedure and the results and analysis are below the code.
mu<-1/.2
mns=NULL
for(i in 1:1000) mns=c(mns,mean(rexp(40,.2)))
vars<-var(mns)
x_bar<-mean(mns)
theor_ci<-mu+c(1,-1)*qnorm(.975)*(5/sqrt(40))
The theoretical mean-also known as the center-of the distribution described above is:
print(mu)
## [1] 5
The empirical mean of the distribution of sample means (based on a simulation of 1000 means of 40 expoentials) is:
print(x_bar)
## [1] 4.988352
t.test(mns,alternative="two.sided",m=5,conf.level=.95)
##
## One Sample t-test
##
## data: mns
## t = -0.4566, df = 999, p-value = 0.6481
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
## 4.938290 5.038414
## sample estimates:
## mean of x
## 4.988352
From the results above we fail to reject the null hypothesis that mu=5, that x_bar is not statistically different from mu, and x_bar is within the 95% confidence interval. Thus, we can conclude that mean of the sample averages of the exponential distribution is a consistent estimator of the population mean.
The figure below shows a comparison between the empirical distribution of sample means and normal probability density functions given by the theoretical and empirical point estimates. The distribution of sample means is standardized.hist(mns,main="Comparison between Distribution \nof Sample Means and Normal Curves",xlab="Sample Means",ylab="Density",prob=T,ylim=c(0,.6))
curve(dnorm(x,mu,5/sqrt(40)),col="red",add=T)
curve(dnorm(x,mean(mns),sqrt(var(mns))),add=T,col="blue")
abline(v=5,col="red")
legend("topright",c("Sample Means","Normal","Empirical","E[x]"),col=c("black","red","blue","red"),lty=1)
A review of the figure above shows how closely the distributions based on theoretical and estimated parameters are to eachother.
The standard error of the sample means is:
print(sqrt(var(mns)))
## [1] 0.8067409
compared to theoretical standard error of:
5/sqrt(40)
## [1] 0.7905694
The standard error was used here in place of the variance because we are concerned with the variability in averages of means. And remember, the variance of sample mean is equal to sigma^2/n. That is, var(mns)=sigma^2/n. Therefore, the standard error of the mean is the square root of var(mns). The standard errors of the sample mean and population mean are quite comparable.
lvals<-seq(0,10,by=.5)
coverage<-sapply(lvals,function(l){
lhats<-mns
ll<-lhats-qnorm(.975)*sqrt((1/.2^2)/40)
ul<-lhats+qnorm(.975)*sqrt((1/.2^2)/40)
mean(ll <l & ul > l )
})
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
qplot(lvals,coverage)+geom_hline(yintercept=.95)+ylab("Percent Coverage")+xlab("Lambda Values")+ggtitle("Coverage of 95% Confidence Interval of Lambda_Hat")+geom_vline(xintercept=c(3.45,6.55),color="red")
The r code below carries out the procedure for estimating variance and the results and analysis are below the code.
sigma<-1/.2^2
vars=NULL
for(i in 1:1000) vars=c(vars,var(rexp(40,.2)))
s_bar<-mean(vars)
theor_ci<-sigma+c(1,-1)*qnorm(.975)*sqrt(25)/sqrt(40)
The empirical variance is:
print(s_bar)
## [1] 25.07329
This represents the center of the distribution of the average of the sample variances, and is the estimate of the population variance.
t.test(vars,alternative="two.sided",m=25,conf.level=.95)
##
## One Sample t-test
##
## data: vars
## t = 0.2258, df = 999, p-value = 0.8214
## alternative hypothesis: true mean is not equal to 25
## 95 percent confidence interval:
## 24.43625 25.71033
## sample estimates:
## mean of x
## 25.07329
The results above lead to the conclusion of failing to reject the null hypothesis and conclude that s_bar is not statistically different from sigma. This conclusion supports that the variance estimate using means of sample variances is a consistent estimator of the population variance, sigma=25.
hist(vars,main="Comparison between Distribution \nof Sample Variances and Normal Distribution",xlab="Sample Variances",ylab="Density",prob=T,ylim=c(0,.6))
curve(dnorm(x,25,5/sqrt(40)),col="red",add=T)
curve(dnorm(x,mean(vars),sd(vars)/sqrt(40)),add=T,col="blue")
abline(v=25,col="red")
legend("topright",c("Sample Means","Normal","Empirical","E[x]"),col=c("black","red","blue","red"),lty=1)