Central Limit Theorem, Explanatory Project

In this experiment:
-We will investigate the exponential distribution in R and compare it with the Central Limit Theorem.
-We will investigate the distribution of averages of 40 exponentials in thousand simulations extracted from a parent population of exponentials having mean and variance as 1/lambda, lambda=0.2.

Creating sample mean vector -For creating each sample, we take 40 R-estimates using rexp(n,lambda). This way we create 1000 samples.

library(ggplot2)
n=40;lambda=0.2;s<-1/lambda;mean<-1/lambda;samplemean=NULL;simVector<-NULL;meanvector<-NULL
set.seed(123)
for (i in 1:1000)
{
    exp=NULL;mns=NULL
    rexp(40,lambda)->exp    
    mean(exp)->mns
    var(exp)->vars
    temp<-(mns-mean)/(s/sqrt(n))
    samplemean<-c(samplemean,temp)      #clt distribution vector
    simVector<-c(simVector,exp)         #40,000 estimates vector
    meanvector<-c(meanvector,mns)       #vector of sample means
}

1. Calculating the means

1/lambda                 #mean of parent population

## [1] 5

mean(simVector)          #mean of the 40,000 estimates

## [1] 5.011911

mean(meanvector)         #mean of sample means

## [1] 5.011911

We observe that the mean of the sample means and the 40,000 estimates are equal whereas they are almost equal to the parent population mean.
Therefore we say that the estimator is consistent due to this convergence.
2. Calculating Variances

n<-40
1/lambda^2              #variance of parent population

## [1] 25

var(simVector)          #variance of 40,000 the estimates

## [1] 24.8226

var(meanvector)         #variance of sample means

## [1] 0.6004928

We observe that the variance of the 40,000 estimates is almost equal to the parent population variance.
Whereas the variance of the sample means is much smaller as compared to those.

Seeing it on graph

g1<-ggplot(as.data.frame(meanvector))
g1+geom_histogram(aes(x=meanvector,y=..density..),fill="steelblue",colour="black",binwidth=.1)+geom_density(aes(x=meanvector,y=..density..),size=1)+scale_x_continuous(breaks=c(mean(meanvector)-2,mean(meanvector)-1,mean(meanvector),mean(meanvector)+1,mean(meanvector)+2))+labs(x="Sample Means",y="Density",title="Distribution of Sample Means")+geom_vline(xintercept=mean(meanvector),size=1.2)

-By plotting it on graph, we see that the mean of the sample means,ie. the estimator nearly coincides with the parent population mean(5). The distribution is centered at 5.011911, the mean. -We also see that the distribution is almost normal.
In this step we also derive property of sample variance. We know that, according to central limit theorem, sample variance=population variance/sample size

popvariance<-1/lambda^2
var(meanvector)             #estimate of variance of the sample

## [1] 0.6004928

popvariance/n               #variance of sample

## [1] 0.625

We observe that both the variances are almost equal, therefore we say that this estimator is consistent due to this convergence.

Applying Central Limit Theorem We observe that the distribution is almost a standard normal distribution. Here we use the samplemean vector which has 1000 elements, each equal to the mean of each sample minus the population mean, 5 ,divided by the standard error.

g2<-ggplot(as.data.frame(samplemean))
g2+geom_histogram(aes(x=samplemean,y=..density..),fill="steelblue",colour="black",binwidth=.1)+geom_density(aes(x=samplemean,y=..density..),size=1)+scale_x_continuous(breaks=c(min(samplemean),-2,-1,0,1,2,max(samplemean)))+geom_vline(show_guide = T,xintercept=mean(samplemean),size=1.2)+labs(x="CLT component",y="Density",title="Distribution Using CLT")

The distribution obtained using CLT is centered at 0.01506671, ie. the mean when rounded up is 0. The standard deviation is 0.9607885, on rounding is 1.

var(samplemean)         #variance of the distribution obtained by clt

## [1] 0.9607885

mean(samplemean)        #mean of the distribution obtained by clt

## [1] 0.01506671

Central Limit Theorem, Explanatory Project

Prepared by: Neehar Mukne

Date: 2015-08-28