Overview

In this project,the following is reported- 1.Calculating the sample mean and comparing it to the theoretical mean of the distribution. 2.Finding out variable the sample is and comparing it to the theoretical variance of the distribution. 3.Showing that the distribution is approximately normal.

Simulations

For the experiment,we’ll work on exponential random variables generated through R. We start by setting the seed so as to make the research reproducible.We use Bootstrapping to create 1000 simulations ,each sampling 40 exponential variables randomly with replacement out of a single sample.The resampled data is meant to to simulate the population and is stored in the variable resamples.

set.seed(7)
B<-1000
lambda<-0.2
x<-rexp(40,0.2)
n<-length(x)
resamples<-matrix(sample(x,n*B,replace=TRUE),B,n)

Sample Mean vs Theoretical Mean

We find out means of all 1000 samples,then take out the average of that data ,calling it the sample mean and comparing it with the theoretical mean of the population. Theoretical Mean is 1/lambda ,lambda being a parameter of the exponential distribution. Taking 0.2 as lambda,clearly the theoretical mean equals 5.From our simulated data,we see that sample mean turns out to be 4.97.Hence we can safely say that sample mean is an unbiased estimator of population mean.

means<-apply(resamples,1,mean)
MeanEstimate<-mean(means)
TheoreticalMean<-1/lambda
g<-ggplot(data.frame(means=means),aes(x=means))
g<-g+geom_histogram(colour="lightblue",fill="blue",binwidth=0.3)
g<-g+labs(title="Sample Means Distribution",subtitle=paste("Dotted Line represents Actual Sample Mean=",round(MeanEstimate,2),"while Theoretical Mean=" ,TheoreticalMean))
g<-g+geom_vline(xintercept=MeanEstimate,lwd=1.5,col="lightblue",lty=2)
g

Sample Variance vs Theoretical Variance (Distribution)

We find out variance of all 1000 samples,then take out the average of that data ,calling it the sample variance and comparing it with the theoretical variance of the population. Theoretical Variance is (1/lambda^2*n). Taking 0.2 as lambda,clearly the theoretical variance equals 0.625.From our simulated data,we see that sample variance turns out to be 0.428.So the theoretical population is more wide spread than simulated population. Also from the plot,it can be clearly seen that the distribution looks Gaussian or Normal.

SDEstimate<-sd(means)
TheoreticalVar<-1/(lambda^2*n)
TheoreticalSD<-sqrt(TheoreticalVar)
d<-ggplot(data.frame(means=means),aes(x=means))
d<-d+stat_function(fun=dnorm,args=list(mean=MeanEstimate,sd=SDEstimate),size=1.2,col="blue")
d<-d+stat_function(fun=dnorm,args=list(mean=TheoreticalMean,sd=TheoreticalSD),size=1.2,col="red")
d<-d+labs(x="Means",y="Density",title="Spread of Sampling Distribution",subtitle="Blue Curve-Sample Variance=0.428,Red Curve-Theoretical Variance=0.625")  
d