Title: Statistical Inference Project

Author: Farhad.M
Date: Sep. 23, 2015

Overview

This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem via simulation. The investigation includes simulation of collections of exponential distributions. The sample and theoritical mean and variance are compared to those of theoretical, and it is finally shown that the distribution is approximately normal.

Simulations

The exponential distribution is simulated in R with rexp(n, lambda) where lambda is the rate parameter, where both the mean and standard deviation of the distribution are 1/lambda. Lambda is set to 0.2 for all of the simulations in which the distribution of averages of 40 exponentials with 1000 simulations are investigated. The R code for initiating simulation is as follows.

set.seed(12321)
lambda<-0.2
n_dist<-40
n_sim<-10000
means = NULL
vars= NULL

for (i in 1 : n_sim) means = c(means, mean(rexp(n_dist, lambda)))
for (i in 1 : n_sim) vars = c(vars, var(rexp(n_dist, lambda)))

Sample Mean versus Theoretical Mean

To answer the first question the mean of 40 random exponential distribution for 1000 collections is plotted as follows (The code can be found in Appendix). Also, the sample and theoretical means are reported for the comparison.

##      Theoretical Sample
## Mean           5   4.99

Conclusion: Referring to the theoretical and sample mean values, the two values very close.

Sample Variance versus Theoretical Variance

To answer the second question, the mean of variances of 40 random exponential distribution for 1000 simulation runs is plotted as follows (The code can be found in Appendix).

##          Theoretical Sample
## Variance          25  24.88

Conclusion: It is observable in the graph that the sample variance is very close to the theoritical variance.

Distribution

To answer the third question for investigating whether the distibution looks approximately like normal distribution, two different simulations are run and corresponding graphs are plotted. The first graph shows the distrbution of 1000 random expnoentilas, and the second graph shows the everage of 1000 simulations of 40 exponentials as follws.

Conclution: As it can be seen in the last two diagrams, both graphs have similar sample-means around 5. However, the average of 40 exponentials for a large number of simulations (corresponding to the right-side plot) looks like normal distribution while the collection of random exponentials (left-side graph) looks different from a normal distribution.

This simulation confirms the Central Limit Theorem (CLT), as the distribution of averages of a distribution behaves like a normal distibution for the sufficiently large smaple sizes.

Appendix

R Source Code

Simulation setup:

set.seed(12321)   
lambda<-0.2   
n_dist<-40   
n_sim<-10000  
means = NULL   
vars= NULL   

for (i in 1 : n_sim) means = c(means, mean(rexp(n_dist, lambda)))   
for (i in 1 : n_sim) vars = c(vars, var(rexp(n_dist, lambda)))

Sample mean versus theoretical mean

sample_mean <- round(mean(means), 2)
theoretical_mean <- 1/lambda
hist(means, main="Distribution of 1000 averages of 40 exponential distributions",  xlab= "Mean", ylab="Density", breaks= 50, col="cyan4")
abline(v= sample_mean, col="red",lwd= 5)

matrix(data=c(theoretical_mean, sample_mean), nrow=1, ncol=2, byrow=TRUE, dimnames=list(c("Mean"), c("Theoretical", "Sample")))

Sample variance versus theoretical vriance

sample_var <- round(mean(vars), 2)
theoretical_var <- (1/lambda)^2 

hist(vars, main="Distribution of 1000 variances of 40 exponential distributions", xlab= "Mean of Variances", ylab="Density", breaks= 50, col="blue4")
abline(v= sample_var, col="red",lwd= 5)

matrix(data=c(theoretical_var, sample_var), nrow=1, ncol=2, byrow=TRUE, dimnames=list(c("Variance"), c("Theoretical", "Sample")))

Comparing average of 40 exponentials with a large collection of exponentials

set.seed(12321)
re<- rexp(n_sim, lambda)
m<- mean(re)
par(mfrow=c(1,2))
 
hist(re, main="Distribution of 1000 exponentials",  xlab= "Random Exponentials", ylab="Density", breaks= 1000, col="Green3")
abline(v= m, col="red",lwd= 5)

hg<- hist(means,breaks= 50,col="grey",main="", xlab= "Distribution of 1000 collections of avergaes of 40 exponentials", ylab="Density")
x_fit<-seq(min(means),max(means),length= 50)
y_fit<-dnorm(x_fit,mean=mean(means), sd=sd(means))
y_fit<-y_fit*diff(hg$mids[1:2])*length(means)
abline(v=mean(means),col="red",lwd=2)
lines(x_fit, y_fit,col="blue",lwd=4)