This course project report looks at a series of exponential distribution iterations, and compares the result set to a normal distribution and the theoretical mean and variance.
This report is for a Coursera Class project - Statistical Inference (Part 1). Per the project requirements, the below solution maintains the following:
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:
1. Show the sample mean and compare it to the theoretical mean of the distribution.
2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
3. Show that the distribution is approximately normal.
Simulating a thousand sets of 40 exponentials using lambda and calculate the mean for each set. Storing the data in a data frame to use it in furhter analysis.
#setting working directory
setwd("C:/Data/devtools/Git/StatsInfer_PeerAssessment1")
set.seed(1000)
lambda <- 0.2
nex <- 40
sims <- 1000
#data frame with sumulated data
dfrm <- NULL
dfrm <- data.frame(mean=numeric(sims))
# Getting the mean of simulated data
for (i in 1:sims) {
explmda <- rexp(nex, lambda)
dfrm[i,1] <- mean(explmda)
}
Calculating the mean for Theoretical and actula distributions. Using R function to calculate the Theoretical Mean
#Theoritical mean
thmn <-1/lambda
thmn
## [1] 5
#Sample mean
smpmn <-mean(dfrm$mean)
smpmn
## [1] 4.986963
Sample Mean of the distribution 4.9869634 and Theoretical Mean 5 are very close (i.e, 5). We could observe the same (blue line at 5) in below histograms:
# ploting the sample & theoretical means
par(mfrow=c(1,2))
hist(dfrm$mean,probability=T,main='Theoretical Mean (5)',ylim=c(0,0.55),xlab='Sample Means')
abline(v=thmn,col='blue',lwd=5)
hist(dfrm$mean,probability=T,main=paste('Sample Mean (4.98)'),ylim=c(0,0.55),xlab='Sample Means')
abline(v=smpmn,col='blue',lwd=5)
Calculating the Variance for Theoretical and actula distributions. Using R function to calculate the Theoretical Variance
thvar <-((1/lambda)^2)/nex #theoretical variance
thvar
## [1] 0.625
smvar <- var(dfrm$mean) #sample variance
smvar
## [1] 0.654343
Sample Variance of the distribution 0.65 and Theoretical Variance 0.62 are very close
Plotted below theoritical mean and sample mean curves on a histogram. Both are close and normally distributed. So the the histogram for the mean of 1000 simulated 40 random exponential values are symmetric around the mean with a bell shape
par(mfrow=c(1,1))
hist(scale(dfrm$mean),probability=T,main='',ylim=c(0,0.5),xlab='')
curve(dnorm(x,0,1),-3,3, col='green',add=T)
lines(density(scale(dfrm$mean)),col='blue')
legend(2,0.4,c('Theoritical','Sample'),cex=0.8,col=c('green','blue'),lty=1)
When compare the Theoritical Variance and Sample Variance (Actual Variance), they are very close. So the data similary distributed. This is wat expected from this analysis.