Josh Katz 02.09.16
In part 1 of this project, the exponential distribution will be used to demonstrate the Central Limit theorem. The following goals will be accomplished: 1) Show the sample mean and compare it to the theoretical mean of the distribution. 2) Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution. 3) Show that the distribution is approximately normal.
A simulation of 40 exponentials will be run 1000 times and stored in a variable to compare mean/variance/normality in the following sections
lambda=0.2
n=40
##define exponential variables
set.seed(189)
##makes simulation reproducible
sim_exp=matrix(rexp(n*1000, lambda),nrow=1000,ncol=n);dim(sim_exp)
## [1] 1000 40
##generated box called sim_exp and put 40 random exponentials in each of the 1000 rows
hist(rowMeans(sim_exp),col="green",main="Distribution of Sample Means")
mean(rowMeans(sim_exp))
## [1] 5.006347
##mean of the 1000 means,each derived from a run of 40
1/.2
## [1] 5
##theoretical mean which is 1/lambda
The mean of the sampling distribution, 5.006347 is very close the theoretical one, 5.0. The mean of the 1000 simulations of 5.006347 is shown in the histogram as 5 appears to be the point of center mass.
var(rowMeans(sim_exp))
## [1] 0.6069175
##variance of the 1000 means,each derived from a run of 40
(1/lambda)^2/40
## [1] 0.625
##theoretical variance which is (1/lambda)^2/n
The variance of the sampling distribution, 0.6069 is very close the theoretical one, 0.625.
Demonstrate normality with plot
set.seed(199)
par(mfrow=c(1,2))
hist(rexp(1000,.2),col="red",main=
"Distribution of 1000
Random Exponentials")
hist(rowMeans(sim_exp),col="green",main="Distribution of Sample Means")
Using several sample means (1000 in this case), see green histogram, a much more bell-shaped or normal distribution is achieved versus using a single sample of exponentials, see red histogram. These histograms, derived from a non-normal distibution, the exponential, demonstrate the central limit theorem: the distribution of the average of a large number of samples will be approximately normal even if they came from a non-normal distribution.
summary(colMeans(sim_exp))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.548 4.872 5.042 5.006 5.112 5.357
The numerical summary agrees with the green histogram’s appearance of normality as the mean and median are very close.
R version 3.2.3 (2015-12-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1