1- loading the required libraries
library(ggplot2)
2- Defining the simulation variables
lambda <- 0.2
pMean <- 1/0.2
pSD <- 1/0.2
n <-40
nosim <- 1000
3- generating data
data <- replicate(nosim,rexp(n,0.2))
The result is a 40 * 1000 matrix. We need to take the column mean in order to get the mean of the sample for each run.
4- Calculating sample means for the 1000 runs
dataMeans <- colMeans(data)
1- Compute the sample mean
sMean <- mean(dataMeans)
The sample mean is 4.9781977 compared to the population mean which is 1/lambda = 5
2-Compute the z confidence interval of the sample mean
zConf <- sMean + c(-1,1)*qnorm(0.95)*(pSD/sqrt(n))
We are 95% confident that the population mean lies withen this zConfidence interval 3.6778267, 6.2785686
1- Calculating the sample variance
sVar <- var(dataMeans)
pVar <- pSD*pSD
The sample variance is 0.6244459 compared to the theoritical variance of the distribution 25 The theoritical variance divided by the number of samples 0.625 which shows that the distribution follows the centeral limit theorem with standard deviation equal to standard error of the mean. We will show below that the distribution is normal.
dat <- data.frame( x = dataMeans,size = factor(rep(n,nosim)))
g <- ggplot(dat, aes(x = x, fill = size)) + geom_histogram( binwidth=.3, colour = "black", aes(y = ..density..))
g <- g + geom_density(size=2,colour = "black",alpha=.1)
g <- g + geom_vline(x = sMean,colour="black",size=2)
g <- g + xlab("Sample Mean")
print(g)