This project is part of Statistical Inference courses by John Hopkins University. This project will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials.
Load package
library(ggplot2)
We then initialize the simulation controlling variables.
nosim <- 1000
n <- 40
lambda <- 0.2
## Create a matrix with 1000 simulations (nosim = 1000) with rexp (random generation) is 40
Set the seed of the Random Number Generator so that the analysis is reproducible
set.seed(15)
Create a matrix with thousands rows corresponding to 1000 simulations and forty column corressponding to 40 random generations
simulationMatrix <- matrix(rexp(nosim * n, rate = lambda), nosim)
Create a vector of thousand rows containing the mean of each row of the simulationMatrix
simulationMean <- rowMeans(simulationMatrix)
Then we will create a data frame that contain the whole data
simulationData <- data.frame(cbind(simulationMatrix, simulationMean))
We will plot the simulation data to visualize
ggplot(data = simulationData, aes(simulationData$simulationMean)) +
geom_histogram(breaks = seq(2,9, by = 0.2), col = "blue", aes(fill = ..count..)) +
labs(title = "Histogram of Mean Distribution", x = "Distribution Mean", y = "Frequency") +
geom_vline(aes(xintercept = mean(simulationData$simulationMean), color = "red", linetype = "dashed", size =1))
Calculate actual mean of the simulated mean sample data
actualMean <- mean(simulationMean)
actualMean
## [1] 4.980535
And the theoretical mean is calculated by
theoreticalMean <- (1/lambda)
theoreticalMean
## [1] 5
We can see that the sample mean is very close to the theoretical mean
Calculate the actual variance of the simulated data
actualVar <- var(simulationMean)
actualVar
## [1] 0.6429192
And the theoretical variance is calcualted by
theoreticalVar <- ((1/lambda)^2)/n
theoreticalVar
## [1] 0.625
Again, the actual variance is very close to the theoretical variance
To prove that the simulated mean sample data approximately follows the Normal distribution, we perform the following three steps:
ggplot(simulationData, aes(x = simulationMean)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "blue") +
geom_density(colour = "red", size = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
actualConfInterval <- actualMean +c(-1,1)*1.96*sqrt(actualVar)/sqrt(n)
print(paste("Actual coverage of the confidence interval for +/-1.96sd = ", round(actualConfInterval, 3)))
## [1] "Actual coverage of the confidence interval for +/-1.96sd = 4.732"
## [2] "Actual coverage of the confidence interval for +/-1.96sd = 5.229"
theoreticalConfInterval <- theoreticalMean + c(-1, 1)*1.96*sqrt(theoreticalVar)/sqrt(n)
print(paste("Theoretical coverage of the confidence interval for +/-1.96sd =", round(theoreticalConfInterval, 3)))
## [1] "Theoretical coverage of the confidence interval for +/-1.96sd = 4.755"
## [2] "Theoretical coverage of the confidence interval for +/-1.96sd = 5.245"
qqnorm(simulationMean)
qqline(simulationMean)
We can see that the actual quantiles are close to the theoretical quantiles. Thus, the distribution is approximately