We will run 1000 simulations where 40 random values are generated from an exponential distribution using the rexp() function. We will create a vector of the means of the 40 samples and then find the mean of the sample means as well as the standard deviation. We wil l check the sample means for a Gaussian distribution as an investigation of the Central Limit Theorem (“CLT”).
We create a 1000 X 40 matrix that contains the simulation. This matrix has 1000 rows of 40 randomly generated values from an exponential distribution with lambda = 0.2. We then create a 1000 x 1 vector by taking the means of the simulation matrix.
set.seed(5927); ExpSim <- matrix(rexp(40*1000,.2), 1000, 40)
SMeans <- apply(ExpSim, 1, mean)
We determine the theoretical mean and the sample mean and compare them.
lambda = .2; ActMean <- 1/lambda; SampleMean <- mean(SMeans)
data.frame(ActMean, SampleMean)
## ActMean SampleMean
## 1 5 5.031298
As expected under the CLT, the sample mean is close to the theoretical mean.
SampleSD <- sd(SMeans); ActSD <- 1/lambda/sqrt(40)
SampleVar <- SampleSD^2; ActVar <- ActSD^2
data.frame(ActSD, SampleSD,ActVar, SampleVar)
## ActSD SampleSD ActVar SampleVar
## 1 0.7905694 0.7913945 0.625 0.6263052
The sample standard deviation and theoretical standard deviation are very close in value. Similarly, the theoretical and sample variance are close in value.
Sample.CI <- SampleMean + c(-1,1)*qnorm(.975)*SampleSD
Act.CI <- ActMean + c(-1,1)*qnorm(.975)*ActSD
data.frame(Act.CI, Sample.CI)
## Act.CI Sample.CI
## 1 3.450512 3.480193
## 2 6.549488 6.582402
The sample confidence intervals are close approximations of the theoretical confidence interval.
library(ggplot2)
g <- ggplot(data.frame(SMeans), aes(x = SMeans)) + geom_histogram(binwidth=.3,
fill="#0066CC", colour = "#003399", aes(y = ..density..)) +
labs(title = "Figure 1: Sample Means with Overlay Plot of Gaussian Distribution",
x = "Sample Means", y = "Density")
g <- g + geom_vline(xintercept = ActMean, colour = "red") +
geom_vline(xintercept = SampleMean, colour = "orange") +
geom_vline(xintercept = c(Sample.CI[1],Sample.CI[2]), colour = "green") +
stat_function(fun = dnorm, args = c(mean = ActMean, sd = ActSD), size = 1) +
geom_density(colour = "purple", size=1)
g
The histogram was created from the 1000x1 vector of sample means. The overlay of the Gaussian plot is generated with mean \(= \frac{1}{\lambda}=5\) and standard deviation \(= \frac{\sqrt{40}}{\lambda}=0.7905694\). The purple line represents the sampled density curve. The red line represents the theoretical mean of the distribution, the orange line is the sample mean. The means are roughly symmetric around the theoretical mean with the most frequently sampled mean is close to the theoretical mean. The green lines are the sample confidence interval.
qqnorm(SMeans, main = "Figure 2: Normal Q-Q Plot"); qqline(SMeans)
This graph plots the sample quantiles against the theoretical quantiles which gives an indication of the normality of the sample. The line represents the threshold where the sample meets its theoretical normal distribution, so the more point that fall on this line the closer the sample is to a Gaussian distribution. As we can see, many points lie close to or on the line.
Given the above analysis, the distribution of the means behave as predicted by the CLT. The sample mean is close to the theoretical mean, the same is true for the standard deviation/variance. The histogram of the distribution appears approximately normal versus the theoretical curve. The sample quantiles confirm an approximately Gaussian distribution.