The purpose of this analysis is illustrate the validity of the Central Limit Theorem(CLT). In order to do this, a random sample with be taken and its statitical qualities are compared to the CLT.
library(ggplot2)
library(knitr)
set.seed(1)
lambda <- 0.2 #rate
exp <- 40 #no. of exponentials
sim <- 1000 #no. of simulations
Show the sample mean and compare it to the theoretical mean of the distribution.
theoretical <- 1/lambda #theoretical mean
theoretical
## [1] 5
We see that the theoretical mean is 5. According to the theory, when we run our simulations, the means should also tend to this value. I will show this by taking the following 3 steps:
sim_exp <- t(replicate(sim, rexp(exp, lambda)))
means <- as.data.frame(rowMeans(sim_exp))
experimental <- mean(means$`rowMeans(sim_exp)`)
experimental #sample mean
## [1] 4.990025
Now let’s take a graphical look at our results.
qplot(rowMeans(sim_exp), data=means, geom=c("histogram"), main = "Experimental Means",
xlab = "",
fill=I("blue"),
col=I("red"),
alpha=I(.2)
#,abline=1
)
Figure 1 - Histogram of 1000 means
Theoretical Mean= 5
Sample mean = 4.9900252
We can see that the sample mean tends toward the theoretical mean illustrated by the taller bins. In the graph above, the highest frequencies tend to 5 which turns out to be the theoretical mean.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
variance_sample <- (sd(means$`rowMeans(sim_exp)`))^2
variance_theo <- (1/lambda)^2/exp
Sample Variance = 0.6111165
Theoretical Variance = 0.625
The sample variance is very close to the theoretical variance.
Show that the distribution is approximately normal.
qqnorm(means$`rowMeans(sim_exp)`, col="light blue", pch=16)
qqline(means, col ="red" )
Figure 2 - Normal points
In the graph above, the points are forming a diagonal. Therefore, it is close enough to be considered approximately normal.