Overview

The purpose of this analysis is illustrate the validity of the Central Limit Theorem(CLT). In order to do this, a random sample with be taken and its statitical qualities are compared to the CLT.

library(ggplot2) 
library(knitr)
set.seed(1)
lambda <- 0.2 #rate
exp <- 40  #no. of exponentials
sim <- 1000 #no. of simulations

Question 1

Show the sample mean and compare it to the theoretical mean of the distribution.

theoretical <- 1/lambda #theoretical mean
theoretical
## [1] 5

We see that the theoretical mean is 5. According to the theory, when we run our simulations, the means should also tend to this value. I will show this by taking the following 3 steps:

1. Create a matrix which is 1000 vectors of 40 values in each (1000x40).
2. Take the mean of each vector.
3. Take the overall mean.
sim_exp <- t(replicate(sim, rexp(exp, lambda)))
means <- as.data.frame(rowMeans(sim_exp))
experimental <-  mean(means$`rowMeans(sim_exp)`)
experimental #sample mean
## [1] 4.990025

Now let’s take a graphical look at our results.

qplot(rowMeans(sim_exp), data=means, geom=c("histogram"), main = "Experimental Means",
      xlab = "",
      fill=I("blue"),
      col=I("red"),
      alpha=I(.2)
      #,abline=1
     )

Figure 1 - Histogram of 1000 means

Solution 1

Theoretical Mean= 5
Sample mean = 4.9900252
We can see that the sample mean tends toward the theoretical mean illustrated by the taller bins. In the graph above, the highest frequencies tend to 5 which turns out to be the theoretical mean.

Question 2

Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

variance_sample <- (sd(means$`rowMeans(sim_exp)`))^2
variance_theo <- (1/lambda)^2/exp

Solution 2

Sample Variance = 0.6111165
Theoretical Variance = 0.625
The sample variance is very close to the theoretical variance.

Question 3

Show that the distribution is approximately normal.

qqnorm(means$`rowMeans(sim_exp)`, col="light blue", pch=16)
qqline(means, col ="red" )

Figure 2 - Normal points

Solution 3

In the graph above, the points are forming a diagonal. Therefore, it is close enough to be considered approximately normal.