Central Limit Theorem states that Distribution of averages of iid (independent and identically distributed) variables (properly normalized) becomes that of a Standard Normal as the sample size increases. Also, Expected Value of Sample Means ie. E(Sample Means) = Mu (True Population Mean). And Variance of Sample Means ie. Var(Sample Means) = (Sigma ^ 2 /n), where {Sigma} = Standard Deviation of Population and {n} = Sample Size. We shall demonstrate each of these points.
The exponential distribution is simulated in R with rexp(n, lambda) where {lambda} is the rate parameter and {n} is sample size.
The mean of exponential distribution is (1/lambda).
The standard deviation is (1/lambda).
For our exercise, we use, lambda=0.2, n=40 , number of simulations = 1000
Please refer to Appendix, Section A:, to see how we simulate 1000 iterations. The variable actualSampleMeans contains 1000 observations, each observation is a mean of 40 samples.
We shall now compare MEAN of actualSampleMeans with Theoretical Population Mean ie. Mu.
meanOfActualSampleMeans <- mean(actualSampleMeans)
round(meanOfActualSampleMeans)
## [1] 5
Mu <- 1/lambda
Mu
## [1] 5
As you can see, both are equal.
The Law of Large Numbers states that as a sample size grows, its mean will get closer and closer to the average of the whole population. Let’s see this in action now.
Please refer to Appendix, Section B:, to see the code.
Figure 1:
As you can see, the mean of Sample Means starts off with somewhere around 4.8 and converges to 5.0 later on.
Also refer to Figure 2 in Appendix, Section B:. Notice that the Actual Sample Means are centered around Population Mean of 5.
The point, we are trying to prove, is, The variance of Sample Mean, of 1000 simulations, is equal to theoritical variance ie. (Sigma ^ 2 /n), where {Sigma} is, Standard Deviation of Population.
Let’s compare actual Sample Variance versus theoritical one. Please refer to Appendix, Section C:
As you can see, both are equal ie 0.6, up to 1 decimal place.
Now, we shall compare the distribution of Sample Means, with distribution of Normals.
Let’s create 1000 normal variables with mu=1/lambda and sd=1/lambda.
Please refer to Appendix, Section D:
Now let’s see, how our variation of Sample Means looks in comparison to variation of Normals. Please refer to Figure 3 in Appendix, Section D:
As you can see, the distribution of Sample Means, is spread less wider and more vertically up than distribution of Normals.
Like stated above, the variance of Normal distribution is Sigma ^ 2 ie. 25 and the variance of Sample Means distribution, is, (Sigma ^ 2 / n) ie. 0.625.
Here, we shall compare the Distribution of Population of Random Exponentials versus Distribution of their Normalized Sample Means.
Please refer to Appendix, Section E:, to see how this population is generated.
Let’s see how our Population Data looks like. Refer to Figure 4 in Appendix, Section E:
We observe that,
Now, let’s compare it with distribution of it’s Normalized Sample Means.
In order to Normalize Sample Means, we shall use the formula,
Normalized Sample Mean = (Estimate - Mean Of Estimate)/(Standard Error of Estimate)
ie. Normalized Sample Mean = (Point Estimate - Mu) / (Sigma/sqrt(n))
normlizedActualSampleMeans <- (actualSampleMeans - Mu) /
(Sigma/sqrt(n))
Now, let’s see how their distribution looks like. Please refer to Figure 5 in Appendix, Section F:. Also see the Normal Probability Plot.
Let’s get the mean and standard deviation of this distribution.
round(mean(normlizedActualSampleMeans))
## [1] 0
round(sd(normlizedActualSampleMeans))
## [1] 1
We can state that this distribution looks approximately normal as,
Also refer to Figure 2 in Appendix, Section B:, to see Histogram of Actual Sample Means without Normalization.
Section A : Simulating 1000 iterations
set.seed(12345)
nosim <- 1000 # Number of Simulations
n <- 40 # Sample Size
lambda <- 0.2
Now we iterate through a for loop, 1000 times, for 1000 simulations. Each time, we generate a sample of 40 observations and take it’s mean. So actualSampleMeans variable contains 1000 observations of Sample Means.
actualSampleMeans <- NULL
for (i in 1 : 1000) actualSampleMeans <- c(actualSampleMeans, mean(rexp(n,lambda)))
Section B : Comparing Sample Mean versus Theoretical Mean
Code for Figure 1:,
means <- cumsum(actualSampleMeans)/(1:length(actualSampleMeans))
library(ggplot2)
g1 <- ggplot(data.frame(x = 1:length(actualSampleMeans), y = means), aes(x = x, y = y)) +
geom_hline(yintercept = Mu, size=2, color="blue") +
geom_line(size = 1) +
labs(x = "Number of observations", y = "Cumulative Mean",
title="Sample Mean Versus Theoritical Mean")
print(g1)
Figure 2: Histogram of Actual Sample Means without Normalization,
Notice that the Actual Sample Means are centered around Population Mean of 5.
Section C : Sample Variance versus Theoretical Variance
actualSampleVariance <- sd(actualSampleMeans) ^ 2
actualSampleVariance
## [1] 0.5954369
Sigma <- 1/lambda
theoriticalSampleVariance <- Sigma ^ 2 / n
theoriticalSampleVariance
## [1] 0.625
Section D : Creating 1000 Normal Variables and plotting them against Sample Means
normals <- rnorm(nosim, mean=1/lambda, sd=1/lambda)
length(normals)
## [1] 1000
summary(normals)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10.110 1.745 5.138 5.131 8.482 20.290
sd(normals)
## [1] 4.946574
Figure 3: Histogram of Actual Sample Means versus Normal Distribution
Section E : Distribution of Population of Random Exponentials,
We have 1000 simulations with, each simulation having a sample size of 40. So our population will have 40,000 observations.
population <- rexp(nosim * n,lambda)
summary(population)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.426 3.437 5.003 6.892 64.000
This is how the population data looks like,
Figure 4: Histogram of Exponential Population Data
Section F : Distribution of Normalized Sample Means,
Figure 5: Histogram of Normalized Actual Sample Means Versus Normal Probability Plot