Statistical Inference Project : Part 1. Exploring Exponential Distribution in R and Comparing it with Central Limit Theorem (By : Narendra Shukla)

Overview :

Central Limit Theorem states that Distribution of averages of iid (independent and identically distributed) variables (properly normalized) becomes that of a Standard Normal as the sample size increases. Also, Expected Value of Sample Means ie. E(Sample Means) = Mu (True Population Mean). And Variance of Sample Means ie. Var(Sample Means) = (Sigma ^ 2 /n), where {Sigma} = Standard Deviation of Population and {n} = Sample Size. We shall demonstrate each of these points.

Simulations :

The exponential distribution is simulated in R with rexp(n, lambda) where {lambda} is the rate parameter and {n} is sample size.

The mean of exponential distribution is (1/lambda).

The standard deviation is (1/lambda).

For our exercise, we use, lambda=0.2, n=40 , number of simulations = 1000

Please refer to Appendix, Section A:, to see how we simulate 1000 iterations. The variable actualSampleMeans contains 1000 observations, each observation is a mean of 40 samples.

Sample Mean versus Theoretical Mean :

We shall now compare MEAN of actualSampleMeans with Theoretical Population Mean ie. Mu.

meanOfActualSampleMeans <- mean(actualSampleMeans)
round(meanOfActualSampleMeans)
## [1] 5
Mu <- 1/lambda                    
Mu
## [1] 5

As you can see, both are equal.

The Law of Large Numbers states that as a sample size grows, its mean will get closer and closer to the average of the whole population. Let’s see this in action now.

Please refer to Appendix, Section B:, to see the code.

Figure 1:

As you can see, the mean of Sample Means starts off with somewhere around 4.8 and converges to 5.0 later on.

Also refer to Figure 2 in Appendix, Section B:. Notice that the Actual Sample Means are centered around Population Mean of 5.

Sample Variance versus Theoretical Variance :

The point, we are trying to prove, is, The variance of Sample Mean, of 1000 simulations, is equal to theoritical variance ie. (Sigma ^ 2 /n), where {Sigma} is, Standard Deviation of Population.

Let’s compare actual Sample Variance versus theoritical one. Please refer to Appendix, Section C:

As you can see, both are equal ie 0.6, up to 1 decimal place.

Now, we shall compare the distribution of Sample Means, with distribution of Normals.

Let’s create 1000 normal variables with mu=1/lambda and sd=1/lambda.

Please refer to Appendix, Section D:

Now let’s see, how our variation of Sample Means looks in comparison to variation of Normals. Please refer to Figure 3 in Appendix, Section D:

As you can see, the distribution of Sample Means, is spread less wider and more vertically up than distribution of Normals.

Like stated above, the variance of Normal distribution is Sigma ^ 2 ie. 25 and the variance of Sample Means distribution, is, (Sigma ^ 2 / n) ie. 0.625.

Distribution :

Here, we shall compare the Distribution of Population of Random Exponentials versus Distribution of their Normalized Sample Means.

Please refer to Appendix, Section E:, to see how this population is generated.

Let’s see how our Population Data looks like. Refer to Figure 4 in Appendix, Section E:

We observe that,

  1. This distribution is far from Standard Normal
  2. It’s heavily skewed to the right
  3. It’s not symmetrical at all about the mean (which is 5)
  4. This is a typical exponential distribution

Now, let’s compare it with distribution of it’s Normalized Sample Means.

In order to Normalize Sample Means, we shall use the formula,

Normalized Sample Mean = (Estimate - Mean Of Estimate)/(Standard Error of Estimate)

ie. Normalized Sample Mean = (Point Estimate - Mu) / (Sigma/sqrt(n))

normlizedActualSampleMeans <- (actualSampleMeans - Mu) /
                                   (Sigma/sqrt(n))

Now, let’s see how their distribution looks like. Please refer to Figure 5 in Appendix, Section F:. Also see the Normal Probability Plot.

Let’s get the mean and standard deviation of this distribution.

round(mean(normlizedActualSampleMeans))
## [1] 0
round(sd(normlizedActualSampleMeans))
## [1] 1

We can state that this distribution looks approximately normal as,

  1. The distribution is bell shaped with single peak
  2. It’s mean is 0 and standard deviation is 1
  3. It’s symmetrical about the mean, with no outliers
  4. It’s Normal Probability Plot is a straight line

Also refer to Figure 2 in Appendix, Section B:, to see Histogram of Actual Sample Means without Normalization.

Appendix :

Section A : Simulating 1000 iterations

set.seed(12345)
nosim <- 1000      # Number of Simulations
n <- 40            # Sample Size
lambda <- 0.2

Now we iterate through a for loop, 1000 times, for 1000 simulations. Each time, we generate a sample of 40 observations and take it’s mean. So actualSampleMeans variable contains 1000 observations of Sample Means.

actualSampleMeans <- NULL
for (i in 1 : 1000) actualSampleMeans <- c(actualSampleMeans, mean(rexp(n,lambda)))

Section B : Comparing Sample Mean versus Theoretical Mean

Code for Figure 1:,

means <- cumsum(actualSampleMeans)/(1:length(actualSampleMeans))
library(ggplot2)
g1 <- ggplot(data.frame(x = 1:length(actualSampleMeans), y = means), aes(x = x, y = y)) +
       geom_hline(yintercept = Mu, size=2, color="blue") + 
       geom_line(size = 1) +
       labs(x = "Number of observations", y = "Cumulative Mean", 
                title="Sample Mean Versus Theoritical Mean")
print(g1)

Figure 2: Histogram of Actual Sample Means without Normalization,

Notice that the Actual Sample Means are centered around Population Mean of 5.

Section C : Sample Variance versus Theoretical Variance

actualSampleVariance <- sd(actualSampleMeans) ^ 2
actualSampleVariance
## [1] 0.5954369
Sigma <- 1/lambda
theoriticalSampleVariance <- Sigma ^ 2 / n
theoriticalSampleVariance
## [1] 0.625

Section D : Creating 1000 Normal Variables and plotting them against Sample Means

normals <- rnorm(nosim, mean=1/lambda, sd=1/lambda)
length(normals)
## [1] 1000
summary(normals)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -10.110   1.745   5.138   5.131   8.482  20.290
sd(normals)
## [1] 4.946574

Figure 3: Histogram of Actual Sample Means versus Normal Distribution

Section E : Distribution of Population of Random Exponentials,

We have 1000 simulations with, each simulation having a sample size of 40. So our population will have 40,000 observations.

population <- rexp(nosim * n,lambda)
summary(population)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.426   3.437   5.003   6.892  64.000

This is how the population data looks like,

Figure 4: Histogram of Exponential Population Data

Section F : Distribution of Normalized Sample Means,

Figure 5: Histogram of Normalized Actual Sample Means Versus Normal Probability Plot