Statistical Inference Project : Part 1. Exploring Exponential Distribution in R and Comparing it with Central Limit Theorem (By : Narendra Shukla)

Overview :

Central Limit Theorem states that Distribution of averages of iid (independent and identically distributed) variables (properly normalized) becomes that of a Standard Normal as the sample size increases. Also, Expected Value of Sample Means ie. E(Sample Means) = Mu (True Population Mean). And Variance of Sample Means ie. Var(Sample Means) = (Sigma ^ 2 /n), where {Sigma} = Standard Deviation of Population and {n} = Sample Size. We shall demonstrate each of these points.

Simulations :

The exponential distribution is simulated in R with rexp(n, lambda) where {lambda} is the rate parameter and {n} is sample size.

The mean of exponential distribution is (1/lambda).

The standard deviation is (1/lambda).

For our exercise, we use, lambda=0.2, n=40 , number of simulations = 1000

Please refer to Appendix, Section A:, to see how we simulate 1000 iterations. The variable actualSampleMeans contains 1000 observations, each observation is a mean of 40 samples.

Sample Mean versus Theoretical Mean :

We shall now compare MEAN of actualSampleMeans with Theoretical Population Mean ie. Mu.

meanOfActualSampleMeans <- mean(actualSampleMeans)
round(meanOfActualSampleMeans)

## [1] 5

Mu <- 1/lambda                    
Mu

## [1] 5

As you can see, both are equal.

The Law of Large Numbers states that as a sample size grows, its mean will get closer and closer to the average of the whole population. Let’s see this in action now.

Please refer to Appendix, Section B:, to see the code.

Figure 1:

As you can see, the mean of Sample Means starts off with somewhere around 4.8 and converges to 5.0 later on.

Also refer to Figure 2 in Appendix, Section B:. Notice that the Actual Sample Means are centered around Population Mean of 5.

Sample Variance versus Theoretical Variance :

The point, we are trying to prove, is, The variance of Sample Mean, of 1000 simulations, is equal to theoritical variance ie. (Sigma ^ 2 /n), where {Sigma} is, Standard Deviation of Population.

Let’s compare actual Sample Variance versus theoritical one. Please refer to Appendix, Section C:

As you can see, both are equal ie 0.6, up to 1 decimal place.

Now, we shall compare the distribution of Sample Means, with distribution of Normals.

Let’s create 1000 normal variables with mu=1/lambda and sd=1/lambda.

Please refer to Appendix, Section D:

Now let’s see, how our variation of Sample Means looks in comparison to variation of Normals. Please refer to Figure 3 in Appendix, Section D:

As you can see, the distribution of Sample Means, is spread less wider and more vertically up than distribution of Normals.

Like stated above, the variance of Normal distribution is Sigma ^ 2 ie. 25 and the variance of Sample Means distribution, is, (Sigma ^ 2 / n) ie. 0.625.

Distribution :

Here, we shall compare the Distribution of Population of Random Exponentials versus Distribution of their Normalized Sample Means.

Please refer to Appendix, Section E:, to see how this population is generated.

Let’s see how our Population Data looks like. Refer to Figure 4 in Appendix, Section E:

We observe that,

This distribution is far from Standard Normal
It’s heavily skewed to the right
It’s not symmetrical at all about the mean (which is 5)
This is a typical exponential distribution

Now, let’s compare it with distribution of it’s Normalized Sample Means.

In order to Normalize Sample Means, we shall use the formula,

Normalized Sample Mean = (Estimate - Mean Of Estimate)/(Standard Error of Estimate)

ie. Normalized Sample Mean = (Point Estimate - Mu) / (Sigma/sqrt(n))

normlizedActualSampleMeans <- (actualSampleMeans - Mu) /
                                   (Sigma/sqrt(n))

Now, let’s see how their distribution looks like. Please refer to Figure 5 in Appendix, Section F:. Also see the Normal Probability Plot.

Let’s get the mean and standard deviation of this distribution.

round(mean(normlizedActualSampleMeans))

## [1] 0

round(sd(normlizedActualSampleMeans))

## [1] 1

We can state that this distribution looks approximately normal as,

The distribution is bell shaped with single peak
It’s mean is 0 and standard deviation is 1
It’s symmetrical about the mean, with no outliers
It’s Normal Probability Plot is a straight line

Also refer to Figure 2 in Appendix, Section B:, to see Histogram of Actual Sample Means without Normalization.

Appendix :

Section A : Simulating 1000 iterations

set.seed(12345)
nosim <- 1000      # Number of Simulations
n <- 40            # Sample Size
lambda <- 0.2

Now we iterate through a for loop, 1000 times, for 1000 simulations. Each time, we generate a sample of 40 observations and take it’s mean. So actualSampleMeans variable contains 1000 observations of Sample Means.

actualSampleMeans <- NULL
for (i in 1 : 1000) actualSampleMeans <- c(actualSampleMeans, mean(rexp(n,lambda)))

Section B : Comparing Sample Mean versus Theoretical Mean

Code for Figure 1:,

means <- cumsum(actualSampleMeans)/(1:length(actualSampleMeans))
library(ggplot2)
g1 <- ggplot(data.frame(x = 1:length(actualSampleMeans), y = means), aes(x = x, y = y)) +
       geom_hline(yintercept = Mu, size=2, color="blue") + 
       geom_line(size = 1) +
       labs(x = "Number of observations", y = "Cumulative Mean", 
                title="Sample Mean Versus Theoritical Mean")
print(g1)

Figure 2: Histogram of Actual Sample Means without Normalization,

Notice that the Actual Sample Means are centered around Population Mean of 5.

Section C : Sample Variance versus Theoretical Variance

actualSampleVariance <- sd(actualSampleMeans) ^ 2
actualSampleVariance

## [1] 0.5954369

Sigma <- 1/lambda
theoriticalSampleVariance <- Sigma ^ 2 / n
theoriticalSampleVariance

## [1] 0.625

Section D : Creating 1000 Normal Variables and plotting them against Sample Means

normals <- rnorm(nosim, mean=1/lambda, sd=1/lambda)
length(normals)

## [1] 1000

summary(normals)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -10.110   1.745   5.138   5.131   8.482  20.290

sd(normals)

## [1] 4.946574

Figure 3: Histogram of Actual Sample Means versus Normal Distribution

Section E : Distribution of Population of Random Exponentials,

We have 1000 simulations with, each simulation having a sample size of 40. So our population will have 40,000 observations.

population <- rexp(nosim * n,lambda)
summary(population)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.426   3.437   5.003   6.892  64.000

This is how the population data looks like,

Figure 4: Histogram of Exponential Population Data

Section F : Distribution of Normalized Sample Means,

Figure 5: Histogram of Normalized Actual Sample Means Versus Normal Probability Plot