Overview

In this assignment, we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution will be simulated in R with rexp(n, \(\lambda\)) where \(\lambda\) is the rate parameter. The mean of exponential distribution is 1/\(\lambda\) and the standard deviation is also 1/\(\lambda\)

For all simulations, unless otherwise stated, the following parameters are set:

lambda = \(\lambda\) = 0.2
noOfSim = no of simulations = 1000
n = no of exponential distribution per simulation = 40
Random seed set = 2016

Exponential Distribution

Let us generate 1000 * 40 random numbers of an exponential distribution with \(\lambda\) = 0.2 and take a look at the distribution and its properties.

expSample <- rexp(noOfSim*n,lambda)

dfExpSample <- data.frame(sample = expSample)

ggplot(data=dfExpSample, aes(x=sample)) +
    geom_histogram(stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
    ylab('Frequency') +
    xlab('') +
    labs(title = 'Histogram\n')

Properties of the exponential distribution generated are as follows:

Mean : 4.98
Standard Deviation : 4.99
Skewness : 1.97
Excess Kurtosis : 5.73

Noticed that the mean and standard deviation is close to 1/\(\lambda\) = 1/0.2 = 5. The distribution is not normal because skewness and excess kurtosis is not close to 0. The QQ plot below also shows that the distribution is not normal.

Studying the distribution of the mean of n exponentially generated random variables

Simulations

Let us try instead to simulate 1000 sample set of 40 exponential random variables and calculating the mean of each sample. Noting that the expected sample mean and its standard error is as follows:

E[X] = 1/\(\lambda\) = 1/0.2 = 5
Var[X] = 1/\(\lambda\)^2 * 1/n = 1/0.2^2 * 1/40 = 0.625
SE[X] = \(\sqrt{Var[X]/n}\) = 5/\(\sqrt{40}\) = 0.79057

simSample <- matrix(rexp(n*noOfSim,lambda),noOfSim,n)

expMean <- 1/lambda
stdError <- 1/lambda/sqrt(n)
sampleMean <- apply(simSample, 1, mean)

dfSampleMean <- data.frame(sample = sampleMean)

ggplot(data=dfSampleMean, aes(x=sampleMean)) +
    geom_histogram(stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
    ylab('Frequency') +
    xlab('') +
    labs(title = 'Histogram\n') +
    geom_vline(xintercept = mean(sampleMean), color = 'red', size = 1.5)

Sample Mean vs Theoretical Mean

mean(sampleMean)

## [1] 5.009748

The sample mean is 5.00975. As indicated (red vertical line) on the histogram. This value is close to the theoretical mean of 1/\(\lambda\) = 1/0.2 = 5.

Sample Variance vs Theoretical Variance

var(sampleMean)

## [1] 0.6194703

The sample variance is 0.61947. This value is close to the theoretical variance of 1/\(\lambda\)^2 * 1/n = 1/0.2^2 * 1/40 = 0.625.

Distribution

Let us study the distribution of the sample means to see whether it follows the Central Limit Theorem which states that the distribution of averages of iid variables (properly normalised) becomes that of a standard normal if the sample size is large.

To standardise the sample means, we will substract the sample means off the expected mean and divide by the standard error.

stdSampleMean <- (sampleMean - expMean)/stdError
dfStdSampleMean <- data.frame(sample = stdSampleMean)

We plot the standardised sample means in the density plot. Noticed that the sample density plot (red) is very close to the standard normal density plot (yellow).

ggplot(data=dfStdSampleMean, aes(x=sample)) +
    geom_histogram(aes(y = ..density..), 
                   stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
    geom_density(col='red', size = 1.5) +
    stat_function(fun=dnorm, colour = "yellow", size = 1.5) +
    ylab('Density') +
    xlab('') +
    labs(title = 'Density Plot of the Standardised Sample Mean\n')

Properties of the standardised sample means are as follows:

Mean : 0.01
Standard Deviation : 1
Skewness : 0.28
Excess Kurtosis : 0.23

Noticed that the mean and standard deviation is close to those of a standard normal distribution of 0 and 1 respectively. The distribution is normal because skewness and excess kurtosis is close to 0. The QQ plot below also shows that the distribution is close to normal.

Conclusion

We have shown that the standardised sample means of the random variables generated from the exponential distribution has a distribution like that of a standard normal when n is large.

Libraries required for this assignment project: ggplot2, moments

Statistical Inference - Course Project 1

Chan Chee-Foong

May 14, 2016