In this assignment, we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution will be simulated in R with rexp(n, \(\lambda\)) where \(\lambda\) is the rate parameter. The mean of exponential distribution is 1/\(\lambda\) and the standard deviation is also 1/\(\lambda\)
For all simulations, unless otherwise stated, the following parameters are set:
Let us generate 1000 * 40 random numbers of an exponential distribution with \(\lambda\) = 0.2 and take a look at the distribution and its properties.
expSample <- rexp(noOfSim*n,lambda)
dfExpSample <- data.frame(sample = expSample)
ggplot(data=dfExpSample, aes(x=sample)) +
geom_histogram(stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
ylab('Frequency') +
xlab('') +
labs(title = 'Histogram\n')
Properties of the exponential distribution generated are as follows:
Noticed that the mean and standard deviation is close to 1/\(\lambda\) = 1/0.2 = 5. The distribution is not normal because skewness and excess kurtosis is not close to 0. The QQ plot below also shows that the distribution is not normal.
Let us try instead to simulate 1000 sample set of 40 exponential random variables and calculating the mean of each sample. Noting that the expected sample mean and its standard error is as follows:
E[X] = 1/\(\lambda\) = 1/0.2 = 5
Var[X] = 1/\(\lambda\)^2 * 1/n = 1/0.2^2 * 1/40 = 0.625
SE[X] = \(\sqrt{Var[X]/n}\) = 5/\(\sqrt{40}\) = 0.79057
simSample <- matrix(rexp(n*noOfSim,lambda),noOfSim,n)
expMean <- 1/lambda
stdError <- 1/lambda/sqrt(n)
sampleMean <- apply(simSample, 1, mean)
dfSampleMean <- data.frame(sample = sampleMean)
ggplot(data=dfSampleMean, aes(x=sampleMean)) +
geom_histogram(stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
ylab('Frequency') +
xlab('') +
labs(title = 'Histogram\n') +
geom_vline(xintercept = mean(sampleMean), color = 'red', size = 1.5)
mean(sampleMean)
## [1] 5.009748
The sample mean is 5.00975. As indicated (red vertical line) on the histogram. This value is close to the theoretical mean of 1/\(\lambda\) = 1/0.2 = 5.
var(sampleMean)
## [1] 0.6194703
The sample variance is 0.61947. This value is close to the theoretical variance of 1/\(\lambda\)^2 * 1/n = 1/0.2^2 * 1/40 = 0.625.
Let us study the distribution of the sample means to see whether it follows the Central Limit Theorem which states that the distribution of averages of iid variables (properly normalised) becomes that of a standard normal if the sample size is large.
To standardise the sample means, we will substract the sample means off the expected mean and divide by the standard error.
stdSampleMean <- (sampleMean - expMean)/stdError
dfStdSampleMean <- data.frame(sample = stdSampleMean)
We plot the standardised sample means in the density plot. Noticed that the sample density plot (red) is very close to the standard normal density plot (yellow).
ggplot(data=dfStdSampleMean, aes(x=sample)) +
geom_histogram(aes(y = ..density..),
stat="bin", binwidth = 0.2, col = 'blue', fill='purple') +
geom_density(col='red', size = 1.5) +
stat_function(fun=dnorm, colour = "yellow", size = 1.5) +
ylab('Density') +
xlab('') +
labs(title = 'Density Plot of the Standardised Sample Mean\n')
Properties of the standardised sample means are as follows:
Noticed that the mean and standard deviation is close to those of a standard normal distribution of 0 and 1 respectively. The distribution is normal because skewness and excess kurtosis is close to 0. The QQ plot below also shows that the distribution is close to normal.
We have shown that the standardised sample means of the random variables generated from the exponential distribution has a distribution like that of a standard normal when n is large.
Libraries required for this assignment project: ggplot2, moments