Martin Livingstone 2016-12-09
The Central Limit Theorem was tested against an exponential distribution. 1000 simulations of sample size 40 were run. It was found that sampling distribution is a normal distribution with a mean that matches the population mean and a variance that matches the theoretical result.
The exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. The exponential distribution has a mean of \(1/\lambda\), a variance of \(1/\lambda^2\) and a standard deviation of \(1/\lambda\).
An example exponential distribution is given below for \(\lambda = 0.2\). Clearly this is a non-normal distribution.
The Central Limit Theorem (CLT) states that if you continually select random samples of a fixed sample size (\(n>30\)), the sample averages (\(\bar{X}\)’s) will gradually build into the shape of a normal distribution, called an \(\bar{X}\) distribution, such that:
We will test the Central Limit Theorem against the exponential distribution (\(\lambda = 0.2\)) by taking 1000 samples of sample size 40.
set.seed(12345); lambda <- 0.2; nosim <- 1000; n <- 40
# Create a matrix of 1000 simulations (samples), each with a sample size of 40
mx <- matrix(rexp(nosim * n, rate = lambda), nosim)
# Calculate the mean of each sample - this gives us a numeric vector of length 1000
mns <- apply(mx, 1, mean)
A histogram of the sample means shows that they are approximately normal:
According to the CLT, the mean of the sampling distribution should be the same as the population mean.
# Mean of sampling distribution
mean(mns)
## [1] 4.971972
The mean of the \(\bar{X}\) distribution is 4.972, a close match to the theoretical population mean of \(1 / \lambda = 1/0.2 = 5\).
According to the CLT, the variance of the sampling distribution equals the variance of the population distribution divided by the sample size.
# Variance of sampling distribution
sd(mns)^2
## [1] 0.6157926
The variance of the \(\bar{X}\) distribution is \(\sigma^2_{\bar{X}}\) = 0.616, which is a close match to the theoretical value \(\frac{\sigma^2}{n} = \frac{1}{\lambda^2 n}\) = 0.625
Similarly, the corresponding standard deviation of the \(\bar{X}\) distribution \(\sigma_{\bar{X}}\) is 0.785, a close match to the theoretical value \(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{1}{\lambda \sqrt n}\) = 0.791.
We have already seen in the histogram above that the sampling distribution looks approximately normal.
Another ‘back of the envolope’ test is to compare distribution quantiles to those from a normal population with the same mean and standard deviation
# Select the 5th, 25th, 50th, 75th and 95th quantiles to use for comparison
probs<-c(0.05, 0.25, 0.5, 0.75, 0.95)
# Sampling distribution
quantile(mns, probs)
## 5% 25% 50% 75% 95%
## 3.740499 4.423866 4.937798 5.492124 6.277302
# Normal distribution
qnorm(probs, mean = mean(mns), sd = sd(mns))
## [1] 3.681215 4.442683 4.971972 5.501261 6.262729
We can see that the quantiles are a very close match, which confirms that the sampling distribution is indeed normal.