Martin Livingstone 2016-12-09

Synopsis

The Central Limit Theorem was tested against an exponential distribution. 1000 simulations of sample size 40 were run. It was found that sampling distribution is a normal distribution with a mean that matches the population mean and a variance that matches the theoretical result.

1 Exponential Distribution

The exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate. The exponential distribution has a mean of \(1/\lambda\), a variance of \(1/\lambda^2\) and a standard deviation of \(1/\lambda\).

An example exponential distribution is given below for \(\lambda = 0.2\). Clearly this is a non-normal distribution.

2 Central Limit Theorem

The Central Limit Theorem (CLT) states that if you continually select random samples of a fixed sample size (\(n>30\)), the sample averages (\(\bar{X}\)’s) will gradually build into the shape of a normal distribution, called an \(\bar{X}\) distribution, such that:

The mean of the \(\bar{X}\) distribution is the same as \(\mu\), the mean of the population, and
The standard deviation of the \(\bar{X}\) distribution, \(\sigma_{\bar{X}}\), otherwise known as the Standard Error of the Mean, is equal to the population standard deviation (\(\sigma\)) divided by the square root of the sample size (\(n\)) \[\sigma_{\bar{X}} = \sigma / \sqrt{n}\]

3 Simulations

We will test the Central Limit Theorem against the exponential distribution (\(\lambda = 0.2\)) by taking 1000 samples of sample size 40.

set.seed(12345); lambda <- 0.2; nosim <- 1000; n <- 40

# Create a matrix of 1000 simulations (samples), each with a sample size of 40
mx <- matrix(rexp(nosim * n, rate = lambda), nosim)

# Calculate the mean of each sample - this gives us a numeric vector of length 1000
mns <- apply(mx, 1, mean)

A histogram of the sample means shows that they are approximately normal:

3.1 Sampling Distribution Mean v Theoretical Mean

According to the CLT, the mean of the sampling distribution should be the same as the population mean.

# Mean of sampling distribution
mean(mns)

## [1] 4.971972

The mean of the \(\bar{X}\) distribution is 4.972, a close match to the theoretical population mean of \(1 / \lambda = 1/0.2 = 5\).

3.2 Sampling Distribution Variance v Theoretical Variance

According to the CLT, the variance of the sampling distribution equals the variance of the population distribution divided by the sample size.

# Variance of sampling distribution
sd(mns)^2

## [1] 0.6157926

The variance of the \(\bar{X}\) distribution is \(\sigma^2_{\bar{X}}\) = 0.616, which is a close match to the theoretical value \(\frac{\sigma^2}{n} = \frac{1}{\lambda^2 n}\) = 0.625

Similarly, the corresponding standard deviation of the \(\bar{X}\) distribution \(\sigma_{\bar{X}}\) is 0.785, a close match to the theoretical value \(\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{1}{\lambda \sqrt n}\) = 0.791.

3.3 Normality of Sampling Distribution

We have already seen in the histogram above that the sampling distribution looks approximately normal.

Another ‘back of the envolope’ test is to compare distribution quantiles to those from a normal population with the same mean and standard deviation

# Select the 5th, 25th, 50th, 75th and 95th quantiles to use for comparison
probs<-c(0.05, 0.25, 0.5, 0.75, 0.95)
# Sampling distribution
quantile(mns, probs)

##       5%      25%      50%      75%      95% 
## 3.740499 4.423866 4.937798 5.492124 6.277302

# Normal distribution
qnorm(probs, mean = mean(mns), sd = sd(mns))

## [1] 3.681215 4.442683 4.971972 5.501261 6.262729

We can see that the quantiles are a very close match, which confirms that the sampling distribution is indeed normal.