Overview:

This report contains the results of a simulation done with the objective of demonstrating the Central Limit Theorem for a population that is exponentially distributed. According to the central limit theorem, the distribution of the sample means of a population will be normally distributed irrespective of the distribution of the population. The probability density funtion for an exponential distribution is f(x;λ) = λe-λx when x ≥ 0 and f(x;λ) = 0 when x < 0

Simulation:

Mean of sampling distribution

The value of λ is assumed to be 0.2, so that the mean and standard deviations of the population are 5. Given below is the simulation of 40 exponentials. In the histogram below, the distribution of the sample means is almost normally distributed around a mean of 5, which is same as the population mean, as stated in the Central Limit Theorem. The theoretical mean of the sampling distribution should be the same as the mean of the population, which is 1/λ, i.e. 5.

#The code below simulates 1000 random samples of 40 exponentials and uses ggplot to plot the histogram and the normal curve enveloping it
expmn <- NULL
for( i in 1:1000) expmn <- c(expmn, mean(rexp(40,0.2))) # simulates 1000 samples of 40 exp values
expdt <- data.frame(expmn)
# create histogram and the theoretical normal distribution
library(ggplot2)
g1 <- ggplot(expdt, aes(x = expdt$expmn)) + geom_histogram(binwidth=.3, colour="black", fill="white", aes(y = ..density..)) + stat_function(fun = dnorm, color="green", args = list(mean = 5, sd = 0.625))
g1 <- g1 +geom_density(alpha=.2, fill="blue")
g1 <- g1 + labs(x = "Means", y = "Density")
g1<- g1 + geom_vline(xintercept = 5, color = "red")
g1<- g1 + annotate("text", x = 5, size = 3, y = -0.006, label = "Mean ≈ 5, the population mean")
g1 <- g1 + ggtitle("Distribution of sample means")
g1

The mean of the simulated sampling distribution is 5.0245. The green curve shows the theoretical normal distribution that the sampling distribution tends to converge to as the number of samples increase.

Also, it can be seen in the chart below that the sample means can be seen to be converging to the value of the population mean of 5 as the number of samples increases

#creates a cumulative sum of the means to demostrate that it converges to the theoretical (population) mean 
library(ggplot2)
means <- cumsum(expmn)/1:1000
g2 <- ggplot(data.frame(x = 1:1000, y = means), aes(x = x, y = y)) + geom_hline(yintercept = 5) + geom_line(size = 1.5)
g2 <- g2 + labs(x = "Number of samples", y = "Cumulative mean")
g2

Variance of sampling distribution

The variance of the sampling distribution is 0.6499.

According to the Central Limit Theormen, the variance of the sampling distribution is thus = σ2/n, which in the above case is (1/λ)2÷n.

For a λ of 0.2, and n of 40, the value is 0.625, which is close to the value obtained in the simulation above.

#Creates a vector with the variance values for samples 1 through 1000
expvar <- var(expmn[1:2])
for(j in 3:1001) expvar <- c (expvar, var(c(expmn[2:j])))
g3 <- ggplot(data.frame(x = 1:1000, y = expvar), aes(x = x, y = y))
g3 <- g3 + geom_hline(yintercept = 0.625) + geom_line(size = 1.5)
g3 <- g3 + annotate("text", x = 300, size = 3, y = 0.650, label = "Theoretical Variance of the sample means = 0.625")
g3 <- g3 + labs(x = "Number of samples", y = "Cumulative variance")
g3