Overview

This post will take you through a very interesting and one of the most powerful theorem in Statistics, the Central Limit Theorem (CLT). The results of the comparison shows, that the normal distribution is indeed the father of all distributions. So, lets explore more on this fact by some simulations in R !!

Simulations

Below, I am plotting a set of 1000 values drawn from exponential distribution with lamda = 0.2. Have a look at it, this might not interest you and you may say, this I knew, it looks like an exponential distribution !! But just hold on for a while to see the next plot.

library(ggplot2)

# Thousand values drawn from exponential distribution with lamda = 0.2

exponentials = rexp(1000, 0.2)
exponentials = as.data.frame(exponentials)

# Plotting the value with the above values

ggplot(exponentials, aes(x = exponentials)) + 
geom_histogram(fill = "blue",alpha = .20, binwidth=.5, colour = "black", aes(y = ..density..)) +
xlab ("Values from the exponential distribution") + 
ylab ("Density of Values")

So, now look at the below plot, does something looks very similar to this ? Yes, you guessed it right, it resembles very closely to a normal distribution. In next section, we would compare its properties like mean and variance with its theoretical values.

s = 1000  # Number of simulations
n = 40    # Number of samples
lambda = 0.2

Averages = NULL
for (i in 1 : s) Averages = c(Averages, mean(rexp(n,0.2)))
Averages = as.data.frame(Averages)

ggplot(Averages, aes(x = Averages)) + 
geom_histogram(fill = "blue",alpha = .20, binwidth=0.3, colour = "black", aes(y = ..density..)) + 
geom_density(colour="blue", size=1) +
xlab("Averages of Exponential Samples") + 
ylab("Density of Averages")

Sample Mean Vs Theoretical Mean

Now, lets compare the Sample and Theoretical Mean for this exponential distribution. And it turns out to be always close each other. So close that you might at times not even be able to distinguish between the two vertical lines in the below plot.

Averages = NULL
for (i in 1 : s) Averages = c(Averages, mean(rexp(n,0.2)))
lambda = 0.2


hist(Averages,  freq=TRUE, breaks=50,
     main="Sample Mean Vs Theoretical Mean",
     xlab="Averages of Exponential Samples",
     ylab="Density of Averages",
     col='light blue')
abline(v=1/lambda,col='red',lwd=3)
abline(v=mean(Averages),col='blue',lwd=3)

#Thorectical Mean

1/lambda
## [1] 5
#Sample Mean

mean(Averages)
## [1] 5.01935

Sample Variance Vs Theoretical Variance

Now, lets compare the Sample and Theoretical Variance for this exponential distribution. And once again the values are very close to each other.

#Thorectical Variance

((1/lambda)^2)/n
## [1] 0.625
#Sample Mean

var(Averages)
## [1] 0.5351164

Normal Distribution and Simulaiton Results

In order to see both the distributions together, the below plot will help us visualize that.

hist(Averages,  prob=T,breaks=50,
     main="Normal Distribution and Simulation Results",
     xlab="Averages of Exponential Samples",
     ylab="Density of Averages",
     col='light blue')

# Density of the Simulated sample means
lines(density(Averages),col="blue", lty=2, lwd =2)

# Theoretical Mean - Red Line
abline(v=mean(Averages), col='red', lwd=2)

# Theoretical density of the exponential distribution
xfit <- seq(min(Averages), max(Averages), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(n)))
lines(xfit, yfit, pch=22, col="red", lty=2, lwd =2)

# Legend
legend('topright', c("Simulation", "Theoretical"), 
       col=c("black", "red"), lty=c(1,1))

We see here, both the red and blue dotted lines more or less traverses the same course, showing the strong similarity. Below are some food for thought topics on Central Limit Theorem.

  1. The probability distribution for total distance covered in a random walk (biased or unbiased) will tend toward a normal distribution.

  2. Flipping a large number of coins will result in a normal distribution for the total number of heads (or equivalently total number of tails).

There are als many interesting applicaitons of this theorem in the field of Medicine, Digital Marketing etc and in general wherever you need to draw some hypothesis tests to conclude something.