Oleksandr Fialko
11/01/2017
The central limit theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, all of the samples will follow an approximate normal distribution pattern, with all variances being approximately equal to the variance of the population divided by each sample’s size.
The cartoon is taken from here
Let's flip a biased coin million times:
sample_size = 1000
data <- rbinom(n=sample_size,size=1,prob=0.2)
The mean is \( 0.2 \), while the variance \( 0.2(1-0.2)= 0.16 \):
c(mean(data),var(data))
[1] 0.2050000 0.1631381
Now let's repeat our experiment many times
num_obs = 1000
flips<- rbinom(sample_size*num_obs,1, 0.2)
and store the results in a matrix:
data <- matrix(flips,nrow = num_obs)
Calculate means of each observation:
means <- apply(data,1,mean)
The means should have Gaussian distribution with mean \( 0.2 \) and variance \( 0.16/1000 \), which is indeed the case as shown here.
I have created a Shiny application, in which I demonstrate CLT using other distributions.
sig <- 1.6e-4
x<-seq(0.15,0.25,0.001)
y<-exp(-(x-0.2)**2/sig/2)
y<-y/sum(y)/0.001
hist(means,freq = F)
lines(x,y,col='red',lwd=2)