The CLT and the normal distribution

The Central Limit Theorem (CLT) States that under certain (fairly loose) conditions, the mean of a sample of random observations will be asymptotically normally distributed regardless of the distribution governing the individual observations. Asymptotically in that as the number of observations increase so to does the distribution of the sample mean become increasingly normal. Below we’ll examine this theorem via an exploration of random samples pulled from the exponential distribution, establish the degree to which the resulting sample means approximate a normal distribution via examination of their empircal moments, and examine briefly the exceptions to the “fairly loose” conditions.

Generating samples of increasing size is a good starting point, if the CLT does indeed hold true (and we hope it does, there’s a proof of it) then we should see the sample distributions becomming more normal as we go.

Just to note at the outset, the exponential distribution is certainly not normal:

hist(rnorm(1000),freq=FALSE,col="grey")
curve(dexp,add=TRUE)

– theres stuff to the left of zero in the histogram up there (normal random variables)… and our exponential theoretical curve is only positive.

So, starting n=2 and increasing up to n=5000 samples from the exponential distribution, we collect 100 sets of such random samples for each n and generate a series of the four moments (mean, variance, skew, and kurtosis. IE these are the moments of the sample mean) as a function of sample size. We can see below the sample mean tighten up around the population mean; How do we know the population mean? Well we know we’re pulling from the theoretical exponential distribution, and we know the mean of the exponential distribution is exactly ^-1 (because math). Lets plot the theoretical distributions given the empirical sample moments:

library("PearsonDS")


resultsMEAN<-NULL
for (n in 2:1000) {
    means<-NULL
    for(i in 1:100) means = c(means,mean(rexp(n,0.2)))
    resultsMEAN<-rbind(resultsMEAN,empMoments(means))
}

colfunc <- colorRampPalette(c("lightblue", "darkblue"))
cols<-colfunc(50)

plot(function(x) dpearson(x,moments=resultsMEAN[1,]),-10,10,ylim=c(0,0.8),col=cols[1],
     main="convergence to population mean",ylab="density")
abline(v=1/0.2)
for (m in 2:999) {
   plot(function(x) dpearson(x,moments=resultsMEAN[m,]),-50,50,add=TRUE,col=cols[m]) 
}

The effect of the increasing sample size (increasing as the color darkens) is visible on the variance of the sample mean through the “squeezing” of the distributions about the population mean (the black line). Looks like we’re unbiased (the mean of our sample means is centered on the population mean), and we’re centering on the population measure with increasing sample size. These are all good things that let us take inferences using stats. Hooray.

We’ll be extra happy if this is true for the second moment (variance) as well:

And hey, it does. Life is good. If you look closely you’ll see it isn’t as good as with the earlier case. With 1000 iterations we’re still a little squat (variance at the end is close to 5)

Naturally, you may be wondering about whether these approximations are actually becomming normal. That’s fair. The distributions look like they’re becomming fairly normal, bell like curves. But how do we know?

A casual observation might be to see whether the histograms appear to look more normal as the sample increases (here just looking at the sample means).

par(mfrow=c(1,4))
means<-NULL
for(i in 1:100) means = c(means,mean(rexp(10,0.2)))
hist(means,main="n=10")
means<-NULL
for(i in 1:100) means = c(means,mean(rexp(25,0.2)))
hist(means,main="n=25")
means<-NULL
for(i in 1:100) means = c(means,mean(rexp(50,0.2)))
hist(means,main="n=50")
means<-NULL
for(i in 1:100) means = c(means,mean(rexp(100,0.2)))
hist(means,main="n=100")

Perhaps these look increasingly normal. It’s frankly difficult to tell.

We can look at some hard numbers instead. The normal distribution is unique in that it is characterized by a kurtosis (the fourth moment) of 3, and a skew (the third moment) of 0. Our empirical moments for both Variance and Mean converge to these expressions nicely:

library(zoo)
## Warning: package 'zoo' was built under R version 3.1.3
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
par(mfrow=c(1,2))
plot(rollmean(resultsMEAN[1:100,3],30),type="l",col="darkblue",ylab="skew",xlab="Sample Size",
     main = "30MA Of sample mean Skew")
plot(rollmean(resultsMEAN[1:100,4],30),type="l",col="darkblue",ylab="kurtosis",xlab="Sample Size",
     main = "30MA Of sample mean kurtosis")

Plotting Skew against Kurtosis gives a nice picture of the convergence of the sample distributions through the skew/kurtosis space.

colfunc <- colorRampPalette(c("lightblue", "darkblue"))
cols<-colfunc(1000)
par(mfrow=c(1,1))
plot(rollmean(resultsMEAN[1:999,3],30),rollmean(resultsMEAN[1:999,4],30),col=cols,xlab="Skew",
     ylab="kurtosis",main="Convergence of Sample mean to normal distribution (0,3)",xlim=c(0,1))
points(0,3,type="p",pch="X",col="black")

variance is slower: (anyone know why?)

par(mfrow=c(1,2))
plot(rollmean(resultsVAR[1:999,3],30),type="l",col="darkblue",ylab="skew",xlab="Sample Size",
     main = "30MA Of sample variance skew")
plot(rollmean(resultsVAR[1:999,4],30),type="l",col="darkblue",ylab="kurtosis",xlab="Sample Size",
     main = "30MA Of sample variance kurtosis")

colfunc <- colorRampPalette(c("lightblue", "darkblue"))
cols<-colfunc(1000)
par(mfrow=c(1,1))
plot(rollmean(resultsVAR[1:999,3],30),rollmean(resultsVAR[1:999,4],30),col=cols,xlab="Skew",
     ylab="kurtosis",main="Convergence of Sample variance to normal distribution (0,3)",xlim=c(0,3))
points(0,3,type="p",pch="X",col="black")

Both converge on the normal distribution at (3,0) eventually.

Other Stable distributions

We said earlier that sample mean converges by the CLT to the normal distribution under certain conditions. As it turns out the normal is just one (very special member) of a family of attractor distributions, called the levy stable distribtuions. The normal’s spot in skew/kurtosis space is unique, and with minor alterations we can map out the broader family of levy-stable attractors in this (skew,kurtosis) plane.

The levy alpha-stable attractors can be generalized by the formula, given parameters \(\alpha\in(0,2]\), \(\beta\in[-1,1]\), \(\gamma>0\), and \(\delta\in\mathbb{R}\):

\[ S_1(\alpha,\beta,\gamma,\delta) = \begin{cases}exp\{iu\delta-\gamma ^\alpha|u|^\alpha [1+i\beta(|u\gamma|^(1-\alpha)-1)]sgn(u)\tan(1/2\pi \alpha)\} for \alpha!=1; \\ exp\{iu\delta-\frac{(\gamma|u|[\pi+2i\beta ln(|u\gamma|)]sgn(u))\}}{\pi} for \alpha=1 \end{cases}\] http://mathworld.wolfram.com/StableDistribution.html

in which our familiar distributions take specific values of \(\alpha\) and \(\beta\). Leaving \(\gamma=1\) and \(\delta=0\) we can explore distributions across interesting levels of \(\alpha\) and \(\beta\) by taking the intuition from our earlier experiement that by 1000 samples our distribution had converged.

library(stabledist)

resultsMEAN<-NULL
for (beta in 10:30) {
for (alpha in 1:20){
    means<-NULL
    for(i in 1:100) means = c(means,mean(rstable(1000,alpha/10,(beta/10)-2)))
    resultsMEAN<-rbind(resultsMEAN,empMoments(means))
}
}

colfunc <- colorRampPalette(c("lightblue", "darkblue"))
cols<-colfunc(nrow(resultsMEAN))
plot(resultsMEAN[,3],resultsMEAN[,4],col=cols,xlab="Skew",
     ylab="kurtosis",main="Stable distributions Skew and Kurtosis")

We see the stable distributions extending out into lepto-kurtotic space with increasing symmetric skews. Basically the distributions just get a really fat tail stretching out into negative or positive space. Nifty. Conceptually I think we may actually be constrained on kurtosis eventually by the length of the simulation.