Overview

The central limit theorem establish that:

“in many situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed.”

For this analysis we will use the exponential distribution in order to analyse this theorem.

Simulations

For this analysis we will take 40 samples and calculate the mean, this process will be repeated 1000 times fater this we will have a set of sampling means that will do as the basis of the simulation.

We will use lambda = 2 for the exponential distribution, we use rexp to get the random points of the exponential distribution.

# We set the data that we will need 
nSamples<-40
nMeans<-1000
lambda<-0.2

samplings<-rexp(nSamples*nMeans,lambda)

arrayOfMeans<-tapply(samplings,rep(1:nMeans, rep(nSamples, nMeans)), FUN=mean)

Sample Mean versus Theoretical Mean

We will get the theoretical and sample mean, I will use the means to demonstrate te Central Limit Theorem.

theoreticalMean<-1/lambda

sampleMean<-mean(arrayOfMeans)

So, the means are:

Theoretical mean = 5
Sample mean = 4.9881671

Sample Variance versus Theoretical Variance

I will use the sample variance and mean sample variance in order to compare the theoretical and the sample variance.

In the case of the exponential distribution the variance is 1/lambda^2 and the mean sample variance is variance/n where n = number of samples to calculate the mean n this case 40.

theoreticalVariance<-1/(lambda^2)

sampleVariance<-var(samplings)

theoreticalMeanVariance<-(theoreticalVariance)/40

meanVariance<-var(arrayOfMeans)

Theoretical variance = 25
Sample variance = 24.809946

Theoretical mean variance = 0.625
Sample mean variance = 0.6343142

Distribution

To demonstrate that the distribution is approximately normal I will compare the quantiles of the distribution and thos of a normal distribution with mean=5 and variance= 0.625.

quantiles<-quantile(arrayOfMeans,c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9))
aux<-qnorm(c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9),theoreticalMean,sqrt(theoreticalMeanVariance))
total<-rbind(quantiles,aux)
xtableMaxInterval<-xtable(total)
print(xtableMaxInterval,type="html")
10% 20% 30% 40% 50% 60% 70% 80% 90%
quantiles 4.00 4.37 4.58 4.75 4.94 5.11 5.36 5.63 6.03
aux 3.99 4.33 4.59 4.80 5.00 5.20 5.41 5.67 6.01

Now I will graph the histogram of the distribution and a normal distribution with same variance and mean.

df<-as.data.frame(arrayOfMeans)
ggplot(df, aes(arrayOfMeans))+
  geom_histogram(aes(y=..density..), fill="aliceBlue", col="azure4", bins = 100)+
  geom_density(colour="darkcyan", size=1)+
  stat_function(fun = dnorm, colour = "blue", args = list(mean = theoreticalMean, sd = theoreticalMeanVariance))