The central limit theorem establish that:
“in many situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed.”
For this analysis we will use the exponential distribution in order to analyse this theorem.
For this analysis we will take 40 samples and calculate the mean, this process will be repeated 1000 times fater this we will have a set of sampling means that will do as the basis of the simulation.
We will use lambda = 2 for the exponential distribution, we use rexp to get the random points of the exponential distribution.
# We set the data that we will need
nSamples<-40
nMeans<-1000
lambda<-0.2
samplings<-rexp(nSamples*nMeans,lambda)
arrayOfMeans<-tapply(samplings,rep(1:nMeans, rep(nSamples, nMeans)), FUN=mean)
We will get the theoretical and sample mean, I will use the means to demonstrate te Central Limit Theorem.
theoreticalMean<-1/lambda
sampleMean<-mean(arrayOfMeans)
So, the means are:
Theoretical mean = 5
Sample mean = 4.9881671
I will use the sample variance and mean sample variance in order to compare the theoretical and the sample variance.
In the case of the exponential distribution the variance is 1/lambda^2 and the mean sample variance is variance/n where n = number of samples to calculate the mean n this case 40.
theoreticalVariance<-1/(lambda^2)
sampleVariance<-var(samplings)
theoreticalMeanVariance<-(theoreticalVariance)/40
meanVariance<-var(arrayOfMeans)
Theoretical variance = 25
Sample variance = 24.809946
Theoretical mean variance = 0.625
Sample mean variance = 0.6343142
To demonstrate that the distribution is approximately normal I will compare the quantiles of the distribution and thos of a normal distribution with mean=5 and variance= 0.625.
quantiles<-quantile(arrayOfMeans,c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9))
aux<-qnorm(c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9),theoreticalMean,sqrt(theoreticalMeanVariance))
total<-rbind(quantiles,aux)
xtableMaxInterval<-xtable(total)
print(xtableMaxInterval,type="html")
| 10% | 20% | 30% | 40% | 50% | 60% | 70% | 80% | 90% | |
|---|---|---|---|---|---|---|---|---|---|
| quantiles | 4.00 | 4.37 | 4.58 | 4.75 | 4.94 | 5.11 | 5.36 | 5.63 | 6.03 |
| aux | 3.99 | 4.33 | 4.59 | 4.80 | 5.00 | 5.20 | 5.41 | 5.67 | 6.01 |
Now I will graph the histogram of the distribution and a normal distribution with same variance and mean.
df<-as.data.frame(arrayOfMeans)
ggplot(df, aes(arrayOfMeans))+
geom_histogram(aes(y=..density..), fill="aliceBlue", col="azure4", bins = 100)+
geom_density(colour="darkcyan", size=1)+
stat_function(fun = dnorm, colour = "blue", args = list(mean = theoreticalMean, sd = theoreticalMeanVariance))