In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. https://en.wikipedia.org/wiki/Central_limit_theorem
To examine the central limit theorem applied to the exponential distribution let’s plot the histogram of the N = 1000 mean of the M = 40 samples drawn from the exponential distribution.
set.seed(1)
N = 1000
M = 40
lambda <- 2
mns = c(N)
for (i in 1 : N) mns[i] = mean(rexp(M, rate = lambda))
p1 <- hist(mns, breaks = N / 50, col=rgb(0,0,1,1/4),
xlab='Value', main='The histogram of the samples mean from the exponential distribution')
abline(v= 1/lambda, col='blue', lwd=4)
The above plot is a histogram of the mean of the 1000 samples of the mean of the exponential distribution with lambda = 2. The blue line is vertical line with X = 0.5, which is the theoretical mean of the exponential distribution with lambda = 2. It’s easy to see, what the mean of the histogram is nearly the same as the theoretical mean of the exponential distribution which is 1/lambda, which is equal to 0.5.
mean = mean(mns)
mean
## [1] 0.4990025
theor_mean = 1/lambda
print(paste('Theoretical mean is', theor_mean))
## [1] "Theoretical mean is 0.5"
The central limit theorem (CLT) establishes that mean of the properly normalized sum tends toward mean of the initial distribution. The theoretical mean of the exponential distribution is 1/lambda. As one can see, it’s quite close to the calculated sample mean.
The central limit theorem (CLT) establishes that variance of the properly normalized sum tends toward 1/(lambda^2 * M), because the variance of the exponential distribution is 1/(lambda^2 * M). Here lambda > 0 is the only parameter of the exponential distribution, often called the rate parameter, M is the count of the samples to be averaged.
variance = var(mns)
print(paste('Sample variance =', variance))
## [1] "Sample variance = 0.00611116466559575"
print(paste('Theoretical variance is', 1/(lambda^2 * M)))
## [1] "Theoretical variance is 0.00625"
As we can see, the sample variance is quite close to it’s theoretical estimate.
t.test(mns, mu = theor_mean)
##
## One Sample t-test
##
## data: mns
## t = -0.4035, df = 999, p-value = 0.6867
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
## 0.4941515 0.5038536
## sample estimates:
## mean of x
## 0.4990025
So, the theoretical mean is included in the calculated sample mean 0.95 confidence interval. Also, p - value of the test results is too large, so this is impossible to reject the hypotesis what the sample mean is equal to the theoretical mean of the normal distribution.
x <- p1$mids
plot(x, p1$density, main='Sample mean distribution density and the normal one', xlab='', ylab='', col = 'red')
lines(x, dnorm(x, mean = theor_mean, sd = sqrt(1/(lambda^2 * M))), col = 'blue')
legend(0.575, 5, c("Experimental distrubution", "Normal distribution"), col=c("red", "blue"), lty=1, cex=0.8)
The figure above compares the density of our experimental samples drawn from the mean of the exponential distrubition (red color) and the normal distrubution (blue color) with the theoretical mean and the standard deviation. It’s easy to see, what the normal distrubition approximation of the experimental one is quite good.