The project goal: investigate the exponential distribution in R and compare it with the Central Limit Theorem.

In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. https://en.wikipedia.org/wiki/Central_limit_theorem

To examine the central limit theorem applied to the exponential distribution let’s plot the histogram of the N = 1000 mean of the M = 40 samples drawn from the exponential distribution.

The central limit theorem (CLT) establishes that variance of the properly normalized sum of the M samples drawn from exponential distribution (experimental distribution) tends toward 1/(lambda^2 * M), because the variance of the exponential distribution is 1/lambda^2, and its mean value tends toward 1/lambda. Here lambda > 0 is the only parameter of the exponential distribution, often called the rate parameter, M is the count of the samples to be averaged.

set.seed(1)
N = 1000
M = 40
lambda <- 2
mns = c(N)
for (i in 1 : N) mns[i] = mean(rexp(M, rate = lambda))
p1 <- hist(mns, breaks = N / 50, col=rgb(0,0,1,1/4), 
           xlab='Value', main='The histogram of the samples mean from the exponential distribution') 
abline(v= 1/lambda, col='blue', lwd=4)

The mean of the distrubution.

The above plot is a histogram of the mean of the 1000 samples of the mean of the exponential distribution with lambda = 2. The blue line is vertical line with X = 0.5, which is the theoretical mean of the exponential distribution with lambda = 2. It’s easy to see, what the mean of the histogram is nearly the same as the theoretical mean of the exponential distribution which is 1/lambda, which is equal to 0.5.

Now let’s calculate the sample mean and compare it with theoretical value:

mean = mean(mns)
mean

## [1] 0.4990025

theor_mean = 1/lambda
print(paste('Theoretical mean is', theor_mean))

## [1] "Theoretical mean is 0.5"

The central limit theorem (CLT) establishes that mean of the properly normalized sum tends toward mean of the initial distribution. The theoretical mean of the exponential distribution is 1/lambda. As one can see, it’s quite close to the calculated sample mean.

The variance of the distrubution.

The central limit theorem (CLT) establishes that variance of the properly normalized sum tends toward 1/(lambda^2 * M), because the variance of the exponential distribution is 1/(lambda^2 * M). Here lambda > 0 is the only parameter of the exponential distribution, often called the rate parameter, M is the count of the samples to be averaged.

Let’s calculate it’s value and compare it with the theoretical value now:

variance = var(mns)
print(paste('Sample variance =', variance))

## [1] "Sample variance = 0.00611116466559575"

print(paste('Theoretical variance is', 1/(lambda^2 * M)))

## [1] "Theoretical variance is 0.00625"

As we can see, the sample variance is quite close to it’s theoretical estimate.

P - test for the mean and its confidence interval.

t.test(mns, mu = theor_mean)

## 
##  One Sample t-test
## 
## data:  mns
## t = -0.4035, df = 999, p-value = 0.6867
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
##  0.4941515 0.5038536
## sample estimates:
## mean of x 
## 0.4990025

So, the theoretical mean is included in the calculated sample mean 0.95 confidence interval. Also, p - value of the test results is too large, so this is impossible to reject the hypotesis what the sample mean is equal to the theoretical mean of the normal distribution.

Comparison of the distrubition density of the samples mean of the exponential distribution with the normal distribution density.

x <- p1$mids
plot(x, p1$density, main='Sample mean distribution density and the normal one', xlab='', ylab='', col = 'red')
lines(x, dnorm(x, mean = theor_mean, sd = sqrt(1/(lambda^2 * M))), col = 'blue')
legend(0.575, 5, c("Experimental distrubution", "Normal distribution"), col=c("red", "blue"), lty=1, cex=0.8)

The figure above compares the density of our experimental samples drawn from the mean of the exponential distrubition (red color) and the normal distrubution (blue color) with the theoretical mean and the standard deviation. It’s easy to see, what the normal distrubition approximation of the experimental one is quite good.

Coursera Statistical Inference Course Project - part 1. A simulation exercise.

Andrei Keino