The main aim of this project is to understand the Central Limit Theorem using the Exponential Distribution as case test.

Started by using a known seed (a pseudo-random number) to be able to reproduce the distribution in study.

set.seed(1000)

Then plotted a single simulation of 40 random exponential distributions to understand the behavior of the function x~Exp(1/0.2)

As it is a continuous variable, a histogram and box-plot were the chosen plots.

Exp_function = rexp(40,0.2)
par(mfrow=c(1,2))
par(oma=c(0,0,2,0))
hist(Exp_function, main = "Histogram", col = "grey", xlab="Exponential Functions")
boxplot(Exp_function, main = "Boxplot", col = "grey", xlab = "Exponential Functions")
title(main = "Simulation of 40 exponentials", outer= TRUE)

The simulation of the distribution of 1000 averages of 40 exponential distributions was then calculated and plotted:

exp_1000 = NULL
for (i in 1:1000) exp_1000 = c(exp_1000, mean(rexp(40,0.2)))

par(mfrow=c(1,3))
par(oma=c(0,0,2,0))
hist(exp_1000, prob = TRUE, col = "grey", main = "Histogram", xlab ="Exponential Functions")
lines(density(exp_1000), col ="red", lwd=2)
boxplot(exp_1000, col = "grey", main = "Boxplot", xlab = "Exponential Functions")
qqnorm(exp_1000)
qqline(exp_1000)
title(main = "One thousand simulations of 40 exponential distributions", outer= TRUE)

The fitting to the normal curve is almost perfect (Histogram) with a good agreement between the theoretical and sampling quantiles (q-q plot). So, the averages of samples of 40 distributions follow a normal distribution.

Then, a comparison between the Sample and Theoretical Means was presented in the form of a table (points 2 and 3):

theoretical_mean=1/0.2
Theoretical_values <-c(round(theoretical_mean,3),round((theoretical_mean**2)/40,3))
Simulation_values<-c(round(mean(exp_1000),3),round(var(exp_1000),3))
rbind(Theoretical_values,Simulation_values)
##                     [,1]  [,2]
## Theoretical_values 5.000 0.625
## Simulation_values  4.987 0.654

Clearly, the average of the experimental mean and variance approximates the theoretical values.

The Interval of confidence at 95% for the mean of population (u=5) is:

mean(exp_1000)+c(-1,1)*qnorm(0.975)*sd(exp_1000)/sqrt(1000)
## [1] 4.937347 5.037602

We could go higher and simulate 100000 distributions of 40 exponential and check that the distribution will better fit to the normal distribution.

exp_100000 = NULL
for (i in 1:100000) exp_100000 = c(exp_100000, mean(rexp(40,0.2)))

par(mfrow=c(1,3))
par(oma=c(0,0,2,0))
hist(exp_100000, prob = TRUE, col = "grey", main = "Histogram", xlab ="Exponential Functions")
lines(density(exp_100000), col ="red", lwd=2)
boxplot(exp_100000, col = "grey", main = "Boxplot", xlab = "Exponential Functions")
qqnorm(exp_100000)
qqline(exp_100000)
title(main = "One hundred thousand simulations of 40 exponential distributions", outer= TRUE)