In this project we will investigate how the exponential distribution works in R (using function such as rexp(n, lambda) or pexp(q, lambda)) and we will compare it to the Central Limit Theorem (or CTL in short). We will be doing this through simulation, that is studing the distribution of averages of 40 exponentials repeated thousand times.
We draw an Exponential distribution of n = 1000 and lambda = 3 (left panel) and we apply the Central Limit Theorem which states that the distribution of the sum (or average) of a large number of independent, identically distributed variables will be approximately normal (more information could be oibatained here). So if we apply that to our previous data set we should get a normal distribution (Right panel).
In this prject we will be simulating the distribution of the mean from 40 exponentials. then we will address the following question:
As stated in the Exercise instructions lambda will be set at 0.2 and therefore the theorical mean should be:
lambda <- 0.2
tmean <- 1/lambda
print(tmean)
## [1] 5
Now we will simulate our distribution of the 40 exponentials:
set.seed(2016)
n <- 40
sim <- NULL
for (i in 1:1000) sim <- c(sim, mean(rexp(n, rate = lambda)))
And we will calculate the sample mean:
smean <- mean(sim)
print(smean)
## [1] 4.979186
So the mean extracted from our simulation sample is not exactly our theorical mean but they are pretty close:
difference <- tmean - smean
print(difference)
## [1] 0.0208141
We will calculate the theorical variance and the variance obtained after our simulation:
# the theorical Variance is:
tsd <- (1/lambda)/sqrt(n)
tvar <- tsd^2
print(tvar)
## [1] 0.625
# our sample variance is:
svar <- var(sim)
print(svar)
## [1] 0.6384844
So there is only a difference of
## [1] 0.01348445
As we pointed out in the backgroud section a large collection of averages of 40 exponentials results in a Normal distribution. Here we will show it with and increased collection of averages:
Informally called the bell curve; we have computed here 100 averages of 40 random samples, and we see here that they are approximately well distributed and beautyful bell shaped. As an additional information, note that the shape of the dark-blue bell curve (our sample distribution) is exactly the same as the theorically normal distribution (orange dot line). Increasing the number of averages (right panel, increased to 10000), results in an almost complete overlapping of the mean and variance values of the theorical and the sample distributions.
Here I am presenting the code I have employed to generate the different plots you can see through the project:
ex <- rexp(1000, rate = 3)
mns <- NULL
for (i in 1:1000) mns <- c(mns, mean(rexp(1000, rate = 3)))
par(mfrow = c(1, 2), mar = c(2,2,1,1))
hist(ex, xlab = "", main = "Exponential Distribution", prob = TRUE)
curve(dexp(x, rate = 3), col = 2, lty = 2, lwd = 2, add = TRUE)
hist(mns, prob = TRUE, xlab = "", main = "CTL applied")
curve(dnorm(x, mean = mean(mns), sd = sd (mns)), add = TRUE, col = 2, lty = 2, lwd = 2)
par(mfrow = c(1, 1))
hist(sim, prob = TRUE, xlab = "lambda = 0.2", main = "1000 simulations", col = "lightblue")
set.seed(2016)
n <- 40
mns <- NULL
for (i in 1:1000) mns <- c(mns, mean(rexp(n, rate = lambda)))
par(mfrow = c(1, 2))
hist(mns, prob = TRUE, xlab = "1000 averages", main = "Distribution of Averages", col = "lightgreen")
curve(dnorm(x, mean = mean(mns), sd = sd (mns)), add = TRUE, col = "darkblue",lwd = 2)
curve(dnorm(x, mean = 5, sd = tsd), add = TRUE, col = "orange", lwd = 2, lty = 2)
abline(v = mean(mns), col = "red", lwd = 2)
legend("topright", c("theorical", "sample"), fill = c("orange", "darkblue"))
mns2 <- NULL
for (i in 1:10000) mns2 <- c(mns2, mean(rexp(n, rate = lambda)))
hist(mns2, prob = TRUE, xlab = "10000 averages", main = "Distribution of Averages", col = "lightgreen")
curve(dnorm(x, mean = mean(mns2), sd = sd (mns2)), add = TRUE, col = "darkblue",lwd = 2)
curve(dnorm(x, mean = 5, sd = tsd), add = TRUE, col = "orange", lwd = 2, lty = 2)
abline(v = mean(mns2), col = "red", lwd = 2)
legend("topright", c("theorical", "sample"), fill = c("orange", "darkblue"))