This project investigates the exponential distribution in R and compare it with the Central Limit Theorem. Properties of the distribution of the mean of n exponentials are illustrated via simulation.
We will simulate exponential distribution with rexp(n, lambda), where 1/lambda equals the mean and also the standard deviation.
First, we set the required parameter, lambda.
#lambda is set to 0.2
lambda <- 0.2
Let’s fix the sample size and number of simulations as well.
#Sample size is set to 40
n <- 40
#Doing 1000 simulations
nsims <- 1000
To see the distribution of the sample mean, let’s do n exponentials, take the mean, and repeat it for nsims times.
mns = NULL
# Simulate data
for (i in 1 : nsims) mns = c(mns, mean(rexp(n, lambda)))
We know that (theoretically) the sample means are centered at the population mean
1/lambda
## [1] 5
, with standard deviation of
1/lambda * 1/sqrt(n)
## [1] 0.7905694
Let’s verify that it’s approximately centered at 1/lambda = 5 and has variance of (close to) 1/lambda * 1/sqrt(n) = 0.79
mean(mns)
## [1] 5.018998
sd(mns)
## [1] 0.8096596
Next, we convert the data to standard normal and overlay a Gaussian plot over the histogram.
#Convert to standard normal
mns <- sqrt(n) * (mns - 1/lambda) * lambda
#Plotting
library(ggplot2)
g <- ggplot(data.frame(X = mns), aes(x = X))
g <- g + geom_histogram(alpha = .20, binwidth=.3, colour = "black", aes(y = ..density..))
g <- g + stat_function(fun = dnorm, size = 2)
g
As evident from the chart, the distribution of sample means is approximately Gaussian, as predicted by the Central Limit Theorem.
We can begin by simulating 1000 exponentials.
pexp <- rexp(1000, lambda)
hist(pexp, main = paste("Histogram of", 1000, "exponentials"))
The mean of the above sample is
mean(pexp)
## [1] 4.961201
The sample mean tries to estimate the population mean. With 1000 observations, it should be quite close to the theoretical mean which is given by
1/lambda
## [1] 5
According to the Law of Large Numbers, the estimated mean will more likely be closer to the theoretical mean as the sample size increases. Let’s test this with 10000 simulations.
library(ggplot2)
ntest <- 10000
means <- cumsum(rexp(ntest, lambda))/(1:ntest)
g <- ggplot(data.frame(x = 1:ntest, y = means), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + labs(x = "Sample size", y = "Cumulative mean")
g
The sample mean converges to the theoretical mean with increasing sample size as predicted by the LLN.
Since the exponential distribution has theoretical standard deviation that is equal to the theoretical mean, the variance is
1/lambda * 1/lambda
## [1] 25
Recall our first simulation above stored in pexp. This has variance of
var(pexp)
## [1] 25.77046
Again, with 1000 observations, we know from the Law of Large Numbers that it should be quite close to the theoretical variance. A plot of the cumulative variance as sample size increases will illustrate this clearly.
ntest <- 10000
sim <- rexp(ntest, lambda)
vars <- NULL
for(i in 1:ntest) vars <- c(vars,var(sim[1:i]))
g <- ggplot(data.frame(x = 2:ntest, y = vars[2:ntest]), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + labs(x = "Sample size", y = "Cumulative variance")
g
Again, we see that the sample variance converges to the theoretical variance of 25 as sample size increases. This agrees with the Law of Large Numbers.