In this project I will investigate the distribution of the mean of exponentials in R and compare it with the Central Limit Theorem in the following aspects:
In probability theory and statistics, the exponential distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant average rate.
The probability density function is \[ f(x, \lambda) = \left\{\begin{array}{ll} \lambda e ^ {-\lambda x} & x >= 0 \\ 0 & x < 0 \end{array} \right. \]
The mean of an exponentially distributed random variable \(X\) with the rate parameter \(\lambda\) is given by \[ E[X] = \frac{1}{\lambda} \]
The variance of \(X\) is given by \[ VAR[X] = \frac{1}{\lambda^2} \]
so the standard deviation is also \(\frac{1}{\lambda}\).
Set \(\lambda\) = 0.2 for all of the simulations.
The following plot simiulates the 1000 distributions of the mean of 40 exponentials with \(\lambda\) = 0.2.
set.seed(643)
nosim <- 1000
n <- 40
lambda <- 0.2
mns = NULL
for (i in 1 : nosim) mns = c(mns, mean(rexp(n, lambda)))
hist(mns, main="1000 Similations of the mean of 40 exponential distributions", density=10, xlab="lambda=0.2", breaks=20)
The mean of exponential distribution with \(\lambda\) = 0.2 is \[ E[X] = \frac{1}{\lambda} = \frac{1}{0.2} = 5 \]
The sample mean is
mean(mns)
## [1] 4.991655
In the following plot the blue line is the theoretical mean and the red dash line is the sample mean from the simulation. They are almost overlapping.
library(ggplot2)
g <- ggplot() + aes(mns) + geom_histogram(binwidth = .2, colour = "darkgreen", fill = "white" )
g <- g + scale_x_continuous(breaks = 2:8)
g <- g + geom_vline(xintercept = 5, colour = "blue", size = 1)
g <- g + geom_vline(xintercept = mean(mns), colour = "red", size = 1, linetype = "longdash")
g <- g + labs(x = "X", y = "Density", title = "1000 Similations of the mean of 40 exponential distributions")
g
The expected variance of \(X\) is given by \[ VAR[X] = \frac{1}{\lambda ^ 2} = \frac{1}{0.2^2} = 25 \]
Standard Error of the mean with sample size of 40 is \[ SE = \sqrt \frac{VAR[X]}{n} = \sqrt \frac{25}{40} = 0.791 \]
The Theoretical Variance is \[ SE ^ 2 = 0.625 \]
The sample variance of the mean is
var(mns)
## [1] 0.6237204
The sample variance of the mean is very close to the theoretical variance.
The following plot compares the normal distribution in blue line with the density of the simulated distribution in red line. The approximation is not exact but let us see in the next section if we can improve with larger sample size.
library(ggplot2)
g <- ggplot() + aes(mns) + geom_histogram(aes(y =..density..), binwidth=.2, colour = "darkgreen", fill = "white" )
g <- g + scale_x_continuous(breaks = 2:8)
g <- g + stat_function(fun = dnorm, colour = "blue", arg = list(mean = 5)) + geom_density(colour = "red")
g <- g + labs(x = "Mean", y = "Density", title = "1000 Similations of the mean of 40 exponential distributions")
g
The following plot compares the distribution using sample sizes of 40, 80 and 120. The distribution becomes that of a standard normal as the sample size increases.
set.seed(643)
nosim <- 1000
lambda <- 0.2
cfunc <- function(x, n) (mean(x) - 5) * lambda * sqrt(n)
dat <- data.frame(
x = c(apply(matrix(rexp(nosim * 40, lambda), nosim), 1, cfunc, 40),
apply(matrix(rexp(nosim * 80, lambda), nosim), 1, cfunc, 80),
apply(matrix(rexp(nosim * 120, lambda), nosim), 1, cfunc, 120)
),
size = factor(rep(c(40, 80, 120), rep(nosim, 3))))
g <- ggplot(dat, aes(x = x, fill = size)) + geom_histogram(binwidth=.3, colour = "black", aes(y = ..density..)) + scale_fill_brewer(palette="Spectral")
g <- g + stat_function(fun = dnorm, size = 2)
g + facet_grid(. ~ size)
In the report I investigated the distribution of averages of 40 exponentials. Results shows that the sample mean closely matches the theoretical mean. The sample variance of the mean is also estimated quite accurately by the theoretical variance. The distribution of the sample mean approximated by the normal distribution, especially when the sample size increases.