This report will investigate the exponential distribution in R. Then compare it with the Central Limit Theorem (CLT). We will do simulations to show that distribution of averages of exponentials is approximately standard normal.
The exponential distribution can be simulated in R with \(rexp(n\), \(\lambda\)) where \(\lambda\) is the rate parameter. The mean of exponential distribution is \(1 / \lambda\) and the standard deviation is also \(1 / \lambda\). We set \(\lambda\) = 0.2 for all of the simulations.
We simulate 1000 exponential distributions then find their means. Each distribution has sample size n = 40.
# sample size
n <- 40
# exponential rate parameter
lambda <- 0.2
# sequence of mean of exponential distributions
mns <- NULL
set.seed(4567)
for (i in 1 : 1000)
mns <- c(mns, mean(rexp(n,lambda)))
We can find Theoretical Mean and Sample mean for the 1000 means of 40 random exponentials as Table 1.
# theoretical mean of exponential distribution
mu <- 1/lambda
# Sample mean
X <- mean(mns)
table1 <- data.frame(X,mu)
names(table1) <- c("Sample Mean","Theoretical Mean")
library(knitr)
kable(table1, caption = "Table 1 Sample Mean VS Theoretical Mean")
| Sample Mean | Theoretical Mean |
|---|---|
| 5.02484 | 5 |
We can find Sample and Theoretical Variance of the 1000 means of 40 random exponentials as Table 2.
# Sample standard deviation
S <- sd(mns)
#theoretical exponential distribution standard deviation
sigma <- (1/lambda)
table2 <- data.frame(S^2,sigma^2/n)
names(table2) <- c("Sample Variance","Theoretical Variance")
kable(table2, caption = "Table 2 Sample variance VS Theoretical Variance")
| Sample Variance | Theoretical Variance |
|---|---|
| 0.6373425 | 0.625 |
Let compare histograms between 1000 random exponentials (Figure 1) and 1000 means of 40 random exponentials (Figure 2).
library(ggplot2)
plot1 <- data.frame(rexp(1000,lambda))
names(plot1) <- "Xi"
g1 <- ggplot(plot1,aes(x=Xi))
g1 <- g1 + geom_histogram(binwidth = 0.2, fill=I("cyan"), col=I("black"), alpha=I(.2))
g1 <- g1 + labs(title = "Figure 1. Histogram of 1000 random exponentials")
g1 <- g1 + theme(plot.title = element_text(size=10))
plot2 <- data.frame(mns)
names(plot2) <- "Xi"
g2 <- ggplot(plot2,aes(x=Xi))
g2 <- g2 + geom_histogram(binwidth = 0.2, fill=I("cyan"), col=I("black"), alpha=I(.2))
g2 <- g2 + labs(title = "Figure 2. Histogram of 1000 means of 40 random exponentials")
g2 <- g2 + theme(plot.title = element_text(size=10))
library(gridExtra)
grid.arrange(g1, g2, ncol = 2)
We can see that Figure 2 look far more standard normalization than Figure 1. This is because Figure 2 created from 1000 distributions while Figure 1 only created from 1 distribution.
Next we will show two normalization curves in Figure 3. The red curve is theoretical curve \(N(\mu, \sigma^2 / n)\) and the blue curve is sample curve.
plot3 <- plot2
g3 <- ggplot(plot3,aes(x=Xi))
g3 <- g3 + geom_histogram(aes(y=..density..), binwidth = 0.2, fill=I("cyan"), col=I("black"), alpha=I(.2))
g3 <- g3 + geom_density(aes(color = "Sample_Curve"))
theoretical_curve = dnorm(plot3$Xi, mean=mu, sd=sigma/sqrt(n))
g3 <- g3 + geom_line(aes(y = theoretical_curve, color = "Theoretical_Curve"))
g3 <- g3 + scale_color_manual(name = "Curve", values = c(Sample_Curve = "blue", Theoretical_Curve = "red"))
g3 <- g3 + labs(title = "Figure 3. Sample curve VS Theoretical curve")
g3
From Figure 3, sample curve is approximately theoretical normal curve \(N(\mu, \sigma^2 / n)\)
As Central Limit theoretical (CLT) states that the distribution of averages of iid variable (independent and identically) becomes that of a standard normal as sample size increase. The sample distribution is approximately \(N(\mu, \sigma^2 / n)\).
We have shown in Table 1 that Sample mean is approximately Theoretical mean (\(\mu\)).
We have shown in Table 2 that Sample variance is approximately Theoritical variance (\(\sigma^2 / n\)).
We have shown in Figure 1 and 2 that distribution of averages of exponentials has shape like standard normal.
We have shown in Figure 3 that the sample curve is approximately Theoretical standard normal curve \(N(\mu, \sigma^2 / n)\).