We investigated the exponential distribution and compare it with CLT (Central Limit Theorem). We’ll see the difference between expect value and real value, and how this distribution could be approximately normal.
set.seed(26)
n <- 40
no.sim <- 1000
sim.mat <- matrix(rexp(no.sim*n, .2), no.sim)
hist(sim.mat, col="navajowhite1",
border="navajowhite1",
main="Hist. of Sample", xlab="Value")
First, we build a simulation matrix of random exponentials. There’s 1000 rows and 40 cols. That means there’s 40 components for each 1000 simulations. We made a histogram about sample data also. They seems to be from the exponential distribution.
mns <- apply(sim.mat, 1, mean)
hist(mns, main="Hist. of Sample Means",
xlab="Sample Mean", col="navajowhite1",
border="navajowhite1") ; abline(v=mean(mns),col="snow4", lwd=4)
We created the variable mns, which contains 1000 sample means of each 40 exponentials. Above figure is a histogram about mns variable. It looks like a normal distribution. The vertical line represents the mean of sample means, xbar.
xbar <- mean(mns)
xbar ; (theo.mean <- 1/.2)
## [1] 4.955705
## [1] 5
The value of xbar is 4.9557055, while a theoretical mean is 1/rate = 1/.2 = 5. Since the sample mean is unbiased estimator for population mean, those are almost similar, as we expected.
vars <- apply(sim.mat, 1, var)
expected.samvar <- mean(vars)
expected.samvar ; (theo.var <- (1/.2)^2)
## [1] 24.83952
## [1] 25
The value of expected value is 24.839525, while a theoretical value is 25. Like sample mean, the expected value of sample variance is unbiased estimator for population variance. So those two are similar too.
par(mfrow=c(1,2))
hist(sim.mat, col="navajowhite1",
border="navajowhite1",
main="Hist. of Sample", xlab="Value"); hist(mns,
main="Hist. of Sample Means",
xlab="Sample Mean", col="navajowhite1",
border="navajowhite1") ; abline(v=mean(mns),col="snow4", lwd=4)
We created sample data by sampling 1,000 times for each 40 exponentials, by rexp() function. These variables are independent and identically distributed variables(i.e. iid sample), their sum tends toward a normal distribution. In other words, we can use CLT to sample means of sample dataset.
The figure on the left hand side is a histogram of sim.mat. These are composed of iid samples, but not a form of sum. Therefore, it seems to follow exponential distribution, the original one.
On the right hand side, there’s a histogram of sample means. Its shape is similar to that of normal distribution, because of CLT.