In this simulation, I will be demonstrating the behavior of an exponential distribution compared the distribution of averages of 40 exponentials. I compare the means and variance (via standard deviation), then will show histograms comparing the two distributions.
Define the number of simulations (nosim), the size (n) of each simulation, define lambda, and calculate the theoretical mean and variance (sd) of the exponential distribution for lambda. Run a single simulation of size 1000, and run 1000 simulations of size 40 taking the mean of each. This gives two distrubutions of size 1000 to compare.
nosim <- 1000
n <- 40
lambda <- 0.2
mean <- 1/lambda
sd <- 1/lambda
expDist <- rexp(nosim,lambda)
avgDist = NULL
for (i in 1:nosim) {
avgDist <- c(avgDist,mean(rexp(n,lambda)))
}
Sample mean
mean(avgDist)
## [1] 5.012434
Theoretical mean
mean
## [1] 5
Comparison
data <- c(round(mean(avgDist),3),mean)
bp <- barplot(data,
names.arg=c("Sample Mean","Theoretical Mean"),
main="Sample/Theoretical Means")
text(x=bp,y=data+0.5,labels=as.character(data),xpd=TRUE)
The numbers above are very similar because the Central Limit Theorem states that given a sufficiently large N, the mean of all samples will converge to the population/theoretical mean.
Sample Variance
sd(avgDist)
## [1] 0.7956717
Theoretical Variance
sd
## [1] 5
Comparison
data <- c(round(sd(avgDist),3),sd)
bp <- barplot(data,
names.arg=c("Sample Var","Theoretical Var"),
main="Sample/Theoretical Variance")
text(x=bp,y=data+0.5,labels=as.character(data),xpd=TRUE)
The variances above are so different because we are taking the variance (or standard deviation in this case) of the the means, whereas the theoritcal variance of an exponential distribution is of individual values. The distribution of means is much less variable (and is normally distributed) than the distribution of all exponentials (which is not normally distributed).
Distribution of a large collection of random exponentials
hist(expDist,main="Random Exponentials",xlab="Exponential Values")
Distribution of averages of 40 exponentials
hist(avgDist,main = "Averages of Random Exponentials",xlab="Means of 40 Exponential Values")
The first histogram above shows the expopential distribution. One can tell due to it’s heavily skewed shape. This was built by taking 1000 random pulls from the exponential distribution.
The second histogram shows a roughly normal distribution. One can tell due to its bell-like shape. This was built by taking the means of 1000 distributions of 40 exponential random variables. This is normal because the distribution of the sample mean always converges to normal as N increases, despite the distrution of the underlying data.