The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda with lambda 0.2 .Distribution of averages of 40 expontials will be investigated and illustrated.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
Distributions of 1000 averages of exponential distribution
#set seed for reproducibility
set.seed(1000)
#set constants
lambda <- 0.2 # set lambda for exponential distribution
num_of_simulation <- 1000 # set number of simulation
num_of_avgs <- 40 # number of exponentials
# means with 1000 averages of exponential distribution of 40 exponentials with lambda 0.2
means = NULL
for (i in 1:num_of_simulation) means = c(means, mean(rexp(num_of_avgs, lambda)))
summary(means)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.936 4.416 4.903 4.987 5.533 8.036
data <- as.data.frame(means)
ggplot(data, aes(x = means)) +
geom_histogram(binwidth = 0.1, aes(y = ..density..))
The mean of exponential distribution is \(\mu = \frac{1}{\lambda}\). For each of the 1000 samples we will calculate the mean. Theoretically, this the same as drawing a single sample of size 1000 from the corresponding sampling distribution with \(N(\frac{1}{0.2}, \frac{\frac{1}{0.2}}{\sqrt{40}})\). According to the CLT we would expect that each single mean of those 1000 means is already approximately \(\frac{1}{\lambda} = \frac{1}{0.2} = 5\). Since we now caluculate the mean of 1000 sampled means we expect the output to be very close to 5.
mu <- 1 / lambda # mean of exponential distribution
mu
## [1] 5
meansofmeans <- mean(means) # mean of 1000 averages of 40 exponentials
meansofmeans
## [1] 4.986963
\(\bar{x}\) in our case is 4.99 which is very close to the mean of the theoretical distribution namely \(\mu = \frac{1}{0.2} = 5\).
ggplot(data, aes(x = means)) +
geom_histogram(binwidth = 0.1, aes(y = ..density..), alpha = 0.5) +
geom_vline(xintercept = mu, size = 1, color = "red") +
geom_vline(xintercept = meansofmeans, size = 1, aes(color = "blue"))
The standard deviation of the exponential distribution is \(\sigma = \frac{1}{\lambda}\). According to the CLT we would expect that the variance of the sample of the 1000 means is approximately \(\frac{\frac{1}{0.2^{2}}}{40} = 0.625\).
sd <- 1/lambda/sqrt(40)
sd
## [1] 0.7905694
sdfromMeans <- sd(means)
sdfromMeans
## [1] 0.8089147
\(s^{2}\) in our case is r round(var(exp_sample_means), 2) which is close to the variance of the theoretical distribution we mentioned above.
ggplot(data, aes(x = means)) +
geom_histogram(binwidth = 0.1, aes(y = ..density..), alpha = 0.5) +
stat_function(fun = dnorm, arg = list(mean = mu , sd = sd), colour = "red", size = 1) +
geom_density(colour = "blue", size = 1)
As you can see from above graphs, the calculated distribution of means of random sampled exponantial distributions, overlaps quite nice with the normal distribution with the expected values based on the given lamba.
If you want to see complete code for this project, please visit GitHub.