In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that you we need to do a thousand simulations.
We will try to anwer to the following questions:
library(ggplot2)
# lambda is 0.2
lambda = 0.2
# we will be using 40 exponentials
sample_size = 40
# we will be rusample_sizening 1000 simulations
num_sim = 1:1000
# set a seed to reproduce the data
set.seed(42)
# gather the means
means <- data.frame(x = sapply(num_sim, function(x) {mean(rexp(sample_size, lambda))}))
# lets take a looks at the top means
head(means)
## x
## 1 4.915756
## 2 6.941835
## 3 4.775331
## 4 5.310784
## 5 7.002644
## 6 5.320620
The expected mean (or mu) of an exponential distribution of lamda is:
sim_mu = 1 / lambda
print(sim_mu)
## [1] 5
The sample mean of our 1000 simulations of 40 random samples of exponential distributions is:
sim_mean <- mean(means$x)
print(sim_mean)
## [1] 4.986508
standard deviation of the distribution
simexpsd <- (1/lambda)/sqrt(sample_size)
print(simexpsd)
## [1] 0.7905694
variance of the distribution
simexpvar <- simexpsd^2
print(simexpvar)
## [1] 0.625
sample standard deviation
simsd <- sd(means$x)
print(simsd)
## [1] 0.7965177
sample variance
simvar <- var(means$x)
print(simvar)
## [1] 0.6344405
The distribution of sample means is as follows.
By looking at the below graph we can see that the distribution of the sample means (blue) approaches the normal distribution (red).
m <- ggplot(data = means, aes(x = x))
m + geom_histogram(binwidth=0.1, aes(y=..density..)) + stat_function (fun = dnorm, args = list (mean = sim_mu , sd = simsd), colour = "red", size=2) + geom_density (colour="blue", size=2) + labs(x="Means") +labs (y="Density")
Due to the central limit theorem, the averages of samples follow normal distribution. The figure above also shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values.