In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that we will need to do a thousand simulations.
We set the seed for reproducibility, then define the lambda, number of exponentials, and the number of simulations to run.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.1
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.4.1
## Warning: package 'tidyr' was built under R version 3.4.1
## Warning: package 'readr' was built under R version 3.4.1
## Warning: package 'purrr' was built under R version 3.4.1
## Warning: package 'dplyr' was built under R version 3.4.1
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
set.seed(20)
lambda <- 0.2
n <- 40
sims <- 1000
We run the simulations/replications and then calculate the mean.
sim_exe <- replicate(sims, rexp(n, lambda))
mean_sim_exe <- apply(sim_exe, 2, mean)
hist(mean_sim_exe, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", breaks = 20)
abline(v = mean(mean_sim_exe), lwd = "5", col = "blue")
We can calculate the simulated mean and then the theoretical mean to compare the two. We see that both the theoretical and simulated means are very close at almost 5.
sim_mean <- mean(mean_sim_exe)
sim_mean
## [1] 4.963828
theoretical_mean <- 1/lambda
theoretical_mean
## [1] 5
Again we can see that our simulated and theoretical variances are very close, at almost 0.6 for each.
sim_variance <- (sd(mean_sim_exe))^2
sim_variance
## [1] 0.5919712
theoretical_variance <- (1/lambda)^2 / n
theoretical_variance
## [1] 0.625
Next we are looking at the distribution, which falls into a relatively normal distribution. We can see the min and max at about 3 and 8.
min(mean_sim_exe)
## [1] 2.911542
max(mean_sim_exe)
## [1] 8.392128
hist(mean_sim_exe, prob = TRUE, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", xlim = c(floor(min(mean_sim_exe)), ceiling(max(mean_sim_exe))))
hist(mean_sim_exe, prob = TRUE, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", breaks = 40)
lines(density(mean_sim_exe), col = "blue")
abline(v = mean(mean_sim_exe), lwd = "5", col = "blue")
qqnorm(mean_sim_exe)
qqline(mean_sim_exe, col = 4)
The bulk of the middle of the graph shows how our numbers align and follow a normal distribution. This is approximately normal with some variation out at the extremes of the values.