Overview

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that we will need to do a thousand simulations.

Part 1 – simulation Exercise

  1. Show the sample mean, and compare to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show the distribution is approximately normal.

We set the seed for reproducibility, then define the lambda, number of exponentials, and the number of simulations to run.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.1
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.4.1
## Warning: package 'tidyr' was built under R version 3.4.1
## Warning: package 'readr' was built under R version 3.4.1
## Warning: package 'purrr' was built under R version 3.4.1
## Warning: package 'dplyr' was built under R version 3.4.1
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
set.seed(20)

lambda <- 0.2
n <- 40
sims <- 1000

We run the simulations/replications and then calculate the mean.

sim_exe <- replicate(sims, rexp(n, lambda))
mean_sim_exe <- apply(sim_exe, 2, mean)
hist(mean_sim_exe, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", breaks = 20)
abline(v = mean(mean_sim_exe), lwd = "5", col = "blue")

Sample mean vs Theoretical Mean

We can calculate the simulated mean and then the theoretical mean to compare the two. We see that both the theoretical and simulated means are very close at almost 5.

sim_mean <- mean(mean_sim_exe)
sim_mean
## [1] 4.963828
theoretical_mean <- 1/lambda
theoretical_mean
## [1] 5

Sample Variance vs Theoretical Variance

Again we can see that our simulated and theoretical variances are very close, at almost 0.6 for each.

sim_variance <- (sd(mean_sim_exe))^2
sim_variance
## [1] 0.5919712
theoretical_variance <- (1/lambda)^2 / n
theoretical_variance
## [1] 0.625

Distribution

Next we are looking at the distribution, which falls into a relatively normal distribution. We can see the min and max at about 3 and 8.

min(mean_sim_exe)
## [1] 2.911542
max(mean_sim_exe)
## [1] 8.392128
hist(mean_sim_exe, prob = TRUE, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", xlim = c(floor(min(mean_sim_exe)), ceiling(max(mean_sim_exe))))

hist(mean_sim_exe, prob = TRUE, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", breaks = 40)
lines(density(mean_sim_exe), col = "blue")
abline(v = mean(mean_sim_exe), lwd = "5", col = "blue")

qqnorm(mean_sim_exe)
qqline(mean_sim_exe, col = 4)

The bulk of the middle of the graph shows how our numbers align and follow a normal distribution. This is approximately normal with some variation out at the extremes of the values.