Overview

In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that you we need to do a thousand simulations.

We will try to anwer to the following questions:

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Analysis

Load required libraries

library(ggplot2)

Simulation

# lambda is 0.2
lambda = 0.2

# we will be using 40 exponentials
sample_size = 40

# we will be rusample_sizening 1000 simulations
num_sim = 1:1000

# set a seed to reproduce the data
set.seed(42)

# gather the means
means <- data.frame(x = sapply(num_sim, function(x) {mean(rexp(sample_size, lambda))}))

# lets take a looks at the top means
head(means)
##          x
## 1 4.915756
## 2 6.941835
## 3 4.775331
## 4 5.310784
## 5 7.002644
## 6 5.320620

Sample Mean vs. Theoretical Mean

The expected mean (or mu) of an exponential distribution of lamda is:

sim_mu = 1 / lambda
print(sim_mu)
## [1] 5

The sample mean of our 1000 simulations of 40 random samples of exponential distributions is:

sim_mean <- mean(means$x)
print(sim_mean)
## [1] 4.986508

Sample Variance vs. Theoretical Variance

standard deviation of the distribution

simexpsd <- (1/lambda)/sqrt(sample_size)
print(simexpsd)
## [1] 0.7905694

variance of the distribution

simexpvar <- simexpsd^2
print(simexpvar)
## [1] 0.625

sample standard deviation

simsd <- sd(means$x)
print(simsd)
## [1] 0.7965177

sample variance

simvar <- var(means$x)
print(simvar)
## [1] 0.6344405

The distribution of sample means is as follows.

By looking at the below graph we can see that the distribution of the sample means (blue) approaches the normal distribution (red).

m <- ggplot(data = means, aes(x = x)) 
m + geom_histogram(binwidth=0.1, aes(y=..density..)) + stat_function (fun = dnorm, args = list (mean = sim_mu , sd = simsd), colour = "red", size=2) + geom_density (colour="blue", size=2) + labs(x="Means") +labs (y="Density") 

Due to the central limit theorem, the averages of samples follow normal distribution. The figure above also shows the density computed using the histogram and the normal density plotted with theoretical mean and variance values.