Exponential Distribution & Central Limit Theorem

Overview

In this project we will investigate the Exponential Distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. We will investigate the distribution of averages of 40 exponentials. Note that we will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. we will see…

Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.

Simulation

lambda <- 0.2 # set lambda value as mentioned above
n <- 40 # exponentials
nsims <- 1000 # number of simulations

set.seed(1234567) # set seed for reproducibility

simulated_means <- NULL
for (i in 1 : nsims) simulated_means <- c(simulated_means, mean(rexp(n, lambda)))
hist(simulated_means, main = "Distribution of The mean of exponential")

Show where Center of Distribution and Theoretical Center of Distribution

The mean (mu) of the Distribution is (1/lambda)

1 / lambda

## [1] 5

The sample mean of 1000 simulation of 40 randon exp distribution is

mean(simulated_means)

## [1] 5.038774

Show how variable it is and compare it to the theoretical variance of the distribution

The standard deviation & it’s variance:

(1 / lambda) / sqrt(n) # SD

## [1] 0.7905694

((1 / lambda) / sqrt(n)) ^ 2 # variance

## [1] 0.625

The standard deviation and variance of the simulated means:

sd(simulated_means) # SD

## [1] 0.7880409

var(simulated_means) # variance

## [1] 0.6210084

Some Exploratory data analysis

The above histogram looks loke a normal distribution. We have also checked its mean and standard deviation. Lets check how close it to normal distribution by plotting.

Red colored: Normal distribution

Blue colored: Simulated distribution of mean of 1000 random simulation

library(ggplot2)
g <- ggplot(data.frame(simulated_means), aes(x = simulated_means))
g <- g + geom_histogram(aes(y=..density..))
g <- g + labs(title="Distribution of Means of Simulation", y="Density", x="Mean")
g <- g + stat_function(fun=dnorm, args=list(mean=1/lambda, sd=(1/lambda)/sqrt(n)), color = "red", size=1)
g <- g + stat_function(fun=dnorm, args=list(mean=mean(simulated_means), sd=sd(simulated_means)), color = "blue", size=1)
print(g)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

More simulation, the sample distribution will follow Normal Distribution. Here we will simulate 100,000 times for each averages of 40 exponentials. The plot will show it nearly matching Normal Distribution (the red and blue line alsmot matching).

set.seed(1234567) # set seed for reproducibility
simulated_means <- NULL
for (i in 1 : 100000) simulated_means <- c(simulated_means, mean(rexp(n, lambda)))
hist(simulated_means, main = "Distribution of The mean of exponential")

g <- ggplot(data.frame(simulated_means), aes(x = simulated_means))
g <- g + geom_histogram(aes(y=..density..))
g <- g + labs(title="Distribution of Means of Simulation", y="Density", x="Mean")
g <- g + stat_function(fun=dnorm, args=list(mean=1/lambda, sd=(1/lambda)/sqrt(n)), color = "red", size=1)
g <- g + stat_function(fun=dnorm, args=list(mean=mean(simulated_means), sd=sd(simulated_means)), color = "blue", size=1)
print(g)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Conclusion

We found that more observations gives better approximation. We are getting Standard Normal Distribution for more observation scenario.