Overview

In this project I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. I will set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. I will do a thousand simulations.

  1. Sample Mean versus Theoretical Mean
n <- 1000               # count of simulations
mns = NULL
for (i in 1:n) mns = c(mns, mean(rexp(n = 40, rate = 0.2)))

means = cumsum(mns) / (1:n)     # cumulative means of samples(40 exponentials)

library(ggplot2)
g <- ggplot(data.frame(x = 1:n, y = means), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + labs(x = "number of obs", y = "cumulative mean")

g

mean_samples <- mean(mns)
print(mean_samples)
## [1] 4.988696

The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.

lambda <- 0.2
mean_theo <- 1/lambda

print(mean_theo)
## [1] 5

So, the result of simulation and the result of theoritical mean are almost same.

  1. Sample Variance versus Theoretical Variance

The variance is a square of standard variance. I used standard variance instead of the variance.

n <- 1000               # count of simulations
mns = NULL
for (i in 1:n) mns = c(mns, sd(rexp(n = 40, rate = 0.2)))

sds = cumsum(mns) / (1:n)   # cumulative means of standard variations

g <- ggplot(data.frame(x = 1:n, y = sds), aes(x = x, y = y))
g <- g + geom_hline(yintercept = 0) + geom_line(size = 2)
g <- g + labs(x = "number of obs", y = "cumulative mean of standard variation of samples")

g

sd_samples <- mean(mns)
print(sd_samples)
## [1] 4.851743

The standard variation of exponential distribution is 1/lambda and the standard deviation is also 1/lambda.

lambda <- 0.2
sd_theo <- 1/lambda

print(sd_theo)
## [1] 5

So, the result of simulation and the result of theoritical standard variation are almost same.

  1. Distribution

The histogram of random exponential variables and sample averages is below:

par(mfrow = c(1, 2))
hist(rexp(n = 1000, rate = 0.2), breaks = 100)
hist(mns, breaks = 100)

The left graph was biased to the left. The right graph looks like a normal distribution.