Assignment Description

Investigate the exponential distribution in R and compare it with the Central Limit Theorem.The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials. You should:

  1. Show the sample mean and compare it to the theoretical mean of the distribution.
  2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
  3. Show that the distribution is approximately normal.

Simulation of exponential distribution and calculation of summary statistics

The following code performs the simulations to collect necessary data.

Exponential sampling parameters.

#load the plotting library
library(ggplot2)
n = 40
lambda = 0.2
sim_Num = 1000
set.seed(10000)


simulated_data <- matrix(rexp(n= sim_Num*n,rate=lambda), sim_Num, n)
row_mean <- rowMeans(simulated_data)

1. Comparison between the sample mean and the theoretical mean of the distribution.

Sample Mean

sample_Mean  = mean(row_mean )

Theoretical Mean

 Theory_mean = 1/lambda
result1 <-data.frame("Mean"=c(sample_Mean,Theory_mean), 
                     row.names = c("Sample mean ","Theoretical mean"))

result1
##                     Mean
## Sample mean      5.00599
## Theoretical mean 5.00000

The simulated sample mean of 5.00599 is close to the theoretical value of 5.

hist(row_mean, breaks = 30, prob = TRUE,col = "lightblue", 
     main="Exponential Distribution of Sample Means", 
     xlab="Means of 40 Simulated Samples", ylab = "Counts")
abline(v = Theory_mean, col= "blue", lwd = 3)
abline(v = sample_Mean, col = "red",lwd = 2)
legend('topright', c("Theoretical Mean", "Sample Mean"), 
       bty = "n",       
       lty = c(1,1), 
       col = c(col = "blue", col = "red"))

The blue vertical line indicates the theoretical sample mean, whereas the red vertical line is the sample mean. The center of distribution of 40 exponentials averages is very close to the distribution theoretical center.

2. Sample Variance VS Theoretical Variance.

The Sample Variance

The variance of the sample means estimates the variance of the population by using the varience of the 1000 entries in the means vector times the sample size, 40.

sample_var = var(row_mean )

The Theoretical Variance

The theoretical variance of the population is given by s2=(1/lambda)2/n.

theory_var = (1/lambda)^2/n 
result2 <-data.frame("Variance"=c(sample_var, theory_var), 
                     row.names = c("Sample variance","Theoretical variance"))

result2
##                       Variance
## Sample variance      0.6296518
## Theoretical variance 0.6250000

The sample variance of the distribution is 0.6296518 and the theoretical variance is 0.625.

3. Show that the distribution is approximately normal.

According to the central limit theorem (CLT), the averages of samples follow normal distribution.

The following plot shows that the density computed using the histogram and the normal density plotted with theoretical mean and variance values indicate that the distribution is approximately normal.

hist(row_mean , 
     breaks = 30, 
     prob = TRUE,col = "lightblue", 
     main = "Density of Simulated Samples Means", 
     xlab = "Means of Exponential", ylab = "Mean Density")
lines(density(row_mean ), col = "red", lwd = 2)
abline(v = 1/lambda, col = "green", lwd = 2)
xfit <- seq(min(row_mean ), max(row_mean ), length = 100)
yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda/sqrt(n)))
lines(xfit, yfit, pch = 22, col = "blue", lwd = 2)
legend('topright', c("Theoretical Values", "Simulated Values", "the mean"),
       bty = "n", lwd = c(2,2), col = c("blue", "red", "green"))

qqnorm(row_mean ,main ="Normal Q-Q Plot", col = "red")
qqline(row_mean , col = "blue", lwd = 2)

The above plots show that the density curve is very similar as the normal distribution curve.

Also, the q-q plot below suggests the normality. The theoretical quantiles again match closely with the actual quantiles.

This indicates that the sample distribution is approximately normal.