Statistical Inference: Analyzing the Exponentioal Distribution

Executive Summary

This document aims at interpreting the distribution of sample means of exponential distributions. We begin by simulating S samples of exponential distributions, each of size n. We then collect the mean of each of the simulated samples, dress a statistical distribution of the sample means and estimate the means and the variance of the sample means distribution. These outputs are positioned with respect to the Central Limit Theorem (CLT).We conclude that the CLT is applicable and that the density distribution of sample means tends toward a normal distribution which mean is the population mean and which variance is the population variance divided by the square root of the sample size.

Setting parameters for exponential distributions

Setting the rate parameter (lamba) of exponential distributions, the sample size (n) and the number of samples to simulate (S).

lambda <- 0.2
n <- 40
S <- 1000

Samples simulations

Simulating samples of exponential distributions and calculating the sample means:

means = NULL
for (i in 1:S) means = c(means,mean(rexp(n,lambda)))

Plotting Sample Means Distribution

Plotting the Statistical distribution of the sample means:

hist(means, col = "red", xlab="Simulated Means", 
     main="Distribution of Simulated Means of 40 Exponentials", breaks= 20)

Calculating mean and variance of sample means:

Avgmeans = mean(means)
Sdmeans = sd(means)

Comparing with the Central Limit Theorem (CLT)

Comparing with theoretical average and variance: According to the CLT theorem, the distribution of sample means of iid distributions tends toward a normal distribution which mean tends to the population mean and which variance tends to the population variance divided by square root of the sample size n. The theorical mean and variance of exponentials sample means, as defined by CLT, are calculated as follows:

TheoMean = 1/lambda
TheoSd =  1/(lambda*sqrt(n))

Comparison output is in table below:

row1 <- c('Mean Value', TheoMean,round(Avgmeans,3),abs(round((Avgmeans-TheoMean)*100/TheoMean,3)))
row2 <- c('Variance Value',round(TheoSd,3),round(Sdmeans,3),abs(round((Sdmeans-TheoSd)*100/TheoSd,3)))
m <- rbind(row1,row2)
rownames(m) <- c('','')
colnames(m) <- c('','Theoretical Output', 'Simulated Output','Difference (in %)')
library(gridExtra)
grid.table(m)

We can note than the difference between the obtained values and the theoretical values is very low, for both the mean and variance of sample means of exponential distributions.

We plot below the density distribution of sample means together with a normal distribution having as mean and variance the theoretical mean and variance of sample means as calculated previously according to CLT.

library(ggplot2)
g <- ggplot(as.data.frame(means),aes(x=means)) 
g <- g + geom_histogram(aes(y=..density..),fill="red")
g <-g+ xlab("Simulated Means") + ggtitle("Distribution of sample Means Vs Theoretical Normal Distribution") 
g <- g + stat_function(fun= dnorm,args=list(mean=TheoMean,sd=TheoSd),colour="black",lwd=2)
g

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We can note that the density distribution of sample means tends toward a normal distribution which mean is the population mean and which variance is the population variance divided by the square root of the sample size.