Distribution of Exponential Means

by Davin Kaing

Overview

As part of the Coursera Data Science Specialization course, Statistical Inference, this project compares the exponential distribution with the Central Limit Theorem using R programming.

Simulations

The following code is used to generate the plot of the exponential means distribution. The lambda value is defined as 0.2, the theoretical mean is 1/lambda, and 1000 averages of the 40 exponential observations are used. To simulate 1000 averages, a for-loop is used.

lambda = 0.2
mean = 1/lambda
n = 40
ExMeans = NULL
for (i in 1 : 1000) ExMeans = c(ExMeans, mean(rexp(n,lambda)))

Figure 1 displays a histogram plot of the simulated exponential means produced by the code above.

hist(ExMeans, prob = TRUE, breaks =100, main = "Figure 1: Simulated Exponential Means", xlab = "Means")

Sample Mean versus Theoretical Mean

The theoretical mean is defined as 1/lambda and the sample mean is calculated by taken the mean of the simulated means.

Theo_Mean <- 1/lambda
Sam_Mean <- mean(ExMeans)
Percent_Change <- abs((Theo_Mean - Sam_Mean)/Theo_Mean)*100

The percent change of the theoretical and sample mean is 0.0341702%. This suggests that the theoretical mean is very close to the sample mean. The following plot (figure 2) shows the two means.

hist(ExMeans, prob = TRUE, breaks =100, main = "Figure 2: Theoretical Mean vs Sample Mean", xlab = "Means")
abline(v=1/lambda, col = 2)
abline(v=Sam_Mean, col = 3)
legend("topright", legend = c("Theoretical Mean", "Sample Mean"), fill = c("red", "green"), cex = 0.7)

Sample Variance versus Theoretical Variance

The theoretical variance can be calculated by taking squaring the standard deviation and dividing by the number of observations.

Theo_Var <- ((1/lambda)^2)/n
Sim_Var <- var(ExMeans)
Percent_Diff <- (abs(Theo_Var-Sim_Var)/Theo_Var)*100

The percent difference of the sample variance and theoretical variance is 2.0730344%. The foloowing plot (Figure 3) shows the simulated variances for 1000 averages.

var <- cumsum((ExMeans - 5)^2)/(1:1000)
plot(var, type = "l", main = "Figure 3: Simulated Variance", xlab = "Number of Samples", ylab = "Variance")

Distribution

As shown (figure 4), the exponential means appear to be normally distributed.

x <- seq(min(ExMeans), max(ExMeans), length = 1000)
hist(ExMeans, prob = TRUE, breaks =100, main = "Figure 4: Simulated Exponential Means Distribution", xlab = "Means")
curve(dnorm(x, mean(ExMeans), sd(ExMeans)), add = TRUE, col="darkblue", lwd=2)