by Davin Kaing
As part of the Coursera Data Science Specialization course, Statistical Inference, this project compares the exponential distribution with the Central Limit Theorem using R programming.
The following code is used to generate the plot of the exponential means distribution. The lambda value is defined as 0.2, the theoretical mean is 1/lambda, and 1000 averages of the 40 exponential observations are used. To simulate 1000 averages, a for-loop is used.
lambda = 0.2
mean = 1/lambda
n = 40
ExMeans = NULL
for (i in 1 : 1000) ExMeans = c(ExMeans, mean(rexp(n,lambda)))
Figure 1 displays a histogram plot of the simulated exponential means produced by the code above.
hist(ExMeans, prob = TRUE, breaks =100, main = "Figure 1: Simulated Exponential Means", xlab = "Means")
The theoretical mean is defined as 1/lambda and the sample mean is calculated by taken the mean of the simulated means.
Theo_Mean <- 1/lambda
Sam_Mean <- mean(ExMeans)
Percent_Change <- abs((Theo_Mean - Sam_Mean)/Theo_Mean)*100
The percent change of the theoretical and sample mean is 0.0341702%. This suggests that the theoretical mean is very close to the sample mean. The following plot (figure 2) shows the two means.
hist(ExMeans, prob = TRUE, breaks =100, main = "Figure 2: Theoretical Mean vs Sample Mean", xlab = "Means")
abline(v=1/lambda, col = 2)
abline(v=Sam_Mean, col = 3)
legend("topright", legend = c("Theoretical Mean", "Sample Mean"), fill = c("red", "green"), cex = 0.7)
The theoretical variance can be calculated by taking squaring the standard deviation and dividing by the number of observations.
Theo_Var <- ((1/lambda)^2)/n
Sim_Var <- var(ExMeans)
Percent_Diff <- (abs(Theo_Var-Sim_Var)/Theo_Var)*100
The percent difference of the sample variance and theoretical variance is 2.0730344%. The foloowing plot (Figure 3) shows the simulated variances for 1000 averages.
var <- cumsum((ExMeans - 5)^2)/(1:1000)
plot(var, type = "l", main = "Figure 3: Simulated Variance", xlab = "Number of Samples", ylab = "Variance")
As shown (figure 4), the exponential means appear to be normally distributed.
x <- seq(min(ExMeans), max(ExMeans), length = 1000)
hist(ExMeans, prob = TRUE, breaks =100, main = "Figure 4: Simulated Exponential Means Distribution", xlab = "Means")
curve(dnorm(x, mean(ExMeans), sd(ExMeans)), add = TRUE, col="darkblue", lwd=2)