Statistical Inference Course project

Joseph Bloomquist

05-20-2024

Overview

This is the John Hopkins Statistical Inference Course project. It will consist of two parts. A simulation exercise and basic inferential data analysis. It will be presented in PDF format with no more than 3 pages.

In this we will:

Show where the distribution is centered at and compare it to the theoretical center of the distribution
Show how variable it is and compare it to the theoretical variance of the distribution
Perform an exploratory data analysis of at least a single plot or table highlighting basic features of the data
Perform some relevant confidence intervals and/or tests and interpret them within context.
Investigate the exponential distribution in R and compare it with the Central Limit Theorem.
Investigate the distribution of averages of 40 exponentials using a thousand simulations
Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.

Simulations

For these simulations, we have some preset variables and a seed for reproduction purposes.

set.seed(052024)
lambda = 0.2
exponentials = 40
simulations = 1000
totalMeans = 0
sMean = 0
tMean = 0
sVar = 0
tVar = 0

First we will generate a random exponential distribution where exponentials are the observations and the rate is lambda

for (i in 1 : simulations) totalMeans = c(totalMeans, mean(rexp(exponentials, lambda)))

Sample Mean vs. Theoretical Mean

Now that we have simulated data, we can calculate and differentiate between the two.

First we will grab our sample mean:

sMean <- mean(totalMeans)
sMean

## [1] 5.004291

Now we calculate our theoretical mean:

tMean <- 1/lambda
tMean

## [1] 5

Comparison Plot

hist(totalMeans, main = "Sample Mean vs. Theoretical Mean", col = "lightblue", breaks = 50)
abline(v=sMean, col = "green", lwd = 2)
abline(v=tMean, col = "red", lwd = 2)
legend("topleft", pch=15, col = c("green", "red"), legend = c("Sample Mean - 5.055955", "Theoretical Mean - 5"))

As we can see, the means are visually the same. The sample is more precise with a mere difference of .055955

Sample Variance vs. Theoretical Variance

In order to calculate the variances, we use a built in function for the sample. The formula for exponentials given was 1/Lambda^2, however that formula failed to give the proper results. (lambda * sqrt(exponentials))^-2 appears to be correct.

sVar <- var(totalMeans)
sVar

## [1] 0.6658285

tVar <- (lambda * sqrt(exponentials))^-2
tVar

## [1] 0.625

Comparison Plot

hist(sVar, main = "Sample Variance vs. Theoretical Variance", col = "lightblue", breaks = 50)
abline(v=sVar, col = "green", lwd = 2)
abline(v=tVar, col = "red", lwd = 2)
legend("topright", pch=15, col = c("green", "red"), legend = c("Sample Var - 0.6658285", "Theoretical Var - 0.625"))
text(0.8, .65, "Difference of: ", col = "black")
text(0.8, .55, round(sVar-tVar,2), col = "black")

Distribution

How are the sample means distributed?

hist(totalMeans, main="Mean distribution", col="green", breaks=50, prob=TRUE)
lines(density(totalMeans), lwd=3, col="red")

Conclusion

Even though the means were derived from different exponential distributions, they collectively resemble a normal distribution.