Title: Statistical Inference Course Project_Part 1 | Author: Anna Huynh | Date: 11/24/2020
This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT), consisting of two parts:
The Central Limit Theorem (CLT) - one of the most important theorems in all of statistics. It states that the distribution of averages of iid (independent and identically distributed) variables (properly normalized) becomes that of a standard normal as the sample size increases. The CLT tells us that averages have normal distributions centered at the population mean.
The exponential distribution is the probability distribution of the time between events in a Poisson point process. (Wikipedia)
The exponential distribution can be simulated by taking samples from size of (n, lambda) where lambda is the rate parameter, and n is 1000 random uniforms. The mean (mu) of exponential distribution is 1/lambda and the standard deviation (s) is also 1/lambda. Set lambda = 0.2 for all of the simulations. We eventually get the result of the distribution of averages of 40 exponential (sample sizes).
# Setting values
lambda <- 0.2
random_uni <- 1000
sampleSize <- 40
means <- vector()
for (i in 1:1000) {
means <- c(means, mean(rexp(sampleSize, lambda)))
}
hist(means, breaks=40, col = "green")
rug(means)
lines(density(means))
abline(v=1/lambda, col="magenta", lwd=4) #The magenta line shows actual mean
print(mean(means))
## [1] 5.035921
Figure 01: Sample Mean versus Theoretical Mean
Observation: Sample mean is pretty close to Theoretical mean
var(means)
## [1] 0.602873
Observation: Sample variance is pretty close to Theoretical variance
install.packages("ggplot2")
## Error in install.packages : Updating loaded packages
library(ggplot2)
pvals <- seq(.5, .99, by = .01)
myplot <- function(means){
d <- data.frame(n= qnorm(pvals),t=qt(pvals, means),
p = pvals)
g <- ggplot(d, aes(x = n, y = t))
g <- g + geom_abline(size = 2, col = "lightblue") # Normal distribution
g <- g + geom_point(color="black",size=4,alpha=1/2)#Distribution of Sample means
g <- g + geom_vline(xintercept = qnorm(0.975))
g <- g + geom_hline(yintercept = qt(0.975, means))
g <- g + labs(x="Theoretical Quantiles",y="Sample Quantiles")
g <- g + ggtitle("Sample means in Normal Distribution")
g
}
myplot(40)
Figure 02: The distribution of a large collection of random exponential and the distribution of a large collection of averages of 40 exponential.The closer the values lie to the line, the better the fit. The plot suggests a high degree of normality.