Title: Statistical Inference Course Project_Part 1 | Author: Anna Huynh | Date: 11/24/2020


Overview

This project is to investigate the exponential distribution in R and compare it with the Central Limit Theorem (CLT), consisting of two parts:

Part 1: Simulation Exercise

Definition

Simulations

The exponential distribution can be simulated by taking samples from size of (n, lambda) where lambda is the rate parameter, and n is 1000 random uniforms. The mean (mu) of exponential distribution is 1/lambda and the standard deviation (s) is also 1/lambda. Set lambda = 0.2 for all of the simulations. We eventually get the result of the distribution of averages of 40 exponential (sample sizes).

1. Sample Mean versus Theoretical Mean

# Setting values
lambda <- 0.2
random_uni <- 1000
sampleSize <- 40
means <- vector()
for (i in 1:1000) {
    means <- c(means, mean(rexp(sampleSize, lambda)))
}
hist(means, breaks=40, col = "green")
rug(means)
lines(density(means))
abline(v=1/lambda, col="magenta", lwd=4) #The magenta line shows actual mean

plot of chunk unnamed-chunk-2

print(mean(means))
## [1] 5.035921

Figure 01: Sample Mean versus Theoretical Mean

Observation: Sample mean is pretty close to Theoretical mean

2. Sample Variance versus Theoretical Variance

var(means)
## [1] 0.602873

Observation: Sample variance is pretty close to Theoretical variance

3. Distribution: Via figures and text, explain how one can tell the distribution is approximately normal.

install.packages("ggplot2")
## Error in install.packages : Updating loaded packages
library(ggplot2)
pvals <- seq(.5, .99, by = .01)
myplot <- function(means){
d <- data.frame(n= qnorm(pvals),t=qt(pvals, means),
p = pvals)
g <- ggplot(d, aes(x = n, y = t))
g <- g + geom_abline(size = 2, col = "lightblue") # Normal distribution
g <- g + geom_point(color="black",size=4,alpha=1/2)#Distribution of Sample means
g <- g + geom_vline(xintercept = qnorm(0.975))
g <- g + geom_hline(yintercept = qt(0.975, means))
g <- g + labs(x="Theoretical Quantiles",y="Sample Quantiles")
g <- g + ggtitle("Sample means in Normal Distribution")
g
}
myplot(40)

plot of chunk unnamed-chunk-4 Figure 02: The distribution of a large collection of random exponential and the distribution of a large collection of averages of 40 exponential.The closer the values lie to the line, the better the fit. The plot suggests a high degree of normality.