Peer Graded Assignment: Statistical Inference Course Project

The project has two parts

Part 1: Simulation Exercise

Overview:

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Simulations:

Question 1: Show the sample mean and compare it to the theoretical mean of the distribution.

# Set and initialize the variables as defined in the problem.
Lambda <- 0.2
n <- 40
NumSim <- 1000
set.seed(12345)

# we use replicate function to create the test sample
TestSample <- replicate(NumSim, mean(rexp(n, Lambda)), simplify = TRUE)

# theoretical mean
1/Lambda
## [1] 5
# exploratory data analysis highlighting basic features of the data using the summary function
summary(TestSample)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.703   4.451   4.918   4.972   5.491   8.325
# Mean of 4.972 is close to the theoretical mean distribution of 5. The larger the sample the more the values converge.

Question 2: Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

# theoretical variance
(Lambda * sqrt(n))^-2
## [1] 0.625
# SampleMean variance
var(TestSample)
## [1] 0.5954369

the variance of the testsample of 0.5954 is close to the theoretical variance 0.625

Plotting the distribution is to show it is appoximately normal

hist(TestSample, breaks = n, prob = T, col = "green", xlab = "Means")
# draw the normal distribution using lines
x <- seq(min(TestSample), max(TestSample), length = 100)
lines(x, dnorm(x, mean = 1/0.2, sd = (1/0.2/sqrt(40))), pch = 25, col = "blue")

using qqnorm for Quantile-Quantile Plots of testsample. Comparison of Observed Quantiles with Theoretical Distribution

qqnorm(TestSample)
qqline(TestSample, col = "blue")

One can see that the distribution averages of the 40 exponentials is very close to a normal distribution (blue line)