The project has two parts
A simulation exercise.
Basic inferential data analysis.
In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Question 1: Show the sample mean and compare it to the theoretical mean of the distribution.
# Set and initialize the variables as defined in the problem.
Lambda <- 0.2
n <- 40
NumSim <- 1000
set.seed(12345)
# we use replicate function to create the test sample
TestSample <- replicate(NumSim, mean(rexp(n, Lambda)), simplify = TRUE)
# theoretical mean
1/Lambda
## [1] 5
# exploratory data analysis highlighting basic features of the data using the summary function
summary(TestSample)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.703 4.451 4.918 4.972 5.491 8.325
# Mean of 4.972 is close to the theoretical mean distribution of 5. The larger the sample the more the values converge.
Question 2: Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
# theoretical variance
(Lambda * sqrt(n))^-2
## [1] 0.625
# SampleMean variance
var(TestSample)
## [1] 0.5954369
the variance of the testsample of 0.5954 is close to the theoretical variance 0.625
Plotting the distribution is to show it is appoximately normal
hist(TestSample, breaks = n, prob = T, col = "green", xlab = "Means")
# draw the normal distribution using lines
x <- seq(min(TestSample), max(TestSample), length = 100)
lines(x, dnorm(x, mean = 1/0.2, sd = (1/0.2/sqrt(40))), pch = 25, col = "blue")
using qqnorm for Quantile-Quantile Plots of testsample. Comparison of Observed Quantiles with Theoretical Distribution
qqnorm(TestSample)
qqline(TestSample, col = "blue")
One can see that the distribution averages of the 40 exponentials is very close to a normal distribution (blue line)