This report is a course project within the Statistical Inference Course on the Data Science Specialization by Johns Hopkins University on Coursera.
The project consists of two parts:
In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Simulation Run
# Constants
lambda <- 0.2 ; n = 40 ; sim = 1000
# Set seed to create reproducibility
set.seed(12345)
# Simulation
Exp_Sim <- matrix(rexp(sim*n, lambda), nrow = sim, ncol = n)
Exp_Mean <- apply(Exp_Sim, 1, mean)
Exponential Distribution Mean (Theoretical Mean)
Theo_Mean <- 1/lambda
print(Theo_Mean)
## [1] 5
Mean of the Simulated Exponential Random Variable Means (Sample Mean)
Sample_Mean <- mean(Exp_Mean)
print(Sample_Mean)
## [1] 4.971972
Vizualization of both Distributions and both means.
# Histograms
par(mfrow = c(2,1))
# Distribution of the Simulated Sample of Exponential Random Variables
hist(Exp_Sim, breaks = 40, xlim=c(0,30),
main = "Distribution of the Simulated Sample of Exponential Random Variables",
xlab = "Exponential Variables",
ylab = "Frequency",
col = "khaki1")
abline(v = 1/lambda, lty = 1, lwd = 4, col = "gray13")
legend("topright", lty = 1, lwd = 4, col = "gray13", legend = "Theoretical Mean")
# Distribution of the Simulated Exponential Random Variable Means
hist(Exp_Mean, breaks = 40, xlim=c(3,9),
main = "Distribution of the Simulated Exponential Random Variable Means",
xlab = "Means",
ylab = "Frequency",
col = "khaki1")
abline(v = mean(Exp_Mean), lty = 1, lwd = 4, col = "gray13")
legend("topright", lty = 1, lwd = 4, col = "gray13", legend = "Sample Mean")
Exponential Distribution Variance (Theoretical Variance)
Theo_Var <- ((1/lambda)/sqrt(n))^2
print(Theo_Var)
## [1] 0.625
Variance of the Distribution of Averages (Sample Variance)
Sample_Var <- sd(Exp_Mean)^2
print(Sample_Var)
## [1] 0.6157926
As you can see, both variances are very close.
This graph shows the histogram of the simulation of the exponential variable means distribution. And a normal distribution curve on top, that considers as parameters the exponential theoretical mean, and the theoretical standard deviation divided by the square root of the sample size (Central Limit Theorem). Illustrating that it fits relatively well.
# Histogram
hist(Exp_Mean, breaks = 40,
main = "Distribution of the Simulated Exponential Random Variable Means",
prob=T, xlab = "Means", ylab = "Density", col = "khaki1")
# Normal Distribution Line
# Create 40 breaks from the min and max values of the Simulated Sample of Exp Variables, for the X Axis.
xfit <- seq(min(Exp_Mean),max(Exp_Mean),length=40)
# Create a Normal Distribution Vector of Densities for the Y Axis
yfit <- dnorm(xfit, mean=1/lambda,sd=1/lambda/sqrt(n))
# Plot the Normal Distribution Line
lines(xfit,yfit,lty = 1, lwd = 2, col = "black")
legend(6.5, 0.50, lty = 1, lwd = 2, col = "black", legend = "Normal Distribution", cex = 0.75)
Given the Central Limit Theorem’s mean and standard deviation, we can visualize how would a normal distribution would look like, according to the simulated sample of exponential random variables.
# Histogram
h <- hist(Exp_Mean, breaks = 40,
main = "Distribution of the Simulated Exponential Random Variable Means",
xlab = "Means", ylab = "Frequency", col = "khaki1")
# Normal Distribution Line
# Create 40 breaks from the min and max values of the Simulated Sample of Exp Variables, for the X Axis.
xfit2 <- seq(min(Exp_Mean),max(Exp_Mean),length=40)
# Create a Normal Distribution Vector of Densities for the Y Axis
yfit2 <- dnorm(xfit2, mean=1/lambda,sd=1/lambda/sqrt(n))
# Simulate the sample frequency with the normal Distribution line
yfit2 <- yfit2*diff(h$breaks[1:2])*length(Exp_Mean)
# Plot the Normal Distribution Line
lines(xfit2,yfit2,lty = 1, lwd = 2, col = "black")
legend(6.5, 45, lty = 1, lwd = 2, col = "black", legend = "Normal Distribution", cex = 0.75)