In this project I shall investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. I shall set lambda = 0.2 for all of the simulations. I shall investigate the distribution of averages of 40 random exponentials, doing so with a thousand simulations.
This investigation shall then:
Show the sample mean and compare it to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show that the distribution is approximately normal.
The following R code will run 1,000 simulations whereby 40 random exponential values, and the mean of each set of 40 random exponential values, will be generated in each simulation.
# Deliberately not setting a seed as the results should be similar for any 1,000 simulations of 40 exponential values drawn randomly.
# Set simulation parameters
sims = 1000 # number of simulations
n = 40 # number of exponentials
lambda = 0.2 # defined in assignment instructions
means = NULL
# Run simulations
for (i in 1 : sims) means = c(means, mean(rexp(n, lambda)))
summary(means)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.745 4.519 5.027 5.041 5.516 7.664
The theoretical mean is calculated in R as 1/lambda.
The sample mean is calculated in R as mean(means).
# Calculate theoretical mean
theo_mean <- 1/lambda
# Calculate sample mean
sample_mean <- mean(means)
The theoretical mean is 5.
The sample mean is 5.040891.
The difference between the theoretical mean and the sample mean is -0.040891.
# Create histogram showing distribution of simulation means
hist(means, col = "light blue", main = "Random exponential values (1,000 simulations)",
sub = "Sample mean in red", xlab = "Mean values")
# Show sample mean as vertical red line
abline(v = sample_mean, col = "red", lwd = 2)
The theoretical variance is calculated in R as the square of the theoretical mean divided by the number of random exponential values in each simulation.
The sample variance is calculated in R as var(means).
# Calculate theoretical variance
theo_var <- (1/lambda)^2 / n
# Calculate sample mean
sample_var <- var(means)
The theoretical variance is 0.625.
The sample variance is 0.5902553.
The difference between the theoretical variance and the sample variance is 0.0347447.
The theoretical distribution is plotted as a normal distribution curve (red line) with mean = theoretical mean and standard deviation = square root of theoretical variance.
The sample distribution is plotted using the histogram generated earlier, but with frequencies converted to probabilities. This histogram is also overlaid with a distribution curve (blue line) of the 1,000 sample means calculated earlier.
# Create histogram showing probability distribution of simulation means
hist(means, prob = T, col = "light blue", main = "Random exponential values (1,000 simulations)",
xlab = "Mean values")
# Create sample distribution curve (blue line)
lines(density(means), col = "blue", lwd = 2)
# Create theoretical distribution curve (red line)
x <- seq(min(means), max(means), length = sims)
y <- dnorm(x, theo_mean, sqrt(theo_var))
lines(x, y, col = "red", lwd = 2)
The histogram and sample distribution curve (blue line) both approximate the theoretical normal distribution curve (red line) relatively closely.
Another method with which the theoretical and sample distributions can be compared is the Normal Q-Q plot. This plots the mean random exponential variables from the 1,000 simulations as data points against a normal QQ-line, which represents the theoretical normal distribution.
# Plot sample distribution as blue data points.
qqnorm(means, col = "blue")
# Plot theoretical distribution as a red line.
qqline(means, col = "red", lwd = 3)
The Normal Q-Q Plot shows that the sample distribution approximates the theoretical normal distribution relatively closely.
# print relevant software and versions
sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.5.3 magrittr_1.5 tools_3.5.3 htmltools_0.3.6
## [5] yaml_2.2.0 Rcpp_1.0.0 stringi_1.4.3 rmarkdown_1.12
## [9] knitr_1.22 stringr_1.4.0 xfun_0.5 digest_0.6.18
## [13] evaluate_0.13