by Sandy Sng
15 May 2018
In this project we will investigate the exponential distribution in R and compare it with the Central Limit Theorem.The simulated samples are used to illustrate and explain the properties of the distribution of the mean of 40 exponentials in the following ways:
The exponential distribution can be simulated in R with rexp(n, lambda) where:
We will set and investigate:
We start by simulating a thousand sets of 40 exponentials using lambda 0.2 and calculate the mean for each set.
lambda <- 0.2 # Set lambda
n <- 40 # Set 40 exponentials
nosim <- 1000 # Set no. of simulations to 1000
set.seed(123) # Set seed for consistent simulations
# Create a exponentials data frame 1000 X 40 from rexp(x, lambda)
simulatedData <- matrix(rexp(n*nosim, lambda), nrow = nosim, ncol = n)
# Calculate the average/mean of each row (40 exponentials) -- i.e. sample mean
simMeans <- apply(simulatedData, 1, mean)
Show the sample mean and compare it to the theoretical mean of the distribution.
tm <- 1/lambda # Calculate theoretical mean
sm <- mean(simMeans) # Calculate average sample mean
## [1] 5
## [1] 5.011911
The sample mean (solid green) is very close at value 5.01 to the theoretical mean (dotted purple) at value 5.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
We can make the comparison by defining the theoretical variance as 1/lambda squared, divided by the number of exponential observations (i.e. n):
tv <- ((1/lambda)^2)/n # Calculate theoretical variance
sv <- var(simMeans) # Calculate sample variance
## [1] 0.625
## [1] 0.6088292
The sample variance is very close at value 0.609 to the theoretical variance at value 0.625.
Show that the distribution is approximately normal.
We observe that the histogram for the mean of 1000 simulated 40 random exponential values is symmetric around the mean with a bell shape. The normal distribution of the mean of 40 random exponentials is consistent with the characteristic of the Central Limit Theorem.
Code for Figure “Histogram of 1000 Means of 40 Sample Exponentials”, under Sample Mean versus Theoretical Mean
# Mean distribution of 1000 simulations
hist(simMeans,
main = "Histogram of 1000 Means of 40 Sample Exponentials",
xlab = "Sample Means from 1000 Simulations",
ylab = "Frequency")
# Highlight the 2 means we are comparing
abline(v = sm, col = "green", lwd = 3, lty = 1)
abline(v = tm, col = "purple", lwd = 3, lty = 2)
# Add a legend
legend('topright',
c("Average Sample Mean","Theoretical Mean"),
col = c("green", "purple"), lwd = 3, lty = c(1,2), bty = "n")
Code for Figure “Distribution of Sample Means vs Theoretical Density of Exponentials”, under Distribution
# Mean distribution of 1000 simulations
hist(simMeans, prob = T,
main = "Distribution of Sample Means
vs Theoretical Density of Exponentials",
xlab = "Sample Means from 1000 Simulations",
ylab = "Frequency")
# Theoretical density of the exponential distribution
xfit <- seq(min(simMeans), max(simMeans), length=100)
yfit <- dnorm(xfit, mean = 1/lambda, sd = (1/lambda/sqrt(n)))
lines(xfit, yfit, pch = 22, col = "orange", lwd = 3, lty = 1)
# Theoretical Mean - Orange line
abline(v = tm, col = 'orange', lwd = 1)
# Density of the simulated sample means - Blue line
lines(density(simMeans), col = "blue",lwd = 3, lty = 2)
# Legend
legend('topright', c("Simulation", "Theoretical"),
col=c("blue", "orange"), lwd = 3, lty=c(2,1))