Overview: This part executes simulations and data analysis to illustrate application of the central limit theorem, using R programming.
Question 1: Show the sample mean and compare it to the theoretical mean of the distribution
lambda <- 0.2
sim_Data <- matrix(rexp(1000*40, lambda), nrow = 1000, ncol = 40)
distMean <- apply(sim_Data, 1, mean)
hist(distMean, breaks = 50,
main = "Distribution of 1000 averages of 40 random exponentials",
xlab = "Value of means",
ylab = "Frequency of means",
col = "grey")
abline(v = 1/lambda, lty = 1, lwd = 5, col = "blue")
legend("topright", lty = 1, lwd = 5, col = "grey", legend = "theoretical mean")
The simulated sample means are normally distributed with a center very close to the theoretical mean.
Question 2: Show how variable the sample is (using variance) and compare to the theoretical variance of the distribution
distVar <- apply(sim_Data, 1, var)
hist(distVar, breaks = 50, main = "The distribution of variances in a sample of 40 random exponentials", xlab = "Value of variances", ylab = "Frequency of variance", col = "light blue")
abline(v = (1/lambda)^2, lty = 1, lwd = 5, col = "blue")
legend("topright", lty = 1, lwd = 5, col = "blue", legend = "theoretical variance")
The simulated sample variances are almost normally distributed with a center near the theoretical variance
Question 3: Show that distribution is approximately normal.
par(mfrow = c(3, 1))
hist(sim_Data, breaks = 50, main = "Distribution of exponentials with lambda equals to 0.2", xlab = "Exponentials", col = "yellow")
hist(distMean, breaks = 50, main = "The distribution of 1000 averages of 40 random exponentials", xlab = "Value of means", ylab = "Frequency of means", col = "pink")
simNorm <- rnorm(1000, mean = mean(distMean), sd = sd(distMean))
hist(simNorm, breaks = 50, main = "A normal distribution with theoretical mean and sd of the exponentials", xlab = "Normal variables", col = "light green")
The first histogram is the distribution of the exponentials with lambda equals to 0.2. The second histogram is the distribution of 1000 averages of 40 random exponentials. The third histogram is a normal distribution with mean and standard deviation equal to the second histogram’s. Comparing the first with the second histogram, we can see the distrubution becomes normal as the means were taken from each groups, aligning to the central limit theorem. Comparing the second and the third histogram, we see the distribution of the means is close to a normal distribution with the same mean and standard deviation.