Hossam Saad 25/6/2020 #Part 1 : Simulation Exercise Overview: This part is going to execute simulations and data analysises to illustrate application of the central limit theorem. R programming will be the major tool to realize the mentioned goal.
knitr::opts_chunk$set(echo = TRUE)
lambda <- 0.2
SimulatData <- matrix(rexp(1000*40, lambda), nrow = 1000, ncol = 40)
MeanOfDistribut <- apply(SimulatData, 1, mean)
hist(MeanOfDistribut, breaks = 50, main = "The distribution of 1000 averages of 40 random exponentials", xlab = "Value of means", ylab = "Frequency of means", col = "red")
abline(v = 1/lambda, lty = 1, lwd = 5, col = "blue")
legend("topright", lty = 1, lwd = 5, col = "blue", legend = "theoretical mean")
The simulated sample means are normally distributed with a center very close to the theoretical mean.
VarianceOfDistribut <- apply(SimulatData, 1, var)
hist(VarianceOfDistribut, breaks = 50, main = "The distribution of 1000 variance of 40 random exponentials", xlab = "Value of variances", ylab = "Frequency of variance", col = "pink")
abline(v = (1/lambda)^2, lty = 1, lwd = 5, col = "black")
legend("topright", lty = 1, lwd = 5, col = "black", legend = "theoretical variance")
The simulated sample variances are almost normally distributed with a center near the theoretical variance.
par(mfrow = c(3, 1))
hist(SimulatData, breaks = 50, main = "Distribution of exponentials with lambda equals to 0.2", xlab = "Exponentials", col = "pink")
hist(SimulatData, breaks = 50, main = "The distribution of 1000 averages of 40 random exponentials", xlab = "Value of means", ylab = "Frequency of means", col = "green")
NormalSimulat <- rnorm(1000, mean = mean(MeanOfDistribut), sd = sd(MeanOfDistribut))
hist(NormalSimulat, breaks = 50, main = "A normal distribution with theoretical mean and sd of the exponentials", xlab = "Normal variables", col = "brown")
The first histogram is the distribution of the exponentials with lambda equals to 0.2. The second histogram is the distribution of 1000 averages of 40 random exponentials. The third histogram is a real normal distribution with a mean and standard deviation equals to the second histogram’s.Comparing the first with the second histogram, we can see the distrubution becames normal as the means were taken from each groups. It is a result of the central limit theorem. Comparing the second and the third histogram, we can see the distribution of the means is similar to a real normal distribution with the same mean and standard deviation.