This is the course project for the statistical inference from Coursera. The project investigates the exponential distribution in R and compares it with the Central Limit Theorem.
The project consists of two parts.
1. A simulation exercise.
2. Basic inferential data analysis.
1.0 Simulation and Distribution of the Sample mean data
set.seed(5)
lambda <- 0.2 #the given parameter
n <- 40 #number of exponentials
simul <- 1000 #number of simulations
# the simulation data and distribution
simuldata <- data.frame(ncol = 2,nrow = simul)
names(simuldata) <- c("simulation","mean")
for (i in 1:simul) {
simuldata[i,1] <- i
simuldata[i,2] <- mean(rexp(n,lambda))
}
# the simulation data mean distribution
hist(simuldata$mean,
col = "lightblue", breaks = 20, xlab = "Means",
main = "Distribution of Sample means")
1.1 Sample Mean versus Theoretical Mean
theo_mean <- 1/0.2
paste(" The Theoretical mean is " ,round(theo_mean,2))
## [1] " The Theoretical mean is 5"
simul_mean <- mean(simuldata$mean)
paste(" The Sample mean is ",round(simul_mean,2))
## [1] " The Sample mean is 5.04"
#Let’s check with simulated data mean and theoretical mean with vertical lines.
hist(simuldata$mean,
col = "lightblue", breaks = 20, xlab = "Means",
main = "Distribution of Sample means")
rug(simuldata$mean)
abline(v=theo_mean, lwd="4", col="red")
abline(v=simul_mean, lwd="4", col="blue")
text(6.5, 80, paste("Sample mean = ", round(simul_mean,2)), col="blue")
text(6.5, 90, paste("Theoretical mean = ", round(theo_mean, 2)), col="red")
1.2 Sample Variance versus Theoretical Variance
#The standard deviation of the exponential distribution is (1/lambda)/sqrt(n).
paste("Theoretical standard deviation: ", round( (1/lambda)/sqrt(n) ,2),
", Sample standard deviation", round(sd(simuldata$mean) ,2) )
## [1] "Theoretical standard deviation: 0.79 , Sample standard deviation 0.78"
paste("Theoretical variance: ", round((1/lambda)^2/n,2),
", Sample variance", round(var(simuldata$mean) ,2) )
## [1] "Theoretical variance: 0.62 , Sample variance 0.6"
1.3 Is the distribution of means normal?
hist(simuldata$mean,
probability = TRUE, col = "lightblue", breaks = 20, xlab = "Means",
main = "Distribution of Sample means")
lines(density(simuldata$mean), lwd =3, col = "blue")
#Generate normal distribution line
x <- seq(min(simuldata$mean), max(simuldata$mean), length.out = 1000)
y <- dnorm(x, mean = 1/lambda, sd = (1 / lambda)/sqrt(n))
lines(x, y, pch=22, col="black", lwd=2, lty = 2)
Due to the Central Limit Theorem, the averages of samples should follow a normal distribution. Like the graph shown, the Sample distribution of means of random sampled exponantial distributions overlaps with the normal distribution.