First, I make sure that everyone will be able to see the R code, set echo=“TRUE” for the whole document.
knitr:: opts_chunk$set(echo=TRUE, results = "asis", cache = TRUE)
This project is part 1 of course project for my Statistical Inference course that I took on Coursera. This is course number 6 out of 10 courses that I am taking for the Data Science Certificate from Johns Hopkins University through Cousera.
In this project, I will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. I will investigate the distribution of averages of 40 exponentials. Note that I will do 1000 simulations. Set up some constants:
lambda = 0.2
n = 40
sim = 1000
set.seed(12345)
Creat a matrix of 1000 rows and 40 columns of exponential distribution, then calculate the mean of each row:
matrix = matrix(rexp(sim * n,lambda), ncol = n, nrow = sim)
mean_row <- rowMeans(matrix)
Make a histogram of the mean of exponential distribution:
hist(mean_row, xlab = "Mean of 40 exppnentials", ylab="Frequency", col = "green", main = "Distribution of the mean of 40 exponentials")
Compare sample mean vs theoretical mean:
sample_mean <- round(mean(mean_row),3)
theoretical_mean = round(1/lambda,3)
Sample mean is 4.972 and theoretical mean is 5. They are close enough.
Comapre samle variance and theortical variance:
sample_variance <- round(var(mean_row),3)
theoretical_var = round(((1/lambda)^2)/n,3)
Sample variance is 0.616 and theoretical variance is 0.625. Again, they are close enough.
Now, I want to investigate to see if the distribution is approximately normal Now I will show histogram of the mean of exponential distribution and normal distribution with mean = 1/lambda, sd= sqrt((1/lambda)^2/40).
hist(mean_row, breaks = 40,prob=TRUE, col = "green", main = "Histogram of mean of 40 exponentials", xlab = "Mean of each row")
lines(density(mean_row),col = "black", lwd = 2)
curve(dnorm(x,mean=5,sd=sqrt(theoretical_var)), col="red", add=TRUE, yaxt="n", lwd = 3)
legend(6, 0.4, legend=c("Normal Distribution", "Mean of exponential distribution"),col=c("red", "black"),lty = 1:1, cex=0.8)
We can see that the histogram approaches a normal distribution.