This project is for the course Statistical Inference on Coursera. It simulates the exponential distribution in R. The simulation compare the sample mean and theoretical mean, sample variance and theoretical variance, and show the distribution of sample mean is approximately normal.
Ramdonly initial sample data.
set.seed(1)
n <- 1000; sample_size <- 40
lambda <- 0.2
mns = NULL
vars = NULL
for (i in 1:n){
exp_temp <- rexp(sample_size, rate=lambda)
mns = c(mns, mean(exp_temp))
vars = c(vars, var(exp_temp))}
library(ggplot2)
means <- cumsum(mns) / (1:n)
g <- ggplot(data.frame(x = 1:n, y=means), aes(x=x, y=y) )
g <- g + geom_hline(yintercept = 5) + geom_line(size = 2)
g <- g + labs(x = "Number of obs", y = "Cumulative mean")
print(g); means[n]
## [1] 4.990025
Sample mean is 4.990025, theoretical mean is 5.
variances <- cumsum(vars) / (1:n)
p <- ggplot(data.frame(x = 1:n, y=variances), aes(x=x, y=y) )
p <- p + geom_hline(yintercept = 25) + geom_line(size=2)
p <- p + labs(x = "Number of obs", y= "Cumulative variance")
print(p); variances[n]
## [1] 25.06459
Sample variance is 25.06459, theoretical variance is 25.
x <- seq(min(mns), max(mns), length=1000)
y <- dnorm(x, mean = 1/lambda, sd = 1/lambda/sqrt(sample_size))
plotdata <- data.frame(mns, x, y)
q <- ggplot(data.frame(plotdata), aes(x=mns) )
q <- q + geom_histogram( aes(y=..density..), fill = "lightblue", binwidth=0.2, colour = "black")
q <- q + geom_density(colour="black", size=2)
q <- q + geom_line(aes(x=x, y=y), colour="red", size=1, linetype=2)
q <- q + geom_vline(xintercept=5, colour="red", size=1, linetype=2)
print(q); qqnorm(mns); qqline(mns)
The black solid curve is the density of sample data, the red dashed curve is the normal curve with expection 5 and variance 25. From the plot, the distribution of sample mean exponential distribution is approximately norml. The result is consistent with the inference of Central Limit Theorem.