Overview

This project is for the course Statistical Inference on Coursera. It simulates the exponential distribution in R. The simulation compare the sample mean and theoretical mean, sample variance and theoretical variance, and show the distribution of sample mean is approximately normal.

Simulation

Ramdonly initial sample data.

set.seed(1)
n <- 1000; sample_size <- 40
lambda <- 0.2
mns = NULL
vars = NULL
for (i in 1:n){
    exp_temp <- rexp(sample_size, rate=lambda)
    mns = c(mns, mean(exp_temp))
    vars = c(vars, var(exp_temp))}

Sample Mean versus Theoretical Mean

library(ggplot2)
means <- cumsum(mns) / (1:n)
g <- ggplot(data.frame(x = 1:n, y=means), aes(x=x, y=y) )
g <- g + geom_hline(yintercept = 5) + geom_line(size = 2)
g <- g + labs(x = "Number of obs", y = "Cumulative mean")
print(g); means[n]

## [1] 4.990025

Sample mean is 4.990025, theoretical mean is 5.

Sample Variance versus Theoretical Mean

variances <- cumsum(vars) / (1:n)
p <- ggplot(data.frame(x = 1:n, y=variances), aes(x=x, y=y) )
p <- p + geom_hline(yintercept = 25) + geom_line(size=2)
p <- p + labs(x = "Number of obs", y= "Cumulative variance")
print(p); variances[n]

## [1] 25.06459

Sample variance is 25.06459, theoretical variance is 25.

Show the distribution is approximately normal

x <- seq(min(mns), max(mns), length=1000)
y <- dnorm(x, mean = 1/lambda, sd = 1/lambda/sqrt(sample_size))
plotdata <- data.frame(mns, x, y)
q <- ggplot(data.frame(plotdata), aes(x=mns) )
q <- q + geom_histogram( aes(y=..density..), fill = "lightblue", binwidth=0.2, colour = "black")
q <- q + geom_density(colour="black", size=2)
q <- q + geom_line(aes(x=x, y=y), colour="red", size=1, linetype=2)
q <- q + geom_vline(xintercept=5, colour="red", size=1, linetype=2)
print(q); qqnorm(mns); qqline(mns)

The black solid curve is the density of sample data, the red dashed curve is the normal curve with expection 5 and variance 25. From the plot, the distribution of sample mean exponential distribution is approximately norml. The result is consistent with the inference of Central Limit Theorem.