This is the project for the statistical inference class. In it, I will use simulation to explore inference and do some simple inferential data analysis. The project consists of two parts:

  1. Simulation exercises.

  2. Basic inferential data ayalysis.

Simulation Exercise

The exponential distribution can be simulatted in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. In this project, I set lambda = 0.2 for all of the simulations. In this simulaiton, I will investigate the distribution of 40 exponential distributions. The total number of simulation is 1000.

First, I have done 1000 simultion, in each simulation, I have 40 exponential distributions with lambda =0.2. I plotted the cumulative mean of samples with the number of the simulation. We can see that with more simulations, the cumulative mean approach the theoretical value, which is 1/lambda = 5.

nsim <- 1000
a = matrix(rexp(40*nsim,0.2),nsim)
ind.means = apply(a, 1, mean)
means = cumsum(ind.means)/(1:nsim)
library(ggplot2)
g <- ggplot(data.frame(x = 1 : nsim, y = means), aes(x = x, y = y)) 
g <- g + geom_hline(yintercept = 1/0.2) + geom_line(size = 2) 
g <- g + labs(x = "Number of simulation", y = "Cumulative mean")
g

plot of chunk unnamed-chunk-1 Plot the distribution of the mean of 40 exponential(0.2)s

dat <- data.frame(ind.means)
g <- ggplot(dat, aes(x = ind.means)) + geom_histogram(alpha = .20, binwidth=.3, colour = "black", aes(y = ..density..)) 
g <- g + geom_vline(xintercept=5,color="red",size=1.5)
g

plot of chunk unnamed-chunk-2

Basic Inferential data analysis

Now, let use the CLT to show that the distribution is approximately normal.

cfunc <- function(x, n) sqrt(n)*(x - 5)/5
dat <- data.frame(apply(matrix(ind.means), 1, cfunc, 40))
colnames(dat) <- c("x")
g <- ggplot(dat, aes(x=x))+geom_histogram(alpha = .20, binwidth=.3, colour = "black", aes(y = ..density..))+stat_function(fun=dnorm,size=1.5,color="red")
g

plot of chunk unnamed-chunk-3

Next, I evaluate the coverage of the confidence interval for 1/lambda:

lambdavals <- seq(0.005, 0.8, by=0.01)
nsim <- 1000
coverage <- sapply(lambdavals, function(lambda){
        lhats <- rexp(nsim, lambda)
        ll <- lhats - qnorm(0.975)*sd(lhats)
        ul <- lhats + qnorm(0.975)*sd(lhats)
        mean(ll<1/lambda & ul > 1/lambda)
})
ggplot(data.frame(lambdavals, coverage), aes(x = lambdavals, y = coverage)) + geom_line(size = 2) + geom_hline(yintercept = 0.95)+ylim(0, 1.0)

plot of chunk unnamed-chunk-4