Overview

In this project, I will investigate the exponential distribution in R and compare it with the Central Limit Theorem, which including the mean of exponential distribution and the standard deviation. And I will investigate the distribution of averages of 40 exponentials.

Simulations

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. Set lambda = 0.2 for all of the simulations. n is the size of the sample.

Sample Mean versus Theoretical Mean

We first create a function named meanplot to plot the umulative mean with respect to the number of observations. then run the code meanplot(n,lambda). The red dash line in the picture is the theoretical mean.

lambda = 0.2
m =10000
n=40

Meanplot<- function(m,n,lambda){
        mns = NULL
        for (i in 1 : m) mns = c(mns, mean(rexp(n,lambda)))
        hist(mns, xlab = "means of n obs", probability = T)
        abline(v=5,lwd=2,col="red",lty=2)
        abline(v= mean(mns),lwd=2,col="blue")
}

Meanplot(m,n,lambda)

The red dashed line is the theoretical center, the blue line is the distribution center. it is pretty close.

lambda = 0.2;n=1000

meanplot<- function(n,lambda){
          x=1:n; y= rexp(n, lambda) 
          ymean=NULL
          for (i in 1 : n) ymean[i] = mean(y[1:i])
          plot(x,ymean,type = "l", ylab = "Cumulative mean", xlab = "Number of obs")
          abline(h=5,lwd=2,col="red",lty=2)
}

meanplot(n,lambda)

Or we could try this using ggplot2 package.

library(ggplot2)
lambda = 0.2;n=1000
# Put initialization code in this file.
meanPlot <- function(n){
  means<- cumsum(rexp(n,lambda))/(1:n)
  g <- ggplot(data.frame(x = 1 : n, y = means), aes(x = x, y = y)) 
  g <- g + geom_hline(size=1.5 ,yintercept = 5,alpha=0.6,
    linetype="longdash") + geom_line(size = 1)
if(n<100){
    g <- g + geom_point(colour="red",size=3,alpha=0.8)
    }    
  g <- g + labs(x = "Number of obs", y = "Cumulative mean")
  g <- g + scale_x_continuous(breaks=seq(0,n+1,ceiling(n/10)))
  print(g)
  invisible()
}
meanPlot(n)

From the picture, we can see that the variance of sample is asymptotic to the theoretical mean 5 as sample size n getting bigger.

Sample Vairance versus Theoretical Variance

We first create a function named varianceplot to plot the sample variance with respect to the sample size n. then run the code varianceplot(n,lambda).

lambda = 0.2;n=1000
varianceplot<- function(n,lambda){
          x=1:n; y= rexp(n, lambda) 
          yvar=NULL
          for (i in 1 : n) yvar[i] = sqrt(var(y[1:i]))
          plot(x,yvar,type = "l", ylab = "standard deviation of the sample", xlab = "sample size")
          abline(h=5,lwd=2,col="red",lty=2)
}

varianceplot(n,lambda)

From the picture, we can see that the variance of sample is asymptotic to the theoretical variance as sample size n getting bigger.

lambda = 0.2
m =10000
n=40
Varplot<- function(m,n,lambda){
        mns = NULL
        for (i in 1 : m) mns = c(mns, sd(rexp(n,lambda)))
        hist(mns, xlab = "standard deviation of obs", probability = T)
        abline(v=5,lwd=2,col="red",lty=2)
}

Varplot(m,n,lambda)

From the picture, the variance of 40 observations is center at the theoretical variance of the red dashed line.

Distribution

We first create a function named meanhist to plot the sample mean with of m simulations. then run the code mean(m,n,lambda). The red dash line in the picture is the theoretical standard deviation.

  lambda = 0.2
  m =10000
  n=40

meanhist<- function(m,n,lambda){
        mns = NULL
        for (i in 1 : m) mns = c(mns, mean(rexp(n,lambda)))
        mns = (mns-1/lambda)*lambda*sqrt(n)
        hist(mns, xlab = "means of n obs", probability = T)
}

meanhist(m,n,lambda)

x<- seq(-4,4,length = n) 
y<- dnorm(x)
lines(x,y,type = "l", col ="red")
abline(v=0,lwd=4, col = "blue")

# we could create a interactive object to see the distribution of the sample mean.
# library(manipulate)
# manipulate(meanhist(m,n,lambda), n= slider(1,1000, initial = 40, step = 10))

From the picture, the red line is the standard normal distribution, we can see that the mean of sample is close to a normal distribution.