Statistical Inference - Project Part I

Properties of the distribution of the mean of 40 exponential(0.2)s

The exponential distribution is given as lambda * e^{-(lambda *x})

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations.

set.seed(0)
rate <- 0.2
samples <- 40
nosims <- 1000

x <- replicate(nosims,mean(rexp(samples,rate))) #vector containing means of 40 samples repeated 1000 times

1. Show where the distribution is centered at and compare it to the theoretical center of the distribution.

The theoretical mean of the exponential distribution is calculated as 1/lambda.

lambda = 0.2, hence mean = 1/0.2 = 5

As the distribution of averages of the sample estimates the mean of the random variable distribution, the theoretical mean is simply 5.

Theoretical mean of this exponential distribution is 5

The mean of the simulated distribution is:

mean(x)

## [1] 4.989678

2. Show how variable it is and compare it to the theoretical variance of the distribution.

The theoretical variance of the exponential distribution is (1/lambda)²

(1/.2)² = 25

The variance of the averages of the sample distribution is (variance of random variable distribution)/n = 25/40 = 0.625 And theoretical standard deviation is 0.7905

Theoretical variance of this exponential distribution is 0.625

Theoretical standard deviation of this exponential distribution is 0.7905

The variance and standard deviation of the simulated distribtion are:

var(x)

## [1] 0.6181582

sd(x)

## [1] 0.7862304

3. Show that the distribution is approximately normal.

First let's simulate the distribution of a 1000 random exponentials.

library(ggplot2)
y <- rexp(1000,rate)
xy <- data.frame(x,y)

plot(ggplot(xy,aes(x=y))+geom_histogram(aes(y=..density..),binwidth=1,color="black",fill="white")+stat_function(fun=dexp, args=list(rate=0.2))+labs(title="Distribution of 1000 random exponentials"))

plot of chunk unnamed-chunk-4

Now let's simulate the distribution of 1000 averages of 40 samples of exponentials.

plot(ggplot(xy,aes(x=x))+geom_histogram(aes(y=..density..),binwidth=0.1,color="black",fill="white")+stat_function(fun=dnorm, args=list(mean=mean(x),sd=sd(x)))+geom_vline(xintercept=mean(x),color="red",lwd=1.2)+labs(title="Distribution of 1000 averages of 40 samples of exponentials")+geom_text(aes(x=mean(x)+0.2,y=-0.01,label=round(mean(x),2)),color="red",size=4)+geom_vline(xintercept=mean(x)+1.96*c(-1,1)*sd(x),color="blue")+geom_text(aes(x=mean(x)+0.2+2*sd(x),y=-0.01,label=round(mean(x)+2*sd(x),2)),color="blue",size=4)+geom_text(aes(x=mean(x)-0.2-1.96*sd(x),y=-0.01,label=round(mean(x)-1.96*sd(x),2)),color="blue",size=4))

plot of chunk unnamed-chunk-5

This plot shows the mean and 95% confidence interval of the distribution of 1000 averages of 40 samples of exponentials

Conclusion

It is clear from the plots above, that the distribution of a large collection of means of exponentials approximates to a normal distribution. The mean of this distribution estimates the mean of the original exponential distribution.