Statistical Inferences

Synopsis - This report explored and simulates exponentional distribution and compares it against CLT. The distribution generates 40 exponentional variables 1000 times and calculates mean of those 40 variables. The report then analyzes the distribution of those means thus obtained.

ExpMean <- c(integer(0))
for( i in 1:1000) { ExpMean[i] <- mean(rexp(40,0.2)) }

Above piece of code generates a vector ExpMeanDist which contains the mean of 40 random exponential variables with lambda = 0.02.

Below is what this distribution looks like

## Loading required package: ggplot2

dist <- data.frame(Exp = ExpMean, label = c(rep("Mean Exp", times=1000)))
ggplot(dist,aes(x=Exp),bindwidth=0.5) + 
    geom_histogram(fill="red", colour="black", alpha=0.5, binwidth=0.5) +
    geom_vline(xintercept=mean(ExpMean), lwd=0.5, linetype="dashed", colour="darkgreen") + 
    geom_vline(xintercept=5, lwd=0.5, linetype="solid", colour="orange")

1. Show the sample mean and compare it to the theoretical mean of the distribution.

The mean of this sample distribution,shown by dashed green line is 4.96

The theoritical mean shown by solid orange line is 5

Inference - both sample mean and the theoritical means are in agreement

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

The theoritical variance (\(\sigma\)) of this distribution is \((1/\lambda)^2\) i.e. 25

Assuming sample to comprise of iid variables, then the variance of the sample mean should be = \(\sigma^2/n\) = 25/40 = 0.62

The variance of the distribution of sample means, given by round(var(ExpMean),2) is 0.61

Inference - The sample is a normally distributed around variance of 0.62. The variance of distribution returned by var function and variance estimated by \(\sigma^2/n\) are inline with each other.

3. Show that the distribution is approximately normal.

To evaluate above, let us compare the distribution with a another normal distribution of 1000 random exponential variables and plot them side by side to see if they appear similar. To do this, create another data fram with 1000 random exponetial variables with lambda 0.2 and label as ‘Random Exp’

RandDist <- data.frame(Exp = rexp(40,0.2), label= "Random Exp")
## Row bind two data frames to merge the data
dist <- rbind(dist,RandDist)
ggplot(dist,aes(x=Exp,fill=label)) + 
geom_density(alpha=0.2) + 
geom_vline(xintercept=c(mean(dist$Exp), mean(RandDist$Exp)), colour=c("red", "green"), linetype="dashed", lwd=1)

This is a clear applicaiton of CLT. according to which the distirbution of mean of iid samples is normally distributed.

Statistical Inferences - Part I

Bharat Naruka

May 23, 2015

1. Show the sample mean and compare it to the theoretical mean of the distribution.

2. Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.

3. Show that the distribution is approximately normal.