Statistical Inference Course Project

Introduction

In this report it will be applied some of techniques teached in class of Inferential Statistics. First, the exponential distribution will be explored by use of simulation and analysis with respect to Central Limit Theorem.

In second part of this report, the ToolGrowth data will be presented. Some statiscal methods will be applied to compare the response in function of the variables.

Part 1 - Exponencial distribution

The CLT (Central Limit Theorem) states that the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases.

From Wikipedia: “The exponencial distribution is the probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur continuously and independently at a constant rate.”

For this function, the mean and standard deviatios is equal to \(1/\lambda\), where \(\lambda\) is the rate.

Some simulations with exponencial distribution will be carried out and the results will be compared to what CLT states.

First, let’s make 1000 simulation of samples of size 40 of a expoential distribution which \(\lambda\) is equal to 0.2.

set.seed(10000)
mns<-NULL
for (i in 1:1000)
  mns <- c(mns, mean(rexp(40,0.2)))
hist(mns, main="Histogram of means of 40 exponetials", xlab = "Mean of 40 exponenitials")
abline(v=mean(mns), lwd = 4, col = "blue")

The mean of means of all simulations is equal to 5.01 and the expected mean of a exponential distribution is 1/\(\lambda\) = 1/0.2 = 5. This result shows agreement with CLT.

Let’s evaluate the variance of samples of 40 exponentials.

vrc<-NULL
for (i in 1:1000)
  vrc <- c(vrc, var(rexp(40,0.2)))
hist(vrc, main="Histogram of variance of 40 exponetials", xlab = "Variance of 40 exponenitials")
abline(v=mean(vrc), lwd = 4, col = "blue")

The mean of variances of all simulations is equal to 24.73 and the expected variance of a exponential distribution is 1/\(\lambda^2\) = 1/0.04 = 25. This result is pretty close to expected variance of exponential distribution, so it shows agreement with CLT.

Next, it will be shown that the distribution of means is approximately normal. For this lets take a summary of the means of simulated samples of 40 exponentials.

summary(mns)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.031   4.467   4.947   5.006   5.519   8.383

Let’s compare the information in summary (first and third quantile) with equivalent of a normal distribution with \(mean = 5\) and \(sd = 5/\sqrt{40}\).

qnorm(c(0.25, 0.5, 0.75), mean = 5, sd = 5/sqrt(40))

## [1] 4.466769 5.000000 5.533231

It can be seen that the quantile values are pretty close between normal distribution and the values obtained in distribution of means of samples.

This section will be closed with a plot comparing a normal distribution of mean = 5 and sd = 5/\(\sqrt{40}\) and this sample means distribution.

library(ggplot2)
dat<-data.frame(x = mns )
g <- ggplot(dat, aes(x = x)) + geom_histogram(binwidth=.3, colour = "black", aes(y = ..density..)) 
g <- g + stat_function(fun = dnorm, size = 1, args = list(mean = 5, sd = 5/sqrt(40)))
g <- g + xlim(c(2,8))
g

The black line is the normal distrution. It is clear that means of sample of 40 exponentials is pretty close to a normal distribution.

Statistical Inference Course Project - Part 1

Andre Morato

April 8, 2017

Introduction

Part 1 - Exponencial distribution