Inferential Statistics - Course Project 6

Introduction

In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponentials.

Simulations

Include English explanations of the simulations you ran, with the accompanying R code.

Set lambda to 0.2.

lambda <- 0.2

I create a first data object dt1 with a vectorized operation. When possible, I prefer this approach rather than a loop function. I set set.seed(123).

set.seed(123)
dt1 <- replicate(1000, mean(rexp(40, lambda)))
class(dt1)

## [1] "numeric"

Sample Mean versus Theoretical Mean

Include figures with titles. In the figures, highlight the means you are comparing. Include text that explains the figures and what is shown on them, and provides appropriate numbers.

The sample mean is not very distant from the theoretical mean.

s_mean <- mean(dt1)
th_mean <- 1/lambda
s_mean

## [1] 5.011911

th_mean

## [1] 5

Now I am going to graphically compare the distribution of data dt1 abouth the means.

hist(dt1, main = "Simulation means frequencies", col = "lightgray")
abline(v = s_mean, col = "green")
abline(v = th_mean, col = "red")

As anticipated after the calculation of the means, the deffirence between the sample mean and the theoretical mean is not relevant.

Sample Variance versus Theoretical Variance

Include figures (output from R) with titles. Highlight the variances you are comparing. Include text that explains your understanding of the differences of the variances.

Comparison between sample variance and theoretical variance.

s_var <- var(dt1)
th_var <- ((1/lambda)^2)/40
s_var

## [1] 0.6004928

th_var

## [1] 0.625

Distribution

Via figures and text, explain how one can tell the distribution is approximately normal.

As anticipated, variance and theoretical variance do not differ that much, hence variability is similar between to a normal distribution. That is maily due to the Central Limit Theorem. You can approximate the sample’s distribution with a normal distribution if the sample size is large enough.

# Add normal distribution line to hist

par(mfrow = c(1, 2))
h2 <- hist(dt1, col="lightgray", main="Normal distribution curve") 
xfit<-seq(min(dt1),max(dt1),length=40) 
yfit<-dnorm(xfit,mean=mean(dt1),sd=sd(dt1)) 
yfit <- yfit*diff(dt1[1:2])*length(dt1) 
lines(xfit, yfit, col="red", lwd=2)

h1 <- hist(dt1, probability = TRUE, col = "lightgray", main = "Probability density curve")
lines(density(dt1), col = "red", lwd = 2)