Part 1: Background Introduction

The following is the introduction of the first Part of the assignment:

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also also 1/lambda. Set lambda = 0.2 for all of the simulations. In this simulation, you will investigate the distribution of averages of 40 exponential(0.2)s. Note that you will need to do a thousand or so simulated averages of 40 exponentials.

Illustrate via simulation and associated explanatory text the properties of the distribution of the mean of 40 exponential(0.2)s. You should 1. Show where the distribution is centered at and compare it to the theoretical center of the distribution. 2. Show how variable it is and compare it to the theoretical variance of the distribution. 3. Show that the distribution is approximately normal.

Note that for point 3, focus on the difference between the distribution of a large collection of random exponentials and the distribution of a large collection of averages of 40 exponentials.

As a motivating example, compare the distribution of 1000 random uniforms

hist(runif(1000))

# and the distribution of 1000 averages of 40 random uniforms

mns = NULL
for (i in 1 : 1000) mns = c(mns, mean(runif(40)))
hist(mns)

Excercise

How are these distributions related with each other?

# Load some usual libraries
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)


set.seed(111)
lambda <- 0.2
simID <- seq_along(1:1000)
expon <- 40

# Apply for each of the 1000 simulations, the mean of
# the 40 exponentials with lambda as 0.2
createDF <- data.frame(x=sapply(simID, function(x) { mean(rexp(expon, lambda))}))

# Check for first 10 elements
createDF[1:10,]
##  [1] 5.720126 3.986680 5.400150 4.747093 3.129122 5.851082 4.746867
##  [8] 3.607837 5.331758 4.150056
# What is the distributions center
mean(createDF$x)
## [1] 5.02562
# What is the expected center of distribution
1/lambda
## [1] 5
# The standard deviation
sd(createDF$x)
## [1] 0.7790891
# The variance
var(createDF$x)
## [1] 0.6069798

How do these compare to the expected standard deviationr and variances?

# To SD
1/lambda/sqrt(expon)
## [1] 0.7905694
# To vaiance
((1/lambda)/sqrt(expon))^2
## [1] 0.625
# Plotting of the DF
ggplot(data=createDF, aes(x)) + geom_histogram(aes(y=..density..), binwidth=0.2) + stat_function(fun=dnorm, args=list(mean=5, sd=sd(createDF$x)))

Seems that the distribution is very normal.