Statistical Inference Course Project

Part 1 – simulation Exercise

Show the sample mean, and compare to the theoretical mean of the distribution.
Show how variable the sample is (via variance) and compare it to the theoretical variance of the distribution.
Show the distribution is approximately normal.

We set the seed for reproducibility, then define the lambda, number of exponentials, and the number of simulations to run.

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.4.1

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Warning: package 'ggplot2' was built under R version 3.4.1

## Warning: package 'tidyr' was built under R version 3.4.1

## Warning: package 'readr' was built under R version 3.4.1

## Warning: package 'purrr' was built under R version 3.4.1

## Warning: package 'dplyr' was built under R version 3.4.1

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

set.seed(20)

lambda <- 0.2
n <- 40
sims <- 1000

We run the simulations/replications and then calculate the mean.

sim_exe <- replicate(sims, rexp(n, lambda))
mean_sim_exe <- apply(sim_exe, 2, mean)

hist(mean_sim_exe, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", breaks = 20)
abline(v = mean(mean_sim_exe), lwd = "5", col = "blue")

Sample mean vs Theoretical Mean

We can calculate the simulated mean and then the theoretical mean to compare the two. We see that both the theoretical and simulated means are very close at almost 5.

sim_mean <- mean(mean_sim_exe)
sim_mean

## [1] 4.963828

theoretical_mean <- 1/lambda
theoretical_mean

## [1] 5

Sample Variance vs Theoretical Variance

Again we can see that our simulated and theoretical variances are very close, at almost 0.6 for each.

sim_variance <- (sd(mean_sim_exe))^2
sim_variance

## [1] 0.5919712

theoretical_variance <- (1/lambda)^2 / n
theoretical_variance

## [1] 0.625

Distribution

Next we are looking at the distribution, which falls into a relatively normal distribution. We can see the min and max at about 3 and 8.

min(mean_sim_exe)

## [1] 2.911542

max(mean_sim_exe)

## [1] 8.392128

hist(mean_sim_exe, prob = TRUE, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", xlim = c(floor(min(mean_sim_exe)), ceiling(max(mean_sim_exe))))

hist(mean_sim_exe, prob = TRUE, xlab = "Average Exponentials", main = "Distribution of the Average Exponentials", breaks = 40)
lines(density(mean_sim_exe), col = "blue")
abline(v = mean(mean_sim_exe), lwd = "5", col = "blue")

qqnorm(mean_sim_exe)
qqline(mean_sim_exe, col = 4)

The bulk of the middle of the graph shows how our numbers align and follow a normal distribution. This is approximately normal with some variation out at the extremes of the values.

Statistical Inference Course Project

Tom

September 17, 2017

Overview

Part 1 – simulation Exercise

Sample mean vs Theoretical Mean

Sample Variance vs Theoretical Variance

Distribution