Synopsis This is a project for the Coursera Statistical Inference Class. The project consists of two parts: 1. Simulation Exercise to explore inference 2. Basic inferential analysis using the ToothGrowth data in the R datasets package
Part 1 - Simulation Exercise
Overview
Investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. The mean of exponential distribution is 1/lambda and the standard deviation is also 1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the distribution of averages of 40 exponentials. Note that you will need to do a thousand simulations.
Prepare the Environment
Load Libraries and set Global Options.
#To suppress loading messages set *message = FALSE*.
#Set global options *echo = TRUE* so others will be able to read the code and set *results = hold* to hold & push output to end of chunk.
library(knitr)
opts_chunk$set(echo = TRUE, results = 'hold')
library(data.table)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
Set variables as defined in the problem.
n <- 40 # number of exponentials (sample size)
lambda <- 0.2 # lambda for rexp (limiting factor) (rate)
nosim <- 1000 # number of simulations
quantile <- 1.96 # 95th % quantile to be used in Confidence Interval
set.seed(234) # set the seed value for reproducibility
Create a matrix with 1000 simulations each with 40 samples drawn from the exponential distribution.
# Use rexp() and matrix() to generate 40 samples creating a matrix with 1000 rows and 40 columns.
simData <- matrix(rexp(n * nosim, rate = lambda), nosim)
Calculate the averages across the 40 values for each of the 1000 simulations.
simMeans <- rowMeans(simData) # Matrix Mean
Mean Comparison
Show the sample mean and compare it to the theoretical mean of the distribution.
Mean of Sample Means
Calculate the actual mean of sample data; the average sample mean of 1000 simulations of 40 randomly sampled exponential distributions.
sampleMean <- mean(simMeans) # Mean of sample means
sampleMean
## [1] 5.001573
Theoretical Mean
Calculate the theoretical mean; the expected mean of the exponential distribution of rate = 1/lambda.
theoMean <- 1 / lambda # Theoretical Mean
theoMean
## [1] 5
Variance Comparison
Show how variable the sample is and compare it to the theoretical variance of the distribution.
Sample Variance
Calculate the Actual Variance.
sampleVar <- var(simMeans)
sampleVar
## [1] 0.6631504
Theoretical Variance
Calculate the theoretical variance (expected variance).
theoVar <- (1 / lambda)^2 / (n)
theoVar
## [1] 0.625
Sample Standard of Deviation
Calculate the sample means standard of deviation.
sampleSD <- sd(simMeans)
sampleSD
## [1] 0.8143405
Theoretical Standard of Deviation
Calculate the theoretical standard of deviation.
theoSD <- 1/(lambda * sqrt(n))
theoSD
## [1] 0.7905694
RESULTS
Show that the distribution is approximately normal.
plotdata <- data.frame(simMeans)
m <- ggplot(plotdata, aes(x =simMeans))
m <- m + geom_histogram(aes(y=..density..), color="black",
fill = "lightblue")
m <- m + labs(title = "Distribution of averages of 40 Samples", x = "Mean of 40 Samples", y = "Density")
m <- m + geom_vline(aes(xintercept = sampleMean, color = "green"))
m <- m + geom_vline(aes(xintercept = theoMean, color = "violet"))
m <- m + stat_function(fun = dnorm, args = list(mean = sampleMean, sd = sampleSD), color = "blue", size = 1.0)
m <- m + stat_function(fun = dnorm, args = list(mean = theoMean, sd = theoSD), color = "red", size = 1.0)
m
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Confidence Interval Comparison
Check the confidence interval levels to see how they compare.
Sample CI
Calculate the sample confidence interval; sampleCI = mean of x plus or minus the .975th normal quantile times the standard error of the mean standard deviation of x divided by the square root of n (the length of the vector x).
sampleConfInterval <- round (mean(simMeans) + c(-1,1)*1.96*sd(simMeans)/sqrt(n),3)
sampleConfInterval
## [1] 4.749 5.254
Theoretical CI
Calculate the theoretical confidence interval; theoCI = theoMean of x plus or minus the .975th normal quantile times the standard error of the mean standard deviation of x divided by the square root of n (the length of the vector x).