Part 1: Simulation Exercise

Overview

This project consists of two parts. In the first part, I use the sample averages drawn from the exponential distribution to illustrate core properties of the Central Limit Theorem (CLT) and show that the distribution of the samples is similar to predictions made using the CLT. In the second part, I use the ToothGrowth dataset to test the null hypotheses that supplement and dose do not affect tooth length.

Simulations

I first created a histogram of 1,000 data points from a random exponential distribution.

set.seed(1234)
lambda <- 0.2
hist(rexp(1000, lambda), xlab = "x", main = "Histogram of a random exponential distribution")

As expected, the distribution is far from normal, with a mode against the y-axis and an extreme right skew. However, if we plot 1,000 averages, each the average of 40 random exponential distributions, we get a far different distribution.

set.seed(2345)
n <- 40
expmatrix <- matrix(rexp(n*1000, lambda), ncol = 40, nrow = 1000)
meanexp <- rowMeans(expmatrix)
hist(meanexp, xlab = "x", main = "Histogram of the means of 40 random exponential distributions")

The distribution of means is very close to a normal distribution even though the underlying exponential distributions are far from normal.

Sample Mean vs Theoretical Mean

The theoretical mean of an exponential distribution with a lambda of 0.2 is 1/lambda = 5. The sample mean is 4.9630234. The simulated mean is very close to the theoretical mean, as is shown in the figure below.

hist(meanexp, xlab = "Mean of random exponential distributions", main = "Means of 40 random exponential distributions")
abline(v = 1/lambda, col = "black", lwd = 2)
abline(v = mean(meanexp), col = "blue", lty = 2, lwd = 1.5)

sdtheory <- ((1/lambda)/sqrt(40))
sdsim <- sd(meanexp)
abline(v = 5 - sdtheory, col = "black", lty = 2, lwd = 1.5)
abline(v = 5 + sdtheory, col = "black", lty = 2, lwd = 1.5)
abline(v = mean(meanexp) - sdsim, col = "blue", lty = 3, lwd = 1.5)
abline(v = mean(meanexp) + sdsim, col = "blue", lty = 3, lwd = 1.5)
Histogram of the means of 40 random exponential distributions with theoretical mean +/- 1 standard deviation (black) and actual mean +/- 1 standard devaition (blue). The simulated standard deviation is noticeably smaller than the theoretical standard deviation.

Histogram of the means of 40 random exponential distributions with theoretical mean +/- 1 standard deviation (black) and actual mean +/- 1 standard devaition (blue). The simulated standard deviation is noticeably smaller than the theoretical standard deviation.

Sample Variance vs Theoretical Variance

The theoretical variance is 0.625. The simulated variance is 0.548283. The simulated variance is 0.076717 smaller than the theoretical variance, reflected in the noticeably smaller simulated standard deviation versus the theoretical standard deviation in the histogram above.

Distribution

The distribution of means appears to be nearly normal, closely matching the theoretical mean and variance calculated assuming a normal distribution. Further evidence can be found by comparing the density function of the simulated data to the density function of the theoretical normal distribution.

hist(meanexp, xlab = "Mean of random exponential distributions", main = "Means of 40 random exponential distributions")
Histogram and density distribution of simulated data (in blue) versus theoretical density distribution (in red) of means of exponential distributions with a lambda = 0.2.

Histogram and density distribution of simulated data (in blue) versus theoretical density distribution (in red) of means of exponential distributions with a lambda = 0.2.

exphist <- hist(meanexp)
weight <- exphist$counts / exphist$density
x <- seq(min(meanexp), max(meanexp), length.out = 100)
bell <- dnorm(x = x, mean = mean(meanexp), sd = sd(meanexp))
lines(x, bell*weight[1], col = "blue", lwd = 2)
normal <- dnorm(x = x, mean = 1/lambda, sd = (1/lambda)/sqrt(40))
lines(x, normal*weight[1], col = "red", lwd = 2)
Histogram and density distribution of simulated data (in blue) versus theoretical density distribution (in red) of means of exponential distributions with a lambda = 0.2.

Histogram and density distribution of simulated data (in blue) versus theoretical density distribution (in red) of means of exponential distributions with a lambda = 0.2.

As you can clearly see, the density curve of the simulated data closely matches that of a theoretical density distribution with a lambda of 0.2.