Overview

Part One of this report will focus on investigating the exponential distribution in R and comparing it with the Central Limit Theorem. Part Two will be analyzing the ToothGrowth data, comparing tooth growth by supp and dose.

Simulations

The exponential distribution can be simulated in R with rexp(n, lambda) where lambda is the rate parameter. We set lambda = 0.2 for all simulations. The code below runs the simulation of 40 exponentials 1000 times.

set.seed(1)

lambda <- 0.2
n <- 40
numsim <- 1000

data <- matrix(rexp(n*numsim, lambda), numsim)

Next, we calculate the theoretical and actual mean and variance.

theoMean <- 1/lambda
rowMean <- apply(data, 1, mean)
actlMean <- mean(rowMean)
theoSTDEV <- ((1/lambda) * (1/sqrt(n)))
actSTDEV <- sd(rowMean)
theoVariance <- theoSTDEV^2
actVariance <- var(rowMean)

Analysis

The graph displays both our theoretical and actual means: The yellow dashed line being the theoretical mean, the black line being the actual mean, the red curve being our theoretical variance and our blue line being the actual variance. The table compares the values of the means, standard deviations and variance.

dfRowMeans <- data.frame(rowMean) # convert to data.frame for ggplot
mp <- ggplot(dfRowMeans, aes(x=rowMean))
mp <- mp + geom_histogram(binwidth = lambda, fill="orange", color="black", aes(y = ..density..))
mp <- mp + labs(title="Density of 40 Numbers from Exponential Distribution", x="Mean of 40 Selections", y="Density")
mp <- mp + geom_vline(xintercept=actlMean,size=1.0, color="black")
mp <- mp + stat_function(fun=dnorm,args=list(mean=actlMean, sd=actSTDEV),color = "blue", size = 1.0)
mp <- mp + geom_vline(xintercept = theoMean, size = 1.0, color = "yellow", linetype = "longdash")
mp <- mp + stat_function(fun = dnorm, args = list(mean = theoMean, sd = theoSTDEV),color = "red", size = 1.0)
mp <- mp + theme_bw() + theme(plot.title = element_text(hjust = 0.5))
mp

Variable Theoretical Value Actual Value
Mean 5 4.9900252
Standard Deviation 0.7905694 0.7859435
Variance 0.625 0.6177072

As the graph shows, our distribution is approximately normal by following the Central Limit Theorem.