knitr::opts_chunk$set(echo = TRUE)
Illustrate via simulation and associated explanatory text, the properties of the distribution of the mean of 40 exponentials
I’m familiar with R, but Knitr is quite impressive. I particularly like the way a lot of the formatting follows familiar formatting conventions, such as Atlassian’s Confluence
setwd("/home/johnm/Desktop")
lambda <- 0.2
nrow <- 1000
ncol <- 40
simulatedData <- matrix(rexp(1000*40, lambda), nrow, ncol)
Means <- apply(simulatedData, 1, mean)
hist(Means, breaks = 50, main = "1,000 averages of 40 random exponentials", xlab = "Means", ylab = "Frequency")
abline(v = 1/lambda, lty = 1, lwd = 5)
legend("topright", lty = 1, lwd = 5, legend = "Theoretical Mean")
SampleMean <- NULL
for(index in 1:nrow)
{
SampleMean <- c(SampleMean, mean(rexp(ncol, lambda)))
}
mean(SampleMean)
## [1] 5.016767
The figure above, and the calculation both show that this is close to the theoretical value for the mean of 5
distVar <- apply(simulatedData, 1, var)
hist(distVar, breaks = 50, main = "This distribution of 1,000 variance of 40 random exponentials", xlab = "Value of variances", ylab = "Frequency of variance", col = "lightblue")
abline(v = (1/lambda)^2, lty = 1, lwd = 2, col = "red")
legend("topright", lty = 1, lwd = 5, col = "blue", legend = "Theoretical variance")
The sample variances have a distribution that approaches a normal distributoin, and is nearly centered on the theoretical variance
The plot below shows the distribution of exponentials with lambda equal to 0.2
par(mfrow = c(3, 1))
hist(simulatedData, breaks = 50, main = "Distribution of exponentials with lambda equals to 0.2", xlab = "Exponentials", col = "orange")
The plot below shows the distribution of 1,000 averages of 40 random exponentials
hist(Means, breaks = 50, main = "The distribution of 1,000 averages of 40 random exponentials", xlab = "Value of means", ylab = "Frequency of means", col = "pink")
The plot below shows the normal distribution, with the mean and standard deviation equal to that in the second plot above.
Comparing the second figure with the third figure, it shows that the distribution of the means tends toward the normal distribution with the same mean and standard deviation.
simNorm <- rnorm(1000, mean = mean(Means), sd = sd(Means))
hist(simNorm, breaks = 50, main = "A normal distribution with theoretical mean and sd of the exponentials", xlab = "Normal variables", col = "lightgreen")
library(stats)
data("ToothGrowth")
library(ggplot2)
suppBox <- ggplot(data = ToothGrowth, aes(x = supp, y = len))
suppBox +geom_boxplot(aes(fill = supp)) + labs(title = "Boxplots of Tooth Length By Supplement", x = "Supplement", y = "Tooth Lenth")
suppDoseBox <- ggplot(data = ToothGrowth, aes(x = supp, y = len))
suppDoseBox + geom_boxplot(aes(fill = supp)) +
facet_wrap(~ dose) +
labs(title = "Tooth Length by Supplement and Dose",
x = "Supplement", y = "Tooth Length")
The box plots shows a strong, positive correlation between the dose and tooth growth; as the dose increases, the tooth growth also increases.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
sd(ToothGrowth$len)
## [1] 7.649315
summary(ToothGrowth[ToothGrowth$supp == "OJ", ])
## len supp dose
## Min. : 8.20 OJ:30 Min. :0.500
## 1st Qu.:15.53 VC: 0 1st Qu.:0.500
## Median :22.70 Median :1.000
## Mean :20.66 Mean :1.167
## 3rd Qu.:25.73 3rd Qu.:2.000
## Max. :30.90 Max. :2.000
sd(ToothGrowth[ToothGrowth$supp == "OJ", ]$len)
## [1] 6.605561
summary(ToothGrowth[ToothGrowth$supp == "VC",])
## len supp dose
## Min. : 4.20 OJ: 0 Min. :0.500
## 1st Qu.:11.20 VC:30 1st Qu.:0.500
## Median :16.50 Median :1.000
## Mean :16.96 Mean :1.167
## 3rd Qu.:23.10 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
sd(ToothGrowth[ToothGrowth$supp == "VC",]$len)
## [1] 8.266029
t.test(x = ToothGrowth$len, data = ToothGrowth, paired = FALSE, conf.level = 0.95)$conf
## [1] 16.83731 20.78936
## attr(,"conf.level")
## [1] 0.95
The lower edge of the 95% confidence interval is 16.84 and the upper edge is 20.79 (truncated at three decimal places, and rounded to two decimal places)
Looking at the data for the means of ‘supp’ and ‘dosage’ shows the following:
summary(ToothGrowth[ToothGrowth$supp == "OJ", ]$len)[4]
## Mean
## 20.66333
summary(ToothGrowth[ToothGrowth$supp == "VC", ]$len)[4]
## Mean
## 16.96333
summary(ToothGrowth[ToothGrowth$dose == 0.5, ]$len)[4]
## Mean
## 10.605
summary(ToothGrowth[ToothGrowth$dose == 1.0, ]$len)[4]
## Mean
## 19.735
summary(ToothGrowth[ToothGrowth$dose == 2.0, ]$len)[4]
## Mean
## 26.1
On this basis, the null hypothesis can be rejected. As mentioned earlier, there is a strong correlation between the dosage and the growth
The null hypothesis can safely be rejected
However, the effect appears to relate to the dose only. The null hypthesis that there is no difference between OJ and VC cannot be rejected