Tyler Byers
Statistical Inference, Coursera
Class Project, Problem #2
Aug 2014
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
library(ggplot2)
ggplot(aes(x=dose, y = len), data = ToothGrowth) +
geom_point(aes(color = supp))
ggplot(aes(x = supp, y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = supp))
ggplot(aes(x = factor(dose), y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = factor(dose)))
ggplot(aes(x = supp, y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = supp)) + facet_wrap(~ dose)
Based on some simple EDA, the dosage appears to affect tooth length – the higher the supplement, the longer the tooth length. The supplement type may affect the tooth length – with OJ being higher than VC – but it is difficult to tell if the differences are statistically significant. Finally, it appears as if the supplement type affects tooth length at lower dosages, with OJ having a larger effect, but at higher dosages the differences appear minimal, if any at all.
summary(ToothGrowth)
## len supp dose
## Min. : 4.2 OJ:30 Min. :0.50
## 1st Qu.:13.1 VC:30 1st Qu.:0.50
## Median :19.2 Median :1.00
## Mean :18.8 Mean :1.17
## 3rd Qu.:25.3 3rd Qu.:2.00
## Max. :33.9 Max. :2.00
# summarize after separating by supplement and dose. Table shows length and summary for all supp/dose combinations.
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), length)
## : OJ
## : 0.5
## [1] 10
## --------------------------------------------------------
## : VC
## : 0.5
## [1] 10
## --------------------------------------------------------
## : OJ
## : 1
## [1] 10
## --------------------------------------------------------
## : VC
## : 1
## [1] 10
## --------------------------------------------------------
## : OJ
## : 2
## [1] 10
## --------------------------------------------------------
## : VC
## : 2
## [1] 10
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), summary)
## : OJ
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.2 9.7 12.2 13.2 16.2 21.5
## --------------------------------------------------------
## : VC
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 5.95 7.15 7.98 10.90 11.50
## --------------------------------------------------------
## : OJ
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.5 20.3 23.5 22.7 25.6 27.3
## --------------------------------------------------------
## : VC
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.6 15.3 16.5 16.8 17.3 22.5
## --------------------------------------------------------
## : OJ
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.4 24.6 26.0 26.1 27.1 30.9
## --------------------------------------------------------
## : VC
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.5 23.4 26.0 26.1 28.8 33.9
There are 10 samples for each dose/supplement combination (3 dose levels, two supp type = 6 combined levels), for a total of 60 samples.
Test by supplement factor only – do not consider dosage.
t.test(len ~ supp, paired = F, var.equal = F, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.171 7.571
## sample estimates:
## mean in group OJ mean in group VC
## 20.66 16.96
With a confidence interval of [-0.171, 7.571] for mean(OJ)-mean(VC), we cannot reject the null hypothesis that there is not a significant difference in tooth length between the two supplement types.
For these tests, we will ignore the the type of supplement, and see if there is a difference in tooth length based on dosage levels. We create three separate data frames to compare 0.5 vs 1.0, 0.5 vs 2.0, and 1.0 vs 2.0.
Tooth.dose12 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
Tooth.dose13 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
Tooth.dose23 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose12)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.477, df = 37.99, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.984 -6.276
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.61 19.73
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose13)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.8, df = 36.88, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.16 -12.83
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.61 26.10
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose23)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.901, df = 37.1, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996 -3.734
## sample estimates:
## mean in group 1 mean in group 2
## 19.73 26.10
Finally, for this test we will see if, given certain dosage levels, there is a significant difference in tooth growth between the two supplement types (i.e. at dose level 0.5 mg, is there a significant difference in tooth growth between VC and OJ supplement types?).
Tooth.dose05 <- subset(ToothGrowth, dose == 0.5)
Tooth.dose10 <- subset(ToothGrowth, dose == 1.0)
Tooth.dose20 <- subset(ToothGrowth, dose == 2.0)
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose05)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose10)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.033, df = 15.36, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802 9.058
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose20)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.798 3.638
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
var.equal = FALSE for all the t tests).