Tooth Growth

Tyler Byers

Statistical Inference, Coursera

Class Project, Problem #2

Aug 2014

Load Data

data(ToothGrowth)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Perform Basic EDA

library(ggplot2)
ggplot(aes(x=dose, y = len), data = ToothGrowth) + 
    geom_point(aes(color = supp)) 

plot of chunk ggplots

ggplot(aes(x = supp, y = len), data = ToothGrowth) + 
    geom_boxplot(aes(fill = supp))

plot of chunk ggplots

ggplot(aes(x = factor(dose), y = len), data = ToothGrowth) + 
    geom_boxplot(aes(fill = factor(dose)))

plot of chunk ggplots

ggplot(aes(x = supp, y = len), data = ToothGrowth) +
    geom_boxplot(aes(fill = supp)) + facet_wrap(~ dose)

plot of chunk ggplots

Based on some simple EDA, the dosage appears to affect tooth length – the higher the supplement, the longer the tooth length. The supplement type may affect the tooth length – with OJ being higher than VC – but it is difficult to tell if the differences are statistically significant. Finally, it appears as if the supplement type affects tooth length at lower dosages, with OJ having a larger effect, but at higher dosages the differences appear minimal, if any at all.

Basic data summary

summary(ToothGrowth)
##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00
# summarize after separating by supplement and dose. Table shows length and summary for all supp/dose combinations.
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), length)
## : OJ
## : 0.5
## [1] 10
## -------------------------------------------------------- 
## : VC
## : 0.5
## [1] 10
## -------------------------------------------------------- 
## : OJ
## : 1
## [1] 10
## -------------------------------------------------------- 
## : VC
## : 1
## [1] 10
## -------------------------------------------------------- 
## : OJ
## : 2
## [1] 10
## -------------------------------------------------------- 
## : VC
## : 2
## [1] 10
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), summary)
## : OJ
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.2     9.7    12.2    13.2    16.2    21.5 
## -------------------------------------------------------- 
## : VC
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20    5.95    7.15    7.98   10.90   11.50 
## -------------------------------------------------------- 
## : OJ
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    14.5    20.3    23.5    22.7    25.6    27.3 
## -------------------------------------------------------- 
## : VC
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    13.6    15.3    16.5    16.8    17.3    22.5 
## -------------------------------------------------------- 
## : OJ
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    22.4    24.6    26.0    26.1    27.1    30.9 
## -------------------------------------------------------- 
## : VC
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    18.5    23.4    26.0    26.1    28.8    33.9

There are 10 samples for each dose/supplement combination (3 dose levels, two supp type = 6 combined levels), for a total of 60 samples.

Confidence Intervals and Hypothesis Testing

Test by Supplement

Test by supplement factor only – do not consider dosage.

t.test(len ~ supp, paired = F, var.equal = F, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.171  7.571
## sample estimates:
## mean in group OJ mean in group VC 
##            20.66            16.96

With a confidence interval of [-0.171, 7.571] for mean(OJ)-mean(VC), we cannot reject the null hypothesis that there is not a significant difference in tooth length between the two supplement types.

Test by Dosage

For these tests, we will ignore the the type of supplement, and see if there is a difference in tooth length based on dosage levels. We create three separate data frames to compare 0.5 vs 1.0, 0.5 vs 2.0, and 1.0 vs 2.0.

Tooth.dose12 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
Tooth.dose13 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
Tooth.dose23 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose12)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.477, df = 37.99, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.984  -6.276
## sample estimates:
## mean in group 0.5   mean in group 1 
##             10.61             19.73
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose13)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.8, df = 36.88, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.16 -12.83
## sample estimates:
## mean in group 0.5   mean in group 2 
##             10.61             26.10
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose23)
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.901, df = 37.1, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996 -3.734
## sample estimates:
## mean in group 1 mean in group 2 
##           19.73           26.10

Test by Supplement Across Dosage Levels

Finally, for this test we will see if, given certain dosage levels, there is a significant difference in tooth growth between the two supplement types (i.e. at dose level 0.5 mg, is there a significant difference in tooth growth between VC and OJ supplement types?).

Tooth.dose05 <- subset(ToothGrowth, dose == 0.5)
Tooth.dose10 <- subset(ToothGrowth, dose == 1.0)
Tooth.dose20 <- subset(ToothGrowth, dose == 2.0)
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose05)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose10)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.033, df = 15.36, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802 9.058
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose20)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.798  3.638
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

Conclusions and Assumptions

Conclusions

Assumptions