Overview

This document analyzes the ToothGrowth data in the R datasets package. Following a brief summary and exploratory data analyses of the package, tooth growth is compared by supp and dose.

Load data into R & Summary

library(datasets)
data(ToothGrowth)

Here is the data:

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
tail(ToothGrowth)
##     len supp dose
## 55 24.8   OJ    2
## 56 30.9   OJ    2
## 57 26.4   OJ    2
## 58 27.3   OJ    2
## 59 29.4   OJ    2
## 60 23.0   OJ    2
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

A jitter plot of the data (to prevent overplotting):

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + 
        geom_jitter(width = 0.3) + 
        facet_grid(. ~ supp) + 
        labs(title = "ToothGrowth by OJ and VC", x = "Dose", y = "Length")

A boxplot of the data

ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + 
        geom_boxplot() + 
        facet_grid(. ~ supp) + 
        labs(title = "ToothGrowth by OJ and VC", x = "Dose", y = "Length")

It appears that OJ tends to produce longer lengths in doses 0.5 and 1, while dose 2 in VC tends to be more sporadic.

Confidence Intervals of Tooth Growth by Supp and Dose

Test Whole By Supp:

t.test(ToothGrowth$len ~ ToothGrowth$supp)$conf.int
## [1] -0.1710156  7.5710156
## attr(,"conf.level")
## [1] 0.95

Since this confidence interval contains 0, it’s possible that the population means of the lengths by supp are equal.

Test By Individual Doses:

dose.5 <- subset(ToothGrowth, dose == 0.5)
dose1 <- subset(ToothGrowth, dose == 1.0)
dose2 <- subset(ToothGrowth, dose == 2.0)

Dose 0.5:

t.test(dose.5$len ~ dose.5$supp)$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95

It is not possible that these two population means are equal.

Dose 1.0:

t.test(dose1$len ~ dose1$supp)$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95

It is not possible that these two population means are equal.

Dose 2.0:

t.test(dose2$len ~ dose2$supp)$conf.int
## [1] -3.79807  3.63807
## attr(,"conf.level")
## [1] 0.95

These two population means could be equal.

Conclusion

This analysis has shown that there is a 95% confidence rate that dose 0.5 and dose 1.0 produce longer tooth length in OJ, whereas there is no statistically significant advantage to either supp in dose 2. Therefore, the OJ supp is the more effective option.