Introduction to ToothGrowth Database

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

A data frame with 60 observations on 3 variables.

[,1] len numeric Tooth length [,2] supp factor Supplement type (VC or OJ). [,3] dose numeric Dose in milligrams/day

C. I. Bliss (1952). The Statistics of Bioassay. Academic Press.

library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(graphics)
data("ToothGrowth")
raw = ToothGrowth

Basic Summary

str(raw)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(raw)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Data trend

raw = group_by(raw, supp, dose)
summ = summarise(raw, len=mean(len))
g1 = ggplot(data = summ, aes(x=dose,y=len,col=factor(supp))) + geom_line()
g1

Comparison across Orange Juice and Vitamin C, given fixed dose. Some outliers can be spotted

g2 = ggplot(data = ToothGrowth, aes(x=factor(dose) , y=len, fill = factor(dose)))+
        geom_boxplot() + facet_grid(.~supp)
g2

Hypothesis Testing

H0: At different level of doses, length of odontoblasts is not statistically significant between OJ and VC. (ie. At different level of doses, µ(OJ) - µ(VC) = 0) H1: At different level of doses, µ(OJ) - µ(VC) != 0

t0.5 = t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supp %in% c("OJ","VC"),],
              var.equal = FALSE)
t1.0 = t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose == 1.0 & ToothGrowth$supp %in% c("OJ","VC"),],
             var.equal = FALSE)
t2.0 = t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose == 2.0 & ToothGrowth$supp %in% c("OJ","VC"),],
             var.equal = FALSE)

result = data.frame(
        "p-value" = c(t0.5$p.value, t1.0$p.value, t2.0$p.value),
        "Conf.Low" = c(t0.5$conf.int[1],t1.0$conf.int[1], t2.0$conf.int[1]),
        "Conf.High" = c(t0.5$conf.int[2],t1.0$conf.int[2], t2.0$conf.int[2]),
        row.names = c("Dose=0.5","Dose=1.0","Dose=2.0"))
result
##              p.value  Conf.Low Conf.High
## Dose=0.5 0.006358607  1.719057  8.780943
## Dose=1.0 0.001038376  2.802148  9.057852
## Dose=2.0 0.963851589 -3.798070  3.638070

Conclusion

As shown in the “result” dataframe, when dose is low (dose = 0.5 or 1.0), 0 is excluded the 95% confidence interval. In the other words, we are 95% confidence to say that µ(OJ) > µ(VC). However, the differece between OJ and VC is not statistically significant when dose = 2.0

Some other conclusions can be derived easily from the plot: 1. for both, VC and OJ, larger dose, longer length 2. for both, VC and OJ, the relationship between length and dose is not linear. The curve grows faster when dose is low and slower when dose is high