The Base R tooth growth dataset

Using techniques pioneered by the Economist’s dedicated data graphics team (http://www.economist.com/blogs/graphicdetail/2014/11/daily-chart-16), it is clear to see that animal growth can have huge impacts on human society. As such the study of Guinea Pig tooth growth is naturally of utmost importance, With this in mind we will examine the ToothGrowth dataset in R.

From the R help page for the dataset:

Description

The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Usage

ToothGrowth Format

A data frame with 60 observations on 3 variables.

[,1] len numeric Tooth length [,2] supp factor Supplement type (VC or OJ). [,3] dose numeric Dose in milligrams.

Looking at the dataset in more detail:

data(ToothGrowth)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
hist(ToothGrowth$len,main="Histogram of Tooth Growth lengths")

Of course, with three variables, we can get a pretty good idea of the entire dataset from a simple 2d graph with color (2d+1 = 3d…)

plot(ToothGrowth$dose,ToothGrowth$len,col=ToothGrowth$supp,
     ylab='tooth length... we\'re assuming AU',
     xlab='dose in mg')

legend("topleft", # places a legend at the appropriate place 
       c('OJ','AC'), # puts text in the legend 

pch=c(1,1), # gives the legend appropriate symbols (lines)

,col=c('black','red'))

We can try a number of different t-tests here:

  1. \(H_0\) The effect of OJ and VC on tooth growth are identical \(\mu_{OJ}=\mu_{VC}\):
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len by ToothGrowth$supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

We actually can’t reject.

–Fitting arbitrary models we could be trumped up here. We can easily imagine a giant effect on one averaging out to a medium mean and equaling a 0 effect on another. Testing by dosage:

at 0.5MG 1) \(H_0\): \(\mu_{OJ 0.5}=\mu_{VC 0.5}\):

dose1<-ToothGrowth[which(ToothGrowth$dose==0.5),]
t.test(dose1$len~dose1$supp)
## 
##  Welch Two Sample t-test
## 
## data:  dose1$len by dose1$supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

We reject the null. OJ has a higher len.

at 1mg 1) \(H_0\): \(\mu_{OJ 1}=\mu_{VC 1}\):

dose2<-ToothGrowth[which(ToothGrowth$dose==1),]
t.test(dose2$len~dose1$supp)
## 
##  Welch Two Sample t-test
## 
## data:  dose2$len by dose1$supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

again, the null is rejected.

at 2mg 1) \(H_0\): \(\mu_{OJ 2}=\mu_{VC 2}\):

dose3<-ToothGrowth[which(ToothGrowth$dose==2),]
t.test(dose3$len~dose1$supp)
## 
##  Welch Two Sample t-test
## 
## data:  dose3$len by dose1$supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

we can’t reject at 2mg.

Holding 2. Larger volumes of supplement lead to larger tooth lengths: a) \(H_0\) \(\mu_{OJ1.0}=\mu_{OJ0.5}\)

OJ.5to1<-ToothGrowth[which(ToothGrowth$supp=='OJ'),]
OJ.5to1<-OJ.5to1[which(OJ.5to1$dose!=2),]
t.test(OJ.5to1$len~OJ.5to1$dose)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.5to1$len by OJ.5to1$dose
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.415634  -5.524366
## sample estimates:
## mean in group 0.5   mean in group 1 
##             13.23             22.70
b) $H_0$ $\mu_{OJ2.0}=\mu_{OJ1.0}$
OJ1to2<-ToothGrowth[which(ToothGrowth$supp=='OJ'),]
OJ1to2<-OJ1to2[which(OJ1to2$dose!=0.5),]
t.test(OJ1to2$len~OJ1to2$dose)
## 
##  Welch Two Sample t-test
## 
## data:  OJ1to2$len by OJ1to2$dose
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5314425 -0.1885575
## sample estimates:
## mean in group 1 mean in group 2 
##           22.70           26.06
c) $H_0$ $\mu_{VC1.0}=\mu_{VC0.5}$
OJ.5to1<-ToothGrowth[which(ToothGrowth$supp=='VC'),]
OJ.5to1<-OJ.5to1[which(OJ.5to1$dose!=2),]
t.test(OJ.5to1$len~OJ.5to1$dose)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.5to1$len by OJ.5to1$dose
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.265712  -6.314288
## sample estimates:
## mean in group 0.5   mean in group 1 
##              7.98             16.77
d) c) $H_0$ $\mu_{VC2.0}=\mu_{VC1.0}$
OJ1to2<-ToothGrowth[which(ToothGrowth$supp=='VC'),]
OJ1to2<-OJ1to2[which(OJ1to2$dose!=0.5),]
t.test(OJ1to2$len~OJ1to2$dose)
## 
##  Welch Two Sample t-test
## 
## data:  OJ1to2$len by OJ1to2$dose
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.054267  -5.685733
## sample estimates:
## mean in group 1 mean in group 2 
##           16.77           26.14

For each case we clearly reject the null in favor of increasing len with increasing doseage.

Of course, a succinct way to express all of this is a boxplot. Note that for some reason, base R likes to plot the median (the dark line) and the 95% confidence interval of the median (edges of the boxes), rather than the mean. This changes our rejection of the null hypothesis only for the 0.5 dose comparison of OJ to VC:

\(H_0: \tilde{X}_{0.5VC}=\tilde{X}_{0.5OJ}\)

is not rejected. Although

\(H_0: \mu_{0.5VC}=\mu_{0.5OJ}\)

is.

In any case, a visual summary of median confidence intervals is below:

boxplot(len~dose*supp, data=ToothGrowth)

Conclusions:

It is clear that both OJ and ascorbic acid has a positive correlation with increasing tooth length. Assuming that we can reasonable hold Ceteris Paribus in these guinea pigs we may even be able to assert causation. Interestingly there is some evidence that while under low doses the effectiveness of the two delivery methods is clearly different (OJ supporting higher tooth growth)

The turkey analysis…

As an extension we might consider how Guniea Pig teeth would grow given arbitrarily large ammounts of viatamin C. Assuming the relationship between Guinea Pig teeth growth observed over this window is linear, and additionally that it holds for doseages as high as 0.1 gram of ascorbic acid, it’s possible that we may be able to generate guinea pig teeth longer than 1000… does it bother anyone else that there is no documentation about the units on the len variable for guinea pig teeth? Let’s assume AU. Guinea pigs with teeth as long as 1000 AU are possible with 0.1 gram of ascorbic acid. World destroying consequenses would likely result… well, solar system destroying probably.

dat<-data.frame("dose"=seq(0,100,by=0.1))
pred<-predict(glm(len~dose^2,data=ToothGrowth[which(ToothGrowth$supp=='VC'),]),newdata=dat)
plot(len~dose, data=ToothGrowth,xlim=c(0,100),ylim=c(0,2000),main="A completely plausible
     projection of Guinea Pig Teeth Growth.",
     xlab='dose in mg',
     ylab='tooth length (AU)')
lines(lowess(dat$dose,unname(pred)))