The ToothGrowth Data Set

The ToothGrowth data set in R shows tooth growth in guinea pigs after receiving vitamin C at various doses with either orange juice (“OJ”) or ascorbic acid tablet (“VC”) as the delivery method.

You can graph this data quickly using the method suggested in the data set’s R help page. That graph is a coplot, as seen here:

coplot(len ~ dose | supp, data=ToothGrowth, 
       panel=panel.smooth, xlab="len vs. dose, given type of supp")

I’ll create some other graphs below which break that information out so that individual comparisons can be highlighted. To get started I will separate the data set by delivery method. This lets me make histograms without using a bunch of sorting syntax within the plotting code:

vctooth <- ToothGrowth[1:30, c(3,1)]
ojtooth <- ToothGrowth[31:60, c(3,1)]

We may want to look at the summary data to get an idea of what we’re comparing:

# spread of length data, all delivery methods
summary(ToothGrowth$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   13.08   19.25   18.81   25.28   33.90
# length data for the VC subset
summary(vctooth$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   11.20   16.50   16.96   23.10   33.90
# length data for the OJ subset
summary(ojtooth$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.20   15.52   22.70   20.66   25.72   30.90

Then we can use the plyr package to create a nice table with means and standard deviations added:

library(plyr)
toothTable <- ddply(ToothGrowth, .(supp, dose), summarize, 
      mean_len=round(mean(len), 2), 
      sd_len=round(sd(len), 2))
toothTable
##   supp dose mean_len sd_len
## 1   OJ  0.5    13.23   4.46
## 2   OJ  1.0    22.70   3.91
## 3   OJ  2.0    26.06   2.66
## 4   VC  0.5     7.98   2.75
## 5   VC  1.0    16.77   2.52
## 6   VC  2.0    26.14   4.80

Here are histograms of length for the two delivery methods:

hist(vctooth$len,
     breaks="Sturges",
     border="yellow",
     font=2,
     lwd=5,
     cex=1.5,
     ylim=c(0,10),
     main = 'Length of Tooth Growth \n Delivery Method: Ascorbic Acid', 
     xlab="Length of Growth",
     ylab="Frequency", 
     col="red",
     col.main="darkred",
     col.lab="darkred",
     col.axis="darkred")

hist(ojtooth$len,
     breaks="Sturges",
     border="red",
     font=2,
     lwd=5,
     cex=1.5,
     ylim=c(0,10),
     main = 'Length of Tooth Growth \n Delivery Method: Orange Juice', 
     xlab="Length of Growth",
     ylab="Frequency", 
     col="orange",
     col.main="darkorange",
     col.lab="darkorange")

And here are the breakdowns of length by dose for each delivery method:

plot(vctooth,
     main = 'Length of Growth by Dosage \n Delivery Method: Ascorbic Acid', 
     xlab="Dosage Level",
     ylab="Length",
     lwd=4,
     pch=16,
     font=2,
     col="red",
     col.main="darkred",
     col.lab="darkred",
     col.axis="darkred")

plot(ojtooth, 
     main = 'Length of Growth by Dosage \n Delivery Method: Orange Juice', 
     xlab="Dosage Level",
     ylab="Length",
     lwd=5,
     pch=16,
     font=2,
     col="orange",
     col.main="darkorange",
     col.lab="darkorange",
     col.axis="darkorange")

And here are box plots for length by dosage level regardless of delivery method:

plot(len ~ as.factor(dose),
     col=c("bisque1","lightgreen","tan"),
     xlab="Dosage Level",
     border="steelblue3",
     lwd=0.8,
     ylab="Length",
     col.lab="steelblue",
     col.axis="steelblue",
     col.main="steelblue",
     main="Length of Growth by Dosage \n Combined Delivery Methods",
     data=ToothGrowth)

This is a great data set to use to learn basic categorical analysis. You can run the Student’s T test on the delivery subsets, a multiple comparison on the dosage subsets, or a Chi-Square procedure on the subsets represented in the table created with the plyr method.