The ToothGrowth data set in R shows tooth growth in guinea pigs after receiving vitamin C at various doses with either orange juice (“OJ”) or ascorbic acid tablet (“VC”) as the delivery method.
You can graph this data quickly using the method suggested in the data set’s R help page. That graph is a coplot, as seen here:
coplot(len ~ dose | supp, data=ToothGrowth,
panel=panel.smooth, xlab="len vs. dose, given type of supp")
I’ll create some other graphs below which break that information out so that individual comparisons can be highlighted. To get started I will separate the data set by delivery method. This lets me make histograms without using a bunch of sorting syntax within the plotting code:
vctooth <- ToothGrowth[1:30, c(3,1)]
ojtooth <- ToothGrowth[31:60, c(3,1)]
We may want to look at the summary data to get an idea of what we’re comparing:
# spread of length data, all delivery methods
summary(ToothGrowth$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 13.08 19.25 18.81 25.28 33.90
# length data for the VC subset
summary(vctooth$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 11.20 16.50 16.96 23.10 33.90
# length data for the OJ subset
summary(ojtooth$len)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 15.52 22.70 20.66 25.72 30.90
Then we can use the plyr package to create a nice table with means and standard deviations added:
library(plyr)
toothTable <- ddply(ToothGrowth, .(supp, dose), summarize,
mean_len=round(mean(len), 2),
sd_len=round(sd(len), 2))
toothTable
## supp dose mean_len sd_len
## 1 OJ 0.5 13.23 4.46
## 2 OJ 1.0 22.70 3.91
## 3 OJ 2.0 26.06 2.66
## 4 VC 0.5 7.98 2.75
## 5 VC 1.0 16.77 2.52
## 6 VC 2.0 26.14 4.80
Here are histograms of length for the two delivery methods:
hist(vctooth$len,
breaks="Sturges",
border="yellow",
font=2,
lwd=5,
cex=1.5,
ylim=c(0,10),
main = 'Length of Tooth Growth \n Delivery Method: Ascorbic Acid',
xlab="Length of Growth",
ylab="Frequency",
col="red",
col.main="darkred",
col.lab="darkred",
col.axis="darkred")
hist(ojtooth$len,
breaks="Sturges",
border="red",
font=2,
lwd=5,
cex=1.5,
ylim=c(0,10),
main = 'Length of Tooth Growth \n Delivery Method: Orange Juice',
xlab="Length of Growth",
ylab="Frequency",
col="orange",
col.main="darkorange",
col.lab="darkorange")
And here are the breakdowns of length by dose for each delivery method:
plot(vctooth,
main = 'Length of Growth by Dosage \n Delivery Method: Ascorbic Acid',
xlab="Dosage Level",
ylab="Length",
lwd=4,
pch=16,
font=2,
col="red",
col.main="darkred",
col.lab="darkred",
col.axis="darkred")
plot(ojtooth,
main = 'Length of Growth by Dosage \n Delivery Method: Orange Juice',
xlab="Dosage Level",
ylab="Length",
lwd=5,
pch=16,
font=2,
col="orange",
col.main="darkorange",
col.lab="darkorange",
col.axis="darkorange")
And here are box plots for length by dosage level regardless of delivery method:
plot(len ~ as.factor(dose),
col=c("bisque1","lightgreen","tan"),
xlab="Dosage Level",
border="steelblue3",
lwd=0.8,
ylab="Length",
col.lab="steelblue",
col.axis="steelblue",
col.main="steelblue",
main="Length of Growth by Dosage \n Combined Delivery Methods",
data=ToothGrowth)
This is a great data set to use to learn basic categorical analysis. You can run the Student’s T test on the delivery subsets, a multiple comparison on the dosage subsets, or a Chi-Square procedure on the subsets represented in the table created with the plyr method.