16 April 2018
Using R’s ToothGrowth dataset,1 we examine tooth growth in Guinea pigs in relation to vitamin C dose level (variable dose
) and method of delivery (variable supp
). The response variable, len
, is the length of odontoblasts (cells responsible for tooth growth) measured in 60 guinea pigs. (The R help for this dataset has no documentation on the units of measurement of the len
variable, and we did not have access to the original paper.2 Thus for this exercise we will assume that the length units of odontoblast cells are microns.3) Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 milligrams/day) by one of two delivery methods: orange juice (coded as OJ
) or ascorbic acid (a form of vitamin C, coded as VC
).
The following R code loads the data and necessary packages.
library(knitr)
library(dplyr)
data("ToothGrowth")
attach(ToothGrowth)
Table 1 (all tables and figures are in the Appendix) shows the number of animals across dosage levels and delivery methods. The experiment has a balanced design, with an equal number of observations (n
= 10) for all possible combinations of factor levels. An advantage of balanced designs is that the variances within factor levels are more likely to be homogeneous, a common assumption of significance tests.4
This can be seen in Table 2, which presents a basic summary of the data. The following R code computes the Bartlett Test of homogeneity of variances for the dependent variable len
.5
bart <- bartlett.test(len ~ interaction(supp, dose), data = ToothGrowth)
The p-value of Bartlett’s Test is 0.226, indicating that variances do not differ among the six combinations of dosage and delivery method.
We used box plots to simultaneously conduct exploratory and confirmatory analysis6 of the data. Box plots are a graphical method to display the central tendency (via the median), spread of distribution (via the interquartile range), and outliers of a batch of data. An extension of the box plot is the notched box plot, in which a notch around the median indicates an approximate 95% confidence interval. For multiple box plots, non-overlap of adjacent notches indicates significant difference of the medians at approximately a 95% level.7
Figure 1 shows the median odontoblast length (tooth growth) as a heavy horizontal line at the center of each box plot. The spread of each box plot around the median shows that the data, at each combination of dose and delivery method, are more or less symmetrical and not heavily skewed. The notches, being approximate 95% confidence intervals around the medians, are a statistical test of the effects of dose and delivery method on median tooth growth. Figure 1 also shows the effect of dosage level on tooth growth within delivery methods. Box plots for orange juice delivery (OJ) are colored orange, while those for ascorbic acid delivery (VC) are yellow. Both sets of colored box plots show a clear trend of increasing tooth growth with increasing dose of vitamin C for both delivery methods. The confidence intervals around the medians also show that tooth growth was signficantly greater for the orange juice delivery method for the 0.5 and 1 mg/day dosage levels, while there was no difference in tooth growth between delivery methods at the highest dosage level, 2mg/day.
As stated by Krzywinski and Altman, “Because they are based on statistics that do not require us to assume anything about the shape of the distribution, box plots robustly provide more information about samples than conventional error bars.”8 While our sample size within each dosage-delivery method combination is not particularly large (n
= 10, Table 1), a sample size of 10 is adequate for comparisons using box plots.9 In addition, the homogeneity of variances among factor levels lends additional confidence to our comparisons of medians.10 Finally, although box plots, as mentioned above, do not depend on the shape of the distribution, the fact that medians and means are essentially equal in each of the six groups (Table 2) is evidence that the distributions are Gaussian.11
The following R code produces Table 1, showing the number of animals across dosage levels and delivery methods.
kable(table(supp, dose), caption = "Table 1. Sample size by dosage level (milligrams/day)
and delivery method (OJ = orange juice, VC = ascorbic acid).")
0.5 | 1 | 2 | |
---|---|---|---|
OJ | 10 | 10 | 10 |
VC | 10 | 10 | 10 |
The following R code produces Table 2, a basic summary of the data.
grouped <- group_by(ToothGrowth, supp, dose)
kable(summarise(grouped, mean = mean(len), median = median(len), variance = var(len)),
caption = "Table 2. Summary statistics for Guinea pig tooth length (variable `len`, in microns)
by vitamin C delivery method and dosage level.", digits = 2,
col.names = c("Delivery method", "Dose", "Mean", "Median", "Variance"), align = "c")
Delivery method | Dose | Mean | Median | Variance |
---|---|---|---|---|
OJ | 0.5 | 13.23 | 12.25 | 19.89 |
OJ | 1.0 | 22.70 | 23.45 | 15.30 |
OJ | 2.0 | 26.06 | 25.95 | 7.05 |
VC | 0.5 | 7.98 | 7.15 | 7.54 |
VC | 1.0 | 16.77 | 16.50 | 6.33 |
VC | 2.0 | 26.14 | 25.95 | 23.02 |
The following R code produces Figure 1, a series of six notched box plots grouped by dosage level and delivery method.
boxplot(len ~ dose:supp, data = ToothGrowth, notch = TRUE,
boxwex = 0.5, col = c("orange", "yellow"),
main = "",
xlab = "Vitamin C dose in milligrams by delivery method (dose:delivery method)",
ylab = "Odontoblast length (microns)", sep = ":", lex.order = TRUE,
ylim = c(0, 35), yaxs = "i")
Figure 1. Effect of vitamin C dosage and delivery method on median tooth growth (odontoblast length) in Guinea pigs. Combinations of dose (0.5, 1, and 2 mg/day) and delivery method (orange juice, OJ; ascorbic acid, VC) are shown on the horizontal axis. Individual box plots are colored orange for OJ, yellow for VC. Each box plot represents 10 observations (see Table 1).
https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html.↩
Crampton, E. W. (1947). The growth of the odontoblast of the incisor teeth as a criterion of vitamin C intake of the guinea pig. The Journal of Nutrition 33(5): 491–504.↩
Tukey, J. W. 1980. We Need Both Exploratory and Confirmatory. The American Statistician 34:23–25.↩
McGill, R., J. W. Tukey, and W. A. Larsen. 1978. Variations of box plots. The American Statistician 32:12–16; Krzywinski, M., and N. Altman. 2014. Visualizing samples with box plots. Nature Methods 11:119.↩
Krzywinski, M., and N. Altman. 2014, page 120.↩
Krzywinski and Altman 2014, Figure 2 on page 119.↩
Chambers, J. M. 1983. Graphical methods for data analysis. Boston: Duxbury Press, 1983. Page 62.↩