Overview

This is a report on an analysis of the ToothGrowth data in the R datasets package. This analysis explores the research question of whether the data shows differences in tooth growth depending on supplements administered and the supplement dosage. It will be shown that there is a significant difference in growth between supplements at some, but not all, dosages. The appendix contains supporting R code and information.

Exploratory Data Analysis

The ToothGrowth data consists of 60 observations, 30 each for two supplements (“OC” meaning Orange Juice, and “VC” meaning Vitamin C) at three dosage levels (0.5, 1, and 2 with 10 observations each). There are no NA values in the data. See Appendix for details.

The mean and standard deviation for each supplement and dosage combination is shown below.

## Source: local data frame [6 x 5]
## Groups: dose [?]
## 
##    dose   supp     n  mean     sd
##   (dbl) (fctr) (int) (dbl)  (dbl)
## 1   0.5     OJ    10 13.23 4.4597
## 2   0.5     VC    10  7.98 2.7466
## 3   1.0     OJ    10 22.70 3.9110
## 4   1.0     VC    10 16.77 2.5153
## 5   2.0     OJ    10 26.06 2.6551
## 6   2.0     VC    10 26.14 4.7977

Hypothesis Testing

The research question is whether orange juice or vitamin C is a better supplement for increasing tooth growth and at what dosage levels. T-tests were run at each dosage level to see if there is a statistically significant difference between the means of the observed tooth measurements for each supplement. The null hypothesis is that the difference in means of the supplement types is zero for all dosage levels at a 95% confidence level. The alternative hypothesis is that the means are different. The t-test results are below.

## 
##  Two Sample t-test for dosage=0.5
## 
## data:  len by supp
## t = 3.17, df = 18, p-value = 0.0053
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.7703 8.7297
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
## 
##  Two Sample t-test for dosage=1
## 
## data:  len by supp
## t = 4.03, df = 18, p-value = 0.00078
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.8407 9.0193
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
## 
##  Two Sample t-test for dosage=2
## 
## data:  len by supp
## t = -0.0461, df = 18, p-value = 0.96
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.723  3.563
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

It is clear from the results that the null hypothesis can be rejected for dose levels 0.5 and 1 given t values of 3.16973 and 4.03277, respectively, and p values 0.0053 and 0.00078, respectively. The 95% confidence intervals are well above zero as well. Only for dose level 2 does the test not support rejecting the null hypothesis with a near-zero t value of -0.04614 and an enormous p value of 0.96371. Further evidence is shown by the 95% confidence interval pretty squarely centered around zero.

A permutation test was conducted to further test the strength of the evidence for rejecting the null hypothesis in the best case scenario: dosage level 1. For this test, 10,000 random permutations of the supplement assignments were applied to the data and the difference of the means calculated for each. A plot of the results is below.

Clearly, it is very unlikely that the observed mean difference of 5.93 is the result of random chance (p value = 0.0006) since nearly all of the permutation results are far from the real mean difference.

Conclusion

This analysis has shown that the observed test results show that orange juice is a better supplement than vitamin C for tooth growth at dosages below level 2. At dose level 2, the results of the two supplements are indistinguishable (see Appendix for further evidence).

A further look at the data (see chart in Appendix) shows a potential area of future research. There is a potential upward-trending linear relationship between tooth growth and the vitamin C supplement. Dosages above 2 might show continuing increases in tooth growth. Orange juice as a supplement, however, appears to show possible exponential tapering of its affect beyond level 2 dosage. It could be that level 2 is a cross-over point after which increasing vitamin C doses continue to show increasing effects on tooth growth while increasing orange juice doses show less or none.

Appendix

Summary of Raw Data

The following R code provides summary information for the ToothGrowth data.

# *** Structure of the ToothGrowth data ***
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# *** Summary of the data ***
summary(ToothGrowth)
##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00
# *** First and last several observations ***
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
tail(ToothGrowth)
##     len supp dose
## 55 24.8   OJ    2
## 56 30.9   OJ    2
## 57 26.4   OJ    2
## 58 27.3   OJ    2
## 59 29.4   OJ    2
## 60 23.0   OJ    2
# *** Look for NA entries in the data ***
colSums(is.na(ToothGrowth))
##  len supp dose 
##    0    0    0

The following is a chart of the raw data.

Permutation Testing of Dose Level 2

The same permutation test as was done for dosage at level 1 was duplicate for the worst-case scenario: dosage level 2. The code and results are below.

Not rejecting the null hypothesis in this case is the right call since the distribution of the mean differences for the permutations is very well centered around the observed difference -0.08 (p value = 0.5117). In other words, regardless how the observations are labeled with regard to supplement used, the results are statistically the same.