1 Introduction

The objective of this treatise is to report the exploratory and statistical analysis of the ToothGrowth data in the R datasets package.

For this, we have:

  1. Loaded the ToothGrowth data and provided a basic summary of the data

  2. Performed some Exploratory Data Analyses,

  3. Performed some Statistical Analyses based on t-hypothesis tests and t- confidence intervals to compare tooth growth by supplement and dose, and

  4. Stated our conclusions and the assumptions needed for the conclusions.

The rest of the report is organized accordingly.

2 Loading and looking at the data

We load the data in R and look at some of the observations and also the summary of the data:

data(ToothGrowth)
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
head(ToothGrowth, 3);tail(ToothGrowth,3)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
##     len supp dose
## 58 27.3   OJ    2
## 59 29.4   OJ    2
## 60 23.0   OJ    2

Along with the R help (?ToothGrowth), we understand that:

3 Exploratory Data Analysis

We choose the Violin plot to explore the data graphically. The shape of the ‘violins’ represent the distribution of data (teeth length) for each of the dosages.

library(ggplot2)
mylabel=function(variable, value) {
  names <- list("OJ"="Supplement type = Orange Juice",
                "VC"="Supplement type = Vitamin C")
  return(names[value])
}
ggplot(ToothGrowth, aes(x=factor(dose),y=len, fill=supp)) +
  facet_grid(. ~ supp, labeller=mylabel) +
  geom_violin(size=1,trim=FALSE) +
  labs(x="Dose in milligrams") + labs(y="Tooth length") +
  labs(title="Effect of Vitamin C on Tooth Growth in Guinea Pigs") +
  theme(legend.position="none")

From the plot, it looks like:

  1. Tooth grows faster with increase in the dosage of Vitamin C.

  2. There seems to be a positive effect of administering through orange juice at lower dosages.

Let us see what the data looks like when grouped by dosage and method of delivery.

library(dplyr)
summarise(group_by(ToothGrowth, supp, dose), mean(len), sd(len))
## Source: local data frame [6 x 4]
## Groups: supp
## 
##   supp dose mean(len)  sd(len)
## 1   OJ  0.5     13.23 4.459709
## 2   OJ  1.0     22.70 3.910953
## 3   OJ  2.0     26.06 2.655058
## 4   VC  0.5      7.98 2.746634
## 5   VC  1.0     16.77 2.515309
## 6   VC  2.0     26.14 4.797731

This shows that tooth length increases with dosage for both OJ and VC. Also mean tooth length is higher for OJ for the same dose, except for the dosage of 2 mg, in which case the means are comparable.

4 Statistical Analysis (Hypothesis testing)

We now use t-interval testing to test the hypotheses which we guessed in the earlier section by looking at the data:

  1. Increase in Vitamin C dosage causes faster tooth growth.

  2. Administering Vitamin C through orange juice causes faster tooth growth than direct Vitamin C administration.

4.1 Testing the effect of dosage

Our Null Hypothesis is: There is no effect of dosage of Vitamin C on tooth growth. To test this independent of the effect of the delivery medium, we test it twice, once when administered directly, and once when given through Orange juice.

4.1.1 For administration through Vitamin C

We subset out the data for the three different dosages for supplement “VC”. We then perform the t-test with the argumet paired = FALSE as we cannot reasonably assume that the same set of guinea pigs were given the different dosages and their tooth growth measured.

g_05_vc <- ToothGrowth$len[ToothGrowth$dose==0.5 & ToothGrowth$supp=="VC"]
g_10_vc <- ToothGrowth$len[ToothGrowth$dose==1 & ToothGrowth$supp=="VC"]
g_20_vc <- ToothGrowth$len[ToothGrowth$dose==2 & ToothGrowth$supp=="VC"]
t.test(g_10_vc,g_05_vc,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_10_vc and g_05_vc
## t = 7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.314288 11.265712
## sample estimates:
## mean of x mean of y 
##     16.77      7.98
t.test(g_20_vc,g_10_vc,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_20_vc and g_10_vc
## t = 5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   5.685733 13.054267
## sample estimates:
## mean of x mean of y 
##     26.14     16.77

The p-value is very small (less than 0.1%), t value is high, and the 95% confidence interval does not include zero. Hence we can reject the NULL hypothesis. Also the fact that the ‘mean of x’ is greater than ‘mean of y’ proves that higer dosage gives higher tooth growth.

4.1.2 For administration through Orange Juice

g_05_oj <- ToothGrowth$len[ToothGrowth$dose==0.5 & ToothGrowth$supp=="OJ"]
g_10_oj <- ToothGrowth$len[ToothGrowth$dose==1 & ToothGrowth$supp=="OJ"]
g_20_oj <- ToothGrowth$len[ToothGrowth$dose==2 & ToothGrowth$supp=="OJ"]
t.test(g_10_oj,g_05_oj,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_10_oj and g_05_oj
## t = 5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   5.524366 13.415634
## sample estimates:
## mean of x mean of y 
##     22.70     13.23
t.test(g_20_oj,g_10_oj,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_20_oj and g_10_oj
## t = 2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1885575 6.5314425
## sample estimates:
## mean of x mean of y 
##     26.06     22.70

Here the conclusion is similar, rejecting the Null hypothesis in favour of the hypothesis than with orange juice, tooth growth is higher with higher Vitamin C dosage. Just one point to note that for higher dosage (2 mg over 1 mg), the p value is somewhat higher than the other tests at about 4% (though it is less than 5% required to suppport the Null hypothesis).

4.2 Testing the effect of supplement type

Our Null Hypothesis is: There is no effect of administering Vitamin C on tooth growth directly versus giving it through orange juice. To test this independent of the effect of the dosage, we test it thrice, once for each of the different dosages.

4.2.1 For dosages of 0.5 mg and 1 mg.

t.test(g_05_oj,g_05_vc,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_05_oj and g_05_vc
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean of x mean of y 
##     13.23      7.98
t.test(g_10_oj,g_10_vc,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_10_oj and g_10_vc
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

For these cases the t values are 3 and 4 respectively, and the p values are quite small, less than 0.5%. So we can reject the Null Hypothesis, and since the mean of OJ is more than that of VC, we conclude the administering Vitamin C through orange juice has a more beneficial effect on tooth growth than direct administration.

4.2.2 For dosage of 2 mg.

t.test(g_20_oj,g_20_vc,paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  g_20_oj and g_20_vc
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Here the t value is quite small and the p-value is more than 96%. Also the 95% confidence interval includes zero. The Null Hypothesis is therefore true and we cannot find any significant difference in tooth growth on two the mode of administrations.

5 Conclusions

  1. For both supplement types, there is a significant difference in the tooth lengths with different dosages. Tooth grows faster as dosage increases.

  2. For dosages of 0.5 mg and 1 mg, administering Vitamin C through orange juice provides significantly longer teeth than if administered directly.

  3. For a dosage of 2 mg, there is no significant difference in tooth lengths on the supplement type.

These conclusions can be seen at one glance through this graphic produced by the example code in ?ToothGrowth:

coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
       xlab = "ToothGrowth data: length vs dose, given type of supplement")

5.0.1 Assumptions

In the analyses above, the following assumptions were used:

  1. No two guinea pigs were used for more than one measurement. That is, each of them were given exactly one dosage of Vitamin C through exactly one means of supplement. Because of these, when we do the t-test, we use paired = FALSE.

  2. The variances between the separate populations tested are different (used var.equal = FALSE for all the t tests).