Introduction

The ToothGrowth data set in the datasets package in R contains data on how Vitamin C affects tooth growth in guinea pigs. There are three columns in the data set: tooth length, the delivery method of the Vitamin C (either orange juice or ascorbic acid), and the dose of Vitaminc C in milligrams. Let’s load the data and see what it looks like.

library(datasets)
library(ggplot2)
data(ToothGrowth)
ggplot(ToothGrowth, aes(x = dose, y = len, color = supp)) + 
        geom_point(alpha = .7, size = 5) +
        ylab("Tooth length") + xlab("Vitamin C dose (mg)") +
        ggtitle("Vitamin C and Tooth Length in Guinea Pigs") + 
        scale_colour_discrete(name="Delivery Method",
                         breaks=c("OJ", "VC"),
                         labels=c("orange juice", "ascorbic acid")) +
        theme(legend.justification=c(1,0), legend.position=c(1,0))

From this graph, it looks like higher doses of Vitamin C may result in longer teeth in guinea pigs, but let’s use more rigorous analysis to examine this further.

Dosage Differences

First I take the original data frame and make three vectors, each containing the tooth length measurements for each dose. Each of these vectors contains 20 measurements at that dose.

tlensmall <- ToothGrowth[ToothGrowth$dose == 0.5, 'len']
tlenmed <- ToothGrowth[ToothGrowth$dose == 1.0, 'len']
tlenlarge <- ToothGrowth[ToothGrowth$dose == 2.0, 'len']

Now I can do a two-sample t-test with any combination of these three vectors. Let’s start by comparing the medium dose to the small dose.

t.test(tlenmed, tlensmall, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  tlenmed and tlensmall
## t = 6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.276219 11.983781
## sample estimates:
## mean of x mean of y 
##    19.735    10.605

The results of the t-test show the means of the two samples being compared and tell us how significant this difference is. Notice the 95% confidence interval on the difference between the means. The 95% confidence interval stretches from 6.2762187 to 11.9837813, indicating that the difference between the two samples is significant at the 95% confidence level. The guinea pigs’ teeth were longer at the medium dose than the small dose at the 95% confidence level. Also notice the small p-value; the distribution of tooth lengths is significantly different at the two different doses.

We see similar results when comparing the high dose of Vitamin C to the medium dose.

t.test(tlenlarge, tlenmed, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  tlenlarge and tlenmed
## t = 4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  3.733519 8.996481
## sample estimates:
## mean of x mean of y 
##    26.100    19.735

The difference between the high dose and the medium dose is not as large as the difference between the medium and the small dose, but it is still significant at the 95% confidence level. Then of course, the t-test comparing the sample at the largest dose to the smallest dose also shows a significant difference, in fact a more significant difference than the other two two-sample t-tests. I omit that here for brevity.

Delivery Method Differences

Now let’s examine how the delivery method affected the guinea pigs’ tooth length. I take the original data frame and make two vectors, each containin the tooth length measurements for each delivery method, orange juice and ascorbic acid. Each of these vectors contains 30 measurements.

tlenOJ <- ToothGrowth[ToothGrowth$supp == 'OJ', 'len']
tlenVC <- ToothGrowth[ToothGrowth$supp == 'VC', 'len']

Now I can do a two-sample t-test comparing these two methods of Vitamin C delivery.

t.test(tlenOJ, tlenVC, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  tlenOJ and tlenVC
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

These two methods of delivery are not different from each other at the 95% confidence level. Notice that the 95% confidence level includes zero, i.e., the mean of the orange juice group and the mean of the ascorbic acid group are not different from each other at the 95% confidence level. Also notice that the p-value is larger than 0.05, meaning that seeing an effect such as this can be attributed to random chance at normal benchmarks of probability.

Conclusions

This data set indicates that higher doses of Vitamin C result in longer tooth lengths in guinea pigs, but that the method of delivery of that Vitamin C does not have a measurable effect. This analysis did not depend on the samples being normally distributed or the variances being equal across the different doses or delivery methods. However, it does depend on the assumption that the underlying data the samples were drawn from are independent and identically distributed Gaussian; specifically, a t-test does not work well if the underlying distribution that the data are drawn from is significantly skewed.