Overview

We analyze the ToothGrowth data that is included in the R datasets package. ToothGrowth contains a set of data pertaining to the effect of Vitamin C on tooth growth in guinea pigs. A calculation of p-values using Gossett’s T test analysis shows that while the two different Vitamin C delivery techniques appear to yield distinct results at the two lower dosage levels, it is not possible to determine which technique, if any, has better efficacy at the highest dose studied.

Exploratory Analysis

According to the R documentation, the ToothGrowth data set consists of 60 observations of the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as “OJ”) or ascorbic acid (a form of vitamin C and coded as “VC”). The dataframe is organized into the length values (len), supplement type (supp), and dosage of the supplement (dose). Here we load in the data and make an exploratory plot that shows how the raw values scatter and overlap for the VC and OJ populations:

library(datasets)
data(ToothGrowth)
plot(ToothGrowth$dose, ToothGrowth$len, col=ToothGrowth$supp, xlab='dose (mg/day)', 
     ylab='length')
legend('bottomright', c("OJ", "VC"), pch=1, col=c('black', 'red'), bty='o')

We calculate the mean values for the two populations to see if we can identify any trends:

tgVC <- ToothGrowth[which(ToothGrowth$supp == "VC"),]
mean_len_VC<-aggregate(tgVC$len, by=list(dose = tgVC$dose), FUN=mean )
head(mean_len_VC)
tgOJ <- ToothGrowth[which(ToothGrowth$supp == "OJ"),]
mean_len_OJ<-aggregate(tgOJ$len, by=list(dose = tgOJ$dose), FUN=mean )
head(mean_len_OJ)

We rename the “x” column that results from the aggregate() call to “len,” to reflect the column name in the original ToothGrowth data frame, and plot the aggregated values so that we can get a cleaner picture:

names(mean_len_VC)[names(mean_len_VC) == 'x'] <- 'len'
names(mean_len_OJ)[names(mean_len_OJ) == 'x'] <- 'len'
plot(mean_len_VC$dose, mean_len_VC$len, col=2, xlab='dose (mg/day)', ylab='length')
points(mean_len_OJ$dose, mean_len_OJ$len, col=1)
legend('bottomright', c("OJ", "VC"), pch=1, col=c('black', 'red'), bty='o')

The averages alone indicate that the tooth growth is greater for the OJ supplement than for the VC at the two lower dose levels, but the average tooth growth is nearly identical at the 2 mg/day dosage. So, we need to understand more precisely what we are seeing, because we know there was plenty of scatter in the raw data.

T Test Analysis

In order to find the answer to this question, we need to conduct a Gossett’s T test analysis, because our sample sizes at 10 guinea pigs per group are too small for a Z test.

In order to tackle the T tests, we need to look at the variances for the respective populations, first for the VC group, and then the OJ group:

var_len_VC<-aggregate(tgVC$len, by=list(dose = tgVC$dose), FUN=var )
head(var_len_VC)
var_len_OJ<-aggregate(tgOJ$len, by=list(dose = tgOJ$dose), FUN=var )
head(var_len_OJ)

We see that the variances are different for the two populations, with the VC data being much more spread out at the 2.0 mg dose, and the OJ data having greater variances at the two lower doses. Thus, we first run the T test analysis at each dose level for independent data sets with unequal variances at dose = 0.5 mg/day:

tgVC05 <- tgVC[which(tgVC$dose == 0.5),]
tgOJ05 <- tgOJ[which(tgOJ$dose == 0.5),]
t.test(tgVC05$len, tgOJ05$len, paired = FALSE, var.equal = FALSE)$p.value
## [1] 0.006358607

We get a p-value = 0.006359, which reflects the probability under the null hypothesis (that the difference in means is zero) of obtaining evidence as or more extreme than this test in the direction of the alternative hypothesis. Thus, the small p-value shows that for a dosage of 0.5 mg/day, the data decisively show that the growth of the teeth is greater for Vitamin C delivered via orange juice (OJ) than by ascorbic acid.

Likewise, we obtain an even smaller p-value for a dosage of 1 mg/day, also showing decisively that orange juice leads to greater tooth growth, based on the data:

tgVC1 <- tgVC[which(tgVC$dose == 1),]
tgOJ1 <- tgOJ[which(tgOJ$dose == 1),]
t.test(tgVC1$len, tgOJ1$len, paired = FALSE, var.equal = FALSE)$p.value
## [1] 0.001038376

However, for the dosage at 2 mg/day, we obtain a p-value = 0.9638516, which indicates that there is plenty of room for better evidence to reject the null hypothesis that the means of the two data sets are equal:

tgVC2 <- tgVC[which(tgVC$dose == 2),]
tgOJ2 <- tgOJ[which(tgOJ$dose == 2),]
t.test(tgVC2$len, tgOJ2$len, paired = FALSE, var.equal = FALSE)$p.value
## [1] 0.9638516

In other words, because of such a high p-value, we fail to reject the null hypothesis that the means of the two sets of data at the 2 mg/day dose are equal. Based on the data we have, we cannot say that orange juice is any more effective than ascorbic acid at 2 mg/day.