Analysis of Tooth Growth Data

Overview

We analyze the ToothGrowth data in the R datasets package, comparing the tooth length for different combinations of supplement type (“OJ” or “VC”) and dose amount (0.5, 1.0 or 2.0 mg) using a Student’s t test.

Exploratory Data Analysis

We load the ToothGrowth data and perform some basic exploratory data analysis.

library(dplyr)
library(lattice)
library(stats)

attach(ToothGrowth)

Plot histograms of the tooth length for each type of supplement (“OJ” or “VC”):

histogram(~len|factor(supp),data=ToothGrowth,
          main="Figure 1: Length by Supplement",xlab="Length")

Plot histograms of the tooth length for each dose amount (0.5, 1.0 or 2.0 mg):

histogram(~len|factor(dose),data=ToothGrowth,
          main="Figure 2: Length by Dose",xlab="Length")

Plot histograms of the tooth length for each combination of supplement type and dose amount:

histogram(~len|factor(dose)+factor(supp),data=ToothGrowth,
          main="Figure 3: Length by Supplement and Dose",xlab="Length")

There are clear differences between the distributions shown in this section. We summarise the data for these distributions, most importantly the sample mean and variance, in the next section.

Summary of the ToothGrowth Data

In this section, we summarise the contents of the ToothGrowth data set.

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

First, we list the number of observations, sample mean, sample variance, sample standard deviation, and standard deviation of the sample mean for all the tooth lengths listed in the ToothGrowth data, for all supplement types and dose amounts:

alldata <- summarise(ToothGrowth, n_sample = length(len), mean_sample = mean(len), var_sample = var(len), std_sample = sqrt(var(len)), std_mean_sample = std_sample/sqrt(n_sample))
print(alldata)

##   n_sample mean_sample var_sample std_sample std_mean_sample
## 1       60    18.81333   58.51202   7.649315       0.9875223

We list the same data for all the lengths listed in the ToothGrowth data, grouped by supplement type, corresponding to Figure 1:

grouped_data_supp <- summarise(group_by(ToothGrowth, supp), n_sample = length(len), mean_sample = mean(len), var_sample = var(len), std_sample = sqrt(var(len)), std_mean_sample = std_sample/sqrt(n_sample))
print(grouped_data_supp)

## Source: local data frame [2 x 6]
## 
##   supp n_sample mean_sample var_sample std_sample std_mean_sample
## 1   OJ       30    20.66333   43.63344   6.605561        1.206005
## 2   VC       30    16.96333   68.32723   8.266029        1.509163

We list the same data for all the lengths listed in the ToothGrowth data, grouped by dose amount, corresponding to Figure 2:

grouped_data_dose <- summarise(group_by(ToothGrowth, dose), n_sample = length(len), mean_sample = mean(len), var_sample = var(len), std_sample = sqrt(var(len)), std_mean_sample = std_sample/sqrt(n_sample))
print(grouped_data_dose)

## Source: local data frame [3 x 6]
## 
##   dose n_sample mean_sample var_sample std_sample std_mean_sample
## 1  0.5       20      10.605   20.24787   4.499763       1.0061776
## 2  1.0       20      19.735   19.49608   4.415436       0.9873216
## 3  2.0       20      26.100   14.24421   3.774150       0.8439257

We list the same data for all the lengths listed in the ToothGrowth data, grouped by each combination of supplement type and dose amount, corresponding to Figure 3:

grouped_data_suppdose <- summarise(group_by(ToothGrowth, supp, dose), n_sample = length(len), mean_sample = mean(len), var_sample = var(len), std_sample = sqrt(var(len)), std_mean_sample = std_sample/sqrt(n_sample))
print(grouped_data_suppdose)

## Source: local data frame [6 x 7]
## Groups: supp
## 
##   supp dose n_sample mean_sample var_sample std_sample std_mean_sample
## 1   OJ  0.5       10       13.23  19.889000   4.459709       1.4102837
## 2   OJ  1.0       10       22.70  15.295556   3.910953       1.2367520
## 3   OJ  2.0       10       26.06   7.049333   2.655058       0.8396031
## 4   VC  0.5       10        7.98   7.544000   2.746634       0.8685620
## 5   VC  1.0       10       16.77   6.326778   2.515309       0.7954104
## 6   VC  2.0       10       26.14  23.018222   4.797731       1.5171757

Comparison of Tooth Lengths by Supplement and Dose

We now use confidence intervals and the Student’s t test to compare the tooth length for different combinations of supplement type and dose amount. We use the t.test() function in R to do this. We have a small sample of at most 60 observations, so we assume that the underlying data are normally distributed and that we can apply the Student’s t test to compare the results. When comparing two groups of different supplement types or dose amounts, we assume that the two groups are independent so that we cannot use a paired t test, and that the variances are unequal between the two groups.

ttestdf <- data.frame()

compare <- function(cond1, cond2){
        x1 <- ToothGrowth[which(cond1), "len"]
        x2 <- ToothGrowth[which(cond2), "len"]
        t.test(x1, x2) # conf.level = 0.95, paired = FALSE, var.equals = FALSE by default
        }

addtodataframe <- function(ttestdf, ttestresults){
        htest <- ifelse(ttestresults$conf.int[1] < 0 & ttestresults$conf.int[2] > 0, "TRUE", "FALSE")
       
        rbind(ttestdf, with(ttestresults, cbind(mean1 = round(estimate[1],2), mean2 = round(estimate[2],2), t = round(statistic, 2), df = round(parameter,2), p_value = signif(p.value,2), CI_lower = round(conf.int[1],2), CI_upper = round(conf.int[2],2), h0 = htest)))
        }

ttestresults <- compare(dose == 1.0, dose == 0.5)
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(dose == 1.0 & supp == "OJ", dose == 0.5 & supp == "OJ")
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(dose == 1.0 & supp == "VC", dose == 0.5 & supp == "VC")
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(dose == 2.0, dose == 1.0)
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(dose == 2.0 & supp == "OJ", dose == 1.0 & supp == "OJ")
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(dose == 2.0 & supp == "VC", dose == 1.0 & supp == "VC")
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(supp == "OJ", supp == "VC")
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(supp == "OJ" & dose == 0.5, supp == "VC" & dose == 0.5)
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(supp == "OJ" & dose == 1.0, supp == "VC" & dose == 1.0)
ttestdf <- addtodataframe(ttestdf, ttestresults)
ttestresults <- compare(supp == "OJ" & dose == 2.0, supp == "VC" & dose == 2.0)
ttestdf <- addtodataframe(ttestdf, ttestresults)



row.names(ttestdf) <- c(1:nrow(ttestdf))

group <- c("all", "OJ", "VC", "all", "OJ", "VC", "all", "0.5", "1.0", "2.0")
ttestdf <- cbind(group, ttestdf)
pop2 <- c("0.5", "0.5", "0.5", "1.0", "1.0", "1.0", "VC", "VC", "VC", "VC")
ttestdf <- cbind(pop2, ttestdf)
pop1 <- c("1.0", "1.0", "1.0", "2.0", "2.0", "2.0", "OJ", "OJ", "OJ", "OJ")
ttestdf <- cbind(pop1, ttestdf)

print(ttestdf)

##    pop1 pop2 group mean1 mean2     t    df p_value CI_lower CI_upper    h0
## 1   1.0  0.5   all 19.73 10.61  6.48 37.99 1.3e-07     6.28    11.98 FALSE
## 2   1.0  0.5    OJ  22.7 13.23  5.05  17.7 8.8e-05     5.52    13.42 FALSE
## 3   1.0  0.5    VC 16.77  7.98  7.46 17.86 6.8e-07     6.31    11.27 FALSE
## 4   2.0  1.0   all  26.1 19.73   4.9  37.1 1.9e-05     3.73        9 FALSE
## 5   2.0  1.0    OJ 26.06  22.7  2.25 15.84   0.039     0.19     6.53 FALSE
## 6   2.0  1.0    VC 26.14 16.77  5.47  13.6 9.2e-05     5.69    13.05 FALSE
## 7    OJ   VC   all 20.66 16.96  1.92 55.31   0.061    -0.17     7.57  TRUE
## 8    OJ   VC   0.5 13.23  7.98  3.17 14.97  0.0064     1.72     8.78 FALSE
## 9    OJ   VC   1.0  22.7 16.77  4.03 15.36   0.001      2.8     9.06 FALSE
## 10   OJ   VC   2.0 26.06 26.14 -0.05 14.04    0.96     -3.8     3.64  TRUE

Conclusions

We have a small sample of at most 60 observations, so we assumed that the underlying data are normally distributed and that we can apply the Student’s t test to compare the results. When comparing two groups of different supplement types or dose amounts, we assumed that the two groups are independent so that we cannot use a paired t test, and that the variances are unequal between the two groups.

Comparing the tooth length using the two different supplement types, OJ and VC, for all possible dose amounts, reveals that the tooth lengths using these two supplements are consistent at 95% confidence level. Since the 95% confidence interval for their difference [-0.17, 7.57] contains the value 0, the hypothesis that these tooth length measurements are consistent is TRUE.

If we compare the tooth length using the OJ and VC supplements and for a dose of 2.0 mg, we see these two groups are also consistent at 95% confidence level. Since the 95% confidence interval for their difference [-3.8, 3.64] contains the value 0, the hypothesis that these tooth length measurements are consistent is TRUE.

Therefore, the effects of the OJ and VC supplements are the same at 95% confidence level, particularly for a dose of 2.0 mg.