Overview

In 1952 C. I. Bliss wrote a paper “The Statistics of Biosassay” in which he analyzed the length of odonotoblasts (part of tooth formation) in a population of 60 guinea pigs by giving three different doses of vitamin C by two different vectors to subgroups of the 60 over 42 days. Utimately the goal was to determine the best method of delivering vitamin C to soldiers who did not have much access to natural sources of the vitamin.

Overall Data Summary

After loading the dataset we find that it is 60 rows (guinea pigs) by 3 columns.

The 3 columns include the length of the odontoblasts, the supplement (vector) by which the vitamin C was administered, and the dosage of vitamin C that the guinea pig received.

In the data it is the case that the dosage is considered a numerical value rather than a discrete factor (level). Hence it is useful to change the value in the “dose” column to be a factor rather than a number.

Now we can perform a brief summary of the data.

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

We see two vectors (OJ=orange juice and VC=synthetic vitamin C) and three dosage levels measured in milligrams per day (.5, 1, 2). The odontoblast lengths are measured in microns (millionths of a meter) and range from 4.2 to 33.9. The table shows how many guinea pigs received each combination of vector and dosage. [Note that running the R command str(tg1) gives no extra useful data.]

There are 10 guinea pigs for each combination (\(2 \cdotp 3 \cdotp 10 = 60\)). Here’s a basic histogram of the length data with the normal cureve in red.

That’s an argument in favor of a t-test (which is robust to departures from normality).

Perhaps we could learn something by grouping the data by Supp and Dose.

Here are box plots for the length grouped by supp and dose.

OK. that looks like length will increase as dosage increases and be higher for orange juice delivery than for synthetic vitamin C delivery.

The plots aren’t entirely clear on the relationship between the doses or between the supplement vectors. An interaction plot would be useful here.

Data Preparation - Subsets

The data must be subsetted for analysis. We find means after grouping, first by supp, then by does, then by combinations of supp and dose. The code is in the appendix.

Because the subsets of data are taken from the same population we assume equal variance for all subsets. None are paired because we are dealing with 60 independent guinea pigs. We will use a confidence interval of 95% and an alpha of 5%. The null hypothesis is that the means of the groups are equal. For any p value \(> .05\) we cannot reject the null hypothesis that the means are equivalent. For smaller p values we reject the null hypothesis in favor of the alternative hypothesis that the means of the groups are different.

Analysis

Let’s compare the means of the various groups. In all cases the mean difference is taken to be the second list minus the first list, so a negative difference means the first list is lower than the second.

Confidence Intervals and P values
Confidence Interval
Lower Bound Upper Bound P value
Overall Dose & Supp
OJ vs VC -0.1670064 7.5670064 0.06039337
.5 vs 1 -11.9837477 -6.2762523 0.00000013
.5 vs 2 -18.1535190 -12.8364810 0.00000000
1 vs 2 -8.9943868 -3.7356132 0.00001811
Same Supp, Different Dose
OJ: .5 vs 1 -13.4108143 -5.5291857 0.00008358
OJ: .5 vs 2 -16.2782226 -9.3817774 0.00000034
OJ: 1 vs 2 -6.5005017 -0.2194983 0.03736280
VC: .5 vs 1 -11.2643455 -6.3156545 0.00000065
VC: .5 vs 2 -21.8328433 -14.4871567 0.00000000
VC: 1 vs 2 -12.9689598 -5.7710402 0.00003398
Same Dose, Different Supp
.5: OJ vs VC 1.7702616 8.7297383 0.00530366
1: OJ vs VC 2.8406920 9.0193080 0.00078073
2: OJ vs VC -3.7229985 3.5629986 0.96370978

We see that in most cases the interval is negative and the p value is low. This means that we reject the null hypothesis of no diffference in means in favor of the alternative hypothesis of a negative relationship (the mean of the second is larger than the mean of the first). There are two exceptions -

  1. Orange Juice vs. Synthetic Vitamin C (over all dosages) wherein the relationship is only statistically significant at a confidence level of 6.04% or higher.
  2. Dosage 2: Orange Juics vs. Vitamin C wherein the vector did not make a statistically significant difference between the means of the lengths almost at all (p value near 1).

Conclusion

While the data are not normally distributed the t test is still valid and applicable.

Appendix

Bring in the data.

tg <- ToothGrowth

Set Dose to a factor

tg1 <- tg
tg1$dose <- sapply(tg1$dose, as.factor)

How much data do we have?

dim(tg)
## [1] 60  3
names(tg)
## [1] "len"  "supp" "dose"

Code for subsetting.

library(dplyr)

By supp.

tgOJ <- tg1 %>%
        filter(supp == "OJ") %>%
        select(len)
tgVC <- tg1 %>%
        filter(supp == "VC") %>%
        select(len)

By dose.

tg.5 <- tg1 %>%
        filter(dose == "0.5") %>%
        select(len)
tg01 <- tg1 %>%
        filter(dose == "1") %>%
        select(len)
tg02 <- tg1 %>%
        filter(dose == "2") %>%
        select(len)

By supp and dose.

tgOJ.5 <- tg1 %>%
        filter(supp == "OJ", dose == "0.5") %>%
        select(len)
tgOJ01 <- tg1 %>%
        filter(supp == "OJ", dose == "1") %>%
        select(len)
tgOJ02 <- tg1 %>%
        filter(supp == "OJ", dose == "2") %>%
        select(len)
tgVC.5 <- tg1 %>%
        filter(supp == "VC", dose == "0.5") %>%
        select(len)
tgVC01 <- tg1 %>%
        filter(supp == "VC", dose == "1") %>%
        select(len)
tgVC02 <- tg1 %>%
        filter(supp == "VC", dose == "2") %>%
        select(len)

t.test code.

OJvVC <- t.test(tgOJ, tgVC, var.equal = TRUE)
halfv01 <- t.test(tg.5, tg01, var.equal = TRUE)
halfv02 <- t.test(tg.5, tg02, var.equal = TRUE)
O1v02 <- t.test(tg01, tg02, var.equal = TRUE)
OJ.5vOJ01 <- t.test(tgOJ.5, tgOJ01, var.equal = TRUE)
OJ.5vOJ02 <- t.test(tgOJ.5, tgOJ02, var.equal = TRUE)
OJ01vOJ02 <- t.test(tgOJ01, tgOJ02, var.equal = TRUE)
VC.5vVC01 <- t.test(tgVC.5, tgVC01, var.equal = TRUE)
VC.5vVC02 <- t.test(tgVC.5, tgVC02, var.equal = TRUE)
VC01vVC02 <- t.test(tgVC01, tgVC02, var.equal = TRUE)
OJ.5vVC.5 <- t.test(tgOJ.5, tgVC.5, var.equal = TRUE)
OJ01vVC01 <- t.test(tgOJ01, tgVC01, var.equal = TRUE)
OJ02vVC02 <- t.test(tgOJ02, tgVC02, var.equal = TRUE)

Libraries for the table of results

library(knitr)
library(kableExtra)

Generating the table of results.

x <-
        t(matrix(
        data = c(
        OJvVC$conf.int,
        OJvVC$p.value,
        halfv01$conf.int,
        halfv01$p.value,
        halfv02$conf.int,
        halfv02$p.value,
        O1v02$conf.int,
        O1v02$p.value,
        OJ.5vOJ01$conf.int,
        OJ.5vOJ01$p.value,
        OJ.5vOJ02$conf.int,
        OJ.5vOJ02$p.value,
        OJ01vOJ02$conf.int,
        OJ01vOJ02$p.value,
        VC.5vVC01$conf.int,
        VC.5vVC01$p.value,
        VC.5vVC02$conf.int,
        VC.5vVC02$p.value,
        VC01vVC02$conf.int,
        VC01vVC02$p.value,
        OJ.5vVC.5$conf.int,
        OJ.5vVC.5$p.value,
        OJ01vVC01$conf.int,
        OJ01vVC01$p.value,
        OJ02vVC02$conf.int,
        OJ02vVC02$p.value
        ),
        nrow = 3
        ))
        rownames(x) <- c(
        "OJ vs VC",
        ".5 vs 1",
        ".5 vs 2",
        "1 vs 2",
        "OJ: .5 vs 1",
        "OJ: .5 vs 2",
        "OJ: 1 vs 2",
        "VC: .5 vs 1",
        "VC: .5 vs 2",
        "VC: 1 vs 2",
        ".5: OJ vs VC",
        "1: OJ vs VC",
        "2: OJ vs VC"
        )
        
        kable(
               x,
                digits = 4,
                row.names = TRUE,
                col.names = c("Lower Bound", "Upper Bound", "P value"),
                align = "l",
                caption = "Confidence Intervals and P values",
                label = NULL,
                escape = TRUE
                ) %>%
        kable_styling("striped") %>%
        add_header_above(c(
                " " = 1,
                "Confidence Interval" = 2,
                " " = 1
        ))