In 1952 C. I. Bliss wrote a paper “The Statistics of Biosassay” in which he analyzed the length of odonotoblasts (part of tooth formation) in a population of 60 guinea pigs by giving three different doses of vitamin C by two different vectors to subgroups of the 60 over 42 days. Utimately the goal was to determine the best method of delivering vitamin C to soldiers who did not have much access to natural sources of the vitamin.
After loading the dataset we find that it is 60 rows (guinea pigs) by 3 columns.
The 3 columns include the length of the odontoblasts, the supplement (vector) by which the vitamin C was administered, and the dosage of vitamin C that the guinea pig received.
In the data it is the case that the dosage is considered a numerical value rather than a discrete factor (level). Hence it is useful to change the value in the “dose” column to be a factor rather than a number.
Now we can perform a brief summary of the data.
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
We see two vectors (OJ=orange juice and VC=synthetic vitamin C) and three dosage levels measured in milligrams per day (.5, 1, 2). The odontoblast lengths are measured in microns (millionths of a meter) and range from 4.2 to 33.9. The table shows how many guinea pigs received each combination of vector and dosage. [Note that running the R command str(tg1) gives no extra useful data.]
There are 10 guinea pigs for each combination (\(2 \cdotp 3 \cdotp 10 = 60\)). Here’s a basic histogram of the length data with the normal cureve in red.
That’s an argument in favor of a t-test (which is robust to departures from normality).
Perhaps we could learn something by grouping the data by Supp and Dose.
Here are box plots for the length grouped by supp and dose.
OK. that looks like length will increase as dosage increases and be higher for orange juice delivery than for synthetic vitamin C delivery.
The plots aren’t entirely clear on the relationship between the doses or between the supplement vectors. An interaction plot would be useful here.
The data must be subsetted for analysis. We find means after grouping, first by supp, then by does, then by combinations of supp and dose. The code is in the appendix.
Because the subsets of data are taken from the same population we assume equal variance for all subsets. None are paired because we are dealing with 60 independent guinea pigs. We will use a confidence interval of 95% and an alpha of 5%. The null hypothesis is that the means of the groups are equal. For any p value \(> .05\) we cannot reject the null hypothesis that the means are equivalent. For smaller p values we reject the null hypothesis in favor of the alternative hypothesis that the means of the groups are different.
Let’s compare the means of the various groups. In all cases the mean difference is taken to be the second list minus the first list, so a negative difference means the first list is lower than the second.
| Lower Bound | Upper Bound | P value | |
|---|---|---|---|
| Overall Dose & Supp | |||
| OJ vs VC | -0.1670064 | 7.5670064 | 0.06039337 |
| .5 vs 1 | -11.9837477 | -6.2762523 | 0.00000013 |
| .5 vs 2 | -18.1535190 | -12.8364810 | 0.00000000 |
| 1 vs 2 | -8.9943868 | -3.7356132 | 0.00001811 |
| Same Supp, Different Dose | |||
| OJ: .5 vs 1 | -13.4108143 | -5.5291857 | 0.00008358 |
| OJ: .5 vs 2 | -16.2782226 | -9.3817774 | 0.00000034 |
| OJ: 1 vs 2 | -6.5005017 | -0.2194983 | 0.03736280 |
| VC: .5 vs 1 | -11.2643455 | -6.3156545 | 0.00000065 |
| VC: .5 vs 2 | -21.8328433 | -14.4871567 | 0.00000000 |
| VC: 1 vs 2 | -12.9689598 | -5.7710402 | 0.00003398 |
| Same Dose, Different Supp | |||
| .5: OJ vs VC | 1.7702616 | 8.7297383 | 0.00530366 |
| 1: OJ vs VC | 2.8406920 | 9.0193080 | 0.00078073 |
| 2: OJ vs VC | -3.7229985 | 3.5629986 | 0.96370978 |
We see that in most cases the interval is negative and the p value is low. This means that we reject the null hypothesis of no diffference in means in favor of the alternative hypothesis of a negative relationship (the mean of the second is larger than the mean of the first). There are two exceptions -
While the data are not normally distributed the t test is still valid and applicable.
Bring in the data.
tg <- ToothGrowth
Set Dose to a factor
tg1 <- tg
tg1$dose <- sapply(tg1$dose, as.factor)
How much data do we have?
dim(tg)
## [1] 60 3
names(tg)
## [1] "len" "supp" "dose"
Code for subsetting.
library(dplyr)
By supp.
tgOJ <- tg1 %>%
filter(supp == "OJ") %>%
select(len)
tgVC <- tg1 %>%
filter(supp == "VC") %>%
select(len)
By dose.
tg.5 <- tg1 %>%
filter(dose == "0.5") %>%
select(len)
tg01 <- tg1 %>%
filter(dose == "1") %>%
select(len)
tg02 <- tg1 %>%
filter(dose == "2") %>%
select(len)
By supp and dose.
tgOJ.5 <- tg1 %>%
filter(supp == "OJ", dose == "0.5") %>%
select(len)
tgOJ01 <- tg1 %>%
filter(supp == "OJ", dose == "1") %>%
select(len)
tgOJ02 <- tg1 %>%
filter(supp == "OJ", dose == "2") %>%
select(len)
tgVC.5 <- tg1 %>%
filter(supp == "VC", dose == "0.5") %>%
select(len)
tgVC01 <- tg1 %>%
filter(supp == "VC", dose == "1") %>%
select(len)
tgVC02 <- tg1 %>%
filter(supp == "VC", dose == "2") %>%
select(len)
t.test code.
OJvVC <- t.test(tgOJ, tgVC, var.equal = TRUE)
halfv01 <- t.test(tg.5, tg01, var.equal = TRUE)
halfv02 <- t.test(tg.5, tg02, var.equal = TRUE)
O1v02 <- t.test(tg01, tg02, var.equal = TRUE)
OJ.5vOJ01 <- t.test(tgOJ.5, tgOJ01, var.equal = TRUE)
OJ.5vOJ02 <- t.test(tgOJ.5, tgOJ02, var.equal = TRUE)
OJ01vOJ02 <- t.test(tgOJ01, tgOJ02, var.equal = TRUE)
VC.5vVC01 <- t.test(tgVC.5, tgVC01, var.equal = TRUE)
VC.5vVC02 <- t.test(tgVC.5, tgVC02, var.equal = TRUE)
VC01vVC02 <- t.test(tgVC01, tgVC02, var.equal = TRUE)
OJ.5vVC.5 <- t.test(tgOJ.5, tgVC.5, var.equal = TRUE)
OJ01vVC01 <- t.test(tgOJ01, tgVC01, var.equal = TRUE)
OJ02vVC02 <- t.test(tgOJ02, tgVC02, var.equal = TRUE)
Libraries for the table of results
library(knitr)
library(kableExtra)
Generating the table of results.
x <-
t(matrix(
data = c(
OJvVC$conf.int,
OJvVC$p.value,
halfv01$conf.int,
halfv01$p.value,
halfv02$conf.int,
halfv02$p.value,
O1v02$conf.int,
O1v02$p.value,
OJ.5vOJ01$conf.int,
OJ.5vOJ01$p.value,
OJ.5vOJ02$conf.int,
OJ.5vOJ02$p.value,
OJ01vOJ02$conf.int,
OJ01vOJ02$p.value,
VC.5vVC01$conf.int,
VC.5vVC01$p.value,
VC.5vVC02$conf.int,
VC.5vVC02$p.value,
VC01vVC02$conf.int,
VC01vVC02$p.value,
OJ.5vVC.5$conf.int,
OJ.5vVC.5$p.value,
OJ01vVC01$conf.int,
OJ01vVC01$p.value,
OJ02vVC02$conf.int,
OJ02vVC02$p.value
),
nrow = 3
))
rownames(x) <- c(
"OJ vs VC",
".5 vs 1",
".5 vs 2",
"1 vs 2",
"OJ: .5 vs 1",
"OJ: .5 vs 2",
"OJ: 1 vs 2",
"VC: .5 vs 1",
"VC: .5 vs 2",
"VC: 1 vs 2",
".5: OJ vs VC",
"1: OJ vs VC",
"2: OJ vs VC"
)
kable(
x,
digits = 4,
row.names = TRUE,
col.names = c("Lower Bound", "Upper Bound", "P value"),
align = "l",
caption = "Confidence Intervals and P values",
label = NULL,
escape = TRUE
) %>%
kable_styling("striped") %>%
add_header_above(c(
" " = 1,
"Confidence Interval" = 2,
" " = 1
))