We’re going to analyze the ToothGrowth data in the R datasets package using confidence intervals and hypothesis tests to compare tooth growth by supp and dose.
library(datasets)
tg <- ToothGrowth
The ?ToothGrowth command gives us the details on this dataset. The purpose of the experiment was to test the effect of vitamin C on tooth growth in guinea pigs.
“The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).”
The format is a data frame with 3 variables:
[,1] len numeric Tooth length [,2] supp factor Supplement type (VC or OJ). [,3] dose numeric Dose in milligrams/day
We can also look at some summary tables on the dataframe to find out a bit more:
str(tg)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(tg)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The structure (str) summary told us that dose is stored in the data as a numeric. We want it as a factor.
tg$dose <- as.factor(tg$dose)
We’ll look at the data a bit to make some hypotheses.
First we’ll ask does dose matter?
plot(len ~ dose, data = tg)
It looks like it could.
Second we’ll ask does the type of supplement matter?
plot(len ~ supp, data = tg)
It also looks like it could.
So, what are our hypotheses that need testing?
So we’ll do the t-test:
t.test(len~supp, paired = FALSE, var.equal = FALSE, data = tg)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
It’s really close but the p-value of 0.06 is higher than 0.05 so we can’t say with 95% confidence that OJ results in more growth than VC.
tg_oj <- subset(tg, supp == "OJ" & dose != 1)
t.test(len~dose, paired = FALSE, var.equal = FALSE, data = tg_oj)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.335241 -9.324759
## sample estimates:
## mean in group 0.5 mean in group 2
## 13.23 26.06
The p-value is way less than .05 so we can say this with near certainty.
tg_vc <- subset(tg, supp == "VC" & dose != 1)
t.test(len~dose, paired = FALSE, var.equal = FALSE, data = tg_vc)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -10.388, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.90151 -14.41849
## sample estimates:
## mean in group 0.5 mean in group 2
## 7.98 26.14
This also has a really small p value and looks conclusive.
I’ve concluded:
We can’t say with certainty that OJ promotes more growth than VC.
We can say with certainty that higher doses of either VC or OJ promote more growth.