Each animal, from a group of 60 guinea pig, received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC). The response is the length of odontoblasts (cells responsible for tooth growth) in this group.
The goal of this analysis is to explore this data by following the steps below:
The data for this analysis comes in the R datasets package and its documentation can be found here.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
As we can see in the summary above, the data frame has 60 observations on 3 variables. According to the documentation, the variables are:
| Variable | Type | Description |
|---|---|---|
| len | numeric | Tooth length. |
| supp | factor | Supplement type (VC or OJ). |
| dose | numeric | Dose in milligrams/day. |
Let’s have a loot on the top 6 lines of the data frame:
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
In order to make the data easier to understand, let’s rename the variable “len” to “length”, the variable “supp” to “supplement”, and replace the codes VC and OJ for “vitamin C” and “orange juice” respectively.
ToothGrowth <- rename(ToothGrowth, length = len)
ToothGrowth <- rename(ToothGrowth, supplement = supp)
ToothGrowth$supplement <- recode(ToothGrowth$supplement, OJ = "orange juice",
VC = "ascorbic acid")
As a first step in exploring the data, let’s plot the points into a graph showing how the length of the tooth by the dose of vitamin C and the method how it was delivered.
ggplot(data = ToothGrowth, aes(x = dose, y = length, col = supplement)) +
geom_point() +
scale_color_brewer(palette = "Set1") +
ggtitle("FIGURE 1: Tooth length X dose of vitamin C") +
geom_smooth(method = lm, se = TRUE, na.rm = TRUE, alpha = 0.3)
This first figure shows us that there seems to be a relationship between tooth size and the dose of vitamin C. It also shows us that the delivery method also seems to have an impact on the tooth size. For smaller doses of vitamin C, the administration through orange juice seems to have a bigger impact on the tooth size. However, as the doses increase the ascorbic acid impact approaches to the orange juice impact.
In the FIGURE 2, we will analise the box plot of the tooth length per dose of vitamin C.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(data = ToothGrowth, aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(aes(alpha = 0.5)) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 2: Tooth length X dose of vitamin C")
The figure shows that the tooth length median and mean (dotted line) for the group administrated with 0.5 mg/day through orange juice is much bigger than the group administrated with 0.5 mg/day through ascorbic acid. This difference increases in the group that received 1 mg/day. However, this difference almost disappears in the groups with the dose of vitamin C increased to 2 mg/day. In these latter groups, the medians seem to be equal and the tooth length mean of the group administered with ascorbic acid seems to be slightly bigger. The variance of the ascorbic acid is also bigger.
Let’s summarise and compare the data to see how close the mean is for each dose of vitamin C.
The data confirms what the FIGURE 2 suggests, the length tooth mean of the ascorbic acid group is much lower than that of orange juice group, except when the dose of vitamin C is 2 mg/day. I this latter case the ascorbic acid group mean is 0.08 greater than the orange juice group.
dcast(ToothGrowth, dose ~ supplement, value.var = "length", fun.aggregate = mean)
## dose orange juice ascorbic acid
## 1 0.5 13.23 7.98
## 2 1 22.70 16.77
## 3 2 26.06 26.14
The median comparison is even more interesting as both groups have exactly the same median when the dose is equal to 2 mg/day.
dcast(ToothGrowth, dose ~ supplement, value.var = "length", fun.aggregate = median)
## dose orange juice ascorbic acid
## 1 0.5 12.25 7.15
## 2 1 23.45 16.50
## 3 2 25.95 25.95
Other interesting patter is that the tooth length variability in the groups administrated with orange juice seems to decrease as the dose increases. In the groups administered with ascorbic acid, the dont differ much between the groups 0.5-dose and 1-dose. The ascorbic-acid-dose-2 group, however, has the greatest variability of all.
dcast(ToothGrowth, dose ~ supplement, value.var = "length", fun.aggregate = sd)
## dose orange juice ascorbic acid
## 1 0.5 4.459709 2.746634
## 2 1 3.910953 2.515309
## 3 2 2.655058 4.797731
The data above suggests that there is an interaction between dose and supplement.
When there is an interaction between factors as there seems to be in this dataset, it is often hard to tell a straight-forward, simple analysis about the data. There is no other choice but to make a lot of comparisons. However, performing multiple comparisons increases the probability of incorrectly rejecting a true hypothesis. To prevent that we will use the Benjamini-Hochberg correction to adjust our P-values.
We will compare the guinea pigs that received 0.5 mg/day of vitamin C.
t.test.05.oj.aa <- t.test(ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement ==
"orange juice",]$length,
ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement ==
"ascorbic acid",]$length)
t.test.05.oj.aa
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == "orange juice", ]$length and "ascorbic acid", ]$length
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
We will compare the guinea pigs that received 1 mg/day of vitamin C.
t.test.10.oj.aa <- t.test(ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement ==
"orange juice",]$length,
ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement ==
"ascorbic acid",]$length)
t.test.10.oj.aa
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement == "orange juice", ]$length and "ascorbic acid", ]$length
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
We will compare the guinea pigs that received 2 mg/day of vitamin C.
t.test.20.oj.aa <- t.test(ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement ==
"orange juice",]$length,
ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement ==
"ascorbic acid",]$length)
t.test.20.oj.aa
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement == "orange juice", ]$length and "ascorbic acid", ]$length
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean of x mean of y
## 26.06 26.14
We will compare the guinea pigs that received vitamin C trough orange juice.
t.test.oj.05.10 <- t.test(ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement ==
"orange juice",]$length,
ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement ==
"orange juice",]$length)
t.test.oj.05.10
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement == "orange juice", ]$length and "orange juice", ]$length
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.415634 -5.524366
## sample estimates:
## mean of x mean of y
## 13.23 22.70
We will compare the guinea pigs that received vitamin C trough orange juice.
t.test.oj.05.20 <- t.test(ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement ==
"orange juice",]$length,
ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement ==
"orange juice",]$length)
t.test.oj.05.20
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement == "orange juice", ]$length and "orange juice", ]$length
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.335241 -9.324759
## sample estimates:
## mean of x mean of y
## 13.23 26.06
We will compare the guinea pigs that received vitamin C trough orange juice.
t.test.oj.10.20 <- t.test(ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement ==
"orange juice",]$length,
ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement ==
"orange juice",]$length)
t.test.oj.10.20
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement == "orange juice", ]$length and "orange juice", ]$length
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y
## 22.70 26.06
We will compare the guinea pigs that received vitamin C trough ascorbic acid.
t.test.aa.05.10 <- t.test(ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement ==
"ascorbic acid",]$length,
ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement ==
"ascorbic acid",]$length)
t.test.aa.05.10
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement == "ascorbic acid", ]$length and "ascorbic acid", ]$length
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.265712 -6.314288
## sample estimates:
## mean of x mean of y
## 7.98 16.77
We will compare the guinea pigs that received vitamin C trough ascorbic acid.
t.test.aa.05.20 <- t.test(ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement ==
"ascorbic acid",]$length,
ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement ==
"ascorbic acid",]$length)
t.test.aa.05.20
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 0.5 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement == "ascorbic acid", ]$length and "ascorbic acid", ]$length
## t = -10.388, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.90151 -14.41849
## sample estimates:
## mean of x mean of y
## 7.98 26.14
We will compare the guinea pigs that received vitamin C trough ascorbic acid.
t.test.aa.10.20 <- t.test(ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement ==
"ascorbic acid",]$length,
ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement ==
"ascorbic acid",]$length)
t.test.aa.10.20
##
## Welch Two Sample t-test
##
## data: ToothGrowth[ToothGrowth$dose == 1 & ToothGrowth$supplement == and ToothGrowth[ToothGrowth$dose == 2 & ToothGrowth$supplement == "ascorbic acid", ]$length and "ascorbic acid", ]$length
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.054267 -5.685733
## sample estimates:
## mean of x mean of y
## 16.77 26.14
As we saw above, performing multiple comparisons increases the probability of incorrectly rejecting a true hypothesis. To prevent that we will use the Benjamini-Hochberg correction to adjust our P-values.
pvalues <- c(t.test.05.oj.aa$p.value, t.test.10.oj.aa$p.value, t.test.20.oj.aa$p.value,
t.test.oj.05.10$p.value, t.test.oj.05.20$p.value, t.test.oj.10.20$p.value,
t.test.aa.05.10$p.value, t.test.aa.05.20$p.value, t.test.aa.10.20$p.value)
padjust <- p.adjust(pvalues, method = "BH")
Considering the adjusted P-values we have the results:
ggplot(data = ToothGrowth[ToothGrowth$dose == 0.5,], aes(x = dose, y = length,
fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 3: dose-0.5-orange-juice vs dose-0.5-ascorbic-acid")
ggplot(data = ToothGrowth[ToothGrowth$dose == 1,], aes(x = dose, y = length,
fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 4: dose-1.0-orange-juice vs dose-1.0-ascorbic-acid")
ggplot(data = ToothGrowth[ToothGrowth$dose == 2,], aes(x = dose, y = length,
fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 5: dose-2.0-orange-juice vs dose-2.0-ascorbic-acid")
ggplot(data = ToothGrowth[(ToothGrowth$dose == 0.5 | ToothGrowth$dose == 1.0) &
ToothGrowth$supplement == "orange juice",],
aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 6: orange-juice-dose-0.5 vs orange-juice-dose-1.0")
ggplot(data = ToothGrowth[(ToothGrowth$dose == 0.5 | ToothGrowth$dose == 2.0) &
ToothGrowth$supplement == "orange juice",],
aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 7: orange-juice-dose-0.5 vs orange-juice-dose-2.0")
ggplot(data = ToothGrowth[(ToothGrowth$dose == 1.0 | ToothGrowth$dose == 2.0) &
ToothGrowth$supplement == "orange juice",],
aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 8: orange-juice-dose-1.0 vs orange-juice-dose-2.0")
ggplot(data = ToothGrowth[(ToothGrowth$dose == 0.5 | ToothGrowth$dose == 1.0) &
ToothGrowth$supplement == "ascorbic acid",],
aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 9: ascorbic-acid-dose-0.5 vs ascorbic-acid-dose-1.0")
ggplot(data = ToothGrowth[(ToothGrowth$dose == 0.5 | ToothGrowth$dose == 2.0) &
ToothGrowth$supplement == "ascorbic acid",],
aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 10: ascorbic-acid-dose-0.5 vs ascorbic-acid-dose-2.0")
ggplot(data = ToothGrowth[(ToothGrowth$dose == 1 | ToothGrowth$dose == 2) &
ToothGrowth$supplement == "ascorbic acid",],
aes(x = dose, y = length, fill = supplement)) +
geom_boxplot(alpha = 0.5) +
scale_fill_brewer(palette = "Set1") +
stat_summary(fun.y = mean, geom = "errorbar", aes(ymax = ..y.., ymin = ..y.. ),
linetype = "dotted", position = position_dodge(0.75), size = 1,
width = .76, color = "black") +
ggtitle("FIGURE 9: ascorbic-acid-dose-1.0 vs ascorbic-acid-dose-2.0")
Assuming that the means of the different samples were normally distributed, and that the data used to carry out these tests was sampled independently from the populations that corresponded with the test in question. We perfomed 9 different t-tests at the 95% significance level. Based on these tests, and the Benjamini-Hochberg adjusted critical p-value, we can conclude that:
Guinea pig tooth length was significantly affected by delivery method at the 0.5 and 1.0 mg/day vitamin C dosages. Guinea pigs that received vitamin C through orange juice at these dosages tended to have longer odontoblast lengths than those that received ascorbic acid.
Guinea pig tooth length was also significantly affected by the dosage of vitamin C; a higher dosage corresponded with a larger tooth length, regardless of which delivery method was used.