This report will analyze the ToothGrowth data in the R datasets package. Then use hypothesis tests to compare tooth growth by supp and dose.
First, We show summary of data.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
We find means of “len” group by “supp” and “dose”.
ToothGrowth_agg <- aggregate(len ~ supp + dose, data=ToothGrowth, mean)
ToothGrowth_agg$dose <- as.factor(ToothGrowth_agg$dose)
ToothGrowth_agg <- ToothGrowth_agg[order(ToothGrowth_agg$supp,ToothGrowth_agg$dose), ]
rownames(ToothGrowth_agg) <- NULL
library(knitr)
kable(ToothGrowth_agg, caption = "Means of len group by supp and dose")
| supp | dose | len |
|---|---|---|
| OJ | 0.5 | 13.23 |
| OJ | 1 | 22.70 |
| OJ | 2 | 26.06 |
| VC | 0.5 | 7.98 |
| VC | 1 | 16.77 |
| VC | 2 | 26.14 |
Next we plot two graphs, the first graph shows different toothgrowth between supp. The second graph shows different toothgrowth between dose.
# different toothgrowth between supp
library(ggplot2)
g1 <- ggplot(ToothGrowth_agg, aes(x = supp, y = len, group = dose))
g1 <- g1 + geom_line(size = 1, aes(colour = dose)) + geom_point(size =4, pch = 21, fill = "salmon", alpha = .5)
g1 <- g1 + labs(title = "Figure 1. different toothgrowth between supp")
g1 <- g1 + theme(plot.title = element_text(size=11))
# different toothgrowth between dose
g2 <- ggplot(ToothGrowth_agg, aes(x = dose, y = len, group = supp))
g2 <- g2 + geom_line(size = 1, aes(colour = supp)) + geom_point(size =4, pch = 21, fill = "salmon", alpha = .5)
g2 <- g2 + labs(title = "Figure 2. different toothgrowth between dose")
g2 <- g2 + theme(plot.title = element_text(size=11))
library(gridExtra)
grid.arrange(g1, g2, ncol = 2)
From Figure 1, we show different toothgrowth between supp. We see that on dose “0.5” and “1”, supp “OJ” has little higher toothgrowth than supp “VC” but on dose “2” there is almost no different toothgrowth between supp. Therefore it does not clear that there is different toothgrowth between supp.
From Figure 2, we show different toothgrowth between dose. We can see cleary that as dose increased on both supp, len also significantly increased.
From Figure 1. Are supp have different toothgrowth?
\(H_0\): \(\mu\) = 0 (supp have no different toothgrowth)
\(H_a\): \(\mu\) != 0 (supp have different toothgrowth)
Assume: significant level = 0.05 and two variances are unequal and distribution is normal distribution.
oj <- ToothGrowth[ToothGrowth$supp == "OJ", 1]
vc <- ToothGrowth[ToothGrowth$supp == "VC", 1]
testing1 <- t.test(oj, vc, paired = FALSE, var.equal = FALSE, alternative = "two.sided", conf.level = 0.95)
cat("testing 1 p-value =", testing1$p.value, "and 95% confident interval =", testing1$conf.int)
## testing 1 p-value = 0.06063451 and 95% confident interval = -0.1710156 7.571016
From t.test we got p-value > 0.05 and 0 is inside 95% confidence interval.
Thus, fail to reject the null hypothesis \(H_0\). So supp have no different toothgrowth.
From Figure 2. Are dose “1” has higher toothgrowth than dose “0.5”?
\(H_0\): \(\mu\) = 0 (dose “1” and “0.5” have no different toothgrowth)
\(H_a\): \(\mu\) > 0 (dose “1” has higher toothgrowth than dose “0.5”)
Assume: significant level = 0.05 and two variances are unequal and distribution is normal distribution.
dose0.5 <- ToothGrowth[ToothGrowth$dose == "0.5", 1]
dose1 <- ToothGrowth[ToothGrowth$dose == "1", 1]
testing2 <- t.test(dose1, dose0.5, paired = FALSE, var.equal = FALSE, alternative = "greater", conf.level = 0.95)
cat("testing 2 p-value =", testing2$p.value, "and 95% confident interval =", testing2$conf.int)
## testing 2 p-value = 6.341504e-08 and 95% confident interval = 6.753323 Inf
From t.test we got p-value < 0.05 and 0 is outside 95% confidence interval.
Thus, we reject the null hypothesis \(H_0\) and accept the hypothesis \(H_a\). So dose “1” has higher toothgrowth than dose “0.5”.
From Figure 2. Are dose “2” has higher toothgrowth than dose “1”?
\(H_0\): \(\mu\) = 0 (dose “2” and “1” have no different toothgrowth)
\(H_a\): \(\mu\) > 0 (dose “2” has higher toothgrowth than dose “1”)
Assume: significant level = 0.05 and two variances are unequal and distribution is normal distribution.
dose2 <- ToothGrowth[ToothGrowth$dose == "2", 1]
testing3 <- t.test(dose2, dose1, paired = FALSE, var.equal = FALSE, alternative = "greater", conf.level = 0.95)
cat("testing 3 p-value =", testing3$p.value, "and 95% confident interval =", testing3$conf.int)
## testing 3 p-value = 9.532148e-06 and 95% confident interval = 4.17387 Inf
From t.test we got p-value < 0.05 and 0 is outside 95% confidence interval.
Thus, we reject the null hypothesis \(H_0\) and accept the hypothesis \(H_a\). So dose “2” has higher toothgrowth than dose “1”.
From Hypothesis testing 1. We conclude that supp have no different toothgrowth (significant level 0.05).
From Hypothesis testing 2 and 3. We conclude that as dose increased (“0.5” < “1” < “2”), toothgrowth also increased (significant level 0.05).