Overview

This report will analyze the ToothGrowth data in the R datasets package. Then use hypothesis tests to compare tooth growth by supp and dose.

exploratory data analyses

First, We show summary of data.

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

We find means of “len” group by “supp” and “dose”.

ToothGrowth_agg <- aggregate(len ~ supp + dose, data=ToothGrowth, mean)
ToothGrowth_agg$dose <- as.factor(ToothGrowth_agg$dose)
ToothGrowth_agg <- ToothGrowth_agg[order(ToothGrowth_agg$supp,ToothGrowth_agg$dose), ]
rownames(ToothGrowth_agg) <- NULL
library(knitr)
kable(ToothGrowth_agg, caption = "Means of len group by supp and dose")
Means of len group by supp and dose
supp dose len
OJ 0.5 13.23
OJ 1 22.70
OJ 2 26.06
VC 0.5 7.98
VC 1 16.77
VC 2 26.14

Next we plot two graphs, the first graph shows different toothgrowth between supp. The second graph shows different toothgrowth between dose.

# different toothgrowth between supp
library(ggplot2)
g1 <- ggplot(ToothGrowth_agg, aes(x = supp, y = len, group = dose))
g1 <- g1 + geom_line(size = 1, aes(colour = dose)) + geom_point(size =4, pch = 21, fill = "salmon", alpha = .5)
g1 <- g1 + labs(title = "Figure 1. different toothgrowth between supp")
g1 <- g1 + theme(plot.title = element_text(size=11))
# different toothgrowth between dose
g2 <- ggplot(ToothGrowth_agg, aes(x = dose, y = len, group = supp))
g2 <- g2 + geom_line(size = 1, aes(colour = supp)) + geom_point(size =4, pch = 21, fill = "salmon", alpha = .5)
g2 <- g2 + labs(title = "Figure 2. different toothgrowth between dose")
g2 <- g2 + theme(plot.title = element_text(size=11))
library(gridExtra)
grid.arrange(g1, g2, ncol = 2)

summary of the data

From Figure 1, we show different toothgrowth between supp. We see that on dose “0.5” and “1”, supp “OJ” has little higher toothgrowth than supp “VC” but on dose “2” there is almost no different toothgrowth between supp. Therefore it does not clear that there is different toothgrowth between supp.

From Figure 2, we show different toothgrowth between dose. We can see cleary that as dose increased on both supp, len also significantly increased.

Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose

Hypothesis testing 1

From Figure 1. Are supp have different toothgrowth?

\(H_0\): \(\mu\) = 0 (supp have no different toothgrowth)

\(H_a\): \(\mu\) != 0 (supp have different toothgrowth)

Assume: significant level = 0.05 and two variances are unequal and distribution is normal distribution.

oj <- ToothGrowth[ToothGrowth$supp == "OJ", 1]
vc <- ToothGrowth[ToothGrowth$supp == "VC", 1]
testing1 <- t.test(oj, vc, paired = FALSE, var.equal = FALSE, alternative = "two.sided", conf.level = 0.95)
cat("testing 1 p-value =", testing1$p.value, "and 95% confident interval =", testing1$conf.int)
## testing 1 p-value = 0.06063451 and 95% confident interval = -0.1710156 7.571016

From t.test we got p-value > 0.05 and 0 is inside 95% confidence interval.

Thus, fail to reject the null hypothesis \(H_0\). So supp have no different toothgrowth.

Hypothesis testing 2

From Figure 2. Are dose “1” has higher toothgrowth than dose “0.5”?

\(H_0\): \(\mu\) = 0 (dose “1” and “0.5” have no different toothgrowth)

\(H_a\): \(\mu\) > 0 (dose “1” has higher toothgrowth than dose “0.5”)

Assume: significant level = 0.05 and two variances are unequal and distribution is normal distribution.

dose0.5 <- ToothGrowth[ToothGrowth$dose == "0.5", 1]
dose1 <- ToothGrowth[ToothGrowth$dose == "1", 1]
testing2 <- t.test(dose1, dose0.5, paired = FALSE, var.equal = FALSE, alternative = "greater", conf.level = 0.95)
cat("testing 2 p-value =", testing2$p.value, "and 95% confident interval =", testing2$conf.int)
## testing 2 p-value = 6.341504e-08 and 95% confident interval = 6.753323 Inf

From t.test we got p-value < 0.05 and 0 is outside 95% confidence interval.

Thus, we reject the null hypothesis \(H_0\) and accept the hypothesis \(H_a\). So dose “1” has higher toothgrowth than dose “0.5”.

Hypothesis testing 3

From Figure 2. Are dose “2” has higher toothgrowth than dose “1”?

\(H_0\): \(\mu\) = 0 (dose “2” and “1” have no different toothgrowth)

\(H_a\): \(\mu\) > 0 (dose “2” has higher toothgrowth than dose “1”)

Assume: significant level = 0.05 and two variances are unequal and distribution is normal distribution.

dose2 <- ToothGrowth[ToothGrowth$dose == "2", 1]
testing3 <- t.test(dose2, dose1, paired = FALSE, var.equal = FALSE, alternative = "greater", conf.level = 0.95)
cat("testing 3 p-value =", testing3$p.value, "and 95% confident interval =", testing3$conf.int)
## testing 3 p-value = 9.532148e-06 and 95% confident interval = 4.17387 Inf

From t.test we got p-value < 0.05 and 0 is outside 95% confidence interval.

Thus, we reject the null hypothesis \(H_0\) and accept the hypothesis \(H_a\). So dose “2” has higher toothgrowth than dose “1”.

Conclusion

From Hypothesis testing 1. We conclude that supp have no different toothgrowth (significant level 0.05).

From Hypothesis testing 2 and 3. We conclude that as dose increased (“0.5” < “1” < “2”), toothgrowth also increased (significant level 0.05).