In this report we will load and analyze the ToothGrowth data in the R datasets package. We will provide a basic summary of the data, use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose, and conclude our analysis.
We first load the data as data frame.
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
One can view the full dataset and notice that dose is best represented as a factor (not numeric). So we will convert it and review the basic summary of the dataset.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
In addition, we will also plot boxplots of the trend of tooth length versus dose by supplement, and vice versa, versus supplement by dose.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
ggplot(ToothGrowth, aes(dose, len)) + geom_point() + facet_grid(. ~ supp) + geom_boxplot(alpha = 0.2) + stat_boxplot(geom ='errorbar')
library(ggplot2)
ggplot(ToothGrowth, aes(supp, len)) + geom_point() + facet_grid(. ~ dose) + geom_boxplot(alpha = 0.2) + stat_boxplot(geom ='errorbar')
For either type of supplement, tooth length increases with dose. At doses of 0.5 and 1, OJ is a more effective supplement than VC, but at dose of 2, both supplements became comparable.
Before we do our analysis, We can subset the dataset by either supplement or dose.
VC <- subset(ToothGrowth, grepl("VC", ToothGrowth$supp))
OJ <- subset(ToothGrowth, grepl("OJ", ToothGrowth$supp))
DOSE05 <- subset(ToothGrowth, grepl("0.5", ToothGrowth$dose))
DOSE1 <- subset(ToothGrowth, grepl("1", ToothGrowth$dose))
DOSE2 <- subset(ToothGrowth, grepl("2", ToothGrowth$dose))
We will apply t-test in our hypothesis testing. However, since t-test only applies to two levels, we cannot do our analysis by supplement because of three levels of doses. We will therefore only do t-test on the tooth growth versus supplement, categorized by dose.
bydose <- data.frame(supp = c(0.5,1,2), pvalue = c(
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = DOSE05)$p.value,
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = DOSE1)$p.value,
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = DOSE2)$p.value))
print(bydose)
## supp pvalue
## 1 0.5 0.006358607
## 2 1.0 0.001038376
## 3 2.0 0.963851589
Both doses 0.5 and 1.0 have p-values <0.05 which means supplements OJ and VC are significantly different (dose 1.0 even more so than 0.5) whereas dose 2.0 has p-value of 0.96 which is significantly >0.05 which means there is no statistically significant difference between supplements OJ and VC. This observation is also apparent on the boxplots above. We had therefore demonstrated both qualitative and quantitative comparisons of tooth growth by supplement and dose.