This project aims to analyze the ToothGrowth dataset. According to the help document, ToothGrowth is a data frame with 60 observations on 3 variables. “len” is Tooth length, which is the response; “supp” is supplement type/delivery method (VC or OJ); “dose” is in milligrams/day, which has 3 levels (0.5, 1, and 2 mg/day). This project mainly compares tooth growth by supplement type and dose.
First load the data set and library the package we want to use. Then take a glimps at the data.
library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.3
library(datasets)
data("ToothGrowth")
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Dose has 3 levels, but it is stored as numeric format. We need to transfer it to factor.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Show a scatter plot of Tooth Length vs Dose Amount, when different supplement type is shown by different color. It seems there is a trend that higher dose amount result in higher tooth length.
g <- ggplot(ToothGrowth, aes(dose,len))
g+geom_point(aes(color=supp)) +ggtitle("Tooth Length vs Dose Amount") + xlab("Dose Amount") + ylab("Tooth Length")
Show a boxplot of Tooth Length vs Dose Amount by Supplement Type. It seems tooth length is higher when dose amount is higher for both delivery method.
g <- ggplot(ToothGrowth, aes(dose,len))
g + geom_boxplot(aes(fill=dose)) + facet_grid(.~supp) + xlab("Dose Amount") + ylab("Tooth Length") + ggtitle("Tooth Length vs Dose Amount by Supplement Type")
Since we already see the trend, perform two sample t-test for each two group to proof it. The type I error we want to achieve is less than \(\alpha=0.05\).
t.test(len~dose,data=filter(ToothGrowth, supp=="OJ"& (dose=="0.5" | dose=="1")))$p.value
## Warning: package 'bindrcpp' was built under R version 3.4.3
## [1] 8.784919e-05
t.test(len~dose,data=filter(ToothGrowth, supp=="OJ"& (dose=="1" | dose=="2")))$p.value
## [1] 0.03919514
t.test(len~dose,data=filter(ToothGrowth, supp=="VC"& (dose=="0.5" | dose=="1")))$p.value
## [1] 6.811018e-07
t.test(len~dose,data=filter(ToothGrowth, supp=="VC"& (dose=="1" | dose=="2")))$p.value
## [1] 9.155603e-05
All p-values are less than \(\alpha\), which means that dose amount has positive effect to tooth length, no matter which delivery method is used.
Similarly, show a boxplot of Tooth Length vs Supplement Type by Dose Amount.
g <- ggplot(ToothGrowth, aes(supp,len))
g + geom_boxplot(aes(fill=supp)) + facet_grid(.~dose) + xlab("Supplement Type") + ylab("Tooth Length") + ggtitle("Tooth Length vs Supplement Type by Dose Amount")
It seems that supplement type does not influence tooth length when dose amount is 2 mg/day. For the rest dose level, we need perform further test.
t.test(len~supp,data=filter(ToothGrowth, dose=="0.5"))$p.value
## [1] 0.006358607
t.test(len~supp,data=filter(ToothGrowth, dose=="1"))$p.value
## [1] 0.001038376
t.test(len~supp,data=filter(ToothGrowth, dose=="2"))$p.value
## [1] 0.9638516
According to the p-values, the difference of effect between two supplement types is not significant when dose amount is 2 mg/day. However, when dose amount is relatively low (0.5 or 1 mg/day), Orange Juice results in higher tooth length than VC.