Overview

This project aims to analyze the ToothGrowth dataset. According to the help document, ToothGrowth is a data frame with 60 observations on 3 variables. “len” is Tooth length, which is the response; “supp” is supplement type/delivery method (VC or OJ); “dose” is in milligrams/day, which has 3 levels (0.5, 1, and 2 mg/day). This project mainly compares tooth growth by supplement type and dose.

Load data and exploratory data analysis

First load the data set and library the package we want to use. Then take a glimps at the data.

library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.3
library(datasets)
data("ToothGrowth")
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Dose has 3 levels, but it is stored as numeric format. We need to transfer it to factor.

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Show a scatter plot of Tooth Length vs Dose Amount, when different supplement type is shown by different color. It seems there is a trend that higher dose amount result in higher tooth length.

g <- ggplot(ToothGrowth, aes(dose,len))
g+geom_point(aes(color=supp)) +ggtitle("Tooth Length vs Dose Amount") + xlab("Dose Amount") + ylab("Tooth Length") 

Inferential Analysis

Show a boxplot of Tooth Length vs Dose Amount by Supplement Type. It seems tooth length is higher when dose amount is higher for both delivery method.

g <- ggplot(ToothGrowth, aes(dose,len))
g + geom_boxplot(aes(fill=dose)) + facet_grid(.~supp) + xlab("Dose Amount") + ylab("Tooth Length") + ggtitle("Tooth Length vs Dose Amount by Supplement Type")

Since we already see the trend, perform two sample t-test for each two group to proof it. The type I error we want to achieve is less than \(\alpha=0.05\).

t.test(len~dose,data=filter(ToothGrowth, supp=="OJ"& (dose=="0.5" | dose=="1")))$p.value
## Warning: package 'bindrcpp' was built under R version 3.4.3
## [1] 8.784919e-05
t.test(len~dose,data=filter(ToothGrowth, supp=="OJ"& (dose=="1" | dose=="2")))$p.value
## [1] 0.03919514
t.test(len~dose,data=filter(ToothGrowth, supp=="VC"& (dose=="0.5" | dose=="1")))$p.value
## [1] 6.811018e-07
t.test(len~dose,data=filter(ToothGrowth, supp=="VC"& (dose=="1" | dose=="2")))$p.value
## [1] 9.155603e-05

All p-values are less than \(\alpha\), which means that dose amount has positive effect to tooth length, no matter which delivery method is used.

Similarly, show a boxplot of Tooth Length vs Supplement Type by Dose Amount.

g <- ggplot(ToothGrowth, aes(supp,len))
g + geom_boxplot(aes(fill=supp)) + facet_grid(.~dose) + xlab("Supplement Type") + ylab("Tooth Length") + ggtitle("Tooth Length vs Supplement Type by Dose Amount")

It seems that supplement type does not influence tooth length when dose amount is 2 mg/day. For the rest dose level, we need perform further test.

t.test(len~supp,data=filter(ToothGrowth, dose=="0.5"))$p.value
## [1] 0.006358607
t.test(len~supp,data=filter(ToothGrowth, dose=="1"))$p.value
## [1] 0.001038376
t.test(len~supp,data=filter(ToothGrowth, dose=="2"))$p.value
## [1] 0.9638516

According to the p-values, the difference of effect between two supplement types is not significant when dose amount is 2 mg/day. However, when dose amount is relatively low (0.5 or 1 mg/day), Orange Juice results in higher tooth length than VC.

Conclusions

  1. Dose amount has positive effect to tooth length, no matter which delivery method is used.\
  2. The difference of effect between two supplement types is not significant when dose amount is 2 mg/day.\
  3. When dose amount is relatively low (0.5 or 1 mg/day), Orange Juice results in higher tooth length than VC.