This document analyzes the ToothGrowth data in the R datasets package. Following a brief summary and exploratory data analyses of the package, tooth growth is compared by supp and dose.
library(datasets)
data(ToothGrowth)
Here is the data:
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
tail(ToothGrowth)
## len supp dose
## 55 24.8 OJ 2
## 56 30.9 OJ 2
## 57 26.4 OJ 2
## 58 27.3 OJ 2
## 59 29.4 OJ 2
## 60 23.0 OJ 2
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_jitter(width = 0.3) +
facet_grid(. ~ supp) +
labs(title = "ToothGrowth by OJ and VC", x = "Dose", y = "Length")
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot() +
facet_grid(. ~ supp) +
labs(title = "ToothGrowth by OJ and VC", x = "Dose", y = "Length")
It appears that OJ tends to produce longer lengths in doses 0.5 and 1, while dose 2 in VC tends to be more sporadic.
Test Whole By Supp:
t.test(ToothGrowth$len ~ ToothGrowth$supp)$conf.int
## [1] -0.1710156 7.5710156
## attr(,"conf.level")
## [1] 0.95
Since this confidence interval contains 0, it’s possible that the population means of the lengths by supp are equal.
dose.5 <- subset(ToothGrowth, dose == 0.5)
dose1 <- subset(ToothGrowth, dose == 1.0)
dose2 <- subset(ToothGrowth, dose == 2.0)
t.test(dose.5$len ~ dose.5$supp)$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
It is not possible that these two population means are equal.
t.test(dose1$len ~ dose1$supp)$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
It is not possible that these two population means are equal.
t.test(dose2$len ~ dose2$supp)$conf.int
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
These two population means could be equal.
This analysis has shown that there is a 95% confidence rate that dose 0.5 and dose 1.0 produce longer tooth length in OJ, whereas there is no statistically significant advantage to either supp in dose 2. Therefore, the OJ supp is the more effective option.