In this second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package. (Will minimize space as much as possible to stay within 3 pages)
library(ggplot2)
library(datasets)
library(data.table)
library(gridExtra)
# load the ToothGrowth dataset
tg <- data.table(ToothGrowth)
# perform exploratory data anlysis
head(tg)[1:5]; tail(tg)[1:5]
## len supp dose
## 1: 4.2 VC 0.5
## 2: 11.5 VC 0.5
## 3: 7.3 VC 0.5
## 4: 5.8 VC 0.5
## 5: 6.4 VC 0.5
## len supp dose
## 1: 24.8 OJ 2
## 2: 30.9 OJ 2
## 3: 26.4 OJ 2
## 4: 27.3 OJ 2
## 5: 29.4 OJ 2
There appears to be three variables: length, supplement, and dose
unique(tg$len) # identify unique values for growth length
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 5.2 7.0 16.5 15.2 17.3 22.5 13.6
## [15] 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 17.6
## [29] 9.7 8.2 9.4 19.7 20.0 25.2 25.8 21.2 27.3 22.4 24.5 24.8 30.9 29.4
## [43] 23.0
unique(tg$supp) # identify unique values for supplements
## [1] VC OJ
## Levels: OJ VC
unique(tg$dose) # identify unique values for dose
## [1] 0.5 1.0 2.0
summary(tg)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
p<- ggplot(data=tg, aes(x=supp,y=len))+ geom_boxplot(aes(color=factor(supp)))
p + facet_grid(.~dose) + labs(x="Supplement", y="Tooth Length") + ggtitle("Tooth Growth by Supplement Type and Dosage")
Based upon the analysis, it appears that larger doses contribute to longer tooth growth. However, at the lower dosage levels (0.5 and 1), growth appears to be less with VC, but yet, at dosage level 2, VC has a broader variation in growth amount than OJ.
Significance level Pvalue=0.05. Will conduct an overall test first, then one for each dosage level.
# hypothesis 1: tooth growth is the same for both supplements
h_1 <- t.test(len ~ supp, data=tg)
h_1$conf.int
## [1] -0.1710156 7.5710156
## attr(,"conf.level")
## [1] 0.95
h_1$p.value
## [1] 0.06063451
The P value =0.061 which is greater than the .05 significance level. Therefore, we cannot reject the null hypothesis which states the tooth growth is the same for both supplements.
#hypothesis 2: tooth growth is the same at dosage level 0.5
h_2 <- t.test(len ~supp, data=subset(tg, dose ==0.5))
h_2$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
h_2$p.value
## [1] 0.006358607
The P value =0.006 which is less than the .05 significance level. Therefore, we reject the null hypothesis which states the tooth growth is the same for both supplements. As shown in the graph above, VC growth is less at the 0.5 dosage level than it is for OJ.
#hypothesis 3: tooth growth is the same at dosage level 1
h_3 <- t.test(len ~supp, data=subset(tg, dose == 1))
h_3$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
h_3$p.value
## [1] 0.001038376
The P value =0.001 which is less than the .05 significance level. Therefore, we reject the null hypothesis which states the tooth growth is the same for both supplements. As shown in the graph above, VC growth is less at the level 1 dosage level than it is for OJ.
#hypothesis 4: tooth growth is the same at dosage level 2
h_4 <- t.test(len ~supp, data=subset(tg, dose == 2))
h_4$conf.int
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
h_4$p.value
## [1] 0.9638516
The P value =0.963 which is greater than the .05 significance level. Therefore, we cannot reject the null hypothesis which states the tooth growth is the same for both supplements at the level 2 dosage.
Tooth growth varies by dosage amount. OJ contributes to greater growth at the 0.5 and 1 dosage levels. However, at the dosage level 2, the growth appears to be equal between VC and OJ.