The second part of the project will illustrate the basic inferential data analysis using the ToothGrowth data in the R datasets package. Such analysis contains the basic exploratory data analyses, hypothesis tests of the data, and conlusions and assumptions needed for the conclusions.
data <- datasets::ToothGrowth
str(data)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(data)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(data)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
table(data$supp, data$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
g1 <- ggplot(data, aes(x = supp, y = len, fill = supp))
g1 <- g1 + geom_boxplot()
g1
The above plot shows that the tooth length is different with respect to supplement OJ and VC. The tooth length under OJ seems longer than that of VC. A hypothesis test will be conducted to explore this relationship.
g2 <- ggplot(data, aes(x = as.factor(dose), y = len, fill = dose))
g2 <- g2 + geom_boxplot() + xlab("dose")
g2
The above plot shows that the tooth length is different with respect to dose amount of 0.5, 1, and 2. The tooth length under dose 2 seems to be the longest, followed by dose 1. While the tooth length under dose 0.5 seems to be the shortest. Three hypothesis tests will be conducted to explore these relationships.
hypoTest1 <- t.test(len ~ supp, data = data)
hypoTest1
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Since the p-value, 0.0606345, is greater than the \(\alpha\) level, we failed to reject the null hypothesis.
dose05_2 <- subset(data, data$dose %in% c(0.5, 2))
hypoTest2 <- t.test(len ~ dose, data = dose05_2)
hypoTest2
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
Since the p-value, 4.39752510^{-14}, is smaller than the \(\alpha\) level, we will reject the null hypothesis.
dose1_2 <- subset(data, data$dose %in% c(1, 2))
hypoTest3 <- t.test(len ~ dose, data = dose1_2)
hypoTest3
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Since the p-value, 1.906429510^{-5}, is smaller than the \(\alpha\) level, we will reject the null hypothesis.
dose05_1 <- subset(data, data$dose %in% c(0.5, 1))
hypoTest4 <- t.test(len ~ dose, data = dose05_1)
hypoTest4
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
Since the p-value, 1.268300710^{-7}, is smaller than the \(\alpha\) level, we will reject the null hypothesis.
In the case of comparing supplement OJ and VC, there is no significant difference in tooth length because the obtained p-value is greater than the \(\alpha\) level 5%. And we failed to reject the null hypothesis. This indicates different supplements have no impacts on tooth length.
In the case of comparing dose amount of 0.5, 1, and 2, there is significant difference in tooth length since the obtained p-values are smaller than the \(\alpha\) level 5%. And we will reject the null hypothesis and favor the alternative hypothesis. These indicate higher amount of dose having longer tooth length.