For this second part of the project we’re going to explore the ToothGrowth dataset in R and use hypothesis test to compare tooth growth in guinea pigs by supplement and dose.
Some useful information regarding the ToothGrowth data can be obtained directly from the R help (?ToothGrowth). 60 guinea pigs received one of three dose levels (0.5, 1.0, and 2.0 mg/day) of vitamin C, by one of two delivery methods, orange juice (OJ) or ascorbic acid (coded as VC).
library(ggplot2)
data("ToothGrowth")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
plot1 <- ggplot(ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
geom_boxplot() +
xlab("Dose (mg/day)") +
ylab("Tooth length") +
guides(fill=guide_legend(title="Supplement type")) +
labs(title="Tooth Growth in Guinea Pigs")
print(plot1)
We can see that tooth length seems to increse for bigger doses, both for VC and OJ supplements.
Let’s now use t-test to compare tooth growth by supplement and by dose. For this evaluation we assume that the sample size is representative of the population of guinea pigs, and that the variance of the different groups compared is not equal.
The formula for t-test is as follow:
\[\ t = \bar{X_1} - \bar{X_2} / \sqrt{S_1^2/N_1 + S_2^2/N_2}\]
in R, we use t.test. First, we evaluate tooth length by supplement:
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Here p-value=0.06063 is bigger than the threshold of 0.05, and the confidence interval contains zero. This means that we can not reject the null hypothesis that different supplements have no impact on tooth growth.
Now, lets apply t-test for the three different pair of doses.
dose_05_10 <-subset(ToothGrowth, dose %in% c(0.5,1.0))
t.test(len ~ dose, data = dose_05_10)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
dose_05_20 <-subset(ToothGrowth, dose %in% c(0.5,2.0))
t.test(len ~ dose, data = dose_05_20)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
dose_10_20 <-subset(ToothGrowth, dose %in% c(1.0,2.0))
t.test(len ~ dose, data = dose_10_20)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
In all three cases p-value is very small and the confidence interval does not contains zero; therefore, we can reject the null hypothesis and determine that the dose affects the tooth length.
Based on the results previously exposed, we can conclude that: