We’re going to analyze the ToothGrowth
data in the R datasets package.
According to the description of the help article of the dataset, this contains:
- The length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
data("ToothGrowth")
Let’s have a look to the ToothGrowth
data
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
We have a data frame with 60 observations on 3 variables:
1. len: Contains the tooth length with numeric values
- Mean: 18.81
- Standard deviation: 7.65
2. supp: Two types of supplements with class factor
- VC (ascorbic acid) with 30 observations
- OJ (orange juice) with 30 observations
3. dose: Dose of Vitamin C (.5, 1 and 2 miligrams) with 20 observations of each dose
table(ToothGrowth$dose, ToothGrowth$supp)
##
## OJ VC
## 0.5 10 10
## 1 10 10
## 2 10 10
There are 6 groups of 10 observations each depending on the dose and the supplement type.
library(ggplot2)
ggplot(ToothGrowth, aes(x=dose, y=len)) +
ggtitle("Length vs Dose per supplement") +
xlab("Dose") + ylab("Length") +
geom_boxplot(aes(fill=factor(dose))) +
geom_jitter() +
facet_grid(.~supp)
Based on the previous graph, it seems that:
1. The supplement orange juice (OJ) is more effetive than ascorbic acid (VC) for .5 and 1 dose
2. The lenght for dose 2 is independent of the supplement
1.Hypothesis:
- Ho (null hypothesis): for .5 dose, mean(OJ) = mean(VC)
- Ha (alternativve hypothesis): for .5 dose, mean(OJ) != mean(VC)
Null model:
t.test(len ~ supp, paired=FALSE, data=ToothGrowth[ToothGrowth$dose=="0.5",])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
Ho has to be rejected as the p-value (.006) is realy low and t>3!!. This tells me that the type of supplement affects clearly to the length of the tooth.
2.Hypothesis:
- Ho (null hypothesis): for 1 dose, mean(OJ) = mean(VC)
- Ha (alternativve hypothesis): for 1 dose, mean(OJ) != mean(VC)
Null model:
t.test(len ~ supp, paired=FALSE, data=ToothGrowth[ToothGrowth$dose=="1",])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
Ho has to be rejected as the p-value (.001) is realy low and t>4!!. This tells me that the type of supplement affects clearly to the length of the tooth.
3.Hypothesis:
- Ho (null hypothesis): for 2 dose, mean(OJ) = mean(VC)
- Ha (alternativve hypothesis): for 2 dose, mean(OJ) != mean(VC)
Null model:
t.test(len ~ supp, paired=FALSE, data=ToothGrowth[ToothGrowth$dose=="2",])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
Ho can not be rejected as the 95% confidence interval includes 0 and the p-value is very high. This means that the null hypothesis stays valid. For dose = 2, length is independent of the type of supplement.