We will look at the dataset ToothGrowth which shows the response in the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
First a quick look at the data:
glimpse(ToothGrowth)
## Observations: 60
## Variables:
## $ len (dbl) 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16....
## $ supp (fctr) VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, ...
## $ dose (dbl) 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1....
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
levels(ToothGrowth$supp)
## [1] "OJ" "VC"
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
We can see that this dataset contains 60 observations on 3 variables :
The dose can only take 3 values, so we can convert it to a factor:
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Now the dataset is ready for further analysis.
An interesting view would be to display the relation between the tooth length and the dose for each supplement:
g <- ggplot(ToothGrowth, aes(x = dose, y = len, fill = supp))
g <- g + geom_boxplot(aes(x = dose))
g <- g + labs(title="Effect of supplement on tooth growth",
x="Dose (milligrams)",
y="Tooth length")
g <- g + scale_fill_discrete(name="Supplement",
labels=c('Orange Juice', 'Vitamin C'))
g <- g + facet_grid(. ~ supp)
g
We can draw the following observations from this plot:
Let’s now look at the sample used to collect this data, and see if some assumptions we could draw by looking at the data are correct.
Let’s first try to determin if the delivery method has an impact on the results. Let \(H_{0}\) be the null hypothesis that it has no impact:
\[ H_{0} : \mu _{oj} = \mu _{vc} \]
To compare the impact of each supplement, we can perform an independant group T test.
oj <- ToothGrowth[ToothGrowth$supp == 'OJ',]$len
vc <- ToothGrowth[ToothGrowth$supp == 'VC',]$len
Let’s look at the variance of the 2 groups:
var(oj)
## [1] 43.63344
var(vc)
## [1] 68.32723
The difference is not negligible, so we’ll assume the 2 groups have unequal variances for the T test.
t.test(oj, vc, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: oj and vc
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
The 95% confidence interval [-0.1710156 7.5710156] contains 0, which means that we can not really tell with this sample if the delivery method is relevant to the impact.
The P-value is 0.06063 which is too high at a significant level of 5%, we can’t exclude our null hypothesis \(H_{0}\), so it’s possible, given this sample, that the delivery method has no effect on the teeth growth.
Conclusion: the delivery method has no proven impact on the teeth growth.
Now we can test if the dose level has an impact on the results. We will test the impact of 1mg versus 0.5mg, and 2mg versus 1mg.
Our null hypothesis \(H_{0}\) is that increasing the dose level has no impact:
\[ H_{0} : \mu_{0.5mg} = \mu_{1mg} \]
Let’s do another independant group T test. The two groups we are looking at are no the ones who got 0.5mg versus those who got 1mg.
mg05 <- ToothGrowth[ToothGrowth$dose == 0.5,]$len
mg10 <- ToothGrowth[ToothGrowth$dose == 1.0,]$len
Let’s calculate the variances:
var(mg05)
## [1] 20.24787
var(mg10)
## [1] 19.49608
The variance are relatively similar, we can assume equality of variance.
t.test(mg05, mg10, paired=FALSE, var.equal=TRUE)
##
## Two Sample t-test
##
## data: mg05 and mg10
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983748 -6.276252
## sample estimates:
## mean of x mean of y
## 10.605 19.735
The P-value is almost 0, this means that we can reject the null hypothesis that the dose level increase between 0.5mg and 1mg has no impact. Similarely, the 95% confidence doesn’t include 0, so there is more than 95% chance that the dose level impacts the results.
Conclusion: increasing the dose level from 0.5mg to 1mg has an impact on the teeth growth with 95% confidence.
Our null hypothesis \(H_{0}\) is that increasing the dose level from 1mg to 2mg has no impact:
\[ H_{0} : \mu_{1mg} = \mu_{2mg} \]
Let’s do another independant group T test. The two groups we are looking at are no the ones who got 2mg versus those who got 1mg (we already extracted thenm in the previous analysis).
mg20 <- ToothGrowth[ToothGrowth$dose == 2.0,]$len
Let’s calculate the variances:
var(mg10)
## [1] 19.49608
var(mg20)
## [1] 14.24421
The variance are not similar, we will assume unequality of variance.
t.test(mg10, mg20, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: mg10 and mg20
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100
The P-value is almost 0, this means that we can reject the null hypothesis that the dose level increase between 0.5mg and 1mg has no impact. Similarely, the 95% confidence doesn’t include 0, so there is more than 95% chance that the dose level impacts the results.
Conclusion: increasing the dosis from 1mg to 2mg has an impact on the teeth growth with 95% confidence.
To summarize the result of the 3 T-tests performed on the given sample: