Now in the second portion of the project, we’re going to analyze the
ToothGrowth data in the R datasets
package.
library(datasets)
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The data is not paired, so all the tests will be performed for unpaired data. There is no information about the variance in the two groups, hence the variance will not be assumed equal.
The null hypothsesis that we test for is that there is no effect of the delivery method on the tooth grow, i.e. \(H_0 : \mu_1 - \mu_2 = 0\). Both groups have 30 samples with the difference in means \(3.7\)
The \(p\)-value, corresponding to the probabilty of observing such an extreme value (3.7) under the null hypothesis assumption, i.e. under the assumption that means of two groups are equal is \(0.061\).
t <- t.test(len ~ supp, paired = FALSE, var.equal = FALSE, conf.level = 0.95, data = ToothGrowth)
t$p.value
## [1] 0.06063451
However, for a \(95%\) confidence level, the corresponding confidence interval issued by the \(t\) test does contain \(0\).
t$conf.int
## [1] -0.1710156 7.5710156
## attr(,"conf.level")
## [1] 0.95
Hence, for a confidence level of \(95\%\) we fail to reject the null hypothesis of no effect of the delivery method on the tooth grow. However, for a \(90\%\) confidence level, the corresponding confidence interval does not contain \(0\) (see next cell), hence at this confidence level the hypothesis would be rejected.
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, conf.level = 0.90, data = ToothGrowth)$conf.int
## [1] 0.4682687 6.9317313
## attr(,"conf.level")
## [1] 0.9
We consider the groups separated by the dose and check if there is statistical difference in the tooth growth effect between the two delivery methods, at equal dosages.
The t-test is performed for the two groups, corresponding to a low dose of 0.5. In this case the \(95%\) confidence interval does not contain zero, hence for a confidence level of \(95\%\) we reject the null hypothesis of no effect of the delivery method on the tooth grow for dose equal to 0.5.
df<- split(ToothGrowth, ToothGrowth$dose)
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, conf.level = 0.95, data = df$`0.5`)$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
The t-test is performed for the two groups, corresponding to a medium dose of 1. In this case the \(95%\) confidence interval does not contain zero, hence for a confidence level of \(95\%\) we reject the null hypothesis of no effect of the delivery method on the tooth grow for dose equal to 1
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, conf.level = 0.95, data = df$`1`)$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
The t-test is performed for the two groups, corresponding to a high dose of 2. In this case the \(95%\) confidence interval does contain zero, hence for a confidence level of \(95\%\) we fail to reject the null hypothesis of no effect of the delivery method on the tooth grow for dose equal to 2
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, conf.level = 0.95, data = df$`2`)$conf.int
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
Considering the full sample, i.e. not accounting for the strata induced by the dosage, we fail to reject the null hypothesis of no effect of the delivery method on the tooth grow. However, more granularity, i.e. accounting for dosages can explain better the data and even in this case, while we fail to reject the null hypothesis at a 95% confidence level, this hypothesis would be rejected at 90% confidence level.
An effect of the delivery method can be observed for both the low and medium doses (0.5 and 1). In those cases, the null hypothesis of no effect of the delivery method on the tooth grow is rejected.
For high dose, we fail to reject the the null hyothesis.
This indicates that for low levels of vitamine C, the delivery method has effects in tooth growth.
It was assumed that the differences between groups is not equal. It was also assumed that the data comes from a popoulation with an underlying bell shaped distribution (i.e. not skewed).