We’ll start directly by loading the data and understanding its structure
data("ToothGrowth")
glimpse(ToothGrowth)
## Observations: 60
## Variables: 3
## $ len <dbl> 4.2, 11.5, 7.3, 5.8, 6.4, 10.0, 11.2, 11.2, 5.2, 7.0, 16....
## $ supp <fctr> VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, VC, ...
## $ dose <dbl> 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.0, 1....
table(ToothGrowth$dose)
##
## 0.5 1 2
## 20 20 20
levels(ToothGrowth$supp)
## [1] "OJ" "VC"
Dose has three values, which will serve us better as a factor.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Let’s analyze the distribution of length of odontoblasts, for different values of dose levels.
doselen <- ggplot(ToothGrowth, aes(x=dose, y=len))
doselen <- doselen + geom_violin(aes(color=dose), trim = F) + ggtitle("Length of odontoblasts vs. Dose Level") +
theme(plot.title = element_text(hjust = 0.5)) + xlab("Vitamin C Dose level") + ylab("Length of odontoblasts")
doselen
Looking at the distribution, one can see that the length of odontoblasts increases as the dose increases. Now let’s include the Delivery methods into our analysis.
doselensupp <- doselen + facet_wrap(~supp)+ ggtitle("Length of odontoblasts vs. Dose Level with Delivery methods") +
theme(plot.title = element_text(hjust = 0.5)) + xlab("Vitamin C Dose level") + ylab("Length of odontoblasts")
doselensupp
Here one can see for example that at dose level 2, the Length of odontoblasts is much more stable with OJ (Orange Juice) , while with VC the values are on average higher but less stable. Dose level 0.5 and 1 have different characteristics as well.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
Let’s check which delivery method has the better mean/max length of odontoblasts. We’ll also check which has higher variance.
ToothGrowth %>% group_by(supp) %>% summarize(mean_length = mean(len))
## # A tibble: 2 × 2
## supp mean_length
## <fctr> <dbl>
## 1 OJ 20.66333
## 2 VC 16.96333
ToothGrowth %>% group_by(supp) %>% summarize(max_length = max(len))
## # A tibble: 2 × 2
## supp max_length
## <fctr> <dbl>
## 1 OJ 30.9
## 2 VC 33.9
ToothGrowth %>% group_by(supp) %>% summarize(var_length = var(len))
## # A tibble: 2 × 2
## supp var_length
## <fctr> <dbl>
## 1 OJ 43.63344
## 2 VC 68.32723
So OJ has higher average length of odontoblasts, but doesn’t have maximum value. At the same time, OJ has much less variance.
Now we’ll run multiple hypothesis tests with different values of supp and dose and later determine if we can reject the null hypothesis. We’ll start with different values of dose levels.
t.test(len ~ dose, paired = F, var.equal = F, data = subset(ToothGrowth, dose %in% c(0.5, 1)))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
t.test(len ~ dose, paired = F, var.equal = F, data = subset(ToothGrowth, dose %in% c(1, 2)))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
t.test(len ~ dose, paired = F, var.equal = F, data = subset(ToothGrowth, dose %in% c(0.5, 2)))
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
Looking at the confidence intervals and p-values for the 3 conducted t-tests, we can easily reject the null hypothesis. Now we turn our attention to supp values.
t.test(len ~ supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 0.5))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len ~ supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 1))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len ~ supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 2))
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
Here for dose level 2, we can’t reject the null hypothesis since the confidence interval contains 0.
Looking at the conducted t-tests, we can conclude that tooth growth is primarily affected by the dose level of Vitamin C. For these conclusions we assume that all data are independent of each other and that there are no other “latent” factors actually influencing the tooth growth masquerading as dose level.