The ToothGrowth data from the datasets library contains data on the length response of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (coded as OJ) or ascorbic acid (a form of vitamin C and coded as VC).

Sys.setlocale("LC_TIME", "English")
library(datasets)
library(ggplot2)
library(dplyr)

We firs inspect the data. it has 60 rows and 3 variables:

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
table(ToothGrowth$supp, ToothGrowth$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

Each supp/dose combination was tested on 10 guinea pigs.

Compare Tooth Growth by SUPP and DOSE

Let us now consider the effect of the dose and the delivery method on the length variable. Figure 1 shows six box plots of the len variable comparing the two delivery methods in each dose level. Just from looking at the box-plots, we see draw some conclusions

ggplot(data=ToothGrowth, mapping = aes(y=len, colour = supp)) + 
  geom_boxplot() + 
  facet_grid(.~dose) +
  theme_bw() + 
  labs(title = "Figure 1: Boxplot of the length variable for \neach delivery method/dose level pair")

We store some statistical values on the box-plots together with the mean and sample standard deviation and sample length in the data frame summ. And for each dose-supp pair, we also calculate the .95% t-confidence interval for the respected mean.

summ <- ToothGrowth %>% 
  summarize(min = min(len),
            median = median(len),
            mean = mean(len),
            max = max(len),
            std = sd(len),
            n = length(len),
            .by = c(dose, supp)) %>% 
              mutate(stderr = std/sqrt(n),
                     t_score = qt(.975, n-1),
                     ci_lower = mean - t_score*stderr,
                     ci_upper = mean + t_score*stderr)
summ[order(summ$dose),]

We see that the two confidence intervals in case of dose = 2 are overlapping, and the sample means are near to each other.It is worth performing a hypothesis test to give a statistical proof whether in such big dosage the delivery methods lead to the same mean growth length.

\(H_0\): In case of dose = 2 the two means are equal: \[E(len | supp=VC, dose=2) = E(len | supp=OJ, dose=2)\].

\(H_a\): In case of dose = 2 the two means are not equal: \[E(len | supp=VC, dose=2) > E(len | supp=OJ, dose=2)\].

We perform a t-test assuming not equal variances.

with(ToothGrowth[ToothGrowth$dose == 2.0,], 
     t.test(len[supp=="VC"], len[supp=="OJ"], alternative = "greater"))
## 
##  Welch Two Sample t-test
## 
## data:  len[supp == "VC"] and len[supp == "OJ"]
## t = 0.046136, df = 14.04, p-value = 0.4819
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -2.9735     Inf
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

Because the confidence interval contains 0 and the p-value is 0.4 > 0.05, we fail to reject our \(H_0\). This means that based on the data in case of dose = 2 the delivery methods lead to the same mean growth length.

We also test whether in case of the orange juice delivery method the dose is relevant.

\(H_0\): In case of supp = OJ the means when dose = 1 and dose = 2 are equal: \[E(len | supp=OJ, dose=2) = E(len | supp=OJ, dose=1)\].

\(H_a\): In case of supp = OJ the means when dose = 1 and dose = 2 are not equal: \[E(len | supp=OJ, dose=2) > E(len | supp=OJ, dose=1)\].

Again, we perform a t-test assuming not equal variances.

oj_data <- subset(ToothGrowth, supp == "OJ" & dose %in% c(1, 2))
oj_data$dose <- factor(oj_data$dose, levels = c(2, 1))

t.test(len ~ dose, data = oj_data, alternative = "greater")
## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = 2.2478, df = 15.842, p-value = 0.0196
## alternative hypothesis: true difference in means between group 2 and group 1 is greater than 0
## 95 percent confidence interval:
##  0.7486236       Inf
## sample estimates:
## mean in group 2 mean in group 1 
##           26.06           22.70

Because the confidence interval does not contain 0, and the p-value less tan 0.05, we reject \(H_0\). This means that the difference in the mean growth length in case of dose 1 and 2 are significant.

So we can conclude that in case of both delivery methods higher dosage leads to greater tooth growth length. Comparing the two delivery methods, as height dosage as 2 does not result in a statistically difference in growth length. However, for lower dosage, orange juice seems more effective than ascorbic acid.