Document provides inferential analysis of the effect of vitamin C on tooth growth in guinea pigs compared by delivery methods and dose levels of the vitamin. The analysis tasks are:

The code used for analysis and plotting data is shown in Appendix section

Explored Data Description

The ToothGrowth data set used for inferential analysis is contained in the datasets package and can be loaded using data("ToothGrowth") command. It represents the odontobiasts (cells responsible for tooth growth) length (represented by len variable) in 60 unique guinea pigs. Each animal received one of three dose levels of vitamin C in mg/day (represented by dose variable):

unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0

Vitamin received by one of two delivery methods (represented by supp variable):

unique(ToothGrowth$supp)
## [1] VC OJ
## Levels: OJ VC

where OJ level stands for orange juice and VC — for ascorbic acid.

Inferential Analysis

Let’s look at the tooth length values distribution (see Appendix 1. Tooth Length Distribution Plot).

It has mean 18.81 and standard deviation 7.65. Since the distribution is approximately normal and have rather small observations quantity (only 60 total) we’ll use Student’s t-test to perform hypothesis tests.

Delivery Method Hypothesis Test

We’ll try to determine whether there is a dependency between tooth length and delivery method. Let’s look how tooth length distributed among delivery methods first (see Appendix 2. Tooth Length by Delivery Method Plot).

Assume delivery methods means are equal, that is our null hypothesis \(H_0 : \mu_{OJ} = \mu_{VC}\) (i.e. suppose that there is no dependency between tooth length and delivery method). We’re going to test alternative hypothesis that delivery methods means are not equal, that is \(H_a : \mu_{OJ} \neq \mu_{VC}\) (i.e. check whether there is a dependency between tooth length and delivery method). To do that we’ll perform a two-sided unpaired t-test with type I error rate \(\alpha = 0.05\) and unequal variances:

## Since our 'Ha : mu_OJ != mu_VC' (i.e. not '<' or '>') there is no need to
## choose which mean should be subtracted from other, so arguments order
## doesn't matter
t.test(
    x = ToothGrowth$len[ToothGrowth$supp == "OJ"],
    y = ToothGrowth$len[ToothGrowth$supp == "VC"]
)
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

So we get 1.915 \(t\) quantile, and the probability of such case is 0.061 which is greater than our \(\alpha\) (0.05). Hence we fail to reject the \(H_0\) and conclude that there is no strong dependency between tooth length and delivery method.

Dose Level of Vitamin C Hypothesis Tests

Now we’re going to determine whether there is a dependency between tooth length and dose level of vitamin C. Let’s look how tooth length distributed among dose levels (see Appendix 3. Tooth Length by Vitamin C Dose Level Plot).

We’re now going to test all possible paired combinations in amount of

\[{{n} \choose {k}} = {{3} \choose {2}} = \frac{3!}{2! \times (3 - 2)!} = 3\]

namely:

  1. \(H_0 : \mu_{1.0} = \mu_{0.5}\) vs. \(H_a : \mu_{1.0} > \mu_{0.5}\)

  2. \(H_0 : \mu_{2.0} = \mu_{0.5}\) vs. \(H_a : \mu_{2.0} > \mu_{0.5}\)

  3. \(H_0 : \mu_{2.0} = \mu_{1.0}\) vs. \(H_a : \mu_{2.0} > \mu_{1.0}\)

For each combination we’re going to perform one-sided unpaired t-test with type I error rate \(\alpha = 0.05\) and unequal variances (see Appendix 4. Dose Level of Vitamin C Hypothesis Tests). As result we reject the \(H_0\) for all three hypothesis tests (since we have 6.477 t-quantile and \(6.3415036\times 10^{-8}\) p-value for the first test, 11.799 t-quantile and \(2.1987625\times 10^{-14}\) p-value for the second test, 4.9 t-quantile and \(9.5321476\times 10^{-6}\) p-value for the third test). By rejecting all null hypotheses we conclude that there is a dependency between tooth length and dose level of vitamin C.

Conclusion

By applying t-tests to approximately normal distributed data we found out that there is no strong dependency between tooth length and delivery method, but there is a dependency between tooth length and dose level of vitamin C. Since we rejected all three null hypotheses related to dose level, it’s clearly visible that bigger vitamin C dose results in bigger tooth length (but only within provided for investigation dose levels, namely 0.5, 1.0 and 2.0 mg/day).

Appendix

1. Tooth Length Distribution Plot

library(ggplot2)
library(dplyr)

ggplot() +
    geom_histogram(
        mapping = aes(x = ToothGrowth$len, y = ..density..),
        color = "gray50",
        bins = unique(ToothGrowth$len) %>% length()
    ) +
    stat_function(
        mapping = aes(x = ToothGrowth$len),
        size = 2,
        fun = dnorm,
        args = list(mean = mean(ToothGrowth$len), sd = sd(ToothGrowth$len))
    ) +
    geom_vline(
        mapping = aes(
            xintercept = c(
                mean(ToothGrowth$len),
                mean(ToothGrowth$len) + sd(ToothGrowth$len)
            ),
            color = c("mean", "std. deviation")
        )
    ) +
    scale_color_manual(name = "", values = c("blue", "red")) +
    geom_vline(
        color = "red",
        xintercept = mean(ToothGrowth$len) - sd(ToothGrowth$len)
    ) +
    labs(
        title = "Tooth Length Values Distribution",
        x = "tooth length"
    )

2. Tooth Length by Delivery Method Plot

ggplot(
    data = ToothGrowth,
    mapping = aes(x = supp, y = len, color = supp, fill = supp)
) +
    geom_boxplot(alpha = 0.2) +
    geom_jitter(width = 0.05) +
    scale_x_discrete(labels = c("orange juice (OJ)", "ascorbic acid (VC)")) +
    labs(
        title = "Tooth Length by Delivery Method",
        x = "delivery method",
        y = "tooth length"
    ) +
    theme(legend.position = "none")

3. Tooth Length by Vitamin C Dose Level Plot

library(magrittr)

ToothGrowth$dose %<>% as.factor()

ggplot(
    data = ToothGrowth,
    mapping = aes(x = dose, y = len, color = dose, fill = dose)
) +
    geom_boxplot(alpha = 0.2) +
    geom_jitter(width = 0.05) +
    labs(
        title = "Tooth Length by Vitamin C Dose Level",
        y = "tooth length"
    ) +
    theme(legend.position = "none")

4. Dose Level of Vitamin C Hypothesis Tests

t.test(
    x = ToothGrowth$len[ToothGrowth$dose == "1"],
    y = ToothGrowth$len[ToothGrowth$dose == "0.5"],
    alternative = "greater"
)
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == "1"] and ToothGrowth$len[ToothGrowth$dose == "0.5"]
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  6.753323      Inf
## sample estimates:
## mean of x mean of y 
##    19.735    10.605
t.test(
    x = ToothGrowth$len[ToothGrowth$dose == "2"],
    y = ToothGrowth$len[ToothGrowth$dose == "0.5"],
    alternative = "greater"
)
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == "2"] and ToothGrowth$len[ToothGrowth$dose == "0.5"]
## t = 11.799, df = 36.883, p-value = 2.199e-14
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  13.27926      Inf
## sample estimates:
## mean of x mean of y 
##    26.100    10.605
t.test(
    x = ToothGrowth$len[ToothGrowth$dose == "2"],
    y = ToothGrowth$len[ToothGrowth$dose == "1"],
    alternative = "greater"
)
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == "2"] and ToothGrowth$len[ToothGrowth$dose == "1"]
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  4.17387     Inf
## sample estimates:
## mean of x mean of y 
##    26.100    19.735