ToothGrowth Data Analysis

Overview

This report includes an analysis of the ToothGrowth data (from R), including an exploratory data analysis and hypothesis testing. The analysis aims to answer the question of what effect different vitament C supplement types (orange juice and ascorbic acid) and different dose levels (0.5, 1, and 2 mg) have on tooth growth of guinea pigs.

Data Summary and Exploratory Analysis

The dataframe is loaded from R. It can be seen from the structure of the data set (shown below) that there are 60 observations, and the three variables are ‘len’, a numeric value for the length measurements, ‘supp’, a 2-level factor variable for the supplement type (OJ or VC), and ‘dose’, a numeric for the 3 dose levels (0.5, 1, and 2 mg). In the context of this analysis, it makes sense to convert ‘dose’ to a 3-level factor variable, for ease of comparisons.

library(datasets)
data("ToothGrowth")
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

Some plots are made to visualize the data, grouped by supplement type and dose level. All code for creating the plots is given in the appendix.

The figure below shows a scatter plot of the data, where an increasing trend in length with dose level is clear. However, it is less clear whether or not supplement type has any significant effect.

The trends and relative effects are seen more clearly in the following box plots. The plot on the left shows the data grouped only by the two supplement types, regardless of dose. The boxes have quite a bit of overlap, so it is not yet clear whether the supplement type has a significant effect on length. The plot on the right shows the data grouped by supplement type, for each distinct dose level. Here again we see the trend of increasing length with dose level, but also it seems that there may be significant difference between supplement types at a given dose level.

Hypothesis Test

In order to draw conclusions about what effect supplement type and dose size has on tooth growth, several different hypotheses were tested using t-tests to determine statistical significance. Results are summarized here, with full code and output for each t-test given in the appendix.

Hypothesis: Supplement type has no effect on tooth growth. (H0: \(\mu_{OJ} = \mu_{VC}\), HA: \(\mu_{OJ} \neq \mu_{VC}\))
- p-value: \(0.06\)
- 95% confidence interval: [\(-0.16701, 7.56701\)]

The p-value > 0.05, indicating we fail to reject the null hypothesis, and therefore supplement type may have no effect on tooth growth.

Hypothesis: Supplement type has no effect on tooth growth, at a dose level = 0.5 mg. (H0: \(\mu_{OJ, 0.5} = \mu_{VC, 0.5}\), HA: \(\mu_{OJ, 0.5} \neq \mu_{VC, 0.5}\))
- p-value: \(0.005\)
- 95% confidence interval: [\(1.77026, 8.72974\)]

The p-value < 0.05, indicating we can reject the null hypothesis, and therefore supplement type may have an effect on tooth growth at a dose level of 0.5 mg.

Hypothesis: Supplement type has no effect on tooth growth, at a dose level = 1 mg. (H0: \(\mu_{OJ, 1} = \mu_{VC, 1}\), HA: \(\mu_{OJ, 1} \neq \mu_{VC, 1}\))
- p-value: \(0.001\)
- 95% confidence interval: [\(2.84069, 9.01931\)]

The p-value < 0.05, indicating we can reject the null hypothesis, and therefore supplement type may have an effect on tooth growth at a dose level of 1 mg.

Hypothesis: Supplement type has no effect on tooth growth, at a dose level = 2 mg. (H0: \(\mu_{OJ, 2} = \mu_{VC, 2}\), HA: \(\mu_{OJ, 2} \neq \mu_{VC, 2}\))
- p-value: \(0.964\)
- 95% confidence interval: [\(-3.723, 3.563\)]

The p-value > 0.05, indicating we fail to reject the null hypothesis, and therefore supplement type may not have an effect on tooth growth at a dose level of 2 mg.

Hypothesis: Dose level has no effect on tooth growth. First test 0.5 to 1 mg change, then 1 mg to 2 mg change. (H0: \(\mu_{dose1} = \mu_{dose2}\), HA: \(\mu_{dose1} \neq \mu_{dose2}\))
- p-value, 0.5 to 1 mg: \(1.2683007\times 10^{-7}\) ; p-value, 1 to 2 mg: \(1.9064295\times 10^{-5}\)
- 95% confidence interval, 0.5 to 1 mg: [\(-11.98378, -6.27622\)]
- 95% confidence interval, 1 to 2 mg: [\(-8.99648, -3.73352\)]

For both cases, the change in dose level has p-value << 0.05, indicating we can reject the null hypothesis, and therefore dose level does have an effect on tooth growth.

Conclusions and Assumptions

Based on all the t-tests performed above, the following conclusions can be drawn:

Overall, when dose is not considered, the type of supplement used to deliver the vitamen C may not have any significant effect on tooth growth.
However, at specific dose levels of 0.5 mg and 1 mg, the type of supplement may actually have a significant effect on tooth growth (orange juice better than ascorbic acid). Though, at 2 mg, this effect no longer exists.
Different dose levels, regardless of supplement type, have a significant effect on tooth growth, in that larger dose levels correlate with increase in length.

The following assumptions were made in order to perform this data analysis:

It is assumed that the experiment was designed such that individual guinea pigs were randomly assigned to supplement types and dosage levels.
In order to use a t-test, especially for smaller sample sizes such as these, the samples must be roughly symmetic and have similar shape. This is the case as can be seen from the box plots of the samples. Even the samples that have some skew are not severely skewed.
The assumption of equal variance in the guinea pig population was used.

Appendix

Code for exploratory data analysis

Scatter plot code:

library(dplyr)
library(ggplot2)
g <- ggplot(ToothGrowth, aes(dose, len, color = supp))
g + geom_point(size = 2) + theme_bw() + labs(x = "Dose", y = "Length", color = "Supplement") +
    scale_color_brewer(palette = "Dark2")

Box plots code:

library(gridExtra)
g1 <- ggplot(ToothGrowth, aes(supp, len, fill=supp)) +
    geom_boxplot() + theme_bw() + labs(x = "Supplement", y = "Length", color = "Supplement") +
    scale_fill_brewer(palette = "Dark2")

g2 <- ggplot(ToothGrowth, aes(dose, len, fill = supp)) +
    geom_boxplot() + theme_bw() + labs(x = "Dose", y = "Length", color = "Supplement") +
    scale_fill_brewer(palette = "Dark2")

grid.arrange(g1, g2, ncol = 2)

T-test code and results

Hypothesis 1:

t.test(len~supp, data = ToothGrowth, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Hypothesis 2:

t.test(len~supp, data = ToothGrowth[ToothGrowth$dose == "0.5" , ], var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 18, p-value = 0.005304
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.770262 8.729738
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

Hypothesis 3:

t.test(len~supp, data = ToothGrowth[ToothGrowth$dose == "1" , ], var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 18, p-value = 0.0007807
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.840692 9.019308
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

Hypothesis 4:

t.test(len~supp, data = ToothGrowth[ToothGrowth$dose == "2" , ], var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 18, p-value = 0.9637
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.722999  3.562999
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

Hypothesis 5:

t.test(ToothGrowth$len[ToothGrowth$dose == "0.5"], ToothGrowth$len[ToothGrowth$dose == "1"])

## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == "0.5"] and ToothGrowth$len[ToothGrowth$dose == "1"]
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

t.test(ToothGrowth$len[ToothGrowth$dose == "1"], ToothGrowth$len[ToothGrowth$dose == "2"])

## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == "1"] and ToothGrowth$len[ToothGrowth$dose == "2"]
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100