This report includes an analysis of the ToothGrowth data (from R), including an exploratory data analysis and hypothesis testing. The analysis aims to answer the question of what effect different vitament C supplement types (orange juice and ascorbic acid) and different dose levels (0.5, 1, and 2 mg) have on tooth growth of guinea pigs.
The dataframe is loaded from R. It can be seen from the structure of the data set (shown below) that there are 60 observations, and the three variables are ‘len’, a numeric value for the length measurements, ‘supp’, a 2-level factor variable for the supplement type (OJ or VC), and ‘dose’, a numeric for the 3 dose levels (0.5, 1, and 2 mg). In the context of this analysis, it makes sense to convert ‘dose’ to a 3-level factor variable, for ease of comparisons.
library(datasets)
data("ToothGrowth")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
Some plots are made to visualize the data, grouped by supplement type and dose level. All code for creating the plots is given in the appendix.
The figure below shows a scatter plot of the data, where an increasing trend in length with dose level is clear. However, it is less clear whether or not supplement type has any significant effect.
The trends and relative effects are seen more clearly in the following box plots. The plot on the left shows the data grouped only by the two supplement types, regardless of dose. The boxes have quite a bit of overlap, so it is not yet clear whether the supplement type has a significant effect on length. The plot on the right shows the data grouped by supplement type, for each distinct dose level. Here again we see the trend of increasing length with dose level, but also it seems that there may be significant difference between supplement types at a given dose level.
In order to draw conclusions about what effect supplement type and dose size has on tooth growth, several different hypotheses were tested using t-tests to determine statistical significance. Results are summarized here, with full code and output for each t-test given in the appendix.
Hypothesis: Supplement type has no effect on tooth growth. (H0: \(\mu_{OJ} = \mu_{VC}\), HA: \(\mu_{OJ} \neq \mu_{VC}\))
The p-value > 0.05, indicating we fail to reject the null hypothesis, and therefore supplement type may have no effect on tooth growth.
Hypothesis: Supplement type has no effect on tooth growth, at a dose level = 0.5 mg. (H0: \(\mu_{OJ, 0.5} = \mu_{VC, 0.5}\), HA: \(\mu_{OJ, 0.5} \neq \mu_{VC, 0.5}\))
The p-value < 0.05, indicating we can reject the null hypothesis, and therefore supplement type may have an effect on tooth growth at a dose level of 0.5 mg.
Hypothesis: Supplement type has no effect on tooth growth, at a dose level = 1 mg. (H0: \(\mu_{OJ, 1} = \mu_{VC, 1}\), HA: \(\mu_{OJ, 1} \neq \mu_{VC, 1}\))
The p-value < 0.05, indicating we can reject the null hypothesis, and therefore supplement type may have an effect on tooth growth at a dose level of 1 mg.
Hypothesis: Supplement type has no effect on tooth growth, at a dose level = 2 mg. (H0: \(\mu_{OJ, 2} = \mu_{VC, 2}\), HA: \(\mu_{OJ, 2} \neq \mu_{VC, 2}\))
The p-value > 0.05, indicating we fail to reject the null hypothesis, and therefore supplement type may not have an effect on tooth growth at a dose level of 2 mg.
Hypothesis: Dose level has no effect on tooth growth. First test 0.5 to 1 mg change, then 1 mg to 2 mg change. (H0: \(\mu_{dose1} = \mu_{dose2}\), HA: \(\mu_{dose1} \neq \mu_{dose2}\))
For both cases, the change in dose level has p-value << 0.05, indicating we can reject the null hypothesis, and therefore dose level does have an effect on tooth growth.
Based on all the t-tests performed above, the following conclusions can be drawn:
The following assumptions were made in order to perform this data analysis:
It is assumed that the experiment was designed such that individual guinea pigs were randomly assigned to supplement types and dosage levels.
In order to use a t-test, especially for smaller sample sizes such as these, the samples must be roughly symmetic and have similar shape. This is the case as can be seen from the box plots of the samples. Even the samples that have some skew are not severely skewed.
The assumption of equal variance in the guinea pig population was used.
Scatter plot code:
library(dplyr)
library(ggplot2)
g <- ggplot(ToothGrowth, aes(dose, len, color = supp))
g + geom_point(size = 2) + theme_bw() + labs(x = "Dose", y = "Length", color = "Supplement") +
scale_color_brewer(palette = "Dark2")
Box plots code:
library(gridExtra)
g1 <- ggplot(ToothGrowth, aes(supp, len, fill=supp)) +
geom_boxplot() + theme_bw() + labs(x = "Supplement", y = "Length", color = "Supplement") +
scale_fill_brewer(palette = "Dark2")
g2 <- ggplot(ToothGrowth, aes(dose, len, fill = supp)) +
geom_boxplot() + theme_bw() + labs(x = "Dose", y = "Length", color = "Supplement") +
scale_fill_brewer(palette = "Dark2")
grid.arrange(g1, g2, ncol = 2)
Hypothesis 1:
t.test(len~supp, data = ToothGrowth, var.equal = TRUE)
##
## Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Hypothesis 2:
t.test(len~supp, data = ToothGrowth[ToothGrowth$dose == "0.5" , ], var.equal = TRUE)
##
## Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 18, p-value = 0.005304
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.770262 8.729738
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
Hypothesis 3:
t.test(len~supp, data = ToothGrowth[ToothGrowth$dose == "1" , ], var.equal = TRUE)
##
## Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 18, p-value = 0.0007807
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.840692 9.019308
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
Hypothesis 4:
t.test(len~supp, data = ToothGrowth[ToothGrowth$dose == "2" , ], var.equal = TRUE)
##
## Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 18, p-value = 0.9637
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.722999 3.562999
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
Hypothesis 5:
t.test(ToothGrowth$len[ToothGrowth$dose == "0.5"], ToothGrowth$len[ToothGrowth$dose == "1"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$dose == "0.5"] and ToothGrowth$len[ToothGrowth$dose == "1"]
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean of x mean of y
## 10.605 19.735
t.test(ToothGrowth$len[ToothGrowth$dose == "1"], ToothGrowth$len[ToothGrowth$dose == "2"])
##
## Welch Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$dose == "1"] and ToothGrowth$len[ToothGrowth$dose == "2"]
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
## 19.735 26.100