This second part of the Statistical Inference course project deals with the ToothGrowth dataset, and will show some simple hypothesis testing to compare tooth growth by supplement and dose.
require(datasets)
require(ggplot2)
The basic data structure is shown below. We have a small data frame of 60 observations and 3 variables.
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Looking at the summary of the data, we see that there are two equally weighted supps: OJ and VC, each with 30 observations. The dosage also appears to only have three distinct levels of 0.5, 1.0 and 2.0.
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
Since we want to understand the differences in the factors, we can start with a graph of the supplements.
ggplot(ToothGrowth, aes(x = supp, y = len)) +
geom_boxplot() +
labs(x = "Supplement", y = "Length", title = "Tooth Growth by Supplement")
We see that the OJ supplement is correlated with a slightly higher length than the VC supplement, however VC does appear to have a larger range.
Looking at another boxplot, accounting for both the dosage factors and supplements, paints a clearer picture of this data:
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) +
geom_boxplot() +
facet_grid(.~supp) +
labs(x = "Dosage", y = "Length", title = "Tooth Growth by Dosage and Length")
There appears to be a strong positive correlation between the dosage and the length for both OJ and VC, though again, OJ seems to have a slightly higher correlation with length.
We will begin by testing the null hypothesis that the different supplements do not have any effect on tooth length.
t.test(len ~ supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
This t-test on the overall data shows a 95% confidence interval which includes 0, and has a p-value of 0.06, which means we fail to reject the null hypothesis.
We can go one step further and combine these results with the dosage levels, to test a null hypothesis that the dose level for these supplements do not have any effect on tooth length.
t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose==0.5,])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose==1.0,])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
t.test(len ~ supp, data = ToothGrowth[ToothGrowth$dose==2.0,])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
The t-test at a dosage level of 0.5 or 1.0 shows a significant correlation between tooth length and dose levels. Accounting for only these subsets, we are able to reject the null hypothesis. However, at dose levels of 2.0, the data is no longer correlated, and we fail to reject the null hypothesis.
We must assume that the populations were independant, and that the supplements were identically distributed among them. It was shown that higher dosage levels lead to longer teeth for both supplements. Based on these assumptions, and the above analysis, it appears that OJ is a better supplement for longer teeth at a lower dosage level (<= 2.0), but once the dosage level reaches that cut-off, OJ is no longer better than VC.