Summary
In this report, I analyze the ToothGrowth data in the R datasets package. I perform exploratory data analysis, provide a basic summary of the data, and use statistical techniques to compare tooth growth in different categories.
Exploratory Data Analysis
The ToothGrowth data set explains the relation between the growth of teeth of guinea pigs at each of three dose levels of Vitamin C with each of the two delivery methods of orange juice and ascorbic acid. In this section, I first load the data set and then look at the variables in the data set.
library(datasets)
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Base on the output of the above code, we can see that the data set has 3 variables (len, supp, and dose which are numeric, factor, and numeric variables, respectively) and 60 observations. For each of the two causal factors supp and dose.
unique(ToothGrowth$supp)
## [1] VC OJ
## Levels: OJ VC
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
Next, we convert the variable dose from numeric to factor. The following code takes care of this conversion and presents the new transformed data set.
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
str(ToothGrowth)
Next, we use box plots for len vs. dose (Appendix 1) and len vs. supp (Appendix 2) and to see the effects of dosage and supplement type on tooth length.
library("ggplot2")
ggplot(aes(x = dose, y = len), data = ToothGrowth) + geom_boxplot()
From the graph in Appendix 1, we can see that dosage significantly affects the tooth length.
library("ggplot2")
ggplot(aes(x = supp, y = len), data = ToothGrowth) + geom_boxplot()
From the graph in Appendix 2, we can see that supplment type affects the tooth length but the effect is not as clear as the effect of dosage on tooth length (it appears that orange juice has a slightly larger impact on the tooth length). This points toward a possibility of an interaction effect. Therefore, we need to look at the box plots for len vs. supp and dose at the same time (Appendix 3).
library("ggplot2")
ggplot(aes(x = supp, y = len), data = ToothGrowth) + geom_boxplot() + facet_wrap(~ dose)
From the graph in Appendix 3, it is evident that the delivery method has different effects with different dosage levels. At dosage levels of 0.5 and 1.0, orange juice has a positive correlation with tooth length while at the dosage level of 2.0 we see no difference between orange juice and Vitamin C.
Basic Summary of Data
In this section, I present a basic summary of the data in the data set.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
The above output provides the summary statistics for all of the variables in the data set. In the following section, I provide the summary of the data for each combination of dose level and delivery method.
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), summary)
## : OJ
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 9.70 12.25 13.23 16.18 21.50
## --------------------------------------------------------
## : VC
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 5.95 7.15 7.98 10.90 11.50
## --------------------------------------------------------
## : OJ
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.50 20.30 23.45 22.70 25.65 27.30
## --------------------------------------------------------
## : VC
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.60 15.27 16.50 16.77 17.30 22.50
## --------------------------------------------------------
## : OJ
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.40 24.58 25.95 26.06 27.08 30.90
## --------------------------------------------------------
## : VC
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.50 23.38 25.95 26.14 28.80 33.90
Confidence Intervals and Hypothesis Testing
In this section, I will evaluate the effects of delivery methods on tooth growth at different levels of Vitamin C dosage.
OJ_1 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == "OJ"]
VC_1 <- ToothGrowth$len[ToothGrowth$dose == 0.5 & ToothGrowth$supp == "VC"]
OJ_2 <- ToothGrowth$len[ToothGrowth$dose == 1.0 & ToothGrowth$supp == "OJ"]
VC_2 <- ToothGrowth$len[ToothGrowth$dose == 1.0 & ToothGrowth$supp == "VC"]
OJ_3 <- ToothGrowth$len[ToothGrowth$dose == 2.0 & ToothGrowth$supp == "OJ"]
VC_3 <- ToothGrowth$len[ToothGrowth$dose == 2.0 & ToothGrowth$supp == "VC"]
test_1 <- t.test(OJ_1, VC_1)
test_2 <- t.test(OJ_2, VC_2)
test_3 <- t.test(OJ_3, VC_3)
Based on these results, I can make the following conclusions:
The p-value of the significance of difference between orange juice and Vitamin C at dosage of 0.5 is 0.0063586. Since this p-value is significant at 5% level (and, also, the confidence interval of (1.7190573, 8.7809427) does not include zero), we can reject the null hypothesis and conclude that at 0.5 we see a significant difference between the methods.
The p-value of the significance of difference between orange juice and Vitamin C at dosage of 1.0 is 0.0010384. Since this p-value is significant at 5% level (and, also, the confidence interval of (2.8021482, 9.0578518) does not include zero), we can reject the null hypothesis and conclude that at 1.0 we see a significant difference between the methods.
The p-value of the significance of difference between orange juice and Vitamin C at dosage of 1.0 is 0.9638516. Since this p-value is not significant at 5% level (and, also, the confidence interval of (-3.7980705, 3.6380705) includes zero), we cannot reject the null hypothesis and conclude that at 2.0 we do not see a significant difference between the methods.
Assumptions and Conclusion
The following assumptions are critical in making inferences from the conclusions of this analysis: the test subjects in the sample are representative of the entire population; the sample is a random sample; assignment of opulations to different dose categories and different supplement types are random; and, the variance of different categories in hypothesis testing is not equal.
Having these assumptions in mind and considering the results in the previous section, we can say that irrespective of the delivery method, higher dosage leads to higher tooth length. As for the significance of the delivery method, we can say that at dosage levels of 0.5 and 1.0, we see a significant difference between the effects of different delivery methods on tooth length but we do not see this difference at the dosage level of 2.0. Orange juice appears to have a greater effect on tooth length at dosage levels of 0.5 and 1.0 but at dosage level of 2.0 there is no statistically significant difference between delivery methods.
Appendix 1 - Box Plots: Length vs. Dose
library("ggplot2")
ggplot(aes(x = dose, y = len), data = ToothGrowth) + geom_boxplot()
Appendix 2 - Box Plots: Length vs. Supplement
library("ggplot2")
ggplot(aes(x = supp, y = len), data = ToothGrowth) + geom_boxplot()
Appendix 3 - Box Plots: Length vs. Dose and Supplement
library("ggplot2")
ggplot(aes(x = supp, y = len), data = ToothGrowth) + geom_boxplot() + facet_wrap(~ dose)