In this report I will analyse the ToothGrowth data in the R data sets package. The data originally analyse the effect of vitamin C on tooth growth in Guinea Pigs.
The study’s description reports that the response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid - a form of vitamin C and coded as VC).
Regarding the format, it’s a data frame with 60 observations on 3 variables:
[1] len numeric Tooth length [2] supp factor Supplement type (VC or OJ) [3] dose numeric Dose in milligrams/day
The source of this study is: C. I. Bliss (1952) The Statistics of Bioassay. Academic Press.
As additional references cited for the information presented: McNeil, D. R. (1977) Interactive Data Analysis. New York: Wiley. Crampton, E. W. (1947) The growth of the odontoblast of the incisor teeth as a criterion of vitamin C intake of the guinea pig. The Journal of Nutrition 33(5): 491-504.
Codes to load the necessary library and dataset. I will also provide a quick summary to show the format stated at the Overview intro.
library(ggplot2)
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
Codes to perform some basic exploratory data analyses. I will focus on just a few points, which are the first few rows; the unique values of the columns and a short list of the factors.
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
unique(ToothGrowth$len)
## [1] 4.2 11.5 7.3 5.8 6.4 10.0 11.2 5.2 7.0 16.5 15.2 17.3 22.5 13.6
## [15] 14.5 18.8 15.5 23.6 18.5 33.9 25.5 26.4 32.5 26.7 21.5 23.3 29.5 17.6
## [29] 9.7 8.2 9.4 19.7 20.0 25.2 25.8 21.2 27.3 22.4 24.5 24.8 30.9 29.4
## [43] 23.0
unique(ToothGrowth$supp)
## [1] VC OJ
## Levels: OJ VC
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), summary)
## : OJ
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.20 9.70 12.25 13.23 16.18 21.50
## --------------------------------------------------------
## : VC
## : 0.5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.20 5.95 7.15 7.98 10.90 11.50
## --------------------------------------------------------
## : OJ
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.50 20.30 23.45 22.70 25.65 27.30
## --------------------------------------------------------
## : VC
## : 1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.60 15.27 16.50 16.77 17.30 22.50
## --------------------------------------------------------
## : OJ
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.40 24.58 25.95 26.06 27.08 30.90
## --------------------------------------------------------
## : VC
## : 2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.50 23.38 25.95 26.14 28.80 33.90
I will state my Null Hypotheses (H0) as “dosage has nothing to do with the teeth growth”. Therefore, my Alternative Hypothesis (HA) is “dosage has a positive effect on tooth growth”.
Below, the codes to perform some exploratory analysis. I intend to specify what will be analyze at the header of which chunk of code.
Analysis of the correlation between Dosage (dose) and Supplement (supp):
average <- aggregate(len~., data = ToothGrowth, mean)
ggplot(data = ToothGrowth, aes(x = dose, y = len)) +
geom_point(aes(group = supp, colour = supp, size = 2, alpha = .6)) +
geom_line(data = average, aes(group = supp, colour = supp)) +
labs(title = "Correlation between Dosage and Supplement")
It seems that higher the dosage, the longer the tooth grows. The dosages are similar for both supplements at 2mg, but OJ has a better effect on teeth growth than VC.
To confirm this correlation I will perform some hypothesis tests.
Analysis of the relationship between Tooth Lenght (len) and Supplement (supp):
ggplot(aes(x = supp, y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = supp)) +
labs(title = "Relationship between Tooth Lenght and Supplement")
Supporting the first inference, the plot above shows that Orange Juice (OJ) has a better effect on teeth growth than Vitamin C (VC).
Analysis of the relationship between Tooth Lenght (len) and Dosage (dose):
ggplot(aes(x = factor(dose), y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = factor(dose))) +
labs(title = "Relationship between Tooth Lenght and Dosage")
Although the plot shows that higher the dosage, better the effect on teeth growth, it seems that there is no directly correlation. Therefore not supporting the first inference.
To check if there is so, the following analysis intends to determine if within Dosage, the Supplements have different effects on teeth growth.
Analysis of the impact of Dosage (dose) and Supplement (supp) on Tooth Lenght (len):
ggplot(aes(x = supp, y = len), data = ToothGrowth) +
geom_boxplot(aes(fill = supp)) +
facet_wrap(~dose) +
labs(title = "Impact of Dosage and Supplement on Tooth Lenght")
The plot above shows that ideed there is a correlation between dosage and supplement on teeth growth.
I also wish to compare tooth growth by supplement. To perform this analysis I will find the confidence interval for a 95% confidence level using the t-test technique.
Analysis using a t-test to compare Tooth Lenght (len) and Supplement (supp):
t.test(len~supp, data = ToothGrowth)
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
Since the p-value is higher than 0.05, the data is likely with a true null. To be sure I will compare tooth growth by dosage, checking at the different pairs of dose values.
T-test using dose amounts 0.5 and 1.0:
test_set_one <- subset(ToothGrowth, ToothGrowth$dose %in% c(1.0, .5))
t.test(len~dose, data = test_set_one)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983781 -6.276219
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
T-test using dose amounts 0.5 and 2.0:
test_set_two <- subset(ToothGrowth, ToothGrowth$dose %in% c(2.0, .5))
t.test(len~dose, data = test_set_two)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15617 -12.83383
## sample estimates:
## mean in group 0.5 mean in group 2
## 10.605 26.100
T-test using dose amounts 1.0 and 2.0:
test_set_three <- subset(ToothGrowth, ToothGrowth$dose %in% c(2.0, 1.0))
t.test(len~dose, data = test_set_three)
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
Given the all-negative confidence interval that doesn’t includes 0 in every each and one of the tests and the very small p-value, I can infer that the average tooth length increases with an increasing dosage. Consequently the null hypothesis can be rejected.
Assuming that the populations are independent, the variances between populations are different, a random population was used, the population was comprised of similar guinea pigs, measurement error was accounted for with significant digits, and double blind research methods were used.
Also assuming that those assumptions are true, I may infer that there is a significant difference between tooth length and dosage levels across both supplement delivery methods. A higher dosage level consistently led to longer teeth, rejecting the null hypothesis.
Initially it appeared that the supplement delivery method had no significant impact on tooth length, but when controlling the dose level, I discovered that there was a significant difference at 0.5mg and 1.0mg, but not at 2.0mg.
Based on this evidence, it appears that orange juice (OJ) is a better supplement method with a larger impact on tooth length for a given dose of Vitamin C (VC), but above a maximum dose level there is no further improvement.