Analyzing the ToothGrowth data in the R datasets package
Description of data - The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (Orange Juice or Ascorbic Acid).
library(datasets)
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
boxplot(len~supp+dose, data=ToothGrowth, main="Tooth Growth", xlab="Supplement and Dose", ylab="Tooth length", col = c("light blue", "light green"))
library(ggplot2)
ggplot(data=ToothGrowth, aes(x=dose, y=len, fill=supp)) +
geom_bar(stat="identity",) +
facet_grid(. ~ supp) +
xlab("Dosage in Milligrams") +
ylab("Total Tooth length") +
guides(fill=guide_legend(title="Supplement Type"))
The basic exploratory data analyses, or two previous graphs, make it appear that the higher the dose of Vitamin C, the more tooth growth. Orange Juice may provide higher growth rates than Ascorbic Acid at the 0.5 and 1 mg dosage levels. Part 3 will validate and test these hypotheses.
library(plyr)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
sd(ToothGrowth$len)
## [1] 7.649315
ddply(ToothGrowth, c("dose", "supp"), summarise, LengthMean=mean(len))
## dose supp LengthMean
## 1 0.5 OJ 13.23
## 2 0.5 VC 7.98
## 3 1 OJ 22.70
## 4 1 VC 16.77
## 5 2 OJ 26.06
## 6 2 VC 26.14
ddply(ToothGrowth, c("dose", "supp"), summarise, LengthStanDev=sd(len))
## dose supp LengthStanDev
## 1 0.5 OJ 4.459709
## 2 0.5 VC 2.746634
## 3 1 OJ 3.910953
## 4 1 VC 2.515309
## 5 2 OJ 2.655058
## 6 2 VC 4.797731
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose == 0.5, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC
## 13.23 7.98
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ than VC at the 0.5 mg Dose.
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose == 1, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC
## 22.70 16.77
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ than VC at the 1 mg Dose.
t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose == 2, ])
##
## Welch Two Sample t-test
##
## data: len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.79807 3.63807
## sample estimates:
## mean in group OJ mean in group VC
## 26.06 26.14
The 95% confidence interval includes zero and the p-value < 0.05, meaning there is not a statistically significant difference between the populations. The Tooth Growth is not higher for OJ than VC at the 2 mg Dose.
OJ.low <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5, ]
OJ.middle <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1.0, ]
OJ.high <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2.0, ]
t.test(OJ.low$len, OJ.middle$len, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: OJ.low$len and OJ.middle$len
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.415634 -5.524366
## sample estimates:
## mean of x mean of y
## 13.23 22.70
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ at the 1 mg Dose than the 0.5 mg Dose.
t.test(OJ.middle$len, OJ.high$len, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: OJ.middle$len and OJ.high$len
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y
## 22.70 26.06
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ at the 2 mg Dose than the 1 mg Dose.
t.test(OJ.low$len, OJ.high$len, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: OJ.low$len and OJ.high$len
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.335241 -9.324759
## sample estimates:
## mean of x mean of y
## 13.23 26.06
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ at the 2 mg Dose than the 0.5 mg Dose.
VC.low <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5, ]
VC.middle <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1.0, ]
VC.high <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2.0, ]
t.test(VC.low$len, VC.middle$len, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: VC.low$len and VC.middle$len
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.265712 -6.314288
## sample estimates:
## mean of x mean of y
## 7.98 16.77
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for VC at the 1 mg Dose than the 0.5 mg Dose.
t.test(VC.middle$len, VC.high$len, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: VC.middle$len and VC.high$len
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.054267 -5.685733
## sample estimates:
## mean of x mean of y
## 16.77 26.14
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for VC at the 2 mg Dose than the 1 mg Dose.
t.test(VC.low$len, VC.high$len, paired=FALSE, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: VC.low$len and VC.high$len
## t = -10.3878, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.90151 -14.41849
## sample estimates:
## mean of x mean of y
## 7.98 26.14
The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for VC at the 2 mg Dose than the 0.5 mg Dose.
With 95% Confidence -