Overview:

Analyzing the ToothGrowth data in the R datasets package

Description of data - The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (Orange Juice or Ascorbic Acid).

1. Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
data(ToothGrowth)
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
boxplot(len~supp+dose, data=ToothGrowth, main="Tooth Growth", xlab="Supplement and Dose", ylab="Tooth length", col = c("light blue", "light green"))

library(ggplot2)
ggplot(data=ToothGrowth, aes(x=dose, y=len, fill=supp)) +
    geom_bar(stat="identity",) +
    facet_grid(. ~ supp) +
    xlab("Dosage in Milligrams") +
    ylab("Total Tooth length") +
    guides(fill=guide_legend(title="Supplement Type")) 

The basic exploratory data analyses, or two previous graphs, make it appear that the higher the dose of Vitamin C, the more tooth growth. Orange Juice may provide higher growth rates than Ascorbic Acid at the 0.5 and 1 mg dosage levels. Part 3 will validate and test these hypotheses.

2. Provide a basic summary of the data.

library(plyr)
summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90
sd(ToothGrowth$len)
## [1] 7.649315
ddply(ToothGrowth, c("dose", "supp"), summarise, LengthMean=mean(len))
##   dose supp LengthMean
## 1  0.5   OJ      13.23
## 2  0.5   VC       7.98
## 3    1   OJ      22.70
## 4    1   VC      16.77
## 5    2   OJ      26.06
## 6    2   VC      26.14
ddply(ToothGrowth, c("dose", "supp"), summarise, LengthStanDev=sd(len))
##   dose supp LengthStanDev
## 1  0.5   OJ      4.459709
## 2  0.5   VC      2.746634
## 3    1   OJ      3.910953
## 4    1   VC      2.515309
## 5    2   OJ      2.655058
## 6    2   VC      4.797731

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

Test the differences between Supplements in terms of Tooth Growth

Tooth Growth difference by Supplement at the 0.5 mg Dose:

t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose == 0.5, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ than VC at the 0.5 mg Dose.

Tooth Growth difference by Supplement at the 1 mg Dose:

t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose == 1, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ than VC at the 1 mg Dose.

Tooth Growth difference by Supplement at the 2 mg Dose:

t.test(len ~ supp, paired=FALSE, var.equal=FALSE, data=ToothGrowth[ToothGrowth$dose == 2, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

The 95% confidence interval includes zero and the p-value < 0.05, meaning there is not a statistically significant difference between the populations. The Tooth Growth is not higher for OJ than VC at the 2 mg Dose.

Test the differences between Dosages for the Supplements

OJ.low <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 0.5, ]
OJ.middle <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 1.0, ]
OJ.high <- ToothGrowth[ToothGrowth$supp == 'OJ' & ToothGrowth$dose == 2.0, ]

Tooth Growth difference by OJ at the 0.5 and 1 mg Dosage Levels:

t.test(OJ.low$len, OJ.middle$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.low$len and OJ.middle$len
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.415634  -5.524366
## sample estimates:
## mean of x mean of y 
##     13.23     22.70

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ at the 1 mg Dose than the 0.5 mg Dose.

Tooth Growth difference by OJ at the 1 and 2 mg Dosage Levels:

t.test(OJ.middle$len, OJ.high$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.middle$len and OJ.high$len
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y 
##     22.70     26.06

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ at the 2 mg Dose than the 1 mg Dose.

Tooth Growth difference by OJ at the 0.5 and 2 mg Dosage Levels:

t.test(OJ.low$len, OJ.high$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  OJ.low$len and OJ.high$len
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.335241  -9.324759
## sample estimates:
## mean of x mean of y 
##     13.23     26.06

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for OJ at the 2 mg Dose than the 0.5 mg Dose.

VC.low <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 0.5, ]
VC.middle <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 1.0, ]
VC.high <- ToothGrowth[ToothGrowth$supp == 'VC' & ToothGrowth$dose == 2.0, ]

Tooth Growth difference by VC at the 0.5 and 1 mg Dosage Levels:

t.test(VC.low$len, VC.middle$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  VC.low$len and VC.middle$len
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.265712  -6.314288
## sample estimates:
## mean of x mean of y 
##      7.98     16.77

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for VC at the 1 mg Dose than the 0.5 mg Dose.

Tooth Growth difference by VC at the 1 and 2 mg Dosage Levels:

t.test(VC.middle$len, VC.high$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  VC.middle$len and VC.high$len
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.054267  -5.685733
## sample estimates:
## mean of x mean of y 
##     16.77     26.14

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for VC at the 2 mg Dose than the 1 mg Dose.

Tooth Growth difference by VC at the 0.5 and 2 mg Dosage Levels:

t.test(VC.low$len, VC.high$len, paired=FALSE, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  VC.low$len and VC.high$len
## t = -10.3878, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -21.90151 -14.41849
## sample estimates:
## mean of x mean of y 
##      7.98     26.14

The 95% confidence interval does not include zero/p-value < 0.05, meaning there is a statistically significant difference between the populations. The Tooth Growth is higher for VC at the 2 mg Dose than the 0.5 mg Dose.

4. State your conclusions and the assumptions needed for your conclusions.

Conclusions

With 95% Confidence -

  • Orange Juice has a higher tooth growth rate than Ascorbic Acid at the 0.5 and 1 mg dosage levels.
  • Orange Juice does not have a higher tooth growth rate than Ascorbic Acid for the 2 mg dose.
  • The higher the dose of Orange Juice or Ascorbic Acid, the higher the tooth growth rate.

Assumptions

  • The variances between the sample populations are not equal.
  • The sample data is not paired.
  • The sample is random and representative of the whole population.