In this second part of the project, we analyze the ToothGrowth data in the R datasets package.

Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
library(ggplot2)
library(graphics)
library(lattice)

Exploring the contents of the dataset

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ToothGrowth[1:10,]
##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
table(ToothGrowth$dose, ToothGrowth$supp)
##      
##       OJ VC
##   0.5 10 10
##   1   10 10
##   2   10 10

1. Summary of the dataset ToothGrowth:

  • 60 observations
  • len: length of odontoblasts (teeth) in each of 10 guinea pigs
  • OJ: (orange juice) as delivery method
  • VC: (ascorbic acid) as deliverym method
  • dose: three dose levels of Vitamin C (0.5, 1 and 2 mg)

2. Exploring dataset by plotting A:

ggplot(data=ToothGrowth, aes(x=as.factor(dose), y=len, fill=supp)) +
    geom_bar(stat="identity",) +
    facet_grid(. ~ supp) +
    xlab("Dosage in miligrams") +
    ylab("Tooth length") +
    guides(fill=guide_legend(title="Supplement Type"))

3. Exploring dataset by plotting B:

xyplot(len~dose|supp, ToothGrowth,
       main="Scatterplots by Supplement Type and Dosage",
       ylab="Length", xlab="Dose")

Notes:
  • Plotting tooth length against both Dosage and Supplement, we see a proportional characteristic: larger the Dosage = longer the tooth.
  • Although the ascorbic acid seems to be higher, it is not enough to state that the delivery method (either orange juice or ascorbic acid) has drastic effect on tooth growth

3. Using confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose:

t.test(len ~ supp, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

4. Doing the same test, but increasing the amount of dosages:

t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == .5, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 1, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len ~ supp, ToothGrowth[ToothGrowth$dose == 2, ])
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

Conclusion:

Confidence testing while varying dosage results that an increase in dosage from .5, 1, to 2 is proportianal to longer tooth. However, with a p-value of 0.06 and having zero in the confidence interval means we can not reject the null hypothesis that different supplement types have no effect on tooth length.