Overview

Analyze the ToothGrowth dataset in R by performing exploratory data analysis and confidence interval testing for the different supplements and dosage.

Procedure

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
  4. State your conclusions and the assumptions needed for your conclusions.

Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
df <- ToothGrowth
boxplot(len~dose+supp,data=df,main="Tooth Length by Dosage and Supplement")

Provide a basic summary of the data.

At first glance, it appears that higher doses of both supplements are correlated to a higher level of tooth length. The variation in the VC data appears much more separated between each dosage amount, while the OJ data appears to have a higher level of variability denoted by the heavily overlapping “whisker” regions in the boxplot above.

Means and standard deviations for each supplement/dosage combination are provided here…

library(dplyr)
summary <- df %>% group_by(supp,dose) %>% summarize(mean = mean(len),sd = sd(len))
as.data.frame(summary)
##   supp dose  mean       sd
## 1   OJ  0.5 13.23 4.459709
## 2   OJ  1.0 22.70 3.910953
## 3   OJ  2.0 26.06 2.655058
## 4   VC  0.5  7.98 2.746634
## 5   VC  1.0 16.77 2.515309
## 6   VC  2.0 26.14 4.797731

Confidence Intervals & Conclusions

Treatment significance for highest doses

The largest tooth lengths appeared in the 2.0 dosage of both supplements. I will compare these two supplements using the highest dosage for each, to see if one is significantly more effective than the other.

topOJ <- df[df$dose==2.0 & df$supp=="OJ",1]
topVC <- df[df$dose==2.0 & df$supp=="VC",1] 
t.test(topOJ,topVC,paired=FALSE)$conf
## [1] -3.79807  3.63807
## attr(,"conf.level")
## [1] 0.95

Assumptions

The data is IDD

The variances across all supplement/dosage combinations are not equal

The different subjects are not paired

The same teeth (or sets of teeth) were measured in each test

Conclusion

Since the confidence interval contains 0, we can not say that the tooth growth amount from the highest dosage of the two treatments are significantly different.

Dosage significance within each Treatment

Here I will attempt to discover whether there is a difference within each treatment for that treatment’s two highest doses. I will assume the variance is equal between both dosage levels.

Confidence interval for OJ…

highDose <- df[df$dose==2.0 & df$supp=="OJ",1]
midDose <- df[df$dose==1.0 & df$supp=="OJ",1] 
t.test(highDose,midDose,paired=FALSE)$conf
## [1] 0.1885575 6.5314425
## attr(,"conf.level")
## [1] 0.95

Confidence interval for VC…

highDose <- df[df$dose==2.0 & df$supp=="VC",1]
midDose <- df[df$dose==1.0 & df$supp=="VC",1] 
t.test(highDose,midDose,paired=FALSE)$conf
## [1]  5.685733 13.054267
## attr(,"conf.level")
## [1] 0.95

Assumptions

The data is IDD

The variances across all supplement/dosage combinations are not equal

The different subjects are not paired

The same teeth (or sets of teeth) were measured in each test

Conclusions

Both groups were deemed significant, due to neither interval containing zero. This means the higher dose did result in higher tooth length. However, the VC interval was far higher than the OJ interval, so while the result was the same for both supplements, the results were much clearer for VC.