Overview

The aim of this project is to analyze the ToothGrowth data in the R datasets package by comparing the tooth growth of 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid.

1. Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
data(ToothGrowth)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
library(ggplot2)
t = ToothGrowth
levels(t$supp) <- c("Orange Juice", "Ascorbic Acid")
ggplot(t, aes(x=factor(dose), y=len)) + 
  facet_grid(.~supp) + 
  geom_boxplot(aes(fill = supp), show.legend = FALSE) + 
  labs(title="Guinea pig tooth length by dosage for each type of supplement", x="Dose (mg/day)", y="Tooth Length")

2. Provide a basic summary of the data.

The box plots show that increase in dosage increases tooth growth. It also shows that Orange juice is more effective than ascorbic acid when the dosage is 0.5 to 1.0 milligrams per day and that both types of supplements are equally effective when the dosage is 2.0 milligrams per day.

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)

Hypothesis 1:

Orange juice & ascorbic acid deliver the same tooth growth across the data set.

hypoth1<-t.test(len ~ supp, data = t)
hypoth1$conf.int
## [1] -0.1710156  7.5710156
## attr(,"conf.level")
## [1] 0.95
hypoth1$p.value
## [1] 0.06063451

The confidence interval includes 0 and the p-value is greater than the threshold of 0.05. Thus, the null hypothesis cannot be rejected.

Hypothesis 2:

For the dosage of 0.5 mg/day, the two supplements deliver the same tooth growth.

hypoth2<-t.test(len ~ supp, data = subset(t, dose == 0.5))
hypoth2$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
hypoth2$p.value
## [1] 0.006358607

The confidence interval doesn’t include 0 and the p-value is below the threshold. Thus, the null hypothesis can be rejected.

The alternate hypothesis that 0.5 mg/day dosage of orange juice delivers more tooth growth than ascorbic acid is accepted.

Hypothesis 3:

For the dosage of 1 mg/day, the two supplements deliver the same tooth growth.

hypoth3<-t.test(len ~ supp, data = subset(t, dose == 1))
hypoth3$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
hypoth3$p.value
## [1] 0.001038376

The confidence interval does not include 0 and the p-value is smaller than the threshold. Thus, the null hypothesis can be rejected.

The alternate hypothesis that 1 mg/day dosage of orange juice delivers more tooth growth than ascorbic acid is accepted.

Hypothesis 4:

For the dosage of 2 mg/day, the two supplements deliver the same tooth growth.

hypoth4<-t.test(len ~ supp, data = subset(t, dose == 2))
hypoth4$conf.int
## [1] -3.79807  3.63807
## attr(,"conf.level")
## [1] 0.95
hypoth4$p.value
## [1] 0.9638516

The confidence interval includes 0 and the p-value is larger than the threshold. Thus, the null hypothesis cannot be rejected.

4. State your conclusions and the assumptions needed for your conclusions.

Assumptions:

Given that the assumptions are true, the following can be concluded: