Tooth Growth

Tyler Byers

Statistical Inference, Coursera

Class Project, Problem #2

Aug 2014

Load Data

data(ToothGrowth)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Perform Basic EDA

library(ggplot2)
ggplot(aes(x=dose, y = len), data = ToothGrowth) + 
    geom_point(aes(color = supp))

plot of chunk ggplots

ggplot(aes(x = supp, y = len), data = ToothGrowth) + 
    geom_boxplot(aes(fill = supp))

plot of chunk ggplots

ggplot(aes(x = factor(dose), y = len), data = ToothGrowth) + 
    geom_boxplot(aes(fill = factor(dose)))

plot of chunk ggplots

ggplot(aes(x = supp, y = len), data = ToothGrowth) +
    geom_boxplot(aes(fill = supp)) + facet_wrap(~ dose)

plot of chunk ggplots

Based on some simple EDA, the dosage appears to affect tooth length – the higher the supplement, the longer the tooth length. The supplement type may affect the tooth length – with OJ being higher than VC – but it is difficult to tell if the differences are statistically significant. Finally, it appears as if the supplement type affects tooth length at lower dosages, with OJ having a larger effect, but at higher dosages the differences appear minimal, if any at all.

Basic data summary

summary(ToothGrowth)

##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00

# summarize after separating by supplement and dose. Table shows length and summary for all supp/dose combinations.
by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), length)

## : OJ
## : 0.5
## [1] 10
## -------------------------------------------------------- 
## : VC
## : 0.5
## [1] 10
## -------------------------------------------------------- 
## : OJ
## : 1
## [1] 10
## -------------------------------------------------------- 
## : VC
## : 1
## [1] 10
## -------------------------------------------------------- 
## : OJ
## : 2
## [1] 10
## -------------------------------------------------------- 
## : VC
## : 2
## [1] 10

by(ToothGrowth$len, INDICES = list(ToothGrowth$supp, ToothGrowth$dose), summary)

## : OJ
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     8.2     9.7    12.2    13.2    16.2    21.5 
## -------------------------------------------------------- 
## : VC
## : 0.5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20    5.95    7.15    7.98   10.90   11.50 
## -------------------------------------------------------- 
## : OJ
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    14.5    20.3    23.5    22.7    25.6    27.3 
## -------------------------------------------------------- 
## : VC
## : 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    13.6    15.3    16.5    16.8    17.3    22.5 
## -------------------------------------------------------- 
## : OJ
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    22.4    24.6    26.0    26.1    27.1    30.9 
## -------------------------------------------------------- 
## : VC
## : 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    18.5    23.4    26.0    26.1    28.8    33.9

There are 10 samples for each dose/supplement combination (3 dose levels, two supp type = 6 combined levels), for a total of 60 samples.

Confidence Intervals and Hypothesis Testing

Test by Supplement

Test by supplement factor only – do not consider dosage.

t.test(len ~ supp, paired = F, var.equal = F, data = ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.171  7.571
## sample estimates:
## mean in group OJ mean in group VC 
##            20.66            16.96

With a confidence interval of [-0.171, 7.571] for mean(OJ)-mean(VC), we cannot reject the null hypothesis that there is not a significant difference in tooth length between the two supplement types.

Test by Dosage

For these tests, we will ignore the the type of supplement, and see if there is a difference in tooth length based on dosage levels. We create three separate data frames to compare 0.5 vs 1.0, 0.5 vs 2.0, and 1.0 vs 2.0.

Tooth.dose12 <- subset(ToothGrowth, dose %in% c(0.5, 1.0))
Tooth.dose13 <- subset(ToothGrowth, dose %in% c(0.5, 2.0))
Tooth.dose23 <- subset(ToothGrowth, dose %in% c(1.0, 2.0))
t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose12)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.477, df = 37.99, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.984  -6.276
## sample estimates:
## mean in group 0.5   mean in group 1 
##             10.61             19.73

t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose13)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.8, df = 36.88, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.16 -12.83
## sample estimates:
## mean in group 0.5   mean in group 2 
##             10.61             26.10

t.test(len ~ dose, paired = F, var.equal = F, data = Tooth.dose23)

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.901, df = 37.1, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996 -3.734
## sample estimates:
## mean in group 1 mean in group 2 
##           19.73           26.10

With a confidence interval of [-11.98, -6.276] for mean(dose0.5)-mean(dose1.0), we reject the null hypothesis and say there is a significant difference in tooth length between dosages of 0.5 mg and 1.0 mg (1.0 mg creates longer teeth in guinea pigs).
With a confidence interval of [-18.16, -12.83] for mean(dose0.5)-mean(dose2.0), we reject the null hypothesis and say there is a significant difference in tooth length between dosages of 0.5 mg and 2.0 mg (2.0 mg creates longer teeth in guinea pigs).
With a confidence interval of [-8.996, -3.734] for mean(dose1.0)-mean(dose2.0), we reject the null hypothesis and say there is a significant difference in tooth length between dosages of 1.0 mg and 2.0 mg (2.0 mg creates longer teeth in guinea pigs).

Test by Supplement Across Dosage Levels

Finally, for this test we will see if, given certain dosage levels, there is a significant difference in tooth growth between the two supplement types (i.e. at dose level 0.5 mg, is there a significant difference in tooth growth between VC and OJ supplement types?).

Tooth.dose05 <- subset(ToothGrowth, dose == 0.5)
Tooth.dose10 <- subset(ToothGrowth, dose == 1.0)
Tooth.dose20 <- subset(ToothGrowth, dose == 2.0)
t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose05)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose10)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.033, df = 15.36, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802 9.058
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

t.test(len ~ supp, paired = F, var.equal = F, data = Tooth.dose20)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.798  3.638
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

With a confidence interval of [1.72, 8,78] for mean(OJ)-mean(VC) at dose level 0.5 mg, we reject the null hypothesis and say that there is a significant difference in tooth length between the two supplement types at this dose level.
With a confidence interval of [2.80, 9.06] for mean(OJ)-mean(VC) at dose level 1.0 mg, we reject the null hypothesis and say that there is a significant difference in tooth length between the two supplement types at this dose level.
With a confidence interval of [-3.80, 3.64] for mean(OJ)-mean(VC) at dose level 2.0 mg, we cannot reject the null hypothesis that there is not a significant difference in tooth length between the two supplement types at this dose level.

Conclusions and Assumptions

Conclusions

When ignoring dose levels, there is no significant difference in the tooth length between the supplement types.
When ignoring the supplement types, there IS a significant difference in tooth length between the dose levels, with higher doses resulting in longer teeth.
When considering dose levels AND supplement types, OJ creates longer teeth than VC at dose levels of 0.5 mg and 1.0 mg, but at dose levels of 2.0 mg, there is no significant difference in teeth length between the two supplement types.

Assumptions

We assume that the variances between the separate populations tested are different (used var.equal = FALSE for all the t tests).
We assume that the populations are independent. They should be because in order to take these samples we would need at least 60 guinea pigs, and would not be able to 're-use' one for a different test.
We assume that other statistical rules were followed, such as random populations of guineau pigs, the guinea pigs are more or less similar as a population, the researchers took accurate measurements, and the reasearchers that took the measurements were unaware of the dosages and supplement types that each individual guinea pig had been treated with.