Overview

From the project instructions:

Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
  4. State your conclusions and the assumptions needed for your conclusions.

Data Loading & Basic Summaries

The first things we can do to start this analysis is to load the ToothGrowth data and explore it. The dataset is an already exisiting R package, so no downloading is necessary.

# Showing system info for basic reproducibility
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.1  magrittr_1.5    tools_3.6.1     htmltools_0.4.0
##  [5] yaml_2.2.0      Rcpp_1.0.3      stringi_1.4.4   rmarkdown_2.1  
##  [9] knitr_1.27      stringr_1.4.0   xfun_0.12       digest_0.6.23  
## [13] rlang_0.4.4     evaluate_0.14
# Loading the datasets library and exploring the structure of the ToothGrowth dataset
library(datasets)
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# The data frame is only 60 observations so let's view the entire thing
ToothGrowth
##     len supp dose
## 1   4.2   VC  0.5
## 2  11.5   VC  0.5
## 3   7.3   VC  0.5
## 4   5.8   VC  0.5
## 5   6.4   VC  0.5
## 6  10.0   VC  0.5
## 7  11.2   VC  0.5
## 8  11.2   VC  0.5
## 9   5.2   VC  0.5
## 10  7.0   VC  0.5
## 11 16.5   VC  1.0
## 12 16.5   VC  1.0
## 13 15.2   VC  1.0
## 14 17.3   VC  1.0
## 15 22.5   VC  1.0
## 16 17.3   VC  1.0
## 17 13.6   VC  1.0
## 18 14.5   VC  1.0
## 19 18.8   VC  1.0
## 20 15.5   VC  1.0
## 21 23.6   VC  2.0
## 22 18.5   VC  2.0
## 23 33.9   VC  2.0
## 24 25.5   VC  2.0
## 25 26.4   VC  2.0
## 26 32.5   VC  2.0
## 27 26.7   VC  2.0
## 28 21.5   VC  2.0
## 29 23.3   VC  2.0
## 30 29.5   VC  2.0
## 31 15.2   OJ  0.5
## 32 21.5   OJ  0.5
## 33 17.6   OJ  0.5
## 34  9.7   OJ  0.5
## 35 14.5   OJ  0.5
## 36 10.0   OJ  0.5
## 37  8.2   OJ  0.5
## 38  9.4   OJ  0.5
## 39 16.5   OJ  0.5
## 40  9.7   OJ  0.5
## 41 19.7   OJ  1.0
## 42 23.3   OJ  1.0
## 43 23.6   OJ  1.0
## 44 26.4   OJ  1.0
## 45 20.0   OJ  1.0
## 46 25.2   OJ  1.0
## 47 25.8   OJ  1.0
## 48 21.2   OJ  1.0
## 49 14.5   OJ  1.0
## 50 27.3   OJ  1.0
## 51 25.5   OJ  2.0
## 52 26.4   OJ  2.0
## 53 22.4   OJ  2.0
## 54 24.5   OJ  2.0
## 55 24.8   OJ  2.0
## 56 30.9   OJ  2.0
## 57 26.4   OJ  2.0
## 58 27.3   OJ  2.0
## 59 29.4   OJ  2.0
## 60 23.0   OJ  2.0
# Lastly, let's look at a summary
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

We can see that the data set is 60 observations of 3 variables. Viewing the help page on ToothGrowth (?ToothGrowth), we see this description:

“The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC)”."

Comparing Tooth Growth by Delivery Method & Dose

We can begin to examine the effects of the delivery method (orange juice vs. vitamin C) and dose level on tooth growth by using some box plots.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.2
p_deliv <- ggplot(ToothGrowth, aes(x = supp, y = len, fill = supp))
p_deliv + geom_boxplot() + ggtitle("Tooth Length by Delivery Method") + xlab("Delivery Method") + ylab("Tooth Length")

We can see that the median tooth length is greater for the orange juice method. This by itself lends evidence that delivery via orange juice is a superior method for tooth length growth.

We can do a similar exercise for the three levels of dose.

## We know that the dose variable is numeric from a structure function above. 
## Therefore, first we convert it to a factor to be compatable with the box plot
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

p_dose <- ggplot(ToothGrowth, aes(x = dose, y = len, fill = dose))
p_dose + geom_boxplot() + ggtitle("Tooth Length by Dose") + xlab("Dose (mg/day of vitamin C)") + ylab("Tooth Length")

Here we see strong evidence for a positive dose-response relationship between vitamin C intake and tooth length. The greater the dose, the greater the tooth length.

There’s one more box plot we can show: the effect dose on tooth length across the 2 delivery methods.

p_both <- ggplot(ToothGrowth, aes(x = dose, y = len, fill = dose))
p_both + geom_boxplot() + facet_wrap(~ supp) + ggtitle("Tooth Length by Dose & Delivery Method") + xlab("Dose (mg/day of vitamin C)") + ylab("Tooth Length")

We can see that the effect of dose remains strong across the two delivery methods. The delivery method seems to matter far less than the level of dose in determining tooth length.

Hypothesis Testing

There are several hypothesis that we can test. We can test at each level of dose, each delivery method, and the cross of the two.

Hypothesis Testing - Dose Only

First, let’s look at dose independent of delivery method.

Null Hypothesis 1 (NH1) >H0 : Dose has no effect on tooth length from dose size 0.5 to 1.0

dose1 <- subset(ToothGrowth, dose == "0.5")$len
dose2 <- subset(ToothGrowth, dose == "1")$len

t.test(dose1, dose2, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dose1 and dose2
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean of x mean of y 
##    10.605    19.735

We see a very small p-value and a confidence interval which doesn’t contain 0. Therefore, assuming a confidence level of 0.05, we can reject the H0. The effect is statistically significant.

Null Hypothesis 2 (NH2) >H0 : Dose has no effect on tooth length from dose size 1 to 2

dose2 <- subset(ToothGrowth, dose == "1")$len
dose3 <- subset(ToothGrowth, dose == "2")$len

t.test(dose2, dose3, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dose2 and dose3
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean of x mean of y 
##    19.735    26.100

Similar to the testing of NH1, we see a very small p-value and a confidence interval which doesn’t contain 0. We reject NH2 at the confidence level of 0.05.

Hypothesis Testing - Delivery Method Only

Now we can look at the effect of the delivery method on tooth length, independent of dose level.

Null Hypothesis 3 (NH3) >H0 : Delivery Method has no effect on tooth length

dm1 <- subset(ToothGrowth, supp == "VC")$len
dm2 <- subset(ToothGrowth, supp == "OJ")$len

t.test(dm1, dm2, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dm1 and dm2
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5710156  0.1710156
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

Here we actually do see a p-value > 0.05 and a confidence interval containing 0. Therefore, we cannot reject NH3 at the level of 0.05. We conclude at the effect of delivery method independent of dose level is statistically insignificant.

Hypothesis Testing - Dose x Delivery Method

The last tests we can run are in regards to the effect of delivery method at each dose level. We saw that delivery method is not statistically significant in the aggregate, but is it at a given dose level?

Null Hypothesis 4 (NH4) >H0 : Delivery Method has no effect on tooth length for a dose level of 0.5

dm3 <- subset(ToothGrowth, supp == "VC" & dose == "0.5")$len
dm4 <- subset(ToothGrowth, supp == "OJ" & dose == "0.5")$len

t.test(dm3, dm4, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dm3 and dm4
## t = -3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.780943 -1.719057
## sample estimates:
## mean of x mean of y 
##      7.98     13.23

We see a p-value < 0.05, therefore reject NH4. The delivery method appears to have a statistically significant effect at a fixed dose level of 0.5.

Null Hypothesis 5 (NH5) >H0 : Delivery Method has no effect on tooth length for a dose level of 1

dm5 <- subset(ToothGrowth, supp == "VC" & dose == "1")$len
dm6 <- subset(ToothGrowth, supp == "OJ" & dose == "1")$len

t.test(dm5, dm6, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dm5 and dm6
## t = -4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.057852 -2.802148
## sample estimates:
## mean of x mean of y 
##     16.77     22.70

We see a p-value < 0.05, therefore reject NH5. The delivery method appears to have a statistically significant effect at a fixed dose level of 1.

Null Hypothesis 6 (NH6) >H0 : Delivery Method has no effect on tooth length for a dose level of 2

dm7 <- subset(ToothGrowth, supp == "VC" & dose == "2")$len
dm8 <- subset(ToothGrowth, supp == "OJ" & dose == "2")$len

t.test(dm7, dm8, paired = FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  dm7 and dm8
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.63807  3.79807
## sample estimates:
## mean of x mean of y 
##     26.14     26.06

Here, the effect disappears. We see a p-value > 0.05 and a confidence interval containing 0, therefore fail to reject NH6.

Conclusions

  1. Tooth length increases positively and linearly with dose level
  2. The delivery method doesn’t affect tooth length, independent of the dose level.
  3. For a given dose level, delivery method may or may not have an effect.

Assumptions: samples are independent and follow a normal distribution