Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

  1. Load the ToothGrowth data and perform some basic exploratory data analyses
  2. Provide a basic summary of the data.
  3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering)
  4. State your conclusions and the assumptions needed for your conclusions.

About ToothGrowth Dataset

Refer External Page: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html

Description: The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Format: A data frame with 60 observations on 3 variables.

  1. len numeric Tooth length
  2. supp factor Supplement type (VC or OJ).
  3. dose numeric Dose in milligrams/day

Source: C. I. Bliss (1952) The Statistics of Bioassay. Academic Press.

Load the ToothGrowth data

library(datasets)
library(ggplot2)
data(ToothGrowth)
dim(ToothGrowth)
## [1] 60  3
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Basic summary of the data

ToothGrowth$dose <- as.factor(ToothGrowth$dose)
summary(ToothGrowth)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90
mean(ToothGrowth$len)
## [1] 18.81333
sd(ToothGrowth$len)
## [1] 7.649315
var(ToothGrowth$len)
## [1] 58.51202
g <- ggplot(data = ToothGrowth, aes(x = dose, y = len, fill = dose))
g <- g + facet_grid(. ~ supp)
g <- g + geom_boxplot()
g <- g + labs(x = "Dosage (mg/day)", y = "Tooth length", title = "Tooth Growth Due to Dosage of different Supplements")
print(g)

Confidence intervals / hypothesis tests to compare tooth growth by supp and dose

Assumptions:

  1. Considered all Guinea pigs are equal lengths and same age/health types

  2. Considered all Guinea pigs are in comfortable environment and there is no stress on such samples

  3. There are small sample sizes, so the t-test is appropriate.

  4. The variances are not equal and just let R do the work to figure out the sample variance.

Supplement groups

Comparing the difference between supplement groups at independent of dose.

t.test(len~supp, paired = F, var.equal = F, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333
t.test(len~supp, paired = F, var.equal = T, data = ToothGrowth)
## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Dosage Groups Comparing the difference between supplement groups at dose group.

t.test(len~supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == .5))
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len~supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 1))
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len~supp, paired = F, var.equal = F, data = subset(ToothGrowth, dose == 2))
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

Conclusions

  1. We observed there are no difference in supplement as the p-value was 0.06 and the confidence interval is zero.

  2. For both .5mg and 1mg groups, a p-value of .006 and .001 respectively was found. For 2mg dose group there was no difference in supplement value So, for lower dosages (.5mg, 1mg) the delivery mechanism of choice is OJ as this is more effective than VC.

  3. The higher dosages had a significant effect.