Overview

This is the second part of Statistical Inference Course Project, part of John Hopkins’ Data Science speacialization at coursera.

We’ll going to apply some statistic analyzis in the the ToothGrowth data in the R datasets package, to:

  1. Provide a basic summary of the data.
  2. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
  3. State your conclusions and the assumptions needed for your conclusions.

Loading and Cheking the dataset

Let’s load the data set and see the structure and the content

# loading the dataset
data(ToothGrowth)
tg <- ToothGrowth #shortcut

# knowing the structure
str(tg)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(tg)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(tg)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
table(tg$supp, tg$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

We can see in the data (and in the description of the dataset at R help) is a record of responses is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Let’s see, if apparently, there are some differences between dosage or supplement type.

library(ggplot2)
g <- ggplot(tg, aes(x=dose, y=len, color=supp))
g <- g + facet_wrap(~supp)
g <- g + geom_point(size=4) 
g

These charts show, visually, interesting results of tooth growth according to dosage, but is more difficult to know if the supplement type has an influence.

Statistic Analysis

Dosage Influence

First, we’ll confirm if the dosage has a significant influence in the tooth growth for both supplementary, let’s compare the tooth length with respect of doses 0.5mg and 2.0mg, C vitamin.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Orange Juice
oj05 <- tg %>% filter(supp=="OJ", dose==0.5) %>% select(len)
oj20 <- tg %>% filter(supp=="OJ", dose==2.0) %>% select(len)
t.test(x = oj20, y=oj05 , var.equal = T)
## 
##  Two Sample t-test
## 
## data:  oj20 and oj05
## t = 7.817, df = 18, p-value = 3.402e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   9.381777 16.278223
## sample estimates:
## mean of x mean of y 
##     26.06     13.23
# ascorbic acid
vc05 <- tg %>% filter(supp=="VC", dose==0.5) %>% select(len)
vc20 <- tg %>% filter(supp=="VC", dose==2.0) %>% select(len)
t.test(x = vc20, y=vc05 , var.equal = T)
## 
##  Two Sample t-test
## 
## data:  vc20 and vc05
## t = 10.388, df = 18, p-value = 4.957e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  14.48716 21.83284
## sample estimates:
## mean of x mean of y 
##     26.14      7.98

conclusion

According with t.test results, seems to exist a statistic significant impact of the dosage, for each supplement type, in the tooth length.

Supplement Type Influence

Now, we want to check if the supplement type has a statistic significant difference between the results, we’ll do the analysis for each dose.

## function to perform comparation between supplement types at some dosage 
testSuppType <- function(dosage) {
  oj <- tg %>% filter(dose==dosage,supp=="OJ") %>% select(len)
  vc <- tg %>% filter(dose==dosage,supp=="VC") %>% select(len)
  test <- t.test(x=oj, y=vc, var.equal=T)
  round(c(test$estimate[1],test$estimate[2],test$estimate[1]-test$estimate[2],test$conf.int[1],test$conf.int[2],test$p.value),5)
}

## applying t.tests
comp <- data.frame(t(sapply(c(0.5,1.0,2.0),FUN=testSuppType)))

## formating
colnames(comp) <- c("mean.OJ","mean.VC","OJ-VC","low.conf.int","sup.conf.inf","p.value")
rownames(comp) <- c("0.5mg","1.0mg","2.0mg")

## comparations
comp
##       mean.OJ mean.VC OJ-VC low.conf.int sup.conf.inf p.value
## 0.5mg   13.23    7.98  5.25      1.77026      8.72974 0.00530
## 1.0mg   22.70   16.77  5.93      2.84069      9.01931 0.00078
## 2.0mg   26.06   26.14 -0.08     -3.72300      3.56300 0.96371

Conclusions

According to data, seems to exist some statistical significance difference between the supplement types Orange Juice and Ascorbic Acid at the 0.5mg and 1.0mg doses in direction of Orange Juice, but this isn’t true for doses of 2.0 mg where there is no statistical significance difference.