Overview

In this document, we are going to analyze the ToothGrowth data in the R datasets package. We commence with showing a basic summary of the data.This is followed by confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

Load data and Summarize data

Load the data

library(datasets)
data(ToothGrowth)

Summarize in three ways:

1.Overall Tooth Length and dose

2.Over the length of tooth by dose supplement type: OJ or VC, irrespect to dose level

3.Over combination of supp type and dose level

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

tapply(ToothGrowth$len, ToothGrowth$supp, FUN=summary)

## $OJ
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.20   15.52   22.70   20.66   25.72   30.90 
## 
## $VC
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   11.20   16.50   16.96   23.10   33.90

library(plyr)

ddply(ToothGrowth, .(supp,dose), summarize,mean = round(mean(len), 2),sd = round(sd(len), 2),max=round(max(len),2),min=round(min(len),2))

##   supp dose  mean   sd  max  min
## 1   OJ  0.5 13.23 4.46 21.5  8.2
## 2   OJ  1.0 22.70 3.91 27.3 14.5
## 3   OJ  2.0 26.06 2.66 30.9 22.4
## 4   VC  0.5  7.98 2.75 11.5  4.2
## 5   VC  1.0 16.77 2.52 22.5 13.6
## 6   VC  2.0 26.14 4.80 33.9 18.5

Visualize summary type 3 above by showing the boxplot over combination of supp type and dose level

library(ggplot2)
ToothGrowth$dose <- factor(ToothGrowth$dose)
ToothGrowth$supp <- factor(ToothGrowth$supp)
 ggplot(aes(y = ToothGrowth$len, x = ToothGrowth$dose, fill = ToothGrowth$supp), data = ToothGrowth) + geom_boxplot()+facet_grid(.~supp)+scale_x_discrete("Dosage") + scale_y_continuous("Teeth Length")+ggtitle("BloxPlot of Tooth Length by Dose amount for Two Supplements")

Study impact of supp and dose to to tooth growth

Before making t-test of two groups based on various dose type and level, it is necessary to evaluate the variance assumption that the variances of two groups are homogeneous. Fisher F-test is applied to verify whether the null hypothesis that variance between two groups are same with 0.05 significance level. We further assume that each sample group is drawn from normal distribution.

ToothGrouped <- dlply(ToothGrowth,.(supp,dose))
rbind(
  var.test(ToothGrouped[[1]]$len,ToothGrouped[[4]]$len)$p.value,
  var.test(ToothGrouped[[2]]$len,ToothGrouped[[5]]$len)$p.value,
  var.test(ToothGrouped[[3]]$len,ToothGrouped[[6]]$len)$p.value
  )

##            [,1]
## [1,] 0.16489022
## [2,] 0.20462137
## [3,] 0.09274336

It is found that p-values are all greater than 0.05, then we can not reject the null hypothesis and can assume that variance are homogeneous(var.equal = TRUE for t.test). We can do t.test with paired or independent two sample.

output <- c()
for (i in 1:3){
  x <- t.test(ToothGrouped[[i]]$len,ToothGrouped[[i+3]]$len,var.equal = TRUE,paired = FALSE)
  rowName <- paste(as.character(ToothGrouped[[i]]$supp[1]),as.character(ToothGrouped[[i+3]]$sup[1]),as.character(ToothGrouped[[i]]$dose[1]))
  output[[i]] <- cbind(x$p.value,x$conf.int[1],x$conf.int[2])
  rownames(output[[i]]) <- rowName
  colnames(output[[i]]) <- c("p-value","conf-low","conf-high")
}

print(output)

## [[1]]
##               p-value conf-low conf-high
## OJ VC 0.5 0.005303661 1.770262  8.729738
## 
## [[2]]
##              p-value conf-low conf-high
## OJ VC 1 0.0007807262 2.840692  9.019308
## 
## [[3]]
##           p-value  conf-low conf-high
## OJ VC 2 0.9637098 -3.722999  3.562999

for (i in 1:3){
  x <- t.test(ToothGrouped[[i]]$len,ToothGrouped[[i+3]]$len,var.equal = TRUE,paired = TRUE)
  rowName <- paste(as.character(ToothGrouped[[i]]$supp[1]),as.character(ToothGrouped[[i+3]]$sup[1]),as.character(ToothGrouped[[i]]$dose[1]))
  output[[i]] <- cbind(x$p.value,x$conf.int[1],x$conf.int[2])
  rownames(output[[i]]) <- rowName
  colnames(output[[i]]) <- c("p-value","conf-low","conf-high")
}

print(output)

## [[1]]
##              p-value conf-low conf-high
## OJ VC 0.5 0.01547205 1.263458  9.236542
## 
## [[2]]
##             p-value conf-low conf-high
## OJ VC 1 0.008229248 1.951911  9.908089
## 
## [[3]]
##           p-value  conf-low conf-high
## OJ VC 2 0.9669567 -4.328976  4.168976

We find that p-values for dose level 0.5 and 1.0 are smaller than 0.05, which indicate rejection of null hypothesis that the mean length under dose OJ and VC are the same. Under dose level 2, p-value is close to 1, which indicates failue of rejection the null hypothesis. Therefore, we can conclude that effects of dose OJ and VC are different under dose level 0.5 and 1 , while their effect is the same under dose level 2. This verifies the boxplot above.

ToothGrowth data

Ning Shen

Monday, December 21, 2015

Overview

Load data and Summarize data

Study impact of supp and dose to to tooth growth