Simple inferential data analysis on ToothGrowth data

author: angelayuan
date: Friday, March 20, 2015

Overwiew

In this project we will analyze the ToothGrowth data in the R datasets package. We will first perform some basic exploratory data analyses and then perform some relevant confidence intervals and/or tests.

Exploratory data analyses

First, we load the ToothGrowth data in the R datasets package, check the first few rows, and summarize the data.

library(datasets)
data(ToothGrowth)
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Here We Perform some basic exploratory data analyses using ggplot2 plot system. We plot the distribution of tooth length at each level of supp. And we make the scatter plot of dose versus length at each level of supp.

library(ggplot2)
qplot(len, data=ToothGrowth, fill=supp, xlab = "Tooth Length")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

g <- ggplot(ToothGrowth, aes(dose, len))
g + geom_point(aes(color = supp), size = 4, alpha = 1/2) + labs(title = "Tooth Growth") + labs(x ="Dose", y="Tooth Length") 

According to above results, we can see that (1) tooth length with VC supp has a wider distribution than those with OJ supp; (2) teeth are longer with OJ than those with VC supp at the dose 0.5 and 1.0 level in general; and (3) the larger the dose, the longer the tooth.

Statistical inference

We will investigate two questions: (1) Is there a different effect on tooth growth between OJ and VC? (2) Is there a different effect on tooth growth across different levels of supp dose?

Question 1

For the first question, we want to test the hypothesis that H0: mean length with OJ equals to mean length with VC. H1: mean length with OJ is longer or shorter than mean length with VC. We perform both confidence intervals and independent t test.

mnOJ <- mean(ToothGrowth$len[ToothGrowth$supp == "OJ"])
sdOJ <- sd(ToothGrowth$len[ToothGrowth$supp == "OJ"])
nOJ <- length(ToothGrowth$len[ToothGrowth$supp == "OJ"])
        
mnVC <- mean(ToothGrowth$len[ToothGrowth$supp == "VC"])
sdVC <- sd(ToothGrowth$len[ToothGrowth$supp == "VC"])
nVC <- length(ToothGrowth$len[ToothGrowth$supp == "VC"])

mnOJ - mnVC + c(-1,1)*qt(0.975, nOJ+nVC-2)*sqrt(((nOJ-1)*sdOJ^2+(nVC-1)*sdVC^2)/(nOJ+nVC-2))*sqrt(1/nOJ+1/nVC)
## [1] -0.1670064  7.5670064
t.test(ToothGrowth$len[ToothGrowth$supp == "OJ"], ToothGrowth$len[ToothGrowth$supp == "VC"], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

According to above results, the confidence interval contains 0, therefore we fail to reject H0. We cannot say there is a different effect on tooth growth between OJ and VC supp.

Question 2

For the second question, we want to test the hypothesis that H0: mean length at one level of dose equals to mean length at another level of dose. H1: mean length at one level of dose is longer or shorter than mean length at one level of dose. We perform both confidence intervals and independent t test.

Cosidering there are three levels of dose (0.5, 1.0, 2.0), we perform t test in following orders: (1) dose 0.5 vs. dose 1.0; (2) dose 1.0 vs. dose 2.0; and (3) dose 0.5 vs. dose 2.0.

mn05 <- mean(ToothGrowth$len[ToothGrowth$dose == 0.5])
sd05 <- sd(ToothGrowth$len[ToothGrowth$dose == 0.5])
n05 <- length(ToothGrowth$len[ToothGrowth$dose == 0.5])
        
mn1 <- mean(ToothGrowth$len[ToothGrowth$dose == 1.0])
sd1 <- sd(ToothGrowth$len[ToothGrowth$dose == 1.0])
n1 <- length(ToothGrowth$len[ToothGrowth$dose == 1.0])

mn2 <- mean(ToothGrowth$len[ToothGrowth$dose == 2.0])
sd2 <- sd(ToothGrowth$len[ToothGrowth$dose == 2.0])
n2 <- length(ToothGrowth$len[ToothGrowth$dose == 2.0])

t.test(ToothGrowth$len[ToothGrowth$dose == 1.0], ToothGrowth$len[ToothGrowth$dose == 0.5], paired = FALSE, var.equal = TRUE)$conf
## [1]  6.276252 11.983748
## attr(,"conf.level")
## [1] 0.95
t.test(ToothGrowth$len[ToothGrowth$dose == 2.0], ToothGrowth$len[ToothGrowth$dose == 1.0], paired = FALSE, var.equal = TRUE)$conf
## [1] 3.735613 8.994387
## attr(,"conf.level")
## [1] 0.95
t.test(ToothGrowth$len[ToothGrowth$dose == 2.0], ToothGrowth$len[ToothGrowth$dose == 0.5], paired = FALSE, var.equal = TRUE)$conf
## [1] 12.83648 18.15352
## attr(,"conf.level")
## [1] 0.95

According to above results, the confidence intervals for three comparisons are all above zero, therefore we reject H0. We can conclude that the larger the supp dose, the longer the tooth.