Statistical Inference on ToothGrowth Dataset

author: liuyubobobo
date: Sunday, March 22, 2015

Overview

In this report, we’re going to analyze the ToothGrowth data in the R datasets package. We’ll first load the ToothGrowth data and perform some basic exploratory data analyses, provide a basic summary of the data. Then, we’ll use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

Load the ToothGrowth data

First of all, we need to load the data.

library(datasets)
data(ToothGrowth)

Basic Data Exploratory

Then, we can look basically at the dataset.

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

We can plot the data

#library(ggplot2)
par(mfrow=c(1,2))
plot( len ~ supp , data = ToothGrowth )
plot( len ~ dose , data = ToothGrowth )

From the plots, we can conclude that: 1) basicaly speaking, the OJ makes the len of tooth growth longer than VC, and the len under the VC supp distribute more variable than under the OJ supp. 2) the dose clearly affect the tooth growth len. The more dose, the longer len.

Use hypothesis test to compare tooth growth by supp

We use t.test to test the hypothesis that H0:the mean length of toothgrow with supp OJ equals to the mean length of toothgrow with supp VC. H1: the mean length of toothgrow with supp OJ is different from the mean toothgrow length with supp VC.

t.test(ToothGrowth$len[ToothGrowth$supp == "OJ"], ToothGrowth$len[ToothGrowth$supp == "VC"], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

According to the result, we can see that the p-value is larger than 0.05, we fail to reject H0. As a result, we cannot say there’s a different on tooth growth with supp OJ and VC.

Use hypothesis test to compare tooth growth by dose

We use t.test to test the hypothesis that H0:the mean length of toothgrow with one level of dose equals to the mean length of toothgrow with another level of dose. H1: the mean length of toothgrow with one level of dose is different from the mean toothgrow length with another level of dose.
Because

we have three levels of dose - 0.5, 1.0, 2.0, we do the t.test into 3 pairs.

  1. compare dose 0.5 and 1.0
t.test(ToothGrowth$len[ToothGrowth$dose == 0.5], ToothGrowth$len[ToothGrowth$dose == 1.0], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 0.5] and ToothGrowth$len[ToothGrowth$dose == 1]
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983748  -6.276252
## sample estimates:
## mean of x mean of y 
##    10.605    19.735
  1. compare dose 1.0 and 2.0
t.test(ToothGrowth$len[ToothGrowth$dose == 1.0], ToothGrowth$len[ToothGrowth$dose == 2.0], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 1] and ToothGrowth$len[ToothGrowth$dose == 2]
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.994387 -3.735613
## sample estimates:
## mean of x mean of y 
##    19.735    26.100
  1. compare dose 0.5 and 2.0
t.test(ToothGrowth$len[ToothGrowth$dose == 0.5], ToothGrowth$len[ToothGrowth$dose == 2.0], paired = FALSE, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$dose == 0.5] and ToothGrowth$len[ToothGrowth$dose == 2]
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15352 -12.83648
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

From all the 3 tests, we can see that all three p-value are smaller than 0.05, which means we should reject the H0. Therefore, we can conclude that different level of dose DOES affect the length of teethgrow. The more dose we use, the longer teeth grow.