author: liuyubobobo
date: Sunday, March 22, 2015
In this report, we’re going to analyze the ToothGrowth data in the R datasets package. We’ll first load the ToothGrowth data and perform some basic exploratory data analyses, provide a basic summary of the data. Then, we’ll use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
First of all, we need to load the data.
library(datasets)
data(ToothGrowth)
Then, we can look basically at the dataset.
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
We can plot the data
#library(ggplot2)
par(mfrow=c(1,2))
plot( len ~ supp , data = ToothGrowth )
plot( len ~ dose , data = ToothGrowth )
From the plots, we can conclude that: 1) basicaly speaking, the OJ makes the len of tooth growth longer than VC, and the len under the VC supp distribute more variable than under the OJ supp. 2) the dose clearly affect the tooth growth len. The more dose, the longer len.
We use t.test to test the hypothesis that H0:the mean length of toothgrow with supp OJ equals to the mean length of toothgrow with supp VC. H1: the mean length of toothgrow with supp OJ is different from the mean toothgrow length with supp VC.
t.test(ToothGrowth$len[ToothGrowth$supp == "OJ"], ToothGrowth$len[ToothGrowth$supp == "VC"], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1670064 7.5670064
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
According to the result, we can see that the p-value is larger than 0.05, we fail to reject H0. As a result, we cannot say there’s a different on tooth growth with supp OJ and VC.
We use t.test to test the hypothesis that H0:the mean length of toothgrow with one level of dose equals to the mean length of toothgrow with another level of dose. H1: the mean length of toothgrow with one level of dose is different from the mean toothgrow length with another level of dose.
Because
we have three levels of dose - 0.5, 1.0, 2.0, we do the t.test into 3 pairs.
t.test(ToothGrowth$len[ToothGrowth$dose == 0.5], ToothGrowth$len[ToothGrowth$dose == 1.0], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$dose == 0.5] and ToothGrowth$len[ToothGrowth$dose == 1]
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.983748 -6.276252
## sample estimates:
## mean of x mean of y
## 10.605 19.735
t.test(ToothGrowth$len[ToothGrowth$dose == 1.0], ToothGrowth$len[ToothGrowth$dose == 2.0], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$dose == 1] and ToothGrowth$len[ToothGrowth$dose == 2]
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -8.994387 -3.735613
## sample estimates:
## mean of x mean of y
## 19.735 26.100
t.test(ToothGrowth$len[ToothGrowth$dose == 0.5], ToothGrowth$len[ToothGrowth$dose == 2.0], paired = FALSE, var.equal = TRUE)
##
## Two Sample t-test
##
## data: ToothGrowth$len[ToothGrowth$dose == 0.5] and ToothGrowth$len[ToothGrowth$dose == 2]
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -18.15352 -12.83648
## sample estimates:
## mean of x mean of y
## 10.605 26.100
From all the 3 tests, we can see that all three p-value are smaller than 0.05, which means we should reject the H0. Therefore, we can conclude that different level of dose DOES affect the length of teethgrow. The more dose we use, the longer teeth grow.