Investigate the influence of supplement and dose during the tooth growth

Mar. 2015 by Alfred Lu

Exploratory analysis of the ToothGrowth data

First the tooth growth data is loaded, and an exploratory plot is made to get a first glance of data set. This data set consistes of a measurements of the mean size of the odontoblast cell in a balance designed factorial experiment. The subjects was divided into 6 groups of 10, and administered in two form (Orange Juice/OJ and Aqueous Solution/VC) of Vitamin C. Each form of Vitamin C is further carefully controlled by the daily dose (0.5, 1.0, 2.0). The boxplot in the exploratory plot shows the structure, similar to the trial background.

library(ggplot2)
data(ToothGrowth)

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
ggplot(data=ToothGrowth) + 
    geom_boxplot(aes(factor(dose), len, fill = supp)) + 
    theme_bw()

Investigate the influence from different supplement form

To pursuit this topic, we first aggregate the data into two groups, OJ VS. VC.

setOJ <- ToothGrowth[ToothGrowth$supp=='OJ',]
setVC <- ToothGrowth[ToothGrowth$supp=='VC',]

Typically, in this case we make a hypothesis test on the mean of two set, to check if Vitamin C administered by Orange Juice has different influce to the growth of tooth. Since we have no prior knowledge of population mean, t score is used. Moreover, since each group contains meansurement from independent samples from different subjects, we can’t make any assumption of a paired measurement. The sample standard deviation is 6.605561 in OJ group VS. 8.2660287 in VC group.

t.test(x = setOJ$len, y = setVC$len, paired = F, var.equal = F)

    Welch Two Sample t-test

data:  setOJ$len and setVC$len
t = 1.9153, df = 55.309, p-value = 0.06063
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.1710156  7.5710156
sample estimates:
mean of x mean of y 
 20.66333  16.96333 

result shows the p value is above the 5% percent, so we can’t reject the null hypothesis (aka, the supplement form in orange juice close influence than aqueous solution)

Investigate the influence from different dose

The similar manner is used in this section, we firstly subset the data into three groups.

set05 <- ToothGrowth[ToothGrowth$dose == 0.5,]
set10 <- ToothGrowth[ToothGrowth$dose == 1.0,]
set20 <- ToothGrowth[ToothGrowth$dose == 2.0,]

and t.test is performed on each two subbset, to check if more dose leads to different tooth’s length.

t.test(x = set10$len, y = set05$len, paired = F,var.equal = F)

    Welch Two Sample t-test

data:  set10$len and set05$len
t = 6.4766, df = 37.986, p-value = 1.268e-07
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.276219 11.983781
sample estimates:
mean of x mean of y 
   19.735    10.605 
t.test(x = set20$len, y = set10$len, paired = F,var.equal = F)

    Welch Two Sample t-test

data:  set20$len and set10$len
t = 4.9005, df = 37.101, p-value = 1.906e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 3.733519 8.996481
sample estimates:
mean of x mean of y 
   26.100    19.735 

the results shows the both p value are smaller than the 5% percent, which means we can accept two alternative hypothesis. Question is can we say more dose leads to longer tooth.

t.test(x = set10$len, y = set05$len, alternative = "greater", paired = F,var.equal = F)

    Welch Two Sample t-test

data:  set10$len and set05$len
t = 6.4766, df = 37.986, p-value = 6.342e-08
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 6.753323      Inf
sample estimates:
mean of x mean of y 
   19.735    10.605 
t.test(x = set20$len, y = set10$len, alternative = "greater", paired = F,var.equal = F)

    Welch Two Sample t-test

data:  set20$len and set10$len
t = 4.9005, df = 37.101, p-value = 9.532e-06
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
 4.17387     Inf
sample estimates:
mean of x mean of y 
   26.100    19.735 

results shows both alternative hypothesis could be accepted.