Analysis on The Effect of Vitamin C on Tooth Growth in Guinea Pigs

As the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

Load the ToothGrowth data and perform some basic exploratory data analyses

useing help(ToothGrowth) can help us to have a look at the description of the data: The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).Lucky pigs!

suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
data(ToothGrowth)
str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

ggplot(ToothGrowth, aes(x=dose, y=len, colour=supp)) +
    geom_point(alpha=.5, size=5) +
    scale_size_area() + 
    scale_colour_brewer(palette="Set1") +
    stat_smooth(method=lm) +
    theme(legend.position=c(1,0), legend.justification=c(1,0))+ 
    ggtitle("Tooth length in relation to dose by supp") +
    facet_grid(.~supp)

Provide a basic summary of the data.

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

ToothGrowth %>%
  group_by(dose, supp) %>%
  summarise(count=n(), mean=mean(len), sum=sum(len), S=sd(len), SE=S/sqrt(n()))

## Source: local data frame [6 x 7]
## Groups: dose
## 
##   dose supp count  mean   sum        S        SE
## 1  0.5   OJ    10 13.23 132.3 4.459709 1.4102837
## 2  0.5   VC    10  7.98  79.8 2.746634 0.8685620
## 3  1.0   OJ    10 22.70 227.0 3.910953 1.2367520
## 4  1.0   VC    10 16.77 167.7 2.515309 0.7954104
## 5  2.0   OJ    10 26.06 260.6 2.655058 0.8396031
## 6  2.0   VC    10 26.14 261.4 4.797731 1.5171757

ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=supp)) +
    geom_boxplot() +
    theme(legend.position=c(1,0), legend.justification=c(1,0))+ 
    ggtitle("Tooth length in relation to dose by supp")

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

From the plots above, it seems OJ has a better effect on teeth growth than VC, let’s make a test this hypothesis by hold a H0 as the mean of oj and vc is the same(u1=u2), so the alternative Ha is that the u1 != U2, let’s do the test step by step:

oj <- ToothGrowth %>% filter(supp=='OJ') %>% select(len)
vc <- ToothGrowth %>% filter(supp=='VC') %>% select(len)
x1 <- vc$len; x2 <- oj$len
n1 <-length(x1); n2 <-length(x2)
u1 <- mean(x1); u2 <- mean(x2)
var1 <- var(x1); var2 <- var(x2)
se <- sqrt((var1*(n1-1) + var2*(n2-1))/(n1+n2-2)*(1/n1+1/n2))
alpha <- .05
t <- (u2 - u1) / se
ci <- u1 + c(-1, 1) *se*qt(1-alpha/2, n1+n2-2)
p.value <- pt(t, n1+n2-2, lower.tail=F) * 2
t; p.value; ci; u1; u2

## [1] 1.915268

## [1] 0.06039337

## [1] 13.09633 20.83034

## [1] 16.96333

## [1] 20.66333

We can see that the OJ mean ‘20.66333’ is narrowly inside of our boundary of our 95% confidence interval [16.96333, 20.66333]. So we failed to reject the null hypothesis of no differnce in delivery method. The afore-mentioned steps can be done in R with just one step:

t.test(len ~ supp, data = ToothGrowth, paired = F, var.equal = T, alternative ="two.sided")

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

So zero value is inside the ci region, we cannot rule out H0: no differnce hypothesis.

Let compare the dose effect:

 t.test(len~dose, ToothGrowth, dose %in% c(1.0,0.5), paired = F, var.equal = T, alternative ="two.sided")

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983748  -6.276252
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

 t.test(len~dose, ToothGrowth, dose %in% c(2.0,1.0), paired = F, var.equal = T, alternative ="two.sided")

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.994387 -3.735613
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

Both of the null hypothesis can be rejected for we got two very strong t stastic:-6.5, -12, which indicates that the increase of of dose level have a definite effect to the teeth growth.