Synopsis

This report will try to answer the following question: Do Deliver Method and/or dose affect tooth growth in guinea pigs using hypothesis tests and confidence intervals.

The analysis will be based on the R ‘ToothGrowth’ dataset.

Introduction

The R help is a bit confusing about the number of guinea pigs and whether there are any groups. A quick search on Internet yield the following result:

“The Crampton paper makes it clear that these data are 60 distinct guinea pigs, as odontoblasts measurements were taken under microscope for each guinea pig after the guinea pigs were sacrificed and has their teeth removed.

Perhaps the ToothGrowth desscription could be modified to read “The response is the length of odontoblasts (teeth) in each of 60 guinea pigs, 10 for each combination of dose level of Vitamin C (0.5, 1, and 2 mg) and delivery method (orange juice or ascorbic acid)”.

Taken from bugs.r-project.org

Data summary

First some basic information about the dataset:

# First load the data
data(ToothGrowth)
data <- tbl_df(ToothGrowth)

# What are the dimensions
dim(ToothGrowth)
## [1] 60  3
# What are the variables
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Exploratory data analysis

Data overview

We can break the data by dose and group (see R help)

par(oma=c(0,0,3,0))
coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
       ylab = c("Tooth growth"),
       xlab = c("dose","")) 

title(main="ToothGrowth data: length vs dose, given type of supplement", outer=TRUE)

It seems that the Orange Juice delivery method is more efficient, but is it significant? It seem that there is a correlation between the dose and the tooth length, but is it significant? In addition, it seems that the variance is different between groups.

Variance between groups

group_by(data, supp, dose) %>% summarise(variance=var(len))
## Source: local data frame [6 x 3]
## Groups: supp [?]
## 
##     supp  dose  variance
##   (fctr) (dbl)     (dbl)
## 1     OJ   0.5 19.889000
## 2     OJ   1.0 15.295556
## 3     OJ   2.0  7.049333
## 4     VC   0.5  7.544000
## 5     VC   1.0  6.326778
## 6     VC   2.0 23.018222

Key assumptions

Now that we have visualized the data, we can make the following assumptions:

  1. 6 independent groups of guinea pigs, one for each combination of dose level and delivery method (3x2)
  2. Group size is 10 guinea pigs
  3. The underlying population variance for the groups is not constant

Further more, with assume that the underlying population for each question is mean centered and of a Gaussian shape (not skewed). For example, if we had all guinea pig and we will give them orange juice, the distribution will be balanced and of a Gaussian shape. Finally, because of the size of the groups, a T tests will be performed to be more conservative.

Analysis

Statistical tests

In all statistical tests, we will consider the following hypotheses:

H0: mu = mu0
Ha: mu > mu0

Question 1: Is there a relationship between deliver method and tooth growth?

Marginalized results

First lets consider tooth growth over all doses.

lenOJ <- dplyr::filter(data,supp=="OJ")$len
lenVC <- dplyr::filter(data,supp=="VC")$len
t <- t.test(lenOJ,lenVC, paired=FALSE, var.equal=FALSE,alt="greater")
df <- data.frame(pvalues=t$p.value, conf=paste("[",round(t$conf[1],3), ",", 
                                               round(t$conf[2],3),"]"))
row.names(df) <- c(""); print(df)
##     pvalues            conf
##  0.03031725 [ 0.468 , Inf ]

The p-value is lower than the alpha level of 0.05, so H0 (both mean are equal) is rejected in favor of Ha (orange juice mean is greater than ascorbic acid). In addition, the confidence interval does not contain 0.

Per dose results

Now lets check that the previous result is valid for each dose.

dosages <- c(2.0,1.0,0.5);pValues <- c();confInt <- c()
for (d in dosages) {
    lenOJ2 <- dplyr::filter(data,supp=="OJ", dose==d)$len
    lenVC2 <- dplyr::filter(data,supp=="VC", dose==d)$len
    t <- t.test(lenOJ2,lenVC2,paired=FALSE, var.equal=FALSE,alt="greater")
    strp <- paste("[",round(t$conf[1],3), ",", round(t$conf[2],3),"]")
    pValues <- c(pValues,t$p.value); confInt <- c(confInt,strp)
}
df <- data.frame(pvalues=pValues, conf=confInt)
row.names(df) <- dosages
print(df)
##          pvalues             conf
## 2   0.5180742056 [ -3.133 , Inf ]
## 1   0.0005191879  [ 3.356 , Inf ]
## 0.5 0.0031793034  [ 2.346 , Inf ]

It seems that for doses 0.5 and 1.0, the tendency is confirmed however for dose 2.0 the p-value is high. A further investigation shows that for this dosage, mean tooth growth is greater with ascorbic acid than orange juice.

Question 2: Is there a relationship between dose level and tooth growth?

In the following tests we will consider only the 0.5 and 2.0 dosages.

Marginalized results

First lets consider tooth growth over all delivery methods.

len2 <- dplyr::filter(data,dose==2.0)$len
len5 <- dplyr::filter(data,dose==0.5)$len
t <- t.test(len2,len5,paired=FALSE, var.equal=FALSE,alt="greater")
df <- data.frame(pvalues=t$p.value, conf=paste("[",round(t$conf[1],3), ",", 
                                               round(t$conf[2],3),"]"))
row.names(df) <- c(""); print(df)
##       pvalues             conf
##  2.198762e-14 [ 13.279 , Inf ]

The p-value is very small. So H0 (no difference) is rejected. There is a tooth growth correlation with the level of intake of Vitamin C.

Per delivery method results

Lets check now that this correlation is still valid per delivery method.

supp <- c("OJ","VC");pValues <- c();confInt <- c()
for (s in supp) {
    len2 <- dplyr::filter(data,supp==s, dose==2.0)$len
    len5 <- dplyr::filter(data,supp==s, dose==0.5)$len
    t <- t.test(len2,len5,paired=FALSE, var.equal=FALSE,alt="greater")
    strp <- paste("[",round(t$conf[1],3), ",", round(t$conf[2],3),"]")
    pValues <- c(pValues,t$p.value); confInt <- c(confInt,strp)
}
df <- data.frame(pvalues=pValues, conf=confInt)
row.names(df) <- c("Orange juice", "Ascorbic acid")
print(df)
##                    pvalues             conf
## Orange juice  6.618919e-07  [ 9.948 , Inf ]
## Ascorbic acid 2.340789e-08 [ 15.086 , Inf ]

In both cases p-value are small, so the difference is significant. In addition, the confidence intervals do not contain 0.

Conclusions

Question 1: Is there a relationship between deliver method and tooth growth?

It seems that the Orange Juice delivery method is more effective but up to a certain level of dose.

This was already shown in the exploratory data analysis but is confirmed with the statistical tests.

Here are some remarks:

  1. Across all doses the p-value was lower than 0.05
  2. For dose 0.5 and 1.0, the p-value was lower than 0.05
  3. For dose 2.0 there is an inversion of the difference

Question 2: Is there a relationship between dose level and tooth growth

There is definitively a relationship between dose and tooth growth. All tests were significant.