Overview

This study is an exercice to try to determine the effect of vitamin C on tooth growth in Guinea Pigs. More concretly, the effects of 2 factors: the delivery/administration method and the dosis.
This will be done in the following steps:

  1. a quick Exploratory Data Analysis (EDA)
  2. further comparison to validate or not our findings from the EDA

Basic Exploratory Data Analyses

First, we need to load the ‘ToothGrowth’ dataset (R default package)…

data("ToothGrowth")

Understanding the dataset context and values

A quick call to the ?ToothGrowth command gives us basic info about the dataset…
The study was realized on 60 guinea pigs. Each received one of three dose levels of vitamin C (0.5, 1 & 2 mg/day) by one of 2 delivery methods: orange juice (coded ‘OJ’) or ascorbic acid (coded ‘VC’).
Source: C. I. Bliss (1952) The Statistics of Bioassay. Academic Press.

Quick check

As with any new dataset, we first perform calls to str() and summary()

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
xtabs(~ supp+dose, data=ToothGrowth)
##     dose
## supp 0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10

No surprise here, it correspond to the description: 30 measures for each delivery method, 3 dosis and quite some variation between minimal and maximal odontoblasts length.
The last table allows to check the correct repartition between the different parameters (would affect calculus otherwise).

Plotting data

Next steps is to display a graphic of the data ; boxplot has been choosen here as it allows to easily spot the difference between various parameters (here method and dose).

#par(mar=c(4,4,1.5,0.5))
boxplot(ToothGrowth$len ~ ToothGrowth$supp * ToothGrowth$dose,
        col=c('orange', 'blue'), main="Impact of Vitamin C on Odontoblasts",
        xlab="delivery method & dosis", ylab="odontoblast length")
legend("bottomright", cex=0.8, inset=.02,
       c("Orange Juice (OC)","Ascorbic Acid (VC)"), fill=c("orange", "blue"))

From the graphic we can draft 2 hypothesis as it appears that the dose has an clear impact on the tooth growth. But it is not so clear about the method as orange juice seems more effective on small dosis but tend to have the same results with larger one.
This is just an intuition, it has to be confirmed with further tests…

Comparing growth

We will now use more advanced methods to confirm or not our hypothesis from the EDA.
We will use Hypothesis tests to determine wether or not the delivery method (‘supp’) and/or the dosis (‘dose’) have impacts on the tooth growth (or on the odontoblasts to be exact).

Hypothesis test on administration method

Let’s define our null hypothesis Ho: “delivery method have no impacts on the growth” and our alternative hypothesis Ha: “the growth is impacted by the delivery method”.
We can either do the calculus manually or… simply run the R command t.test seen in class ;)

t.test(ToothGrowth$len[ToothGrowth$supp=='OJ'], ToothGrowth$len[ToothGrowth$supp=='VC'])
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$len[ToothGrowth$supp == "OJ"] and ToothGrowth$len[ToothGrowth$supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

The p-value obtained 0.0606345 is greater than 0.05 (and, moreover, the CI95% does content 0) so we cannot reject the null hypothesis that delivery method has no impact on growth.

Hypothesis test on dosis administrated

Has for the delivery method, we can define our 2 hypotesis Ho: “dosis have no impacts on the growth” and Ha: “the growth is impacted by the dosis”.
The difference with our previous test is that, as there are 3 different doses, we will have to compare all 3 combinations possibles.
Let’s just retrieve just the corresponding p-value from our t-test…

t.test(ToothGrowth$len[ToothGrowth$dose==.5], ToothGrowth$len[ToothGrowth$dose==1])$p.value
## [1] 1.268301e-07
t.test(ToothGrowth$len[ToothGrowth$dose==.5], ToothGrowth$len[ToothGrowth$dose==2])$p.value
## [1] 4.397525e-14
t.test(ToothGrowth$len[ToothGrowth$dose==1],  ToothGrowth$len[ToothGrowth$dose==2])$p.value
## [1] 1.90643e-05

We see that the 3 p-values are way lower that 0.05 so we can reject the null hypotesis Ho and accept Ha that the growth is indeed impacted by the dosis.

Conclusions

We can conclude that the delivery method does not seems to have impact on the tooth growth ;
but on the contrary, the dosis of vitamin C administrated seems to have a real impact.

Assumptions

Samples from datasets are Independant Identically Distributed (iid)

  • Samples (Guinea Pigs) are representative of their population and were randomly picked-up
  • Delivery method and dosage were randomly assigned to subjects