PART 2: Tooth Growth Data Analysis

The Effect Of Vitamin C On Tooth Growth In Guinea Pigs

Overview

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

We’re going to analyze the ToothGrowth data in the R datasets package.

1- Load the ToothGrowth data and perform some basic exploratory data analyses. 2- Provide a basic summary of the data. 3- Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) 4- State your conclusions and the assumptions needed for your conclusions.

library(data.table)
library(ggplot2)
echo = TRUE # display the code
results = 'asis' # display the output without formatting

Basic EDA and summary

1- Load the ToothGrowth data and perform some basic exploratory data analyses 2- Provide a basic summary of the data.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
data(ToothGrowth)
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

We have a data frame of 60 observations and 3 variables.

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Tooth growth by dose plot

as part of EDA

ggplot(ToothGrowth) +
        geom_boxplot(mapping = aes(supp, len, fill = supp)) +
        facet_grid(~ dose)

This plot shows that the average tooth growth under OJ supplement is higher than the average tooth growth under VC supplement under the dosages 0.5 and 1.

Tooth growth by supplements plot

As part of EDA

ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(ToothGrowth) +
        geom_boxplot(mapping = aes(dose, len, fill = dose)) +
        facet_grid(~ supp)

This plot shows higher doses of supplements increase the average tooth growth for both VC and OJ.

length vs dose given type of supplement plot

This is not neccessary but cool!

library(graphics)
coplot(len~dose | supp, data = ToothGrowth, panel = panel.smooth)

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

Since we don’t have any data from before the experiment (experiment being: using the supplements with different doses as oppose to not using any supplement) we have to formulate our null hypotheses based on comparing: - the effect of two suuplements OJ and VC - the effect of three different doses 0.5, 1, 2 We can also consider the graphs from our EDA to formulate our null hypotheses. So we formulate two H0 as follow:

compare tooth growth in two groups by supp

Null Hypothesis (H0): The type of supplement (VC vs OJ) has no effect on tooth growth.

compare tooth growth in three groups by dose

Null Hypothesis (H0): Increasing levels of the dosage has no affect on the tooth growth.

We can test this hypothesis for both supplements. In other word: - Increasing the dose of OJ has no affect on the tooth growth. _ Increasing the dose of VC has no effect on the tooth growth.

Compare tooth growth by supp

Null Hypothesis (H0): The type of supplement (VC vs OJ) has no effect on tooth growth.

len <- ToothGrowth$len
dose <- ToothGrowth$dose
supp <- ToothGrowth$supp
t.test(len[supp == "OJ"], len[supp == "VC"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[supp == "OJ"] and len[supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

The confidence interval contains 0 which means it is uncertain whether there is a treatment (supp) effect. This doesn’t mean that there is no treatment effect. p-value = 0.06063 is bigger than 0.05 so is NOT considered statistically significant and hence indicates weak evidence against the null hyposthesis. As a result we fail to reject the null hypothesis. Meaning, there is no significant evidence that the type of supplement has an effet on tooth growth. However, since pvalue is very close to 0.05 we might seek further investigation.

Just out of curiosity

Just out of curiosity, I’ like to see whether there is an effect on tooth growth: - from dose level 1 of OJ vs dose level 1 of VC Why? because from EDA graphs it looks like that there is such effect. But also it could be my vision that is not very accurate :-)

t.test(len[dose == 1 & supp == "OJ"], len[dose == 1 & supp == "VC"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 1 & supp == "OJ"] and len[dose == 1 & supp == "VC"]
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean of x mean of y 
##     22.70     16.77

What the heck? p-value = 0.001038 is less than 0.05 which means we reject the null hypothesis. That was MY hypothesis, how dare this pvalue is!!!!

Compare tooth growth by dose

Null Hypothesis (H0): Increasing levels of the dosage has no affect on the tooth growth.

We are going to formulate our hupotheses in this section for each supp seperately. Meaning, we are going to run the hypothesis test to: - Compare dose levels 0.5 and 1 for OJ - Compare dose levels 1 and 2 for OJ - Compare dose levels 0.5 and 2 for OJ - Compare dose levels 0.5 and 1 for VC - Compare dose levels 1 and 2 for VC - Compare dose levels 0.5 and 2 for VC

t.test(len[dose == 0.5 & supp == "OJ"], len[dose == 1 & supp == "OJ"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 0.5 & supp == "OJ"] and len[dose == 1 & supp == "OJ"]
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.415634  -5.524366
## sample estimates:
## mean of x mean of y 
##     13.23     22.70

The confidence interval does not contain zero. The pvalue (p-value = 8.785e-05)is less than 0.05 and is very small (close to zero) which means we reject null hypothesis. In other word, increasing the dose from 0.5 to 1 has an effect on the tooth growth.

t.test(len[dose == 1 & supp == "OJ"], len[dose == 2 & supp == "OJ"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 1 & supp == "OJ"] and len[dose == 2 & supp == "OJ"]
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y 
##     22.70     26.06

pvalue is p-value = 0.0392 which is smaller than 0.05 so we reject H0, meaning as we increase the dose of OJ from 1 to 2 we see an effect in the tooth growth. However, since the pvalue is close to the cut off and the upper interval is close to 0 we might want to run another investigation. Also looking at the EDA graphs we can confirm that the jump in growth for supp OJ from 0.5 to 1 is more significant than from 1 to 2. Consequently it will be significant from 0.5 to 2. But we will do the hypothesis test anyway.

t.test(len[dose == 0.5 & supp == "OJ"], len[dose == 2 & supp == "OJ"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 0.5 & supp == "OJ"] and len[dose == 2 & supp == "OJ"]
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.335241  -9.324759
## sample estimates:
## mean of x mean of y 
##     13.23     26.06

As expected we are going to reject the null hypothesis, becasue the p-value = 1.324e-06 is less than 0.05. It is getting boring!

t.test(len[dose == 0.5 & supp == "VC"], len[dose == 1 & supp == "VC"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 0.5 & supp == "VC"] and len[dose == 1 & supp == "VC"]
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.265712  -6.314288
## sample estimates:
## mean of x mean of y 
##      7.98     16.77

p-value = 6.811e-07 is less than 0.05, hence we reject the null hypothesis. Meaning increasing the dose of VC has an effect on the tooth growth.

t.test(len[dose == 1 & supp == "VC"], len[dose == 2 & supp == "VC"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 1 & supp == "VC"] and len[dose == 2 & supp == "VC"]
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.054267  -5.685733
## sample estimates:
## mean of x mean of y 
##     16.77     26.14

Another rejection!

t.test(len[dose == 0.5 & supp == "VC"], len[dose == 2 & supp == "VC"], paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len[dose == 0.5 & supp == "VC"] and len[dose == 2 & supp == "VC"]
## t = -10.388, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -21.90151 -14.41849
## sample estimates:
## mean of x mean of y 
##      7.98     26.14

And yet another rejection!

After all these rejections, we can announce that increasing the dose has an effect on the tooth growth.

Conclusion

Boring conclusions: assuming this sample is a true representative of the population, and the distribution of sample means follows the CLT, from all the t-tests we can say: 1- There is no significant evidence that the type of supplement(OJ vs VC) has an effet on the tooth growth. However, since pvalue is very close to 0.05 we might seek further investigation. 2- The dosage level of the supplement(for both OJ and VC), has an effect on the tooth growth.

Interesting conclusions: Let’s take another look at the pvalues:

and sort them:

4.682e-08 < 6.811e-07 < 1.324e-06 < 8.785e-05 < 9.156e-05 < 0.001038 < 0.0392 < 0.05 < 0.06063

or maybe sorting them this way would be more usable:

0.5 to 2 VC < 0.5 to 1 VC < 0.5 to 2 OJ < 0.5 to 1 OJ < 1 to 2 VC < dose 1 of VC vs dose 1 of OJ < 1 to 2 OJ < pvalue < VC vs OJ

What would you give to your guinea pig, consider the above results? I would go with 2 doses of OJ.