The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
We’re going to analyze the ToothGrowth data in the R datasets package.
1- Load the ToothGrowth data and perform some basic exploratory data analyses. 2- Provide a basic summary of the data. 3- Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) 4- State your conclusions and the assumptions needed for your conclusions.
library(data.table)
library(ggplot2)
echo = TRUE # display the code
results = 'asis' # display the output without formatting
1- Load the ToothGrowth data and perform some basic exploratory data analyses 2- Provide a basic summary of the data.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
data(ToothGrowth)
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
We have a data frame of 60 observations and 3 variables.
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
as part of EDA
ggplot(ToothGrowth) +
geom_boxplot(mapping = aes(supp, len, fill = supp)) +
facet_grid(~ dose)
This plot shows that the average tooth growth under OJ supplement is higher than the average tooth growth under VC supplement under the dosages 0.5 and 1.
As part of EDA
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
ggplot(ToothGrowth) +
geom_boxplot(mapping = aes(dose, len, fill = dose)) +
facet_grid(~ supp)
This plot shows higher doses of supplements increase the average tooth growth for both VC and OJ.
This is not neccessary but cool!
library(graphics)
coplot(len~dose | supp, data = ToothGrowth, panel = panel.smooth)
Since we don’t have any data from before the experiment (experiment being: using the supplements with different doses as oppose to not using any supplement) we have to formulate our null hypotheses based on comparing: - the effect of two suuplements OJ and VC - the effect of three different doses 0.5, 1, 2 We can also consider the graphs from our EDA to formulate our null hypotheses. So we formulate two H0 as follow:
We can test this hypothesis for both supplements. In other word: - Increasing the dose of OJ has no affect on the tooth growth. _ Increasing the dose of VC has no effect on the tooth growth.
len <- ToothGrowth$len
dose <- ToothGrowth$dose
supp <- ToothGrowth$supp
t.test(len[supp == "OJ"], len[supp == "VC"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[supp == "OJ"] and len[supp == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
The confidence interval contains 0 which means it is uncertain whether there is a treatment (supp) effect. This doesn’t mean that there is no treatment effect. p-value = 0.06063 is bigger than 0.05 so is NOT considered statistically significant and hence indicates weak evidence against the null hyposthesis. As a result we fail to reject the null hypothesis. Meaning, there is no significant evidence that the type of supplement has an effet on tooth growth. However, since pvalue is very close to 0.05 we might seek further investigation.
Just out of curiosity, I’ like to see whether there is an effect on tooth growth: - from dose level 1 of OJ vs dose level 1 of VC Why? because from EDA graphs it looks like that there is such effect. But also it could be my vision that is not very accurate :-)
t.test(len[dose == 1 & supp == "OJ"], len[dose == 1 & supp == "VC"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 1 & supp == "OJ"] and len[dose == 1 & supp == "VC"]
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
What the heck? p-value = 0.001038 is less than 0.05 which means we reject the null hypothesis. That was MY hypothesis, how dare this pvalue is!!!!
We are going to formulate our hupotheses in this section for each supp seperately. Meaning, we are going to run the hypothesis test to: - Compare dose levels 0.5 and 1 for OJ - Compare dose levels 1 and 2 for OJ - Compare dose levels 0.5 and 2 for OJ - Compare dose levels 0.5 and 1 for VC - Compare dose levels 1 and 2 for VC - Compare dose levels 0.5 and 2 for VC
t.test(len[dose == 0.5 & supp == "OJ"], len[dose == 1 & supp == "OJ"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 0.5 & supp == "OJ"] and len[dose == 1 & supp == "OJ"]
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.415634 -5.524366
## sample estimates:
## mean of x mean of y
## 13.23 22.70
The confidence interval does not contain zero. The pvalue (p-value = 8.785e-05)is less than 0.05 and is very small (close to zero) which means we reject null hypothesis. In other word, increasing the dose from 0.5 to 1 has an effect on the tooth growth.
t.test(len[dose == 1 & supp == "OJ"], len[dose == 2 & supp == "OJ"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 1 & supp == "OJ"] and len[dose == 2 & supp == "OJ"]
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.5314425 -0.1885575
## sample estimates:
## mean of x mean of y
## 22.70 26.06
pvalue is p-value = 0.0392 which is smaller than 0.05 so we reject H0, meaning as we increase the dose of OJ from 1 to 2 we see an effect in the tooth growth. However, since the pvalue is close to the cut off and the upper interval is close to 0 we might want to run another investigation. Also looking at the EDA graphs we can confirm that the jump in growth for supp OJ from 0.5 to 1 is more significant than from 1 to 2. Consequently it will be significant from 0.5 to 2. But we will do the hypothesis test anyway.
t.test(len[dose == 0.5 & supp == "OJ"], len[dose == 2 & supp == "OJ"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 0.5 & supp == "OJ"] and len[dose == 2 & supp == "OJ"]
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -16.335241 -9.324759
## sample estimates:
## mean of x mean of y
## 13.23 26.06
As expected we are going to reject the null hypothesis, becasue the p-value = 1.324e-06 is less than 0.05. It is getting boring!
t.test(len[dose == 0.5 & supp == "VC"], len[dose == 1 & supp == "VC"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 0.5 & supp == "VC"] and len[dose == 1 & supp == "VC"]
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.265712 -6.314288
## sample estimates:
## mean of x mean of y
## 7.98 16.77
p-value = 6.811e-07 is less than 0.05, hence we reject the null hypothesis. Meaning increasing the dose of VC has an effect on the tooth growth.
t.test(len[dose == 1 & supp == "VC"], len[dose == 2 & supp == "VC"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 1 & supp == "VC"] and len[dose == 2 & supp == "VC"]
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.054267 -5.685733
## sample estimates:
## mean of x mean of y
## 16.77 26.14
Another rejection!
t.test(len[dose == 0.5 & supp == "VC"], len[dose == 2 & supp == "VC"], paired = FALSE)
##
## Welch Two Sample t-test
##
## data: len[dose == 0.5 & supp == "VC"] and len[dose == 2 & supp == "VC"]
## t = -10.388, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -21.90151 -14.41849
## sample estimates:
## mean of x mean of y
## 7.98 26.14
And yet another rejection!
After all these rejections, we can announce that increasing the dose has an effect on the tooth growth.
Boring conclusions: assuming this sample is a true representative of the population, and the distribution of sample means follows the CLT, from all the t-tests we can say: 1- There is no significant evidence that the type of supplement(OJ vs VC) has an effet on the tooth growth. However, since pvalue is very close to 0.05 we might seek further investigation. 2- The dosage level of the supplement(for both OJ and VC), has an effect on the tooth growth.
Interesting conclusions: Let’s take another look at the pvalues:
dose 1 of VC vs dose 1 of OJ : p-value = 0.001038
0.5 to 2 VC : p-value = 4.682e-08
and sort them:
4.682e-08 < 6.811e-07 < 1.324e-06 < 8.785e-05 < 9.156e-05 < 0.001038 < 0.0392 < 0.05 < 0.06063
or maybe sorting them this way would be more usable:
0.5 to 2 VC < 0.5 to 1 VC < 0.5 to 2 OJ < 0.5 to 1 OJ < 1 to 2 VC < dose 1 of VC vs dose 1 of OJ < 1 to 2 OJ < pvalue < VC vs OJ
What would you give to your guinea pig, consider the above results? I would go with 2 doses of OJ.