Aim of this project is to conduct analysis on ToothGrowth data:
-provide basic summary of the data,
-use hypothesis tests to compare tooth growth by supp (orange juice (OJ) or ascorbic acid (VC)) and dose (0.5, 1 or 2 mg).
First ToothGrowth data is loaded and an exploratory boxplot is made:
data(ToothGrowth)
ToothGrowth$dose=as.factor(as.numeric(ToothGrowth$dose))
library(ggplot2)
ggplot(ToothGrowth, aes(x=dose, y=len)) +
ylab("Teeth lenght")+
geom_boxplot()+
facet_wrap( ~ supp)+
theme_minimal()
Secondly means, standard deviations and variances for each supplement are calculated:
library(plyr)
ddply(ToothGrowth,.(supp),summarise,Mean=mean(len),Sd=sd(len), Var=var(len))
## supp Mean Sd Var
## 1 OJ 20.66 6.606 43.63
## 2 VC 16.96 8.266 68.33
Means, standard deviations and variances for each dose level are calculated:
ddply(ToothGrowth,.(dose),summarise,Mean=mean(len),Sd=sd(len), Var=var(len))
## dose Mean Sd Var
## 1 0.5 10.61 4.500 20.25
## 2 1 19.73 4.415 19.50
## 3 2 26.10 3.774 14.24
Means, standard deviations and variances for each dose level in each supplement are calculated:
ddply(ToothGrowth,.(supp, dose),summarise,Mean=mean(len),Sd=sd(len), Var=var(len))
## supp dose Mean Sd Var
## 1 OJ 0.5 13.23 4.460 19.889
## 2 OJ 1 22.70 3.911 15.296
## 3 OJ 2 26.06 2.655 7.049
## 4 VC 0.5 7.98 2.747 7.544
## 5 VC 1 16.77 2.515 6.327
## 6 VC 2 26.14 4.798 23.018
It is seen from the graph and from different means that in higher Vitamin C dose levels mean teeth length for Guinea Pigs is higher. Also there appears to be difference in delivery methods.
To be sure that differences are statistically significant 10 t-tests are conducted. Code for t-tests is in appendix. Null hypothesis is that there is no difference in means of teeth length in different delivery methods or dosage. Alternative hypothesis is that there are differences in those means. Due to multiple comparisons Benjamini & Yekutieli correction in p-values are used to adjust false discovery rate (FDR). Benjamini & Yekutieli (BY) correction is used because t-tests are not independent of each other and so Bonferroni and Benjamini & Hochberg corrections assumptions are violated. Tests results are following:
## Groups Pvalues BY
## 1 OJ vs VC 0.003 0.012
## 2 0.5 vs 1 0.000 0.000
## 3 1 vs 2 0.000 0.002
## 4 0.5 OJ vs 1 OJ 0.002 0.012
## 5 1 OJ vs 2 OJ 0.084 0.273
## 6 0.5 VC vs 1 VC 0.000 0.002
## 7 1 VC vs 2 VC 0.000 0.003
## 8 0.5 OJ vs 0.5 VC 0.015 0.057
## 9 1 OJ vs 1 VC 0.008 0.034
## 10 2 OJ vs 2 VC 0.967 1.000
As seen from t-tests corrected results only tests no. 5, 8 and 10 don’t have significant p-values at a significance level of 5%. Before Benjamini & Yekutieli correction t-test no. 8 was significant. Corrected p-values are used to interpret the results.
Conclusions from data analysis are following:
-In higher dosage levels Guinea Pigs mean teeth length was higher (in level 2 mg mean length was 26.1 compared to 0.5 mg mean length 26.1). Only difference was found then using orange juice as a delivery method: mean teeth length was not found to be different compared to 1mg and 2 mg doses of Vitamin C (respectively means were 22.7 and 26.06).
-In condition where orange juice was used as a Vitamin C delivery method mean teeth length was greater compared to condition where ascorbic acid was used only if dose was 1 mg (respectively means were 22.7 and 16.77).
Assumptions for this analysis is:
-each Guinea pig was assigned to a combination of dosage and supplement type so that t-tests performed could use dependent samples methodology,
-sample of 60 Guinea pigs is representative of all Guinea pigs and based on sample conclusions can be drawn about the population (they were randomly picked from population),
-variance is unequal in all groups.
Code for t-tests:
pvalues<-c(
t.test(len ~supp, data=ToothGrowth, paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="0.5"],
ToothGrowth$len[ToothGrowth$dose=="1"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="1"],
ToothGrowth$len[ToothGrowth$dose=="2"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="0.5" & ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="OJ"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="2"& ToothGrowth$supp=="OJ"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="0.5" & ToothGrowth$supp=="VC"],
ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="VC"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="VC"],
ToothGrowth$len[ToothGrowth$dose=="2"& ToothGrowth$supp=="VC"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="0.5" & ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="0.5"&ToothGrowth$supp=="VC"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="1"& ToothGrowth$supp=="VC"], paired=T)$p.value,
t.test(ToothGrowth$len[ToothGrowth$dose=="2"& ToothGrowth$supp=="OJ"],
ToothGrowth$len[ToothGrowth$dose=="2"& ToothGrowth$supp=="VC"], paired=T)$p.value)
Groups=c("OJ vs VC", "0.5 vs 1", "1 vs 2", "0.5 OJ vs 1 OJ", "1 OJ vs 2 OJ",
"0.5 VC vs 1 VC", "1 VC vs 2 VC", "0.5 OJ vs 0.5 VC",
"1 OJ vs 1 VC","2 OJ vs 2 VC")
BY=round(p.adjust(pvalues, method = "BY"),3)
Pvalues=format((round(pvalues, 3)), scientific=FALSE)
Pvalues=data.frame(Groups, Pvalues, BY)