This report is devoted to analysis of ToothGrowth dataset.
The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).
Load necessary packages and create dataframe:
if (!require("datasets")){
install.packages("datasets")
library(datasets)
}
if (!require("ggplot2")){
install.packages("ggplot2")
library(ggplot2)
}
## Loading required package: ggplot2
data("ToothGrowth")
Let’s look on the data distribution for different delivery methods:
ggplot( ToothGrowth, aes(len, fill = supp))+
geom_density(alpha=0.3, position = "identity")+
xlab("Length")+
scale_fill_discrete(name="Supplement")+
theme_bw()
And for different doses:
ggplot( ToothGrowth, aes(len, fill = factor(dose)))+
geom_density(alpha=0.3, position = "identity")+
xlab("Length")+
scale_fill_discrete(name="Dose")+
theme_bw()
We can see that distributions are approximately normal. So we can use t-test to analyse these datasets.
ggplot( ToothGrowth, aes(factor(supp),len, fill = supp))+
geom_boxplot()+
ylab("Length")+
xlab("")+
scale_fill_discrete(name="Supplement")+
theme_bw()
It’s hard to say is there some significant difference or not. Execute t-test for difference between VC and juice:
t <- t.test(len~supp,data = ToothGrowth, paired = FALSE)$conf.int
Confidence interval for difference: -0.1710156, 7.5710156. So we cannot reject null-hypotheses (with confidence level = 0.95). There are not significant differences between different supplements.
Let’s do the same analysis with different doses of supplement.
ggplot( ToothGrowth, aes(factor(dose),len, fill = factor(dose)))+
geom_boxplot()+
ylab("Length")+
xlab("")+
scale_fill_discrete(name="Doses")+
theme_bw()
There is more clear dependence: bigger dose - better results. Execute t-test for difference between 0.5 and 1 doses:
t <- t.test(len~dose,data = ToothGrowth[ToothGrowth$dose==0.5|
ToothGrowth$dose==1,],
paired = FALSE)$conf.int
Confidence interval for difference: -11.9837813, -6.2762187.
Between 1 and 2:
t <- t.test(len~dose,data = ToothGrowth[ToothGrowth$dose==2|
ToothGrowth$dose==1,],
paired = FALSE)$conf.int
Confidence interval for difference: -8.9964805, -3.7335195.
And at last between 0.5 and 2 (it’s obvious that there is significant differences, but we should proof it):
t <- t.test(len~dose,data = ToothGrowth[ToothGrowth$dose==2|
ToothGrowth$dose==0.5,],
paired = FALSE)$conf.int
Confidence interval for difference: -18.1561665, -12.8338335.
So in all cases there are significant differences and we ca reject null-hypotheses.
ggplot( ToothGrowth, aes(factor(supp),len, fill = factor(dose)))+
geom_boxplot()+
ylab("Length")+
xlab("Supplement")+
scale_fill_discrete(name="Doses")+
theme_bw()
For dose = 0.5:
t <- t.test(len~supp,data = ToothGrowth[ToothGrowth$dose==0.5,],
paired = FALSE)$conf.int
Confidence interval for difference: 1.7190573, 8.7809427.
For dose = 1:
t <- t.test(len~supp,data = ToothGrowth[ToothGrowth$dose==1,],
paired = FALSE)$conf.int
Confidence interval for difference: 2.8021482, 9.0578518.
For dose = 2:
t <- t.test(len~supp,data = ToothGrowth[ToothGrowth$dose==2,],
paired = FALSE)$conf.int
Confidence interval for difference: -3.7980705, 3.6380705.
We can conclude that there is not difference in case “dose=2”. But for another cases delivery method is important.
Dose size is the most important factor for length of odontoblasts. Best result is shown for 2.0 dose. Note that delivery method is not important for this case. But for another doses orange juice is more preffered.