Statistical Interference Course Project Report 2

Introduction

This report is devoted to analysis of ToothGrowth dataset.

Dataset description

The response is the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC).

Data loading and settings

Load necessary packages and create dataframe:

if (!require("datasets")){
        install.packages("datasets")
        library(datasets)
}

if (!require("ggplot2")){
        install.packages("ggplot2")
        library(ggplot2)
}

## Loading required package: ggplot2

data("ToothGrowth")

Exploratory analysis

Let’s look on the data distribution for different delivery methods:

ggplot( ToothGrowth, aes(len, fill = supp))+
        geom_density(alpha=0.3, position = "identity")+
        xlab("Length")+
        scale_fill_discrete(name="Supplement")+
        theme_bw()

And for different doses:

ggplot( ToothGrowth, aes(len, fill = factor(dose)))+
        geom_density(alpha=0.3, position = "identity")+
        xlab("Length")+
        scale_fill_discrete(name="Dose")+
        theme_bw()

We can see that distributions are approximately normal. So we can use t-test to analyse these datasets.

Difference between delivery methods

ggplot( ToothGrowth, aes(factor(supp),len, fill = supp))+
        geom_boxplot()+
        ylab("Length")+
        xlab("")+
        scale_fill_discrete(name="Supplement")+
        theme_bw()

It’s hard to say is there some significant difference or not. Execute t-test for difference between VC and juice:

t <-  t.test(len~supp,data = ToothGrowth, paired = FALSE)$conf.int

Confidence interval for difference: -0.1710156, 7.5710156. So we cannot reject null-hypotheses (with confidence level = 0.95). There are not significant differences between different supplements.

Difference between doses

Let’s do the same analysis with different doses of supplement.

ggplot( ToothGrowth, aes(factor(dose),len, fill = factor(dose)))+
        geom_boxplot()+
        ylab("Length")+
        xlab("")+
        scale_fill_discrete(name="Doses")+
        theme_bw()

There is more clear dependence: bigger dose - better results. Execute t-test for difference between 0.5 and 1 doses:

 t <- t.test(len~dose,data = ToothGrowth[ToothGrowth$dose==0.5|
                                            ToothGrowth$dose==1,], 
        paired = FALSE)$conf.int

Confidence interval for difference: -11.9837813, -6.2762187.

Between 1 and 2:

 t <- t.test(len~dose,data = ToothGrowth[ToothGrowth$dose==2|
                                            ToothGrowth$dose==1,], 
        paired = FALSE)$conf.int

Confidence interval for difference: -8.9964805, -3.7335195.

And at last between 0.5 and 2 (it’s obvious that there is significant differences, but we should proof it):

 t <- t.test(len~dose,data = ToothGrowth[ToothGrowth$dose==2|
                                            ToothGrowth$dose==0.5,], 
        paired = FALSE)$conf.int

Confidence interval for difference: -18.1561665, -12.8338335.

So in all cases there are significant differences and we ca reject null-hypotheses.

Differences between supplements for each doses

ggplot( ToothGrowth, aes(factor(supp),len, fill = factor(dose)))+
        geom_boxplot()+
        ylab("Length")+
        xlab("Supplement")+
        scale_fill_discrete(name="Doses")+
        theme_bw()

For dose = 0.5:

 t <- t.test(len~supp,data = ToothGrowth[ToothGrowth$dose==0.5,], 
        paired = FALSE)$conf.int

Confidence interval for difference: 1.7190573, 8.7809427.

For dose = 1:

 t <- t.test(len~supp,data = ToothGrowth[ToothGrowth$dose==1,], 
        paired = FALSE)$conf.int

Confidence interval for difference: 2.8021482, 9.0578518.

For dose = 2:

 t <- t.test(len~supp,data = ToothGrowth[ToothGrowth$dose==2,], 
        paired = FALSE)$conf.int

Confidence interval for difference: -3.7980705, 3.6380705.

We can conclude that there is not difference in case “dose=2”. But for another cases delivery method is important.

Conclusion

Dose size is the most important factor for length of odontoblasts. Best result is shown for 2.0 dose. Note that delivery method is not important for this case. But for another doses orange juice is more preffered.