library(ggplot2)
library(ggthemes)

Overview

the following project makes an analysis of the ToothGrowth dataset. Graphical tools, such as Ggplot2, and the student t-test are used to compare population means.

Load the ToothGrowth data and perform some basic exploratory data analyses.

library(datasets)
data("ToothGrowth")

Provide a basic summary of the data.

We implemented basic sumarry functions, and plots to perform a better understanding of the dataset.

It can be observed that the dataset contains 3 columns, and 60 rows.

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
head(ToothGrowth,5)

The following graphs try to give greater data compression, as well as its shape.

as.factor(ToothGrowth$dose)->ToothGrowth$dose

ggplot(ToothGrowth, aes(x=dose, y=len)) + 
     geom_boxplot(aes(fill=dose))+
     theme_test() + 
     xlab("Dose (milligrams/day)") + 
     ylab("Tooth Length") + 
     facet_grid(~ supp) + 
     ggtitle("BoxPLot of Tooth Length and Dose Amount.")+
     theme(legend.position = "bottom")+
     scale_fill_brewer(palette="Set1")

ggplot(aes(x=supp, y=len), data=ToothGrowth) + 
     theme_test()+
     geom_violin(aes(fill=supp)) + 
     xlab("Supplement Delivery (VC or OJ)") + 
     ylab("Tooth Length") + 
     facet_grid(~ dose) + 
     ggtitle("Violin Plot of Tooth Length and Delivery Method") +
     theme(legend.position = "bottom")+
     scale_fill_brewer(palette="Set1")

Use hypothesis tests to compare tooth growth by supp and dose.

since the P.value is 0.06, (greater than 0.05), and the confidence interval contains 0, it can be said that supplement types seems to have no impact on Tooth growth based on this test.

cat("P.value",t.test(len~supp,data=ToothGrowth,paired=F, var.equal=F)$p.value, sep = "\n" )
## P.value
## 0.06063451
cat("Confidence intervals with alpha=0.05:  ",
    t.test(len~supp,data=ToothGrowth,paired=F, var.equal=F)$conf.int[1:2])
## Confidence intervals with alpha=0.05:   -0.1710156 7.571016

It can be seen that all P.values are very close to zero; Therefore, it can be said that the average tooth length increases with the amount of dose that is delivered.

t1<-t.test(len~dose,
       data=subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,0.5)),
       paired=F, var.equal=F)
t2<-t.test(len~dose,
           data=subset(ToothGrowth, ToothGrowth$dose %in% c(1.0,2.0)),paired=F, var.equal=F)
t3<-t.test(len~dose,
           data=subset(ToothGrowth, ToothGrowth$dose %in% c(0.5,2.0)),paired=F, var.equal=F)

data.frame(cbind(c(t1$p.value, t2$p.value, t3$p.value),
                 c("1.0 vs 0.5", "1.0 vs 2.0","0.5 vs 2.0")))->tests
names(tests)<-c("P.values","Dose in milligrams/day");tests

Conclusions.

We conclude there is significant difference between Orange Juice (OJ)and Ascorbic Acid (VC) in all the doses.