Load the ToothGrowth data and perform some basic exploratory data analyses

First we load the data into the workspace

library(ggplot2)
library(datasets)
data(ToothGrowth)
head(ToothGrowth)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Description of the structure of the ToothGrowth dataframe

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

Provide a basic summary of the data.

ToothGrowth$dose <- as.factor(ToothGrowth$dose) # Transform dose variable into a factor as it can only get the values 0.5, 1 and 2 (mg/day)
table(ToothGrowth$dose, ToothGrowth$supp)

##      
##       OJ VC
##   0.5 10 10
##   1   10 10
##   2   10 10

Summary of the three variables of the dataframe, and also add separetly the standard deviation of len (as it is the only continious variable)

summary(ToothGrowth)

##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

sd(ToothGrowth$len)

## [1] 7.649315

Next a boxplot visualization which is divided in two facets depending on the type of supplement (orange juice or ascorbic acid) and divided by dose.

p <- ggplot(ToothGrowth, aes(x=factor(dose),y=len,fill=factor(dose)))
p + geom_boxplot() + facet_grid(.~supp) +
     scale_x_discrete("Dose (Milligrams per day)") +   
     scale_y_continuous("Tooth length") +  
     ggtitle("Boxplot separating by Supplement Type and amount of Dosage") + theme_minimal() + scale_fill_brewer(palette="Spectral")

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

First we compare the length of tooth depending on the dose by using a t.test.

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 0.5)) # For doses 1 and 2

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 37.101, p-value = 1.906e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.996481 -3.733519
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 1)) # For doses 0.5 and 2

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 2)) # For doses 0.5 and 1

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 37.986, p-value = 1.268e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983781  -6.276219
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

Second we compare the length of tooth depending on the dose, but also separating by type of supplement. First for VC

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 0.5 & ToothGrowth$supp == 'VC')) # For doses 1 and 2

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -5.4698, df = 13.6, p-value = 9.156e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.054267  -5.685733
## sample estimates:
## mean in group 1 mean in group 2 
##           16.77           26.14

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 1 & ToothGrowth$supp == 'VC')) # For doses 0.5 and 2

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -10.3878, df = 14.327, p-value = 4.682e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -21.90151 -14.41849
## sample estimates:
## mean in group 0.5   mean in group 2 
##              7.98             26.14

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 2 & ToothGrowth$supp == 'VC')) # For doses 0.5 and 1

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -7.4634, df = 17.862, p-value = 6.811e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.265712  -6.314288
## sample estimates:
## mean in group 0.5   mean in group 1 
##              7.98             16.77

Second for OJ

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 0.5 & ToothGrowth$supp == 'OJ')) # For doses 1 and 2

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -2.2478, df = 15.842, p-value = 0.0392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.5314425 -0.1885575
## sample estimates:
## mean in group 1 mean in group 2 
##           22.70           26.06

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 1 & ToothGrowth$supp == 'OJ')) # For doses 0.5 and 2

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -7.817, df = 14.668, p-value = 1.324e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.335241  -9.324759
## sample estimates:
## mean in group 0.5   mean in group 2 
##             13.23             26.06

t.test(len ~ dose, data = subset(ToothGrowth,ToothGrowth$dose != 2 & ToothGrowth$supp == 'OJ')) # For doses 0.5 and 1

## 
##  Welch Two Sample t-test
## 
## data:  len by dose
## t = -5.0486, df = 17.698, p-value = 8.785e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.415634  -5.524366
## sample estimates:
## mean in group 0.5   mean in group 1 
##             13.23             22.70

State your conclusions and the assumptions needed for your conclusions.

After the exploratory analysis and supported by the t.tests applied I can conclude the next ideas:

It seems that indeed the amount of dose affects the length of the tooth, we can see that the larger was the dose, the bigger are the tooth.
On the other hand, depending on the supplement: I conclude that to get greater tooth growth with low levels of dosage (0.5 & 1.0) we should use OJ to get a better grow, however it is not statistically clear in the case of 2.0 dosage.

References

Dataset: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/ToothGrowth.html

Stadistical Inference Project - Part 2

Jose A. Ruiperez Valiente

22 de noviembre de 2015

Load the ToothGrowth data and perform some basic exploratory data analyses

Provide a basic summary of the data.

Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.

State your conclusions and the assumptions needed for your conclusions.

References