The data for this project comes from the analysis of the teeth growth of 60 guinea pigs randomly selected and representative of the population (1st assumption) when subjected to two different conditions: A- Delivery method (OJ: orange juice; VC: ascorbic acid - ) B- Vitamin C dose (Low: 0.5 mg; Medium: 1 mg; High: 2 mg - numeric) The main aim of this project is to perform exploratory data analysis and hypothesis testing.

For the exploratory analysis we should load data and study relevant information:

library(datasets)
data(ToothGrowth)
dim(ToothGrowth)
## [1] 60  3
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ToothGrowth$dose<-factor(ToothGrowth$dose, levels=c(0.5,1.0,2.0),labels=c("Low","Medium","High"))
summary(ToothGrowth)
##       len        supp        dose   
##  Min.   : 4.20   OJ:30   Low   :20  
##  1st Qu.:13.07   VC:30   Medium:20  
##  Median :19.25           High  :20  
##  Mean   :18.81                      
##  3rd Qu.:25.27                      
##  Max.   :33.90

Then, we should plot the Length of Tooth against the two factors (A and B):

boxplot(ToothGrowth$len~ToothGrowth$supp*ToothGrowth$dose, ylab = "Length of tooth", col= c("grey", "red"), main ="Box-plots: effects of supplement type or dosage on Tooth Growth")

A t-test should be performed for a more comprehensive understanding of the data. The guinea pigs are assumed to constitute independent and identically distributed groups (2nd assumption). It is also assumed that they follow a normal distribution (3th assumption). First, for an analysis of the effect of the delivery method:

x: mean of tooth length when receiving orange juice y: mean of tooth length when receiving ascorbic acid

H0: x-y=0 Ha: x-y!=0

Tooth_dose_OJ = subset(ToothGrowth, supp == "OJ")
Tooth_dose_VC = subset(ToothGrowth, supp == "VC")
teste_1<-t.test(Tooth_dose_OJ$len, Tooth_dose_VC$len, paired = FALSE, var.equal=FALSE)
teste_2<-t.test(Tooth_dose_OJ$len, Tooth_dose_VC$len, paired = FALSE, var.equal=TRUE)
results1 <- data.frame(c(round(teste_1$p.value,3),round(teste_1$conf[1],3), round(teste_1$conf[2],3)),c(round(teste_2$p.value,3),round(teste_2$conf[1],3), round(teste_2$conf[2],3)),row.names = c("p-value","Lower limit IC","Higher limit IC"))
colnames(results1)<-c("Dif Var","Equal Var")
results1
##                 Dif Var Equal Var
## p-value           0.061     0.060
## Lower limit IC   -0.171    -0.167
## Higher limit IC   7.571     7.567

For a significance level of 0.05, with a p-value of 0.060 we cannot reject H0 for either situation (equal or different variance). Also the IC95%=[-0.171, 7.571] includes 0, which strengths the conclusion that the mean values are similar and that there is not an observable effect of the delivery method.

We will now focus in the effects of the dosage level(low, medium or high): H0: There is no difference in the tooth length caused by dose change. Ha: There is a difference in the tooth length caused by dose change.

data(ToothGrowth)
Tooth_dose_low = subset(ToothGrowth, dose== 0.5)
Tooth_dose_medium = subset(ToothGrowth, dose==1.0)
Tooth_dose_high = subset(ToothGrowth, dose==2.0)
Test_D1<-t.test(Tooth_dose_low$len ~ Tooth_dose_low$supp, paired = FALSE, var.equal = FALSE)
Test_D2<-t.test(Tooth_dose_medium$len~Tooth_dose_medium$supp, paired = FALSE,var.equal=FALSE)
Test_D3<-t.test(Tooth_dose_high$len~Tooth_dose_high$supp, paired = FALSE,var.equal=FALSE)
Test_D4<-t.test(Tooth_dose_low$len~Tooth_dose_low$supp, paired = FALSE,var.equal=TRUE)
Test_D5<-t.test(Tooth_dose_medium$len~Tooth_dose_medium$supp, paired = FALSE,var.equal=TRUE)
Test_D6<-t.test(Tooth_dose_high$len~Tooth_dose_high$supp, paired = FALSE,var.equal=TRUE)
results2 <- data.frame(c(round(Test_D1$p.value,3),round(Test_D1$conf[1],3), round(Test_D1$conf[2],3)),c(round(Test_D2$p.value,3),round(Test_D2$conf[1],3), round(Test_D2$conf[2],3)), c(round(Test_D3$p.value,3),round(Test_D3$conf[1],3), round(Test_D2$conf[2],3)), c(round(Test_D4$p.value,3),round(Test_D4$conf[1],3), round(Test_D4$conf[2],3)), c(round(Test_D5$p.value,3),round(Test_D5$conf[1],3), round(Test_D5$conf[2],3)), c(round(Test_D6$p.value,3),round(Test_D6$conf[1],3), round(Test_D6$conf[2],3)),row.names = c("p-value","Lower limit IC","Higher limit IC"))
colnames(results2)<-c("Low Dif. Var","Med Dif. Var", "High Dif. Var", "Low Equal Var", "Med Equal Var", "High Equal Var" )
results2
##                 Low Dif. Var Med Dif. Var High Dif. Var Low Equal Var
## p-value                0.006        0.001         0.964         0.005
## Lower limit IC         1.719        2.802        -3.798         1.770
## Higher limit IC        8.781        9.058         9.058         8.730
##                 Med Equal Var High Equal Var
## p-value                 0.001          0.964
## Lower limit IC          2.841         -3.723
## Higher limit IC         9.019          3.563

Let´s assume that the population variance is different (4th assumption). For low and medium doses (0.5 and 1.0 mg) the p-values are lower than the significance level of 0.05, and so we can reject H0 that both variables have the same mean value. For high dose, we cannot reject H0 and the difference of 0 is included in the IC95% [-3.798, 9.058].

Main Conclusions:

Taking into account the four assumptions in this study, we can infer that at low and medium dosages there is a significant effect of the delivery method on the length of the tooth. Oppositely, for high levels of Vitamin C, there is not sufficient statistically evidence to show that delivery method effect tooth growth.