Exploratory data analysis

The general information of the dataset[¹].

#datasets::ToohtGrowth
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

library(ggplot2)
ggplot(ToothGrowth, aes(x=factor(dose), y=len, color=supp))+geom_boxplot()+labs(x="Vitamin C dose (mg)",y="Teeth length (mm)")

Data summary

From the exploratory analysis the following observations might be done:

  1. There is an observable trend between teeth length and vitamin C dose,
  2. Orange juice (OC) as a delivery method of vitamin C seems to be more effective (in terms of teeth growth) than Ascorbic Acid (VC)

Data analysis

In the following hyphotesis study, the previous two observations are further investigated. The departing hyphothesis according to each observation are:

a<-t.test(x=ToothGrowth[ToothGrowth$dose==1,]$len,y=ToothGrowth[ToothGrowth$dose==0.5,]$len,alternative="greater", var.equal=TRUE)
 
library(knitr)
b <- t.test(x=ToothGrowth[ToothGrowth$dose==0.5 &ToothGrowth$supp=="VC",]$len,y=ToothGrowth[ToothGrowth$dose==0.5 &ToothGrowth$supp=="OJ",]$len, alternative="less", mu=0, var.equal = TRUE)

c <- t.test(x=ToothGrowth[ToothGrowth$dose==1 &ToothGrowth$supp=="VC",]$len,y=ToothGrowth[ToothGrowth$dose==1 &ToothGrowth$supp=="OJ",]$len, alternative="less", mu=0, var.equal = TRUE)

d <- t.test(x=ToothGrowth[ToothGrowth$dose==2 &ToothGrowth$supp=="VC",]$len,y=ToothGrowth[ToothGrowth$dose==2 &ToothGrowth$supp=="OJ",]$len, alternative="less", mu=0, var.equal = TRUE)

test <- c("Dose vs teeth length",
"Orange Juice vs Ascorbic Acid (dose=0.5mg)", 
              "Orange Juice vs Ascorbic Acid (dose 1 mg)", 
              "Orange Juice vs Ascorbic Acid (dose = 2 mg)")

Lower.CI<-c(a$conf[1],b$conf[1],c$conf[1],d$conf[1])
Higher.CI<-c(a$conf[2],b$conf[2],c$conf[2],d$conf[2])
pvalue<-c(a$p.value,b$p.value,c$p.value,d$p.value)

final=data.frame(test,Lower.CI,Higher.CI,pvalue)

Conclusion

From the table below, the following conclusions might be drawn:

kable(final)
test Lower.CI Higher.CI pvalue
Dose vs teeth length 6.753344 Inf 0.0000001
Orange Juice vs Ascorbic Acid (dose=0.5mg) -Inf -2.377886 0.0026518
Orange Juice vs Ascorbic Acid (dose 1 mg) -Inf -3.380140 0.0003904
Orange Juice vs Ascorbic Acid (dose = 2 mg) -Inf 3.086866 0.5181451

[¹]: C. I. Bliss (1952) The Statistics of Bioassay. Academic Press.