Synopsis

This project analyzes the ToothGrowth data in the R datasets package. The dataset stores data on the effect Of vitamin C, delivered in the form of two supplements and at three dose levels, on tooth growth in 60 guinea pigs. We test the difference in growth resulting from different type of supplements and different dose levels.

Loading and basic exploratory analyses of data

# Loading the datasets library and ToothGrowth dataset
library(datasets)
data(ToothGrowth)

str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(ToothGrowth$len)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.20   13.07   19.25   18.81   25.27   33.90
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0

We see the dataset contains 60 observations of 3 variables:
1. len: numeric; length of cells responsible for tooth growth; value ranging from 4.2 to 33.90
2. supp: factor; type of supplement - “OJ” for orange juice and “VC” for ascorbic acid
3. dose: numeric; three dose levels - 0.5, 1, and 2 mg/day

Basic summary of data

Looking at the data graphically:

par(mfrow = c(1,3))

# Length of cells based on type of supplement
with(ToothGrowth, {
     boxplot(len~supp, col = c("orange", "lightgreen"))
     title(main = "Length by supplement")
     title(xlab = "Supplement type", ylab = "Length of cells")
 })

# Length of cells based on type of dose level
with(ToothGrowth, {
     boxplot(len~dose, col = c("lightyellow", "lightgrey", "lightblue"))
     title(main = "Length by dose")
     title(xlab = "Dose level", ylab = "Length of cells")
})

# Length of cells based on type of supplement type and dose level
with(ToothGrowth, {
     boxplot(len~supp*dose, col = c("orange", "lightgreen"))
     title(main = "Length by supplment and dose")
     title(xlab = "Supplement and dose", ylab = "Length of cells")
})

It appears that overall, OJ delivers greater length than VC and overall, a higher dose appears more effective. However, there apppears equivalence at dose = 2 for OJ and VC.

T-Tests for Hypotheses

Hypothesis 1: OJ results in greater length than VC

Ho: mean difference between OJ and VC = 0
Ha: mean difference between OJ and VC > 0

OJ = ToothGrowth$len[ToothGrowth$supp=="OJ"]
VC = ToothGrowth$len[ToothGrowth$supp=="VC"]

test = t.test(OJ, VC, paired = FALSE, alternative = "greater")
print(test)
## 
##  Welch Two Sample t-test
## 
## data:  OJ and VC
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.4682687       Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333
qt(0.95, test$parameter)
## [1] 1.672874

For a one-sided t-test, the T-Statistic lies outside the acceptance region bound by qt(0.95, df). Also, P-Value < 0.05. Hence, we reject Ho.

Hypothesis 2: Higher doses result in greater length

Ho: mean difference between dose2 and dose1 = 0
Ha: mean difference between dose2 and dose1 > 0

dose2 = ToothGrowth$len[ToothGrowth$dose==2]
dose1 = ToothGrowth$len[ToothGrowth$dose==1]

test = t.test(dose2, dose1, paired = FALSE, alternative = "greater")
print(test)
## 
##  Welch Two Sample t-test
## 
## data:  dose2 and dose1
## t = 4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  4.17387     Inf
## sample estimates:
## mean of x mean of y 
##    26.100    19.735
qt(0.95, test$parameter)
## [1] 1.686976

For a one-sided t-test, the T-Statistic lies outside the acceptance region bound by qt(0.95, df). Also, P-Value < 0.05. Hence, we reject Ho.

Ho: mean difference between dose1 and dose.5 = 0
Ha: mean difference between dose1 and dose.5 > 0

dose1 = ToothGrowth$len[ToothGrowth$dose==1]
dose.5 = ToothGrowth$len[ToothGrowth$dose==0.5]

test = t.test(dose1, dose.5, paired = FALSE, alternative = "greater")
print(test)
## 
##  Welch Two Sample t-test
## 
## data:  dose1 and dose.5
## t = 6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  6.753323      Inf
## sample estimates:
## mean of x mean of y 
##    19.735    10.605
qt(0.95, test$parameter)
## [1] 1.68597

For a one-sided t-test, the T-Statistic lies outside the acceptance region bound by qt(0.95, df). Also, P-Value < 0.05. Hence, we reject Ho.

Hypothesis 3: OJ at 2mg/day is equivalent to VC at 2mg/day

Ho: mean difference between OJ at 2mg/day and VC at 2mg/day = 0
Ha: mean difference between OJ at 2mg/day and VC at 2mg/day != 0

OJ2 = ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==2]
VC2 = ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==2]

test = t.test(OJ2, VC2, paired = FALSE, alternative = "two.sided")
print(test)
## 
##  Welch Two Sample t-test
## 
## data:  OJ2 and VC2
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14
qt(0.025, test$parameter); qt(0.975, test$parameter)
## [1] -2.144216
## [1] 2.144216

For a two-sided t-test, the T-Statistic lies within the acceptance region bound by qt(0.025, df) and qt(0.975, df). Also, P-Value > 0.05. Hence, we accept Ho.

Conclusion

Based on visual exploration and t-test results, we conclude that:
1. Mean difference in length from OJ vs from VC > 0
2. Mean difference in length at higher doses vs lower doses > 0
3. Mean difference in length from OJ at 2mg/day vs VC at 2mg/day = 0