In this project we will analyze the ToothGrowth data from the R datasets package.
According to the R documentation, the dataset shows “the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs” in relation to “one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods” (another term is “supplement type”): orange juice, coded as ‘OJ’, and ascorbic acid, a form of vitamin C and coded as ‘VC’.
In this part, we will take a brief look at the structure and characteristics of the data.
# loading and exploring the data
data("ToothGrowth")
str( ToothGrowth )
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# building a contingency table
table( ToothGrowth[,2:3] )
## dose
## supp 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
# exploring the averages of length by supplement type
aggregate( len ~ supp, data=ToothGrowth, FUN=mean )
## supp len
## 1 OJ 20.66333
## 2 VC 16.96333
# exploring the averages of length by dose
aggregate( len ~ dose, data=ToothGrowth, FUN=mean )
## dose len
## 1 0.5 10.605
## 2 1.0 19.735
## 3 2.0 26.100
# Making a figure of two box plots to visualize the above findings
par( mfcol=c(1,2) )
boxplot( len ~ supp, data=ToothGrowth, cex.axis=0.75, ylab="tooth length", xlab="supplement",
main="Tooth Growth and Supplements", col=c("orange","linen"), sep=":" )
legend("bottomleft", c("Orange Juice","Ascorbic Acid"), fill = c("orange","linen"), cex=0.75)
boxplot( len~dose, data=ToothGrowth, cex.axis=0.75, ylab="tooth length", xlab="dose (mg/day)",
main="Tooth Growth and Dosage", col=c("green","yellow","red"), sep=":" )
legend( "bottomright", c("0.5 mg/day","1 mg/day","2 mg/day"), fill=c("green","yellow","red"),
cex=0.85 )
The basic exploratory data analyses shows that:
We will use the default 95% confidence level and the Welch Two Sample t-test.
One-sided hypothesis tests will be conducted since our alternative hypotheses are of the two types: “greater” (\(\mu_1 > \mu_0\)), “less” (\(\mu_1 < \mu_0\)).
We assume unequal variances (a different variance per group) as no relevant information is available.
We also assume that the groups are independent.
The null hypothesis is assumed true and it is defined as \(\mu_1 = \mu_0\) (\(\mu_1 - \mu_0 = 0\)) i.e. there is no difference between the population means. We will reject it in favor of the alternative hypothesis provided the t-test p-value is smaller than 0.05 (\(\alpha = 0.05\)).
# test for the supplement OJ vs VC
t.test( len ~ supp, data=ToothGrowth, alternative="greater", var.equal=FALSE )
##
## Welch Two Sample t-test
##
## data: len by supp
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.4682687 Inf
## sample estimates:
## mean in group OJ mean in group VC
## 20.66333 16.96333
With this test (p-value 0.03032 \(< \alpha = 0.05\)), we get statistical evidence to reject the null hypothesis, \(\bar x_{OJ} = \bar x_{VC}\), in favor of the alternative hypothesis, \(\bar x_{OJ} > \bar x_{VC}\).
That can be interpreted as the supplement OJ (orange juice) seems to be a statistically better delivery method of vitamin C (disregarding dosage) than the supplement VC (ascorbic acid) for tooth growth in guinea pigs.
# test for the dose levels 2.0 vs 1.0
t.test( len ~ dose, data=ToothGrowth, alternative="less", var.equal=FALSE,
subset=dose %in% c(1.0,2.0) )
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -4.9005, df = 37.101, p-value = 9.532e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -4.17387
## sample estimates:
## mean in group 1 mean in group 2
## 19.735 26.100
# test for the dose levels 1.0 vs 0.5
t.test( len ~ dose, data=ToothGrowth, alternative="less", var.equal=FALSE,
subset=dose %in% c(0.5,1.0) )
##
## Welch Two Sample t-test
##
## data: len by dose
## t = -6.4766, df = 37.986, p-value = 6.342e-08
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -6.753323
## sample estimates:
## mean in group 0.5 mean in group 1
## 10.605 19.735
With these two tests (p-values 9.532e-06 and 6.342e-08 \(< \alpha = 0.05\)), we get statistical evidence to reject the null hypotheses, \(\bar x_{2.0} = \bar x_{1.0}\) and \(\bar x_{1.0} = \bar x_{0.5}\), in favor of the alternative hypotheses, \(\bar x_{2.0} > \bar x_{1.0}\) and \(\bar x_{1.0} > \bar x_{0.5}\).
That can be interpreted as the increasing dose level of vitamin C (disregarding a delivery method) appears to increase tooth growth in guinea pigs.
The supplement OJ (orange juice) seems to be a statistically better delivery method of vitamin C (disregarding dosage) than the supplement VC (ascorbic acid) for tooth growth in guinea pigs
The increasing dose level of vitamin C (disregarding a delivery method) appears to increase tooth growth in guinea pigs
These conclusions are based on the assumptions of independent groups of random samples, a different variance per group (unequal variances), and a low probability of making an error (due to the Type I error rate, \(\alpha\), set to be small).