Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package.
library(ggplot2)
library(datasets)
data(ToothGrowth)
dim(ToothGrowth) # dimension of the dataframe
## [1] 60 3
head(ToothGrowth) # first some rows from the dataframe
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
Dataframe consist of 60 observations on three variables: the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).
# Boxplot
boxplot(len ~ supp * dose, data=ToothGrowth, ylab="Tooth Length", main="Comparing Tooth Growth between different \n supplements and different doses", col=c("cyan", "darkblue"))
require(graphics)
coplot(len ~ dose | supp, data = ToothGrowth, panel = panel.smooth,
xlab = "ToothGrowth data: length vs dose, given type of supplement")
Conclusion
As we can see there are some basic assumptions that we can draw after plotting.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
table(ToothGrowth$supp,ToothGrowth$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
As the sample size is small I am going to use T-test to make a 95% confidence interval for the difference in tooth growth depending on supplement type. The assumption here is that this is a paired test so there is no other cause of tooth length differences other than supplement.
Null-hypothesis: OJ supplement has more impact than VC.
From the Lecture 8 we know that the confidence interval for different groups must be calculated as follows:
Let’s calculate this manually.
lenOJ<-ToothGrowth[ToothGrowth$supp=="OJ",]$len
lenVC<-ToothGrowth[ToothGrowth$supp=="VC",]$len
nOJ <- length(lenOJ)
nVC <- length(lenVC)
# the pooled variance estimator for independent groups is:
sp <- sqrt(((nOJ - 1)*sd(lenOJ)^2 + (nVC-1)*sd(lenVC)^2) / (nOJ + nVC - 2))
# find out mean difference
mean_diff <- mean(lenOJ) - mean(lenVC)
semd <- sp * sqrt(1 / nOJ + 1/nVC)
mean_diff + c(-1, 1) * qt(.975, nVC + nOJ - 2) * semd
## [1] -0.1670064 7.5670064
Assuming that these two groups don’t have constant variances, calculate the confidence interval and test the null-hypothesis again.
# As these two groups are independent and don't have the same variances
# we use both paried and var.equal to be FALSE
t.test(lenOJ,lenVC, paired=FALSE,var.equal=FALSE)$conf.int
## [1] -0.1710156 7.5710156
## attr(,"conf.level")
## [1] 0.95
Conclusion
As these both intervals above cover 0, the assumption that the supplement OJ has more impact than VC on the length of tooth failed, i.e. Null-hypothesis is not true.
Let’s calculate confidence interval for particular doses: 0.5, 1.0, 2.0 . We assume that groups are independent and don’t have constant variances.
Null-hypothesis is that summplement OJ has more impact than VC on tooth length if using dose 0.5
lenOJ<-ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose == .5,]$len
lenVC<-ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose == .5,]$len
t.test(lenOJ,lenVC, paired=FALSE,var.equal=FALSE)$conf.int
## [1] 1.719057 8.780943
## attr(,"conf.level")
## [1] 0.95
Conclusion
Confidence interval doesn’t cover zero, hence Null-hypothesis is true if we use dose 0.5 .
Null-hypothesis is that summplement OJ has more impact than VC on tooth length if using dose 1.0
lenOJ<-ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose == 1.0,]$len
lenVC<-ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose == 1.0,]$len
t.test(lenOJ,lenVC, paired=FALSE,var.equal=FALSE)$conf.int
## [1] 2.802148 9.057852
## attr(,"conf.level")
## [1] 0.95
Conclusion
Null-hypothesis is true if we use dose 1.0 .
Null-hypothesis is that summplement OJ has more impact than VC on tooth length if using dose 2.0
lenOJ<-ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose == 2.0,]$len
lenVC<-ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose == 2.0,]$len
t.test(lenOJ,lenVC, paired=FALSE,var.equal=FALSE)$conf.int
## [1] -3.79807 3.63807
## attr(,"conf.level")
## [1] 0.95
Conclusion
Confidence interval does cover zero, hence Null-hypothesis is false if we use dose 2.0 .