This project analyzes the ToothGrowth data in the R datasets package to explore and infer statistically the relationship between tooth growth and supplement and dose, using the following approaches:
1.Load the ToothGrowth data and perform some basic exploratory data analyses.
2.Provide a basic summary of the data.
3.Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
4.State conclusions and assumptions.
With reference to the R Documentation for ToothGrowth via help(ToothGrowth), the dataset consists of the response in the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (OJ for orange juice, or VC for ascorbic acid). Below is a summary of the data:
data(ToothGrowth) # ToothGrowth dataset
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
At Figure 1, the boxplot suggests that orange juice is more effective in tooth growth than ascorbic acid generally. Cross-checking with the conditional plot (below the boxplot ), the average tooth length by dosage (red line) shows that the average tooth length is greater for doses of 0.5mg and 1mg of Vitamin C through orange juice. At 2 mg of Vitamin C, average tooth length for both orange juice and absorbic acid are the same.
Hence, at this exploratory stage, there seems to be a positive correlation between the tooth growth and the dosage, where orange juice is more effective for lower doses (0.5mg and 1mg, with no test data for 1.5mg) than ascobic acid and the 2mg may be be the maximum limit of both supplements.
This section performs the confidence intervals and hypothesis test (Reference A.1.) via t-test by examining the impact of tooth growth by:
(a) Dosage Alone
(b) Supplement Alone
(c) Compare Supplement and Dosage
df<-data.frame(ToothGrowth)
names(df)[1]<-paste("Length")
names(df)[2]<-paste("Supplement")
names(df)[3]<-paste("Dose")
# Add 'Dosage': 0.5mg as Small Dosage, 1mg as Medium Dosage, 2mg as Large Dosage.
Dosage=sapply(as.character(df$Dose),function(x) as.factor(switch(x,'0.5'='SD','1'='MD','2'='LD')))
cbind(df,Dosage)
set1<-subset(df,Dosage=='SD')$Length
set2<-subset(df,Dosage=='MD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -11.983781 -6.276219
Result: When dosage is increased from 0.5mg to 1mg, this test shows that the confidence interval does not contain zero. Hence, we reject the null hypothesis that this dose increase does not increase tooth length.
set1<-subset(df,Dosage=='MD')$Length
set2<-subset(df,Dosage=='LD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -8.996481 -3.733519
Result: When the dosage is increased from 1mg to 2mg, the confidence interval does not contain zero. Hence, we reject the null hypothesis that this dose increase does not increase tooth length. This dosage-alone test shows the increased dosage leads to an increased tooth length.
set1<-subset(df,Supplement=='VC')$Length
set2<-subset(df,Supplement=='OJ')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$p.value
## [1] 0.06063451
t$conf.int[1:2]
## [1] -7.5710156 0.1710156
Result: In this test, the p-value is greater than 0.5% and the confidence interval contains zero. Hence, we do not reject the null hypothesis. We are confident that the type of supplement alone does not affect tooth growth.
set1<-subset(df,Supplement=='VC' & Dosage=='SD')$Length
set2<-subset(df,Supplement=='OJ' & Dosage=='SD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -8.780943 -1.719057
Result: The confidence interval does not contain zero. Hence, we reject the null hypothesis that small dosage of both supplement does not increase tooth length.
set1<-subset(df,Supplement=='VC' & Dosage=='MD')$Length
set2<-subset(df,Supplement=='OJ' & Dosage=='MD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -9.057852 -2.802148
Result: The confidence interval does not contain zero. Hence, we reject the null hypothesis that median dosage of both supplement does not increase tooth length.
set1<-subset(df,Supplement=='VC' & Dosage=='LD')$Length
set2<-subset(df,Supplement=='OJ' & Dosage=='LD')$Length
t<-t.test(set1,set2,paired=FALSE,var.equal=FALSE)
t$conf.int[1:2]
## [1] -3.63807 3.79807
Result: The confidence interval contains zero. Hence, we fail to reject the null hypothesis that large dosage of both supplement does not increase tooth length.
A.1. Confidence Intervals & Hypothesis Testing http://davidmlane.com/hyperstat/B15183.html
There is an extremely close relationship between confidence intervals and hypothesis testing. When a 95% confidence interval is constructed, all values in the interval are considered plausible values for the parameter being estimated. Values outside the interval are rejected as relatively implausible. If the value of the parameter specified by the null hypothesis is contained in the 95% interval then the null hypothesis cannot be rejected at the 0.05 level. If the value specified by the null hypothesis is not in the interval then the null hypothesis can be rejected at the 0.05 level. If a 99% confidence interval is constructed, then values outside the interval are rejected at the 0.01 level.
The R codes for the figures plotted in this project:
# Plot Figure 1
# Boxplot
boxplot(len ~ supp * dose, data=ToothGrowth,
col=(c("orange","green")),
main="Figure 1
Boxplot of Tooth Growth for 10 Guinea Pigs",
xlab="Suppliment and Dose",ylab="Tooth Length")
legend('bottomright', c("OJ: Orange juice", "VC: Ascorbic acid"),
fill = c("orange","green"),bty = "n")
# Conditional Plot
coplot(len ~ dose|supp, data=ToothGrowth, panel=panel.smooth, col=par("fg"),
xlab="Dosage", ylab="Tooth Length", main="test")