From the course website: “Now in the second portion of the class, we’re going to analyze the ToothGrowth data in the R datasets package. [1] Load the ToothGrowth data and perform some basic exploratory data analyses Provide a basic summary of the data. [2] Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) [3] State your conclusions and the assumptions needed for your conclusions. Some criteria that you will be evaluated on Did you perform an exploratory data analysis of at least a single plot or table highlighting basic features of the data? Did the student perform some relevant confidence intervals and/or tests? Were the results of the tests and/or intervals interpreted in the context of the problem correctly? Did the student describe the assumptions needed for their conclusions?”
# Load Libraries for use
library(plyr)
library(ggplot2)
library(datasets)
library(grid)
data(ToothGrowth)
Our first look at the data reveal that we have 60 observations in three columns containing a length, supplement type, and a dosage:
dfTooth <- data.frame(ToothGrowth)
dim(dfTooth)
## [1] 60 3
head(dfTooth, 3)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
Next, let’s look at the two factors: supplement and dose.
# Convert suppllement and dose to factors
dfTooth$supp <- factor(dfTooth$supp)
dfTooth$dose <- factor(dfTooth$dose)
p1 <- ggplot(dfTooth, aes(x=supp, y=len)) + geom_boxplot(aes(fill=supp))
p2 <- ggplot(dfTooth, aes(x=len, fill=dose)) + geom_density(alpha = 0.5)
# http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/#put-two-potentially-unrelated-plots-side-by-side-pushviewport
pushViewport(viewport(layout = grid.layout(1, 2)))
print(p1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(p2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
Here are two looks: the boxplot of supplement versus length shows that their might not be a significant different. Looking at a plot of the histograms from dosage versus length, their might be a relationship there.
One last thing, lets check out the combination of a dose amount and a supplement to see their interaction.
# Thanks to: http://stats.stackexchange.com/questions/11406/boxplot-with-respect-to-two-factors-using-ggplot2-in-r
dfTooth$suppdose <- interaction(dfTooth$supp, dfTooth$dose)
ggplot(aes(y=len, x = suppdose), data = dfTooth) + geom_boxplot(aes(fill=suppdose))
Looks like there might just be a relationship at the dosage level 1mg and the supplement.
NOTE: Confidence intervals, p-values, etc. will only be reported, saving the Conclusion section to summarize the results.
Comparing the difference between supplement groups, independent of dose.
#Conduct t-test, then put results in a neat table for display
t1 <- t.test(len~supp, paired=F, var.equal=F, data=dfTooth)
t1.summary <- data.frame("p-value"=c(t1$p.value),"CI-Lower"=c(t1$conf[1]), "CI-Upper"=c(t1$conf[2]),
row.names=c("OJ vs. VC: "))
round(t1.summary,4)
## p.value CI.Lower CI.Upper
## OJ vs. VC: 0.0606 -0.171 7.571
Looking at the different dosage groups requires three comparisons: (1) .5 to 1; (2) .5 to 2; (3) 1 to 2
#First we must subset the groups, then conduct the test, then put in a table
df05 <- subset(dfTooth, dfTooth$dose==.5)
df10 <- subset(dfTooth, dfTooth$dose==1)
df20 <- subset(dfTooth, dfTooth$dose==2)
t0510<- t.test(df05$len, df10$len, paired=F, var.equal=F)
t0520<- t.test(df05$len, df20$len, paired=F, var.equal=F)
t1020<- t.test(df10$len, df20$len, paired=F, var.equal=F)
t2.summary <- data.frame("p-value"=c(t0510$p.value,t0520$p.value,t1020$p.value),
"CI-Lower"=c(t0510$conf[1],t0520$conf[1],t1020$conf[1]),
"CI-Upper"=c(t0510$conf[2],t0520$conf[2],t1020$conf[2]),
row.names=c(".5mg vs 1mg: ", ".5mg vs 2mg: ","1mg vs 2mg: "))
round(t2.summary, 6)
## p.value CI.Lower CI.Upper
## .5mg vs 1mg: 0.0e+00 -11.983781 -6.276219
## .5mg vs 2mg: 0.0e+00 -18.156167 -12.833833
## 1mg vs 2mg: 1.9e-05 -8.996481 -3.733519
Recall from our third graph, when the supplement was compared within each dosage group, it looked like there might have been a difference for the 1mg level. Let’s look within the groups just to check.
t05 <- t.test(len~supp, paired=F, var.equal=F, data=df05)
t10 <- t.test(len~supp, paired=F, var.equal=F, data=df10)
t20 <- t.test(len~supp, paired=F, var.equal=F, data=df20)
t3.summary <- data.frame("p-value"=c(t05$p.value,t10$p.value,t20$p.value),
"CI-Lower"=c(t05$conf[1],t10$conf[1],t20$conf[1]),
"CI-Upper"=c(t05$conf[2],t10$conf[2],t20$conf[2]),
row.names=c(".5mg OJ vs. VC: ", "1mg OJ vs. VC: ","2mg OJ vs. VC: "))
round(t3.summary, 6)
## p.value CI.Lower CI.Upper
## .5mg OJ vs. VC: 0.006359 1.719057 8.780943
## 1mg OJ vs. VC: 0.001038 2.802148 9.057852
## 2mg OJ vs. VC: 0.963852 -3.798070 3.638070
Restating assumptions that small sample sizes lend themselves to t-tests and that variances were never treated as equal, allowing R to calculate the pooled variance for the test.
Overall, there appears to be no difference in supplement as the p-value was .061 and the confidence interval contained zero.
Appearances of no difference in supplement is false when looking at the dosage groups. For both .5mg and 1mg groups, a p-value of .006 and .001 respectively was obtained and both confidence intervals did not contain zero. For 2mg, there was no difference in supplement. So, for lower dosages (.5mg, 1mg) the delivery mechanism of choice is OJ.
It was very apparent that higher dosages had a significant effect. In all cases, p-values were incredible small and no confidence interval contained zero.