The datatable “ToothGrowth” consists of three variables - a dependant variable named ‘len’ and two independant variables named ‘dose’ and ‘supp’.
A summary of the data by dose and supp:
library(plyr)
ddply(ToothGrowth, .(supp, dose), function(x) c(summary(x$len), Std.dev. = round(sd(x$len),2)))
## supp dose Min. 1st Qu. Median Mean 3rd Qu. Max. Std.dev.
## 1 OJ 0.5 8.2 9.70 12.25 13.23 16.18 21.5 4.46
## 2 OJ 1.0 14.5 20.30 23.45 22.70 25.65 27.3 3.91
## 3 OJ 2.0 22.4 24.58 25.95 26.06 27.08 30.9 2.66
## 4 VC 0.5 4.2 5.95 7.15 7.98 10.90 11.5 2.75
## 5 VC 1.0 13.6 15.27 16.50 16.77 17.30 22.5 2.52
## 6 VC 2.0 18.5 23.38 25.95 26.14 28.80 33.9 4.80
Histograms of the data by dose and supp:
library(ggplot2)
ggplot(ToothGrowth, aes(len)) + stat_bin(binwidth=2) + facet_grid(dose~supp)
There is such a clear difference between doses that this does does not need to be tested, but is each supp group significantly different from each other when the dose is held constant? Assuming a constant variance, and that the groups are not paired (we don’t know what the variables mean), the 95% confidence interval for the difference between each dose pair is:
t_df <- ddply(ToothGrowth , ~dose, function(x) t.test(len ~ supp, paired = F,var.equal=T, data = x)$conf)
colnames(t_df)[2:3] <- c("95% CI lower", "95% CI upper")
t_df
## dose 95% CI lower 95% CI upper
## 1 0.5 1.770262 8.729738
## 2 1.0 2.840692 9.019308
## 3 2.0 -3.722999 3.562999
These t.test confidence intervals tell us (with 95% significance) that there is a significant difference between len variables for each supp when dose is 0.5 or 1.0, but not when dose is 2.0, as the 95% confidence interval in the latter case includes zero.