library(datasets); data("ToothGrowth")
library(ggplot2)
g <- ggplot(ToothGrowth, aes(dose, len)) + geom_point() + geom_smooth(method = "lm") + facet_grid(.~supp)
g <- g + xlab("Dose in milligrams/day") + ylab("Tooth length")
g <- g + ggtitle("Supplement type (OJ or VC)")
g
library(dplyr)
by_supp <- group_by(ToothGrowth, supp)
s <- summarize(by_supp, mean(len))
s
## Source: local data frame [2 x 2]
##
## supp mean(len)
## (fctr) (dbl)
## 1 OJ 20.66333
## 2 VC 16.96333
It can be seen that, the mean of tooth length for orange juice (OJ) is 20.6633333 and is greater than the mean of tooth length for Vitamin C (VC) which is 16.9633333.
Now we compute the 95% confidence interval on the difference of mean between tooth length for OJ and tooth length for VC.
t.test(len ~ supp, paired = TRUE, data = ToothGrowth)$conf
## [1] 1.408659 5.991341
## attr(,"conf.level")
## [1] 0.95
lower <- t.test(len ~ supp, paired = TRUE, data = ToothGrowth)$conf[1]
upper<- t.test(len ~ supp, paired = TRUE, data = ToothGrowth)$conf[2]
int <- paste("(",lower,", ",upper,")", sep = "")
The interval = (1.40865864101199, 5.99134135898801). Since the entire interval is above zero, therefore it is statistically significant to show that the mean of tooth length for OJ is greater than the mean of tooth length for VC at 95% confidence level.
By computing this confidence interval, we assume that tooth length for OJ and VC is normally distributed.
Besides, we also assume that tooth length for OJ and VC are paired data, hence they are not independent.