Now in the second portion of the class, I am going to analyze the ToothGrowth data in the R datasets package using confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose.
data("ToothGrowth")
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
There are two levels in supp.
library(ggplot2)
ggplot(ToothGrowth, aes(x = supp, y = len)) + geom_boxplot()
For each levels, mean and standard deviations are here.
library(dplyr)
g_df <- tbl_df(ToothGrowth)
g_df_sm <- g_df %>% group_by(supp) %>% summarize(mean = mean(len), sd = sd(len))
print(g_df_sm)
The average for OJ is 20.66 with a standard variance of 6.60 while the average for VC is 16.96 with a standard variance of 8.26. Consider the 95% confidence interval estimate for the differences of the means. I assume a constant variance. I am looking for the inverval with substracting in this order (OJ - VC).
n_oj <- 30 # count of OJ
n_vc <- 30 # count of VC
m_oj <- 20.66 # mean of OJ
sd_oj <- 6.60 # standard variation of OJ
m_vc <- 16.96 # mean of VC
sd_vc <- 8.26 # standard variation of VC
# find pooled variance
spsq <- ( (n_oj - 1) * sd_oj^2 + (n_vc - 1) * sd_vc^2) / (n_oj + n_vc - 2)
# find confidence intervals
(m_oj - m_vc) + c(-1, 1) * qt(0.975, df=(n_oj + n_vc - 2)) * sqrt(spsq) * sqrt(1/n_oj + 1/n_vc)
## [1] -0.1640165 7.5640165
When subtracting (OJ - VC) the interval has zero. The difference of OJ and VC appears to be no effective.
Now, there are three levels in dose. Suppose 0.5 is A, 1.0 is B and 2.0 is C.
ggplot(ToothGrowth, aes(x = factor(dose), y = len)) + geom_boxplot()
I will find that there are some differences between A and B, A and C, B and C. Consider the 95% confidence interval estimate for the differences of the means. I assume a constant variance.
A <- ToothGrowth[ToothGrowth$dose == 0.5,]$len
B <- ToothGrowth[ToothGrowth$dose == 1.0,]$len
C <- ToothGrowth[ToothGrowth$dose == 2.0,]$len
Fist, compare A and B. (A - B)
t.test(A, B, paired = FALSE, var.equal = TRUE)$conf
## [1] -11.983748 -6.276252
## attr(,"conf.level")
## [1] 0.95
When subtracting (A - B) the interval is entirely below zero. The (A-B) appears to be effective.
Second, compare A and C. (A - C)
t.test(A, C, paired = FALSE, var.equal = TRUE)$conf
## [1] -18.15352 -12.83648
## attr(,"conf.level")
## [1] 0.95
When subtracting (A - C) the interval is entirely below zero. The (A-C) appears to be effective.
Last, compare B and C. (B - C)
t.test(B, C, paired = FALSE, var.equal = TRUE)$conf
## [1] -8.994387 -3.735613
## attr(,"conf.level")
## [1] 0.95
When subtracting (B - C) the interval is entirely below zero. The (B-C) appears to be effective.
So, there is no difference in results by supp, but dose’s values affect the len of toothgrowth.