The data:
strain <- data.frame(A = c(9,27,22,30,16), B = c(3,12,7,15,12),
C = c(30,47,50,52,26), D = c(44,38,37,49,40))
strain1 <- data.frame(fac = c(rep("A",5),rep("B",5),rep("C",5),rep("D",5)),
resp = c(strain$A,strain$B,strain$C,strain$D))
head(strain1)
## fac resp
## 1 A 9
## 2 A 27
## 3 A 22
## 4 A 30
## 5 A 16
## 6 B 3
In this data set, the factor is the fac variable or column with four levels, A, B, C, and D. The response is the resp variable or column.
The cell means model: \[ Y_{ij}=\mu_{j}+e_{ij}\] Y(ij) is the resonse i in group j (A,B,C, or D). u(j) is the mean of group j. The model assumes that the error term e(ij) follows normal distribution with a mean of zero and a constant variance.
The treatment effect model: \[ Y_{ij} = \mu + \alpha_{j} + e_{ij} \] With u being the overall mean, alpha(j) being the treatment effect of strain j (A,B,C,or D) and the error term e(ij).
The ANOVA
result <- aov(resp ~ fac, data = strain1)
summary(result)
## Df Sum Sq Mean Sq F value Pr(>F)
## fac 3 3683 1227.8 18.55 1.85e-05 ***
## Residuals 16 1059 66.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The null hypothesis in (2) is that the true means of all four groups are equal. The null hypothesis in (3) is that all treatment effects are zero. \[ H_{0}: \mu_{A}=\mu_{B}=\mu_{C}=\mu_{D} \] \[ H_{0}: \alpha_{1}=\alpha_{2}= ... =\alpha_{j}=0 \] The test statistic F = 18.55, and the p-value is less than 0.05, so we reject the null hytpothesis at the alpha level of 0.05. We conclude that at least one of the means is different than others.
TukeyHSD(result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = resp ~ fac, data = strain1)
##
## $fac
## diff lwr upr p adj
## B-A -11.0 -25.719662 3.719662 0.1833539
## C-A 20.2 5.480338 34.919662 0.0059536
## D-A 20.8 6.080338 35.519662 0.0046950
## C-B 31.2 16.480338 45.919662 0.0000879
## D-B 31.8 17.080338 46.519662 0.0000708
## D-C 0.6 -14.119662 15.319662 0.9994077
The post-hoc analysis with Tukey’s method shows that differences C-A, D-A, C-B, and D-B are significant.
sapply(strain, mean) # the means of all four
## A B C D
## 20.8 9.8 41.0 41.6
The mean of D is the highest, but not significantly higher that the meanof C. So there is no single mean that os significantly higher than others.
summary(result) # the ANOVA table
## Df Sum Sq Mean Sq F value Pr(>F)
## fac 3 3683 1227.8 18.55 1.85e-05 ***
## Residuals 16 1059 66.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(result) # The Tukey output
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = resp ~ fac, data = strain1)
##
## $fac
## diff lwr upr p adj
## B-A -11.0 -25.719662 3.719662 0.1833539
## C-A 20.2 5.480338 34.919662 0.0059536
## D-A 20.8 6.080338 35.519662 0.0046950
## C-B 31.2 16.480338 45.919662 0.0000879
## D-B 31.8 17.080338 46.519662 0.0000708
## D-C 0.6 -14.119662 15.319662 0.9994077