The data:

strain <- data.frame(A = c(9,27,22,30,16), B = c(3,12,7,15,12),
                     C = c(30,47,50,52,26), D = c(44,38,37,49,40))
strain1 <- data.frame(fac = c(rep("A",5),rep("B",5),rep("C",5),rep("D",5)),
                      resp = c(strain$A,strain$B,strain$C,strain$D))
head(strain1)
##   fac resp
## 1   A    9
## 2   A   27
## 3   A   22
## 4   A   30
## 5   A   16
## 6   B    3
  1. In this data set, the factor is the fac variable or column with four levels, A, B, C, and D. The response is the resp variable or column.

  2. The cell means model: \[ Y_{ij}=\mu_{j}+e_{ij}\] Y(ij) is the resonse i in group j (A,B,C, or D). u(j) is the mean of group j. The model assumes that the error term e(ij) follows normal distribution with a mean of zero and a constant variance.

  3. The treatment effect model: \[ Y_{ij} = \mu + \alpha_{j} + e_{ij} \] With u being the overall mean, alpha(j) being the treatment effect of strain j (A,B,C,or D) and the error term e(ij).

  4. The ANOVA

result <- aov(resp ~ fac, data = strain1)
summary(result)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## fac          3   3683  1227.8   18.55 1.85e-05 ***
## Residuals   16   1059    66.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The null hypothesis in (2) is that the true means of all four groups are equal. The null hypothesis in (3) is that all treatment effects are zero. \[ H_{0}: \mu_{A}=\mu_{B}=\mu_{C}=\mu_{D} \] \[ H_{0}: \alpha_{1}=\alpha_{2}= ... =\alpha_{j}=0 \] The test statistic F = 18.55, and the p-value is less than 0.05, so we reject the null hytpothesis at the alpha level of 0.05. We conclude that at least one of the means is different than others.

  1. Post Hoc Analysis
TukeyHSD(result)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = resp ~ fac, data = strain1)
## 
## $fac
##      diff        lwr       upr     p adj
## B-A -11.0 -25.719662  3.719662 0.1833539
## C-A  20.2   5.480338 34.919662 0.0059536
## D-A  20.8   6.080338 35.519662 0.0046950
## C-B  31.2  16.480338 45.919662 0.0000879
## D-B  31.8  17.080338 46.519662 0.0000708
## D-C   0.6 -14.119662 15.319662 0.9994077

The post-hoc analysis with Tukey’s method shows that differences C-A, D-A, C-B, and D-B are significant.

sapply(strain, mean) # the means of all four
##    A    B    C    D 
## 20.8  9.8 41.0 41.6

The mean of D is the highest, but not significantly higher that the meanof C. So there is no single mean that os significantly higher than others.

summary(result) # the ANOVA table
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## fac          3   3683  1227.8   18.55 1.85e-05 ***
## Residuals   16   1059    66.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(result) # The Tukey output
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = resp ~ fac, data = strain1)
## 
## $fac
##      diff        lwr       upr     p adj
## B-A -11.0 -25.719662  3.719662 0.1833539
## C-A  20.2   5.480338 34.919662 0.0059536
## D-A  20.8   6.080338 35.519662 0.0046950
## C-B  31.2  16.480338 45.919662 0.0000879
## D-B  31.8  17.080338 46.519662 0.0000708
## D-C   0.6 -14.119662 15.319662 0.9994077