Introduction

This is a reference document on two-way ANOVA. It is more generalizable to Factorial Anova. Factorial ANOVA is perhaps the simplest model that has structure. It is often called a factorial treatment structure. If we have all possible combinations of factors, then the model is crossed. The model is below:

\(Y_{ijk} = \mu + \alpha_{i} + \beta_{j} + (\alpha*\beta)_{ij} + e_{ijk}\)

where i and j represent the levels of two factors and there are k = 1,…,n observations. Note the interaction term.

Recall the key assumptions of ANOVA, which can be succicently stated as having errors that are independent and normally distributed with mean 0 and constant variance. (That says quite a lot.) We also impose “sum-to-zero” constraints on our parameters.

Our degrees of freedom are i-1, j-1 and (i-1)(j-1) for \(\alpha\), \(\beta\) and their interaction respectively.

Hypothetical Experiment

3 individuals each are put on 3 diets (A-C) in 2 countries (USA and UK). Thus, there are 3 replications and 18 observations total.

Input =("
Diet    Country  Weight_change
 A       USA      0.120
 A       USA      0.125
 A       USA      0.112
 A       UK       0.052
 A       UK       0.055
 A       UK       0.044
 B       USA      0.096
 B       USA      0.100
 B       USA      0.089
 B       UK       0.025
 B       UK       0.029
 B       UK       0.019
 C       USA      0.149
 C       USA      0.150
 C       USA      0.142
 C       UK       0.077
 C       UK       0.080
 C       UK       0.066
")
Data = read.table(textConnection(Input),header=TRUE)
Data$Country = factor(Data$Country, levels=unique(Data$Country))
rm(Input)

We can get a count of observations data using xtabs. You can see we have a nice, balanced, orthogonal design.

We can look at group means and errors using aggregate.

xtabs( ~ Diet + Country, data = Data)
##     Country
## Diet USA UK
##    A   3  3
##    B   3  3
##    C   3  3
aggregate(Weight_change ~ Diet + Country, Data, mean)
##   Diet Country Weight_change
## 1    A     USA    0.11900000
## 2    B     USA    0.09500000
## 3    C     USA    0.14700000
## 4    A      UK    0.05033333
## 5    B      UK    0.02433333
## 6    C      UK    0.07433333
aggregate(Weight_change ~ Diet + Country, Data, sd)
##   Diet Country Weight_change
## 1    A     USA   0.006557439
## 2    B     USA   0.005567764
## 3    C     USA   0.004358899
## 4    A      UK   0.005686241
## 5    B      UK   0.005033223
## 6    C      UK   0.007371115

Alternative Hypothesis/Power Analysis

Most analysis are just trying to prove there is an effect. If we care about the size of the effect, we should think about our alternative hypothesis and we should probably stop right here and do a power analysis.

library(pwr2)
pwr.2way(a = 2, b = 3, alpha = 0.05, size.A = 9, size.B = 6, f.A = .4, f.B = .55)
## 
##      Balanced two-way analysis of variance power calculation 
## 
##               a = 2
##               b = 3
##             n.A = 9
##             n.B = 6
##       sig.level = 0.05
##         power.A = 0.8218751
##         power.B = 0.8113444
##           power = 0.8113444
## 
## NOTE: power is the minimum power among two factors

In order to have acceptable power, we need to assume that the country effect (f.A) is .4 and the diet effect (f.B) is .55.

We can see below that neither of these are possible. So we have an underpowered study.

aggregate(Weight_change ~ Diet, Data, mean)
##   Diet Weight_change
## 1    A    0.08466667
## 2    B    0.05966667
## 3    C    0.11066667
aggregate(Weight_change ~ Country, Data, mean)
##   Country Weight_change
## 1     USA    0.12033333
## 2      UK    0.04966667

Interaction Effects

The order of analysis starts with asking whether the interaction term is significant. If it is, then we use the full model. You can use the interaction.plot() function from the stats package, use ggplot2, but I like the plot produced by the phia package because it references the model object and is therefore easy to program. The plot also gives the interaction terms viewed both ways- the interaction between diet and country and the interaction between country and diet.

library(phia)
model = lm(Weight_change ~ Country + Diet + Country:Diet, data = Data)
IM = interactionMeans(model)
plot(IM)

Clearly, the lines don’t cross so the interaction is not significant. We see the highest weight change in the USA on diet C. This can be verified with a hypothesis test/significance test.

library(car)
Anova(model, type = "II")
## Anova Table (Type II tests)
## 
## Response: Weight_change
##                Sum Sq Df  F value    Pr(>F)    
## Country      0.022472  1 657.7171 7.523e-12 ***
## Diet         0.007804  2 114.2049 1.547e-08 ***
## Country:Diet 0.000012  2   0.1756    0.8411    
## Residuals    0.000410 12                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Or you can use aov()

summary(aov(Weight_change ~ Country + Diet + Country:Diet, data = Data))
##              Df   Sum Sq  Mean Sq F value   Pr(>F)    
## Country       1 0.022472 0.022472 657.717 7.52e-12 ***
## Diet          2 0.007804 0.003902 114.205 1.55e-08 ***
## Country:Diet  2 0.000012 0.000006   0.176    0.841    
## Residuals    12 0.000410 0.000034                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thus, we drop the interaction term and re-run the analysis with the main effects model.

Main Effects Model

The main effects model is an additive model.

model = lm(Weight_change ~ Country + Diet, data = Data)
Anova(model)
## Anova Table (Type II tests)
## 
## Response: Weight_change
##             Sum Sq Df F value    Pr(>F)    
## Country   0.022472  1  745.52 1.527e-13 ***
## Diet      0.007804  2  129.45 9.351e-10 ***
## Residuals 0.000422 14                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see both factors are significant. We may want to know their effects for particular levels. We can see this by looking at the model’s coefficients.

coef(model)
## (Intercept)   CountryUK       DietB       DietC 
##  0.12000000 -0.07066667 -0.02500000  0.02600000

Of course, we have reference levels.

Post-hoc testing with lsmeans

The Tukey method however deals with all possible pairwise comparisons, while a the same time correcting for the increase in type 1 error that you would otherwise have when doing multiple contrasts.

library(lsmeans)
lsmeans(model, pairwise ~ Country, adjust = "tukey", details = TRUE)
## $lsmeans
##  Country lsmean      SE df lower.CL upper.CL
##  USA     0.1203 0.00183 14   0.1164   0.1243
##  UK      0.0497 0.00183 14   0.0457   0.0536
## 
## Results are averaged over the levels of: Diet 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate      SE df t.ratio p.value
##  USA - UK   0.0707 0.00259 14 27.304  <.0001 
## 
## Results are averaged over the levels of: Diet
lsmeans(model, pairwise ~ Diet, adjust = "tukey", details = TRUE)
## $lsmeans
##  Diet lsmean      SE df lower.CL upper.CL
##  A    0.0847 0.00224 14   0.0799   0.0895
##  B    0.0597 0.00224 14   0.0549   0.0645
##  C    0.1107 0.00224 14   0.1059   0.1155
## 
## Results are averaged over the levels of: Country 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate      SE df t.ratio p.value
##  A - B       0.025 0.00317 14   7.887 <.0001 
##  A - C      -0.026 0.00317 14  -8.202 <.0001 
##  B - C      -0.051 0.00317 14 -16.089 <.0001 
## 
## Results are averaged over the levels of: Country 
## P value adjustment: tukey method for comparing a family of 3 estimates

We can compare groups using CLD

marginal <- lsmeans(model, pairwise ~ Diet, adjust = "tukey")
CLD(marginal, alpha = 0.05, Letters = letters, adjust = "tukey")
## Warning in CLD.emm_list(marginal, alpha = 0.05, Letters = letters, adjust
## = "tukey"): `CLD()` called with a list of 2 objects. Only the first one was
## used.
##  Diet lsmean      SE df lower.CL upper.CL .group
##  B    0.0597 0.00224 14   0.0536   0.0657  a    
##  A    0.0847 0.00224 14   0.0786   0.0907   b   
##  C    0.1107 0.00224 14   0.1046   0.1167    c  
## 
## Results are averaged over the levels of: Country 
## Confidence level used: 0.95 
## Conf-level adjustment: sidak method for 3 estimates 
## P value adjustment: tukey method for comparing a family of 3 estimates 
## significance level used: alpha = 0.05
obj <- CLD(marginal, alpha = 0.05, Letters = letters, adjust = "tukey")
## Warning in CLD.emm_list(marginal, alpha = 0.05, Letters = letters, adjust
## = "tukey"): `CLD()` called with a list of 2 objects. Only the first one was
## used.
plot(obj)

Alternatively, we can do Fisher’s test. With Tukey’s procedure it is more difficult to find a difference than with Fisher’s protected LSD. Fisher’s LSD method does not offer full control of the experiment wise type I error rate, which Tukey’s does. Note that unlike Tukey’s test it puts diets A and B in the same group.

library(agricolae) #Fisher
print(LSD.test(Data$Weight_change, Data$Diet, 14, 0.000422, p.adj="none", alpha = 0.05, group=TRUE)) #df errors, ms errors
## $statistics
##    MSerror Df  Mean       CV  t.value        LSD
##   0.000422 14 0.085 24.16781 2.144787 0.02543781
## 
## $parameters
##         test p.ajusted    name.t ntr alpha
##   Fisher-LSD      none Data$Diet   3  0.05
## 
## $means
##   Data$Weight_change        std r        LCL        UCL   Min   Max
## A         0.08466667 0.03800877 6 0.06667942 0.10265391 0.044 0.125
## B         0.05966667 0.03899573 6 0.04167942 0.07765391 0.019 0.100
## C         0.11066667 0.04016798 6 0.09267942 0.12865391 0.066 0.150
##       Q25    Q50     Q75
## A 0.05275 0.0835 0.11800
## B 0.02600 0.0590 0.09425
## C 0.07775 0.1110 0.14725
## 
## $comparison
## NULL
## 
## $groups
##   Data$Weight_change groups
## C         0.11066667      a
## A         0.08466667      b
## B         0.05966667      b
## 
## attr(,"class")
## [1] "group"
out <- LSD.test(Data$Weight_change, Data$Diet, 14, 0.000422, p.adj="none", alpha = 0.05, group=TRUE)
plot(out, variation = "SD")

Checking Assumptions

You can check normality visually or by using shapiro-wilk.

plot(model, which = 2)

shapiro.test(model$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model$residuals
## W = 0.88252, p-value = 0.02879

You can check constant variance visually or use Levene’s test.

plot(model, which = 1, add.smooth = FALSE)

library(car)
leveneTest(y = Data$Weight_change, group = factor(Data$Diet))
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.2135 0.8101
##       15

Note that these results replicate in minitab. #References https://rcompanion.org/handbook/G_09.html https://newonlinecourses.science.psu.edu/stat503/node/26/ https://stat.ethz.ch/~meier/teaching/anova/factorial-treatment-structure.html https://www.researchgate.net/post/What_is_the_difference_between_Tukeys_Post_Hoc_Test_and_Students_t-test https://passel.unl.edu/pages/informationmodule.php?idinformationmodule=1130447277&topicorder=29&maxto=34&minto=1