Starting off

This tutorial covers much of Quinn & Keough (2024) ch. 7 & 10 on crossed and nested ANOVAs.

Remember to set your working directory before you start. You also need to load today’s packages.

library(effectsize)
library(emmeans)
library(ggplot2)
library(ggthemes)
library(GAD)
library(HH)
library(multcomp)

Crossed ANOVA (example with two fixed factors)

In an agricultural experiment to determine the effect of diet supplementation on animal growth, animals were randomly assigned a diet (barley, oats, or wheat), to which was randomly added a nutritional supplement (AgriMore, SuperGain, SuperSupp, or the unsupplemented control). The design was fully crossed with four replicates per treatment. In total, 48 animals were used (3 diets \(\times\) 4 supplements \(\times\) 4 replicates).

Read and check the data

First, read and check the data.

d <- read.csv("Data/animalgrowth.csv")
d$supplement <- factor(d$supplement,
                       levels = c("control", "agrimore", "supergain", "supersupp"),
                       labels = c("Control", "AgriMore", "SuperGain", "SuperSupp"))
d$diet <- factor(d$diet,
                 levels = c("barley", "oats", "wheat"),
                 labels = c("Barley", "Oats", "Wheat"))
replications(gain ~ supplement * diet, data = d)
##      supplement            diet supplement:diet 
##              12              16               4

This experiment is balanced: 4 replicates for each combination of supplement and diet, adding up to a total of 12 replicates for each of the four supplements and 16 replicates for each of the three diets.

The interaction2wt() function (in the HH package) can be used to create an (ugly) interaction plot:

interaction2wt(gain ~ supplement * diet, data = d)

There seem to be no significant interactions. You should explore the data more with ggplot before analysis.

Analysis in GAD

Let’s run the ANOVA in GAD:

di <- as.fixed(d$diet)
sup <- as.fixed(d$supplement)
gain.lm <- lm(gain ~ sup * di, data = d)
gad(gain.lm, quasi.f = TRUE)
## $f.versus
##        Numerator Denominator Num.Df Den.Df
## sup          sup   Residuals      3     36
## di            di   Residuals      2     36
## sup:di    sup:di   Residuals      6     36
## 
## $anova
## Analysis of Variance Table: Quasi F-ratios
## 
## All of your terms has an appropriated F-ratio, Quasi F-test is not necessary
## 
## Response: gain
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## sup        3  91.832  30.611  17.807 2.967e-07 ***
## di         2 287.022 143.511  83.483 3.019e-14 ***
## sup:di     6   3.415   0.569   0.331    0.9161    
## Residuals 36  61.886   1.719                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can check that the model structure is correct by getting the table of expected mean squares using the estimates() function.

estimates(gain.lm, quasi.f = TRUE)
## $tm
##        sup di n
## sup      0  3 4
## di       4  0 4
## sup:di   0  0 4
## Res      1  1 1
## 
## $mse
##           Mean square estimates   
## sup       "1*Residuals + 12*sup"  
## di        "1*Residuals + 16*di"   
## sup:di    "1*Residuals + 4*sup:di"
## Residuals "1*Residuals"           
## 
## $f.versus
##        Numerator Denominator
## sup          sup   Residuals
## di            di   Residuals
## sup:di    sup:di   Residuals

Plot the residuals:

opar <- par(mfrow = c(2, 2)) # Create a 2x2 plotting grid
plot(gain.lm, which = 5)     # Plot residuals for each factor level
plot(gain.lm, which = 1)     # Plot residuals vs. fitted values
plot(gain.lm, which = 2)     # Plot a Q-Q plot
plot(residuals(gain.lm) ~ I(1:48), main="Residuals vs Exp. Unit", font.main=1, data=d)
par(opar)                    # Return to original plotting settings (i.e., a 1x1 grid)

It looks pretty good to me.

Now calculate the effect sizes.

omega_squared(gain.lm)
## # Effect Size for ANOVA (Type I)
## 
## Parameter | Omega2 (partial) |       95% CI
## -------------------------------------------
## sup       |             0.51 | [0.29, 1.00]
## di        |             0.77 | [0.66, 1.00]
## sup:di    |             0.00 | [0.00, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

There is no significant interaction between diet and supplement. Nutritional supplement has a significant effect on growth (\(F_{3,36}=17.807, p<0.001\)), as does diet (\(F_{2,42}=83.483, p<0.001\)).

If there had been a significant interaction, we could use Tukey’s test on the interaction term. (Results not shown.)

emm <- emmeans(gain.lm, ~di:sup)
pairs(emm, adjust = "tukey")
(tukey_cld <- cld(emm))

But there is no significant interaction, so we can run Tukey’s test on both diet and/or supplement individually. Let’s assume that we are primarily interested in which supplement is most effective. We can conduct Tukey’s test.3

emm <- emmeans(gain.lm, ~sup)
pairs(emm, adjust = "tukey")
##  contrast              estimate    SE df t.ratio p.value
##  Control - AgriMore      -2.696 0.535 36  -5.036  0.0001
##  Control - SuperGain      0.685 0.535 36   1.280  0.5815
##  Control - SuperSupp     -1.968 0.535 36  -3.677  0.0041
##  AgriMore - SuperGain     3.381 0.535 36   6.316  <.0001
##  AgriMore - SuperSupp     0.728 0.535 36   1.359  0.5325
##  SuperGain - SuperSupp   -2.653 0.535 36  -4.957  0.0001
## 
## Results are averaged over the levels of: di 
## P value adjustment: tukey method for comparing a family of 4 estimates

Now create the “compact letter display” that shows which groups differ significantly.

(tukey_cld <- cld(emm))
##  sup       emmean    SE df lower.CL upper.CL .group
##  SuperGain   19.7 0.378 36     18.9     20.5  1    
##  Control     20.4 0.378 36     19.6     21.2  1    
##  SuperSupp   22.4 0.378 36     21.6     23.1   2   
##  AgriMore    23.1 0.378 36     22.3     23.9   2   
## 
## Results are averaged over the levels of: di 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 4 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.

SuperSupp and AgriMore are both significantly better than the control and SuperGain. SuperGain does not differ significantly from the control.

One way to plot the results is to plot the tukey_cld object as we did last week. This shows means, 95% confidence intervals (estimated using the pooled standard deviation) and the raw data.

plot(tukey_cld) +
  geom_jitter(aes(x = gain, y = supplement, colour = diet, shape = diet),
              height = 0.1, width = 0, alpha = 0.3, data = d) +
  labs(x = "Mass gain (kg?)", y = "Supplement", colour = "Diet", shape = "Diet") +
  coord_flip() +
  scale_colour_colorblind() +
  theme_few()

To add “a”, “b”, “ab” (etc.) indicators to the plot (corresponding to “1”, “2”, “12” (etc.) in the “compact letter display”), the easiest way is to edit the figure in Adobe Illustrator, Inkscape, etc.

Otherwise, we may wish to use the emmip() function to create an interaction plot showing 95% CIs.

emmip(gain.lm, sup ~ di, CI = TRUE) +
  labs(x = "Diet", y = "Mass gain (kg?)", colour = "Supplement") +
  scale_colour_colorblind() +
  theme_few()

Nested ANOVA (example with random effect nested in fixed effect)

Read and check the data

For comparison, we are using a modified version of the SSRI dataset we used last week. The example dataset is SSRI2.csv. All of the values are the same, but we have added a new column called “Type” indicating “brand-name” vs. “generic” drugs. The names of the drugs have also been relabelled to match the way they are often named in a nested ANOVA from experimental data. “Type” is the top level of our hierarchical ANOVA; SSRI is nested in Type.

  • SSRI1 from last week is now: Type = brand-name; SSRI = 1
  • SSRI2 from last week is now: Type = brand-name; SSRI = 2
  • SSRI3 from last week is now: Type = generic; SSRI = 1
  • SSRI4 from last week is now: Type = generic; SSRI = 2

Drug type is clearly a fixed effect. It’s unrealistic, but let’s temporarily assume the actual SSRIs should be treated as a random effect (i.e. perhaps they represent a random sample of all brand-name and generic drugs available).

# Read the data and convert variables as appropriate
d <- read.csv("Data/SSRI2.csv")
d$Type <- factor(d$Type,
                 levels = c("brandname", "generic"),
                 labels = c("Brand-name", "Generic"))
d$SSRI <- factor(d$SSRI)

Check for balance:

# Type/SSRI indicates nesting; it expands to Type + SSRI %in% Type
replications(Ki ~ Type/SSRI, data = d)
##      Type Type:SSRI 
##        30        15

There are 30 replicates for each level of Type (i.e., brand-name vs. generic), including 15 replicates of each SSRI within each Type. (Why is this not a good design for testing this hypothesis?)

Now plot the data. We can use facet_wrap() to split up the data by the top level in our hierarchy.

ggplot(data = d, aes(x = SSRI, y = Ki)) +
  facet_wrap(~ Type) + 
  geom_boxplot(outliers = FALSE) + 
  geom_jitter(height = 0, width = 0.1, alpha = 0.3) +
  theme_few() +
  labs(y = expression(italic(K[i])*" (nM)"))

At first glance, the data looks roughly normal and has roughly equal variance in each group. Note that if a nested factor is random, then we should be considering the variance of the SSRI means within each within “brand-name” and “generic”. We will examine assumptions further with the residual plots.

Analysis in GAD

To use GAD, you first define your variables as random or fixed. You then fit the model using lm() function and use gad() to summarize the model.

t <- as.fixed(d$Type)
s <- as.random(d$SSRI)
SSRI.lm <- lm(Ki ~ t + s %in% t, data = d)
(SSRI.tab <- gad(SSRI.lm, quasi.f = TRUE))
## $f.versus
##     Numerator Denominator Num.Df Den.Df
## t           t         t:s      1      2
## t:s       t:s   Residuals      2     56
## 
## $anova
## Analysis of Variance Table: Quasi F-ratios
## 
## All of your terms has an appropriated F-ratio, Quasi F-test is not necessary
## 
## Response: Ki
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## t          1   8217  8216.7  1.1381 0.39779  
## t:s        2  14440  7219.9  4.6583 0.01345 *
## Residuals 56  86794  1549.9                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To get the table of expected mean squares and the variance components, we can use GAD’s comp.var() function.4

comp.var(SSRI.lm, anova.tab = SSRI.tab)
## $mse
##           Mean square estimates        
## t         "1*Residuals + 15*t:s + 30*t"
## t:s       "1*Residuals + 15*t:s"       
## Residuals "1*Residuals"                
## 
## $comp.var
##             Type Estimate Square root Percentage
## t          Fixed   33.225       5.764      8.926
## t:s       Random  378.002      19.442     30.108
## Residuals Random 1549.899      39.369     60.966

We should plot the residuals to assess whether the data meet the assumptions of the test.

opar <- par(mfrow = c(2, 2)) # Create a 2x2 plotting grid
plot(SSRI.lm, which = 5)     # Plot residuals for each factor level
plot(SSRI.lm, which = 1)     # Plot residuals vs. fitted values
plot(SSRI.lm, which = 2)     # Plot a Q-Q plot
plot(residuals(SSRI.lm) ~ I(1:60), main="Residuals vs Exp. Unit", font.main=1, data=d)
par(opar)                    # Return to original plotting settings (i.e., a 1x1 grid)

They look good.

The omega_squared() function does not work correctly on random- or mixed-effects models fitted with aov() or lm(). If we want an effect size for Type, we need to calculate it manually or refit the model using lmer().

There is a significant difference among the SSRIs within Type (\(F_{2,56}=4.658, p=0.013\)), but no significant difference between Types (\(F_{1,2}=1.138, p=0.398\)).

To plot the results of an ANOVA with a nested random effect, you may wish to plot a single data point representing the mean of each subgroup, with a 95% CI calculated from the pooled standard deviation at the subgroup level.

ag <- aggregate(Ki ~ SSRI %in% Type, data = d, mean)  # Mean for each SSRI
ag2 <- aggregate(Ki ~ Type, data = d, mean)           # Mean for each Type
SD.subgrp <- sqrt((7219.9-1549.9)/15) # (subgrp MS - error MS) / num. reps per subgrp
ag2$CI <- qt(1 - 0.05/2, df = 2) * SD.subgrp/sqrt(2)

ggplot(data = ag, aes(x = Type, y = Ki)) +
  geom_jitter(height = 0, width = 0.1, alpha = 0.3) +
  geom_pointrange(data = ag2, aes(y = Ki, ymin = Ki-CI, ymax = Ki+CI)) +        
  labs(x = "Drug Type", y = expression(italic(K[i])*" (nM)")) +
  theme_few()

Our 95% CIs for the effect of brand-name vs. generic drugs are massive! That is because we did too much sampling at the wrong level of the experiment: replicates of each SSRI… Instead, we should have invested those resources by sampling more than two brand-name and generic SSRIs.

Analysis in aov()

To conduct this ANOVA in aov(), we need to specify an Error() term. For complex models, these can become quite complicated. In this case, Type + Error(Type:SSRI) indicates that SSRI is a random effect nested in the fixed effect Type.

op <- options(contrasts = c("contr.sum", "contr.poly")) # Set sum-to-zero contrasts

SSRI.aov <- aov(Ki ~ Type + Error(Type:SSRI), data = d) # Fit the model
summary(SSRI.aov)                                       # Create the ANOVA table
pf(7220/1550, df1 = 2, df2 = 56, lower.tail = FALSE)    # p-value for random effect
plot(lm(SSRI.aov))                                      # Plot the residuals

emmeans(SSRI.aov, ~Type)                                # Show EMMs
emmip(SSRI.aov, ~Type, CIs = TRUE) +                    # Plot the effect
  labs(x = "Drug Type",
       y = expression(italic(K[i])*" (nM)")) +
  theme_few()

options(op)                                             # Reset contrasts

If you run this analysis, you will find that the results from aov() are identical to those from GAD.

Analysis in lmer()

Later in the course, we will discuss the use of lmer(). For now, I will just show a simple workflow. lmer() is not great for this dataset, because our random effect (SSRI nested in Type) does not have \(\ge5\) levels.

In lmer(), random effects are indicated with parentheses. For instance, (1|Type:SSRI) indicates that we want a “random intercept” for each SSRI (nested in Type).

library(car)                                            # Load packages
library(lme4)
library(predictmeans)

op <- options(contrasts = c("contr.sum", "contr.poly")) # Set sum-to-zero contrasts

SSRI.lmer <- lmer(Ki ~ Type + (1|Type:SSRI), data = d)  # Fit the model
(SSRI.tab <- Anova(SSRI.lmer, test = "F", type = 3))    # Create the ANOVA table
omega_squared(SSRI.tab)                                 # Show effect sizes
S(SSRI.lmer)                                            # Create the coefficient table
confint(SSRI.lmer, oldNames = FALSE)                    # Show confidence intervals
residplot(SSRI.lmer, group = "Type")                    # Check the residuals

emmeans(SSRI.lmer, ~Type)                               # Show EMMs
emmip(SSRI.lmer, ~Type, CIs = TRUE) +                   # Plot the effect
  labs(x = "Drug Type",
       y = expression(italic(K[i])*" (nM)")) +
  theme_few()

options(op)                                             # Reset contrasts

If you run this analysis, you will find that:

  1. The F-ratio and p-value for the fixed effect (Type) is exactly the same as when we used GAD.
  2. There is no p-value provided for the nested random effect (i.e., Type:SSRI) because the use of p-values for random effects in linear mixed models is controversial and the creators of the lmer() function do not calculate p-values for random effects.5 Instead, they recommend examining the confidence interval for each random effect to see whether it overlaps 0.
  3. The variance components are exactly the same as when we used GAD.
  4. Within each Type (i.e., brand-name and generic), the SSRIs appear more similar to each other when using lmer() than when using GAD. This is because REML shrinks predicted values of random effects towards the overall group mean. This shrinkage produces more accurate predictions on new data, particularly when small sample sizes were used for fitting the model.

This week’s task

ANOVA 1

The data for this task are stored as toothgrowth.csv. The purpose of this experiment on guinea pigs was to determine the effects of vitamin C supplementation on tooth growth. Researchers were interested in both how the vitamin C was delivered to the guinea pigs, and in the dose of vitamin C. In total, 60 guinea pigs were used:

  • Each animal received vitamin C by one of two delivery methods: orange juice (“OJ”) or ascorbic acid (“AA”).
  • Each delivery method was tested using three dose levels of vitamin C (“low (0.5 mg/day)” = “L”, “medium (1.0 mg/day)” = “M”, and “high (2 mg/day)” = “H”)
  • Each delivery/dose combination was tested on 10 guinea pigs.

Perform appropriate analyses to address your research question.

ANOVA 2

The data for this task are stored as rats.csv. The purpose of this experiment on rats was to test if increasing any of three specific treatments affects glycogen levels. The experiment was conducted as follows. In total, there are 36 glycogen measurements:

  • Two rats were randomly assigned to each of the three treatments.
  • For each rat, three liver samples were taken.
  • Each liver sample was assayed twice for glycogen.

Perform appropriate analyses to address your research question.


  1. aov() requires an Error() term to specify nested structures; this can be difficult to determine for complex ANOVAs.↩︎

  2. When using REML, at least 10 levels for each random effect is strongly preferred, with a practical minimum of ~5.↩︎

  3. If you are only interested in comparing each supplement to the control, then you will have greater power to detect differences if you use Dunnett’s test (contrast(emm, "dunnett", ref = "Control")) instead of Tukey’s test (pairs(emm, adjust = "tukey")).↩︎

  4. In GAD 2.0, there is a bug in this function. It calculates percentages using standard deviations instead of variances (and it includes the contribution of fixed effects). To obtain more accurate estimates of variance components (with confidence intervals), functions in the VCA package can be used.↩︎

  5. If you want to obtain approximate p-values for random effects, the easiest way is using the ranova() function from the lmerTest package; however, these p-values are typically “conservative” (i.e., higher than they theoretically should be). To calculate more accurate p-values, see Fox & Weisberg (2019), pg. 354.↩︎