This tutorial covers much of Quinn & Keough (2024) ch. 7 & 10 on crossed and nested ANOVAs.
Remember to set your working directory before you start. You also need to load today’s packages.
library(effectsize)
library(emmeans)
library(ggplot2)
library(ggthemes)
library(GAD)
library(HH)
library(multcomp)
By default, R uses Type I sums of squares (SS) in ANOVA tables (e.g,
produced by aov() + summary()). Type I SS is
generally appropriate for one-way ANOVAs or balanced designs but can be
misleading in unbalanced models. Type II and Type III SS are better
suited for unbalanced designs and can be computed using the
Anova() function from the car package.
Type II vs. Type III SS is contentious. Type II SS is commonly
recommended in the R ecosystem, while Type III SS is often preferred by
users of other statistical software (e.g., SAS, Minitab). To match
output from these programs exactly, you must also use sum-to-zero
contrasts (contr.sum) for unordered factors.
Therefore, in R, the most common ways to conduct complex ANOVAs are:
aov() + summary() for (nearly) balanced
fixed-, random-, and mixed-effects designs1lm() + Anova() for (potentially
unbalanced) fixed-effects designslmer() + Anova(test="F") for (potentially
unbalanced) random- and mixed-effects designs2However, when the design is balanced, lm() +
gad() (in the GAD package) can be used to analyse any
fixed-, random-, or mixed-effects ANOVA using OLS. GAD can be quite
useful for figuring out whether you have specified your model correctly
even if other methods (e.g., aov(), lmer())
are used to conduct the analysis. GAD’s main benefits are:
aov().A few more notes about the GAD package:
s <- as.random(d$SSRI) is acceptable, but
SSRI <- as.random(d$SSRI) would cause GAD to fail.lm() function to fit models. In reality,
the lm() function fits fixed-effects models only;
only functions in the GAD package recognize random effects declared
using as.random(). Therefore, all functions in R
except those in the GAD package think that all of the terms in
a GAD model are fixed effects. Failure to appreciate this can
cause incorrect estimates of standard errors, confidence intervals, and
p-values if functions from other packages are used in
downstream analyses to analyse random- or mixed-effects models from GAD.
For downstream analyses of random- and mixed-effects models, you may
need to refit the model in aov() or
lmer().In an agricultural experiment to determine the effect of diet supplementation on animal growth, animals were randomly assigned a diet (barley, oats, or wheat), to which was randomly added a nutritional supplement (AgriMore, SuperGain, SuperSupp, or the unsupplemented control). The design was fully crossed with four replicates per treatment. In total, 48 animals were used (3 diets \(\times\) 4 supplements \(\times\) 4 replicates).
First, read and check the data.
d <- read.csv("Data/animalgrowth.csv")
d$supplement <- factor(d$supplement,
levels = c("control", "agrimore", "supergain", "supersupp"),
labels = c("Control", "AgriMore", "SuperGain", "SuperSupp"))
d$diet <- factor(d$diet,
levels = c("barley", "oats", "wheat"),
labels = c("Barley", "Oats", "Wheat"))
replications(gain ~ supplement * diet, data = d)
## supplement diet supplement:diet
## 12 16 4
This experiment is balanced: 4 replicates for each combination of supplement and diet, adding up to a total of 12 replicates for each of the four supplements and 16 replicates for each of the three diets.
The interaction2wt() function (in the HH
package) can be used to create an (ugly) interaction plot:
interaction2wt(gain ~ supplement * diet, data = d)
There seem to be no significant interactions. You should explore the data more with ggplot before analysis.
Let’s run the ANOVA in GAD:
di <- as.fixed(d$diet)
sup <- as.fixed(d$supplement)
gain.lm <- lm(gain ~ sup * di, data = d)
gad(gain.lm, quasi.f = TRUE)
## $f.versus
## Numerator Denominator Num.Df Den.Df
## sup sup Residuals 3 36
## di di Residuals 2 36
## sup:di sup:di Residuals 6 36
##
## $anova
## Analysis of Variance Table: Quasi F-ratios
##
## All of your terms has an appropriated F-ratio, Quasi F-test is not necessary
##
## Response: gain
## Df Sum Sq Mean Sq F value Pr(>F)
## sup 3 91.832 30.611 17.807 2.967e-07 ***
## di 2 287.022 143.511 83.483 3.019e-14 ***
## sup:di 6 3.415 0.569 0.331 0.9161
## Residuals 36 61.886 1.719
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can check that the model structure is correct by getting the table
of expected mean squares using the estimates()
function.
estimates(gain.lm, quasi.f = TRUE)
## $tm
## sup di n
## sup 0 3 4
## di 4 0 4
## sup:di 0 0 4
## Res 1 1 1
##
## $mse
## Mean square estimates
## sup "1*Residuals + 12*sup"
## di "1*Residuals + 16*di"
## sup:di "1*Residuals + 4*sup:di"
## Residuals "1*Residuals"
##
## $f.versus
## Numerator Denominator
## sup sup Residuals
## di di Residuals
## sup:di sup:di Residuals
Plot the residuals:
opar <- par(mfrow = c(2, 2)) # Create a 2x2 plotting grid
plot(gain.lm, which = 5) # Plot residuals for each factor level
plot(gain.lm, which = 1) # Plot residuals vs. fitted values
plot(gain.lm, which = 2) # Plot a Q-Q plot
plot(residuals(gain.lm) ~ I(1:48), main="Residuals vs Exp. Unit", font.main=1, data=d)
par(opar) # Return to original plotting settings (i.e., a 1x1 grid)
It looks pretty good to me.
Now calculate the effect sizes.
omega_squared(gain.lm)
## # Effect Size for ANOVA (Type I)
##
## Parameter | Omega2 (partial) | 95% CI
## -------------------------------------------
## sup | 0.51 | [0.29, 1.00]
## di | 0.77 | [0.66, 1.00]
## sup:di | 0.00 | [0.00, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
There is no significant interaction between diet and supplement. Nutritional supplement has a significant effect on growth (\(F_{3,36}=17.807, p<0.001\)), as does diet (\(F_{2,42}=83.483, p<0.001\)).
If there had been a significant interaction, we could use Tukey’s test on the interaction term. (Results not shown.)
emm <- emmeans(gain.lm, ~di:sup)
pairs(emm, adjust = "tukey")
(tukey_cld <- cld(emm))
But there is no significant interaction, so we can run Tukey’s test on both diet and/or supplement individually. Let’s assume that we are primarily interested in which supplement is most effective. We can conduct Tukey’s test.3
emm <- emmeans(gain.lm, ~sup)
pairs(emm, adjust = "tukey")
## contrast estimate SE df t.ratio p.value
## Control - AgriMore -2.696 0.535 36 -5.036 0.0001
## Control - SuperGain 0.685 0.535 36 1.280 0.5815
## Control - SuperSupp -1.968 0.535 36 -3.677 0.0041
## AgriMore - SuperGain 3.381 0.535 36 6.316 <.0001
## AgriMore - SuperSupp 0.728 0.535 36 1.359 0.5325
## SuperGain - SuperSupp -2.653 0.535 36 -4.957 0.0001
##
## Results are averaged over the levels of: di
## P value adjustment: tukey method for comparing a family of 4 estimates
Now create the “compact letter display” that shows which groups differ significantly.
(tukey_cld <- cld(emm))
## sup emmean SE df lower.CL upper.CL .group
## SuperGain 19.7 0.378 36 18.9 20.5 1
## Control 20.4 0.378 36 19.6 21.2 1
## SuperSupp 22.4 0.378 36 21.6 23.1 2
## AgriMore 23.1 0.378 36 22.3 23.9 2
##
## Results are averaged over the levels of: di
## Confidence level used: 0.95
## P value adjustment: tukey method for comparing a family of 4 estimates
## significance level used: alpha = 0.05
## NOTE: If two or more means share the same grouping symbol,
## then we cannot show them to be different.
## But we also did not show them to be the same.
SuperSupp and AgriMore are both significantly better than the control and SuperGain. SuperGain does not differ significantly from the control.
One way to plot the results is to plot the tukey_cld
object as we did last week. This shows means, 95% confidence intervals
(estimated using the pooled standard deviation) and the raw data.
plot(tukey_cld) +
geom_jitter(aes(x = gain, y = supplement, colour = diet, shape = diet),
height = 0.1, width = 0, alpha = 0.3, data = d) +
labs(x = "Mass gain (kg?)", y = "Supplement", colour = "Diet", shape = "Diet") +
coord_flip() +
scale_colour_colorblind() +
theme_few()
To add “a”, “b”, “ab” (etc.) indicators to the plot (corresponding to “1”, “2”, “12” (etc.) in the “compact letter display”), the easiest way is to edit the figure in Adobe Illustrator, Inkscape, etc.
Otherwise, we may wish to use the emmip() function to
create an interaction plot showing 95% CIs.
emmip(gain.lm, sup ~ di, CI = TRUE) +
labs(x = "Diet", y = "Mass gain (kg?)", colour = "Supplement") +
scale_colour_colorblind() +
theme_few()
For comparison, we are using a modified version of the SSRI dataset we used last week. The example dataset is SSRI2.csv. All of the values are the same, but we have added a new column called “Type” indicating “brand-name” vs. “generic” drugs. The names of the drugs have also been relabelled to match the way they are often named in a nested ANOVA from experimental data. “Type” is the top level of our hierarchical ANOVA; SSRI is nested in Type.
Drug type is clearly a fixed effect. It’s unrealistic, but let’s temporarily assume the actual SSRIs should be treated as a random effect (i.e. perhaps they represent a random sample of all brand-name and generic drugs available).
# Read the data and convert variables as appropriate
d <- read.csv("Data/SSRI2.csv")
d$Type <- factor(d$Type,
levels = c("brandname", "generic"),
labels = c("Brand-name", "Generic"))
d$SSRI <- factor(d$SSRI)
Check for balance:
# Type/SSRI indicates nesting; it expands to Type + SSRI %in% Type
replications(Ki ~ Type/SSRI, data = d)
## Type Type:SSRI
## 30 15
There are 30 replicates for each level of Type (i.e., brand-name vs. generic), including 15 replicates of each SSRI within each Type. (Why is this not a good design for testing this hypothesis?)
Now plot the data. We can use facet_wrap() to split up
the data by the top level in our hierarchy.
ggplot(data = d, aes(x = SSRI, y = Ki)) +
facet_wrap(~ Type) +
geom_boxplot(outliers = FALSE) +
geom_jitter(height = 0, width = 0.1, alpha = 0.3) +
theme_few() +
labs(y = expression(italic(K[i])*" (nM)"))
At first glance, the data looks roughly normal and has roughly equal variance in each group. Note that if a nested factor is random, then we should be considering the variance of the SSRI means within each within “brand-name” and “generic”. We will examine assumptions further with the residual plots.
To use GAD, you first define your variables as random or fixed. You
then fit the model using lm() function and use
gad() to summarize the model.
t <- as.fixed(d$Type)
s <- as.random(d$SSRI)
SSRI.lm <- lm(Ki ~ t + s %in% t, data = d)
(SSRI.tab <- gad(SSRI.lm, quasi.f = TRUE))
## $f.versus
## Numerator Denominator Num.Df Den.Df
## t t t:s 1 2
## t:s t:s Residuals 2 56
##
## $anova
## Analysis of Variance Table: Quasi F-ratios
##
## All of your terms has an appropriated F-ratio, Quasi F-test is not necessary
##
## Response: Ki
## Df Sum Sq Mean Sq F value Pr(>F)
## t 1 8217 8216.7 1.1381 0.39779
## t:s 2 14440 7219.9 4.6583 0.01345 *
## Residuals 56 86794 1549.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To get the table of expected mean squares and the variance
components, we can use GAD’s comp.var() function.4
comp.var(SSRI.lm, anova.tab = SSRI.tab)
## $mse
## Mean square estimates
## t "1*Residuals + 15*t:s + 30*t"
## t:s "1*Residuals + 15*t:s"
## Residuals "1*Residuals"
##
## $comp.var
## Type Estimate Square root Percentage
## t Fixed 33.225 5.764 8.926
## t:s Random 378.002 19.442 30.108
## Residuals Random 1549.899 39.369 60.966
We should plot the residuals to assess whether the data meet the assumptions of the test.
opar <- par(mfrow = c(2, 2)) # Create a 2x2 plotting grid
plot(SSRI.lm, which = 5) # Plot residuals for each factor level
plot(SSRI.lm, which = 1) # Plot residuals vs. fitted values
plot(SSRI.lm, which = 2) # Plot a Q-Q plot
plot(residuals(SSRI.lm) ~ I(1:60), main="Residuals vs Exp. Unit", font.main=1, data=d)
par(opar) # Return to original plotting settings (i.e., a 1x1 grid)
They look good.
The omega_squared() function does not work correctly on
random- or mixed-effects models fitted with aov() or
lm(). If we want an effect size for Type, we need to
calculate it manually or refit the model using lmer().
There is a significant difference among the SSRIs within Type (\(F_{2,56}=4.658, p=0.013\)), but no significant difference between Types (\(F_{1,2}=1.138, p=0.398\)).
To plot the results of an ANOVA with a nested random effect, you may wish to plot a single data point representing the mean of each subgroup, with a 95% CI calculated from the pooled standard deviation at the subgroup level.
ag <- aggregate(Ki ~ SSRI %in% Type, data = d, mean) # Mean for each SSRI
ag2 <- aggregate(Ki ~ Type, data = d, mean) # Mean for each Type
SD.subgrp <- sqrt((7219.9-1549.9)/15) # (subgrp MS - error MS) / num. reps per subgrp
ag2$CI <- qt(1 - 0.05/2, df = 2) * SD.subgrp/sqrt(2)
ggplot(data = ag, aes(x = Type, y = Ki)) +
geom_jitter(height = 0, width = 0.1, alpha = 0.3) +
geom_pointrange(data = ag2, aes(y = Ki, ymin = Ki-CI, ymax = Ki+CI)) +
labs(x = "Drug Type", y = expression(italic(K[i])*" (nM)")) +
theme_few()
Our 95% CIs for the effect of brand-name vs. generic drugs are massive! That is because we did too much sampling at the wrong level of the experiment: replicates of each SSRI… Instead, we should have invested those resources by sampling more than two brand-name and generic SSRIs.
aov()To conduct this ANOVA in aov(), we need to specify an
Error() term. For complex models, these can become quite
complicated. In this case, Type + Error(Type:SSRI)
indicates that SSRI is a random effect nested in the fixed effect
Type.
op <- options(contrasts = c("contr.sum", "contr.poly")) # Set sum-to-zero contrasts
SSRI.aov <- aov(Ki ~ Type + Error(Type:SSRI), data = d) # Fit the model
summary(SSRI.aov) # Create the ANOVA table
pf(7220/1550, df1 = 2, df2 = 56, lower.tail = FALSE) # p-value for random effect
plot(lm(SSRI.aov)) # Plot the residuals
emmeans(SSRI.aov, ~Type) # Show EMMs
emmip(SSRI.aov, ~Type, CIs = TRUE) + # Plot the effect
labs(x = "Drug Type",
y = expression(italic(K[i])*" (nM)")) +
theme_few()
options(op) # Reset contrasts
If you run this analysis, you will find that the results from
aov() are identical to those from GAD.
lmer()Later in the course, we will discuss the use of lmer().
For now, I will just show a simple workflow. lmer() is not
great for this dataset, because our random effect (SSRI nested in Type)
does not have \(\ge5\) levels.
In lmer(), random effects are indicated with
parentheses. For instance, (1|Type:SSRI) indicates that we
want a “random intercept” for each SSRI (nested in Type).
library(car) # Load packages
library(lme4)
library(predictmeans)
op <- options(contrasts = c("contr.sum", "contr.poly")) # Set sum-to-zero contrasts
SSRI.lmer <- lmer(Ki ~ Type + (1|Type:SSRI), data = d) # Fit the model
(SSRI.tab <- Anova(SSRI.lmer, test = "F", type = 3)) # Create the ANOVA table
omega_squared(SSRI.tab) # Show effect sizes
S(SSRI.lmer) # Create the coefficient table
confint(SSRI.lmer, oldNames = FALSE) # Show confidence intervals
residplot(SSRI.lmer, group = "Type") # Check the residuals
emmeans(SSRI.lmer, ~Type) # Show EMMs
emmip(SSRI.lmer, ~Type, CIs = TRUE) + # Plot the effect
labs(x = "Drug Type",
y = expression(italic(K[i])*" (nM)")) +
theme_few()
options(op) # Reset contrasts
If you run this analysis, you will find that:
lmer() function do not calculate p-values for
random effects.5 Instead, they recommend examining the
confidence interval for each random effect to see whether it overlaps
0.lmer() than when using GAD. This is because REML shrinks
predicted values of random effects towards the overall group mean. This
shrinkage produces more accurate predictions on new data, particularly
when small sample sizes were used for fitting the model.The data for this task are stored as toothgrowth.csv. The purpose of this experiment on guinea pigs was to determine the effects of vitamin C supplementation on tooth growth. Researchers were interested in both how the vitamin C was delivered to the guinea pigs, and in the dose of vitamin C. In total, 60 guinea pigs were used:
Perform appropriate analyses to address your research question.
The data for this task are stored as rats.csv. The purpose of this experiment on rats was to test if increasing any of three specific treatments affects glycogen levels. The experiment was conducted as follows. In total, there are 36 glycogen measurements:
Perform appropriate analyses to address your research question.
aov() requires an Error() term
to specify nested structures; this can be difficult to determine for
complex ANOVAs.↩︎
When using REML, at least 10 levels for each random effect is strongly preferred, with a practical minimum of ~5.↩︎
If you are only interested in comparing each supplement
to the control, then you will have greater power to detect differences
if you use Dunnett’s test
(contrast(emm, "dunnett", ref = "Control")) instead of
Tukey’s test (pairs(emm, adjust = "tukey")).↩︎
In GAD 2.0, there is a bug in this function. It
calculates percentages using standard deviations instead of variances
(and it includes the contribution of fixed effects). To obtain more
accurate estimates of variance components (with confidence intervals),
functions in the VCA package can be used.↩︎
If you want to obtain approximate p-values for
random effects, the easiest way is using the ranova()
function from the lmerTest package; however, these
p-values are typically “conservative” (i.e., higher than they
theoretically should be). To calculate more accurate p-values,
see Fox & Weisberg (2019), pg. 354.↩︎