Today’s tutorial

Today we will be doing one-way ANOVAs in R. There are many ways to do an ANOVA in R. Today, we will keep it simple by assuming that you have balanced data. In this case, the basic procedure for conducting an ANOVA is:

  1. Read, check, and plot the data
  2. Use aov() to conduct the ANOVA
  3. Use plot() to produce residual plots (for assessing the assumptions of the ANOVA)
  4. Use summary() to produce the ANOVA table
  5. Perform post-hoc testing (e.g., Tukey’s test) as necessary

In future tutorials, we will learn other methods, such as lm() + gad() for balanced data in more complicated experimental designs, and lmer() + Anova() to implement linear mixed-effect models that can be used even for unbalanced data.

Starting off

Remember to set your working directory before you start. (See Week 1’s tutorial if you forget how.) You also need to install today’s packages.

library(ggplot2)
library(ggthemes)
library(emmeans)
library(effectsize)
library(multcomp)
library(multcompView)
library(ICC)

Read and check the data

The example data set is SSRI.csv. This is simulated data representing the effect of 4 different selective serotonin reuptake inhibitors (SSRIs) on receptor binding. (SSRIs are drugs that are commonly used as antidepressants.) For each SSRI, you have obtained 15 estimates of \(K_i\). \(K_i\) is a measure of neuroreceptor binding (in units of nanomolar concentration, nM). Lower values of \(K_i\) indicate a stronger affinity of the drug for the receptor, suggesting the drug may be more effective. You wish to compare the four SSRIs.

# Read the data and convert variables as appropriate
d <- read.csv("Data/SSRI.csv")
d$SSRI <- factor(d$SSRI)

Check for balance. If the data are unbalanced, it is not recommended to use the aov() + summary() method because it implements Type I sums of squares.1 This command returns the sample size of each group:

# The replications() function returns a frequency table with a count for each group
replications(Ki ~ SSRI, data = d)
## SSRI 
##   15

Only one value was printed, meaning that all groups have an equal sample size (i.e., the experiment is balanced).

Now plot the data. A boxplot seems appropriate for exploring the data. We will also show the raw data using geom_jitter().

ggplot(data = d, aes(x = SSRI, y = Ki)) +
  geom_boxplot(outlier.shape = NA) + 
  geom_jitter(height=0, width = 0.1, alpha = 0.3) +
  labs(y = expression(italic(K[i])*" (nM)")) +
  theme_few()

The variances do not appear to be perfectly equal, but they never will be. There are no glaring outliers. SSRI1 looks a little curious (there may be multiple groups within SSRI1?). But overall, it seems like we should be okay to continue with an ANOVA using aov(). We will check other assumptions later.

Let’s continue exploring the data by getting the mean and standard deviation for each SSRI:

aggregate(Ki ~ SSRI, data = d, mean) # Calculate the mean for each group
##    SSRI       Ki
## 1 SSRI1 279.7673
## 2 SSRI2 241.7253
## 3 SSRI3 273.2180
## 4 SSRI4 295.0840
aggregate(Ki ~ SSRI, data = d, sd)   # Calculate the s.d. for each group
##    SSRI       Ki
## 1 SSRI1 45.49070
## 2 SSRI2 32.59665
## 3 SSRI3 37.68350
## 4 SSRI4 40.59071

Contrasts

“Contrasts” describe how the different levels of a treatment factor are compared. By default, R uses “treatment contrasts”. In treatment contrasts, the first level of each factor is used as the reference level, and all other levels are compared to the reference level. (For example, SSRI1 would be the reference level, and SSRI2, SSRI3, and SSRI4 would be compared to SSRI1.) This is okay for a linear regression, but it is inappropriate for a classical ANOVA. To check which contrasts R is currently using by default, you can type:

options("contrasts")
## $contrasts
##         unordered           ordered 
## "contr.treatment"      "contr.poly"

ANOVAs using aov() in R are best conducted with orthogonal contrasts, which we set below. These contrasts will remain set in R until you override them or revert them (or until you close R).

op <- options(contrasts = c("contr.sum", "contr.poly")) # Set orthogonal "sum-to-zero" contrasts

Contrasts for planned comparisons (optional)

Suppose that SSRI1 and SSRI2 are both brand-name drugs, and SSRI3 and SSRI4 are both generic drugs. I may be interested in testing whether the brand-name drugs differ from the generic drugs. This would be an example of a “planned comparison”. Planned comparisons are handled by defining a contrast matrix.

To handle our planned brand-name vs. generic drug comparison, we need to define different contrasts, including a contrast that compares the average of SSRI1 and SSRI2 with the average of SSRI3 and SSRI4.

levels(d$SSRI)         # Show me the order in which the drugs are stored by R
## [1] "SSRI1" "SSRI2" "SSRI3" "SSRI4"
c1 <- cbind(
  c(-1, -1,  1,  1),   # Brand-name (SSRI1, SSRI2) vs. generic (SSRI3, SSRI4) contrast
  c(-1,  1,  0,  0),   # SSRI1 vs. SSRI2 contrast (i.e., within brand-name drugs)
  c( 0,  0, -1,  1)    # SSRI3 vs. SSRI4 contrast (i.e., within generic drugs)
)
(contrasts(d$SSRI) <- c1)
##      [,1] [,2] [,3]
## [1,]   -1   -1    0
## [2,]   -1    1    0
## [3,]    1    0   -1
## [4,]    1    0    1

Here is how you interpret a contrast matrix:2

  • Each column represents one contrast. (The factor has four levels, so there are three orthogonal contrasts.)
  • Levels that are grouped together get the same sign (plus or minus).
  • Levels being compared get the opposite sign.
  • Levels that are not being compared in a given contrast get a 0.
  • Each column should sum to zero.

We can quickly check that these contrasts are orthogonal. (I have hidden the output to save space):

sum(contrasts(d$SSRI)[,1] * contrasts(d$SSRI)[,2]) # = 0
sum(contrasts(d$SSRI)[,1] * contrasts(d$SSRI)[,3]) # = 0
sum(contrasts(d$SSRI)[,2] * contrasts(d$SSRI)[,3]) # = 0

The null hypothesis of a one-way ANOVA is that the mean response is identical for all levels of the treatment factor. (E.g., \(\bar{K_i}\) is the same for all SSRIs.) If the null hypothesis is true, this implies that:

  • There is no difference in \(\bar{K_i}\) between brand-name and generic drugs on average (Contrast 1),
  • There is no difference in \(\bar{K_i}\) between brand-name drugs (Contrast 2), and
  • There is no difference in \(\bar{K_i}\) between generic drugs (Contrast 3).

A significant result in an ANOVA indicates that the mean response is not identical across all levels of the treatment factor, so at least one of these contrasts is likely non-zero.

Fixed-effects ANOVA

Conduct the ANOVA

Now we will conduct a fixed-effects ANOVA using the aov() function; we will produce the ANOVA table using the summary() command. Note that summary() implements Type I sums of squares, which are often inappropriate. But in this case, it is unimportant because we have only one factor and our data are balanced.

SSRI.aov <- aov(Ki ~ SSRI, data = d) # Do the ANOVA
summary(SSRI.aov)                    # Print the ANOVA table
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## SSRI         3  22657    7552   4.873 0.00442 **
## Residuals   56  86794    1550                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The result looks significant (\(p = 0.004\)). But before we get too excited about the results, we should test whether the data meet the assumptions of an ANOVA.

Plot the residuals

Look for outliers or unequal spread of the residuals, and check for normality of the residuals. An ANOVA assumes that the data are normally distributed within each group, which can be determined by examining the residuals.

opar <- par(mfrow = c(2, 2)) # Create a 2x2 plotting grid
plot(SSRI.aov, which = 5)    # Plot residuals for each factor level
plot(SSRI.aov, which = 1)    # Plot residuals vs. fitted values
plot(SSRI.aov, which = 2)    # Plot a Q-Q plot
plot(residuals(SSRI.aov) ~ Subject, main="Residuals vs Exp. Unit", font.main=1, data=d)
par(opar)                    # Return to original plotting settings (i.e., a 1x1 grid)

The residuals do not look terrible to me. The first two plots show roughly equal variance of the residuals among treatments and fitted values. The third plot shows that the residuals are roughly normally distributed. The fourth plot shows the residual for each unit, which could be useful for identifying outliers or patterns in the residuals.

Estimate group means, SEs, CIs, and effect size

Now get the estimated marginal mean for each SSRI, along with its standard error and 95% CI. Standard errors are the same for each group, because they are calculated using the pooled standard deviation.

(emm <- emmeans(SSRI.aov, ~SSRI))
##  SSRI  emmean   SE df lower.CL upper.CL
##  SSRI1    280 10.2 56      259      300
##  SSRI2    242 10.2 56      221      262
##  SSRI3    273 10.2 56      253      294
##  SSRI4    295 10.2 56      275      315
## 
## Confidence level used: 0.95

Now calculate the overall effect size. Note that this effect size is not reported as commonly in Biology as in some other fields (e.g., Psychology). \(\omega^2\) indicates the proportion of variance in \(K_i\) that is explained by SSRI.

omega_squared(SSRI.aov)
## For one-way between subjects designs, partial omega squared is
##   equivalent to omega squared. Returning omega squared.
## # Effect Size for ANOVA
## 
## Parameter | Omega2 |       95% CI
## ---------------------------------
## SSRI      |   0.16 | [0.02, 1.00]
## 
## - One-sided CIs: upper bound fixed at [1.00].

You could report this ANOVA as: “Receptor binding differed significantly among the SSRIs tested (\(F_{3, 56} = 4.873, p = 0.004; \omega^2 = 0.16\ [95\%\ CI = 0.02-1.00]\)).”

Planned comparisons

To view the results of any planned comparisons, we can use the “split” option of summary():

summary(SSRI.aov, split = list(SSRI = list("brand-name vs. generic" = 1,
                                           "brand-name 1 vs. brand-name 2" = 2,
                                           "generic 1 vs. generic 2" = 3))) 
##                                       Df Sum Sq Mean Sq F value  Pr(>F)   
## SSRI                                   3  22657    7552   4.873 0.00442 **
##   SSRI: brand-name vs. generic         1   8217    8217   5.301 0.02505 * 
##   SSRI: brand-name 1 vs. brand-name 2  1  10854   10854   7.003 0.01054 * 
##   SSRI: generic 1 vs. generic 2        1   3586    3586   2.314 0.13387   
## Residuals                             56  86794    1550                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Our brand-name and generic drugs differed significantly (\(F_{1,56}=5.301, p=0.025\)). Here is how you might calculate the averages for each of these two groups:

mean(d[d$SSRI == "SSRI1" | d$SSRI == "SSRI2", "Ki"]) # Mean of brand-name drugs
mean(d[d$SSRI == "SSRI3" | d$SSRI == "SSRI4", "Ki"]) # Mean of generic drugs
## [1] 260.7463
## [1] 284.151

Unplanned comparisons

If we wanted to test every possible pair of SSRI drugs to see which differ, we could use Tukey’s method.3

pairs(emm, adjust = "tukey")
##  contrast      estimate   SE df t.ratio p.value
##  SSRI1 - SSRI2    38.04 14.4 56   2.646  0.0502
##  SSRI1 - SSRI3     6.55 14.4 56   0.456  0.9683
##  SSRI1 - SSRI4   -15.32 14.4 56  -1.065  0.7117
##  SSRI2 - SSRI3   -31.49 14.4 56  -2.191  0.1384
##  SSRI2 - SSRI4   -53.36 14.4 56  -3.712  0.0026
##  SSRI3 - SSRI4   -21.87 14.4 56  -1.521  0.4320
## 
## P value adjustment: tukey method for comparing a family of 4 estimates

Now let’s look at the “compact letter display” that more clearly indicates which groups differ significantly.

(tukey_cld <- cld(emm))
##  SSRI  emmean   SE df lower.CL upper.CL .group
##  SSRI2    242 10.2 56      221      262  1    
##  SSRI3    273 10.2 56      253      294  12   
##  SSRI1    280 10.2 56      259      300  12   
##  SSRI4    295 10.2 56      275      315   2   
## 
## Confidence level used: 0.95 
## P value adjustment: tukey method for comparing a family of 4 estimates 
## significance level used: alpha = 0.05 
## NOTE: If two or more means share the same grouping symbol,
##       then we cannot show them to be different.
##       But we also did not show them to be the same.

The output suggests that the only significant difference between means is SSRI2 vs. SSRI4. Now we may wish to re-plot our data to show means, 95% confidence intervals (estimated using the pooled standard deviation) and the raw data.

plot(tukey_cld) + 
  geom_jitter(aes(y = SSRI, x = Ki), height=0.1, width = 0, alpha = 0.3, data = d) +
  labs(x = expression(italic(K[i])*" (nM)")) +
  coord_flip() +
  theme_few()

To add “a”, “b”, “ab” (etc.) indicators to the plot (corresponding to “1”, “2”, “12” (etc.) in the “compact letter display”), the easiest way is to edit the figure in Adobe Illustrator, Inkscape, etc.

If your data do not meet the assumptions

If your data do not meet the assumptions of an ANOVA, you have a few choices:

  1. Ignore the violated assumptions and keep going (but if the violations are extreme, your p-values will not be trustworthy, esp. if you have unequal sample sizes)
  2. Transform the data so they meet the assumptions (e.g., using a Box-Cox transformation)
  3. Change your method of analysis (e.g. use weighted least-squares, a generalized linear model, a non-parametric method such as the Kruskal-Wallis test4 or a randomization test)

Random-effects ANOVA

In the unlikely case that our SSRIs do not represent specific treatments of interest, but rather they are a random sample of SSRIs (from a population with effects that are assumed to be normally distributed) and our goal was to estimate the variance in \(K_i\) in people taking SSRIs that is attributable to the SSRI they take, then we would use a random-effects ANOVA instead of a fixed-effects ANOVA.

Conduct the ANOVA

In aov(), random effects are specified using the Error() term.

SSRI.raov <- aov(Ki ~ Error(SSRI), data = d) # Do the ANOVA
summary(SSRI.raov)                           # Summarize the ANOVA model
## 
## Error: SSRI
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  3  22657    7552               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 56  86794    1550

R assumes that you are not interested in testing the significance of random effects, so it does not display p-values for them. We can test the significance of the random effect manually as \(F_{3,56} = 7552/1550 = 4.872\):

pf(7552/1550, df1 = 3, df2 = 56, lower.tail = FALSE)
## [1] 0.004424933

Note that these are the same F- and p-values as in the fixed-effects ANOVA above.

Plot the residuals

Look for outliers or unequal spread of the residuals, and check for normality of the residuals.

opar <- par(mfrow = c(2, 2)) # Create a 2x2 plotting grid
plot(SSRI.aov, which = 5)    # Plot residuals for each factor level
plot(SSRI.aov, which = 1)    # Plot residuals vs. fitted values
plot(SSRI.aov, which = 2)    # Plot a Q-Q plot
plot(residuals(SSRI.aov) ~ Subject, main="Residuals vs Exp. Unit", font.main=1, data=d)
par(opar)                    # Return to original plotting settings (i.e., a 1x1 grid)

These residuals should look familiar. A one-way random-effects ANOVA actually produces the same results as a fixed-effects ANOVA. The difference is in the intent and the interpretation.

Estimate the variance components

In a random-effects ANOVA, it is not usually of interest to perform contrasts. Instead, we estimate the variance components. The ICCest() function will estimate variance components, plus the intraclass correlation (\(\rho_1\)) and its confidence intervals. The intraclass correlation is an estimate of the proportion of variance in the response variable that is explained by the random effect.

icc.out <- ICCest(x = SSRI, y = Ki, data = d, alpha = 0.05)
icc.out$vara # The among-group variance
## [1] 400.1521
icc.out$varw # The within-group (i.e., residual) variance
## [1] 1549.899
icc.out$ICC  # The intraclass coefficient
## [1] 0.2052008
c(icc.out$LowerCI, icc.out$UpperCI)  # The 95% confidence intervals for ICC
## [1] 0.02915407 0.81754091

You could report this ANOVA as: “SSRI explains 20.5% (95% CI: 2.9-81.8%) of the variance in receptor binding (\(F_{3, 56} = 4.873, p = 0.004\)).” Note that this result is identical to our fixed-effects ANOVA except for the interpretation and the effect size.

Finishing up

If you are finished using aov() but you want to continue using R, then you can return R to using its default contrasts, which we stored above as op.

options(op)

Today’s tasks

ANOVA 1

You are trying to determine how much variability in calcium concentration exists among turnip greens. You randomly selected four turnip plants. From each plant, you randomly selected four different leaves, and you estimated the calcium concentration once for each leaf. This (simulated) data is stored as turnip.csv. Do the appropriate statistical analysis to meet your research goals. Be sure to test assumptions and make publication-ready plots.

ANOVA 2

Some flower nectars contain caffeine, which may act as a stimulant and a reward for bees. You are interested in knowing how the caffeine content of nectar affects visitation by pollinators. You have set up feeding stations, at which bees were offered a choice between: (1) a control solution of 20%-sucrose syrup (containing no caffeine), and (2) a 20%-sucrose syrup containing one of four levels of caffeine: 50 ppm, 100 ppm, 150 ppm, or 200 ppm.

The response variable is the difference in syrup consumption between the caffeine-supplemented and control feeders at each station (i.e., a positive value means that more caffeine-supplemented syrup was consumed, and a negative value means that more control syrup was consumed). This data (from Singaravelan et al., 2005 and described in Whitlock & Schluter 2015) is stored as caffeine.csv. Do the appropriate statistical analysis to meet your research goals. Be sure to test assumptions and make publication-ready plots.


  1. Instead, we can implement Type II/III sums of squares using lm() + Anova() for a fixed-effects ANOVA, or lmer() + Anova(test="F") for the equivalent of a random- or mixed-effects ANOVA.↩︎

  2. For more information, see Venables & Ripley 1999.↩︎

  3. Scheffé’s method is a flexible method of multiple comparisons better suited for unequal sample sizes and testing non-pairwise hypotheses (e.g., brand-name vs. generic drugs). If you only want to compare treatments to a control group, use Dunnett’s test.↩︎

  4. The Kruskal-Wallis test is simply kruskal.test() in R. Rather than Tukey’s method, Dunn’s test can be used for unplanned comparisons. This is implemented as the dunnTest() function in the FSA package.↩︎