The Analysis of Variance (ANOVA)

M. Drew LaMar
October 28, 2020

Randomization

Definition: Proper randomization means that any individual experimental subject has the same chance as any other individual of finding itself in each experimental group, as well as prepared, setup, or measured in the same way.

  • Improper randomization can lead to the introduction of confounding variables in the experimental protocol itself (experimental artifacts).
  • Randomization breaks the association between possible confounding variables and the explanatory variable.
  • Randomization allows the causal relationship between the explanatory and response variables to be assessed.

Randomization

Definition: Proper randomization means that any individual experimental subject has the same chance as any other individual of finding itself in each experimental group, as well as prepared, setup, or measured in the same way.

  • Randomization does not eliminate variation by confounding variables, only their correlation with treatment.
  • Randomization ensures that variation by confounding variables is similar between treatment groups and occurs by chance alone.

Randomization Example

Question: Does a specific genetic modification to a tomato plant affect its growth rate?

Experimental Design: Place 50 genetically modified plants, and 50 unmodified plants, into individual pots with compost, and then put them all into a growth chamber.

Discuss: Where can improper randomization appear in this example?

Answer: For example:
-      Difference in compost quality.
-      Difference in temperature across chamber.

Randomization Example

Let's look at temperature as a possible confounding variable:

The above randomization would not remove temperature difference across chamber, but simply remove correlation with treatment.

What if we would like to reduce the variation from temperature? We can try blocking.

Blocking Example

Our attempt to control for temperature:

Discuss: What’s right and wrong with this particular design?

Blocking Example

This particular blocking design is properly replicated and randomized.

The variation due to temperature in each chamber has been reduced, so that the difference between treatments becomes more apparent.

There was a systematic difference of temperature across the original chamber. We have now adjusted the design to systematically account for this difference.

Match and adjust

What if you can't do experiments? Randomization does not apply here.

Two strategies are used to limit effects of confounding variables on a difference between treatments in a controlled observational study.

Definition: With matching, every individual in the treatment group is paired with a control individual having the same of closely similar values for the suspected confounding variable.

Definition: With adjustment, use a statistical method, such as analysis of covariance, to correct for differences between treatment and control groups in suspected confounding variables.

Proper randomization

Assigning treatments to subjects (one possibility):

  1. List all \( n \) subjects, one per row, in a spreadsheet.
  2. Use the computer to give each subject a random number.
  3. Assign treatment A or B to those subjects receiving the lowest or highest numbers, respectively.

Randomization in time

Remember, randomization is important in all processes of the experiment, including preparation, setup, and measurement.

Randomize measurement of replicates in time:

  • Watching 50 hours of great tit courtship behaviour on video increases your ability to observe
  • After 10 hours of counting through a microscope, tiredeness kicks in
  • Aging equipment

This shows time of measurement could be a confounding factor.

Additional Reading

  • Whitlock & Schluter, Interleaf 2: Pseudoreplication (pp. 115-116)
  • Whitlock & Schluter, Chapter 14: Designing experiments

Analysis of variance (intro)

Definition: The analysis of variance (ANOVA) compares the means of multiple groups simultaneously in a single analysis.

ANOVA generalizes two-sample \( t \)-test to more than two groups.

In two-sample \( t \)-test, the test statistic is a ratio of the difference between means and the standard error of the mean:

\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]

Analysis of variance (intro)

\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]

Analysis of variance (intro)

By squaring the numerator and denominator of the \( t \)-statistic, we get a ratio of variance components.

\[ \frac{\left(\bar{Y}_{1}-\bar{Y}_{2}\right)^2}{\mathrm{SE}^{2}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]

\[ \frac{"\mathrm{Variance \ between \ groups}"}{"\mathrm{Variance \ within \ groups}"} \]

Analysis of variance (for real)

Data: Suppose I have one categorical explanatory variable X with \( k > 2 \) levels, and a response variable Y.

Hypothesis test:

\[ \begin{eqnarray*} H_{0} & : & \mu_{1} = \mu_{2} = \cdots = \mu_{n}\\ H_{A} & : & \mathrm{At \ least \ one} \ \mu_{i} \ \mathrm{is \ different \ from \ the \ others} \end{eqnarray*} \]

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

Analysis of variance (for real)

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

Definition: The group mean square (\( \mathrm{MS}_{\mathrm{groups}} \)) is proportional to the observed amount of variation among the group sample means [between-group variability].

Definition: The error mean square (\( \mathrm{MS}_{\mathrm{error}} \)) estimates the variance among subjects that belong to the same group [within-group variability].

Analysis of variance (for real)

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

If \( H_{0} \) is true, then \( \mathrm{MS}_{\mathrm{groups}} = \mathrm{MS}_{\mathrm{error}} \) and \( F = 1 \).

If \( H_{0} \) is false, then \( \mathrm{MS}_{\mathrm{groups}} > \mathrm{MS}_{\mathrm{error}} \) and \( F > 1 \).

Analysis of variance (example)

Analysis of variance (example)

Analysis of variance (example)

Practice Problem #1

Many humans like the effect of caffeine, but it occurs in plants as a deterrent to herbivory by animals. Caffeine is also found in flower nectar, and nectar is meant as a reward for pollinators, not a deterrent. How does caffeine in nectar affect visitation by pollinators?

Analysis of variance (example)

Practice Problem #1

Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).

Analysis of variance (example)

Analysis of variance (example)

Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).

Discuss: Describe the experimental design.

Analysis of variance (reminder)

\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \end{eqnarray*} } \]

Analysis of variance (derivation)

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i})\right]^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y})^2 + (Y_{ij} - \bar{Y}_{i})^2 + 2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i})\right] \\ & = & \sum_{i}\sum_{j}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \sum_{i}n_{i}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \end{eqnarray*} } \]

Analysis of variance (derivation)

Can show:

\[ \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) = 0, \]

and thus

\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}}. \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

From sum-of-squares to mean squares

Definition: The group mean square is given by

\[ \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{df_{\mathrm{groups}}}, \] with \( df_{\mathrm{groups}} = k-1 \).

Definition: The error mean square is given by

\[ \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{df_{\mathrm{error}}}, \] with \( df_{\mathrm{error}} = \sum (n_{i}-1) = N-k \).

ANOVA Table