The Analysis of Variance (ANOVA)

Alban Guillaumet, Troy University

Objectives

ANOVA
- Definition
- Derivation
- Example
- Assumptions
- Planned & unplanned comparisons

Analysis of variance

Definition: The analysis of variance (ANOVA) compares the means of multiple groups simultaneously in a single analysis.

ANOVA generalizes two-sample \( t \)-test to more than two groups.

Analysis of variance

Data: Suppose I have one categorical explanatory variable X with \( k > 2 \) levels, and a numerical response variable Y.

Hypothesis test:

\[ \begin{eqnarray*} H_{0} & : & \mu_{1} = \mu_{2} = \cdots = \mu_{n}\\ H_{A} & : & \mathrm{At \ least \ one} \ \mu_{i} \ \mathrm{is \ different \ from \ the \ others} \end{eqnarray*} \]

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

Analysis of variance

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

Definition: The group mean square (\( \mathrm{MS}_{\mathrm{groups}} \)) is proportional to the observed amount of variation among the group sample means [between-group variability].

Definition: The error mean square (\( \mathrm{MS}_{\mathrm{error}} \)) estimates the variability among subjects that belong to the same group [within-group variability].

Analysis of variance

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

If \( H_{0} \) is true, then \( \mathrm{MS}_{\mathrm{groups}} = \mathrm{MS}_{\mathrm{error}} \) and \( F = 1 \).

If \( H_{0} \) is false, then \( \mathrm{MS}_{\mathrm{groups}} > \mathrm{MS}_{\mathrm{error}} \) and \( F > 1 \).

Analysis of variance (example)

The knees who say night:

Traveling to a different time zone can cause jet lag, but people adjust as the schedule of light to their eyes in the new timezone gradually resets their internal, circadian clock. But can it be also reset by exposing the back of the knee to light, as claimed in a controversial paper by by Campbell and Murphy (1998)?

Analysis of variance (example)

The knees who say night:

Wright and Czeisler (2002) reexamined the phenomenon in a follow-up study measuring the circadian rythm, two days after treatment, by the daily cycle of melatonin production in 22 people randomly assigned to one of three light treatments: eyes only, knees only, or neither (control). A negative measurement indicates a delay in melatonin production, which is the predicted effect of light treatment.

Analysis of variance (example)

The knees who say night:

Analysis of variance (derivation)

Separating the sources of variation in the Data:

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), and \( j \) individual \( j \),we have:

\[ \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), and \( j \) individual \( j \),we have:

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), and \( j \) individual \( j \),we have:

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \end{eqnarray*} } \]

From sum-of-squares to mean squares

Definition: The group mean square is given by

\[ \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{df_{\mathrm{groups}}}, \] with \( df_{\mathrm{groups}} = k-1 \).

Definition: The error mean square is given by

\[ \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{df_{\mathrm{error}}}, \] with \( df_{\mathrm{error}} = \sum (n_{i}-1) = N-k \).

Analysis of variance

Test statistic:

\[ F = \frac{\mathrm{group \ mean \ square}}{\mathrm{error \ mean \ square}} = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \]

If \( H_{0} \) is true, then \( \mathrm{MS}_{\mathrm{groups}} = \mathrm{MS}_{\mathrm{error}} \) and \( F = 1 \).

If \( H_{0} \) is false, then \( \mathrm{MS}_{\mathrm{groups}} > \mathrm{MS}_{\mathrm{error}} \) and \( F > 1 \).

Analysis of variance (example)

Practice Problem #1

Many humans like the effect of caffeine, but it occurs in plants as a deterrent to herbivory by animals. Caffeine is also found in flower nectar, and nectar is meant as a reward for pollinators, not a deterrent. How does caffeine in nectar affect visitation by pollinators?

Analysis of variance (example)

Practice Problem #1

Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).

Analysis of variance (example)

str(strungOutBees)

'data.frame':   20 obs. of  2 variables:
 $ ppmCaffeine                     : Factor w/ 4 levels "ppm50","ppm100",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ consumptionDifferenceFromControl: num  -0.4 0.01 0.65 0.24 0.34 -0.39 0.53 0.44 0.19 -0.08 ...

Analysis of variance (example)

Discuss: State the null and alternative hypotheses appropriate for this question.

\[ \begin{eqnarray*} H_{0} & : & \mu_{50} = \mu_{100} = \mu_{150} = \mu_{200} \\ H_{A} & : & \mathrm{At \ least \ one \ of \ the \ means \ is \ different} \end{eqnarray*} \]

Analysis of variance (example)

Short way using R

caffResults <- lm(consumptionDifferenceFromControl ~ ppmCaffeine, data=strungOutBees)
anova(caffResults)

Analysis of Variance Table

Response: consumptionDifferenceFromControl
            Df Sum Sq Mean Sq F value  Pr(>F)  
ppmCaffeine  3 1.1344 0.37814  4.1779 0.02308 *
Residuals   16 1.4482 0.09051                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis of variance (example)

Definition: The \( R^{2} \) value is the “fraction of the variation explained by groups” and is given by

\[ R^{2} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{SS}_{\mathrm{total}}}. \] Note: \( 0 \leq R^2 \leq 1 \).

beeAnovaSummary <- summary(caffResults)
beeAnovaSummary$r.squared

[1] 0.4392573

ANOVA assumptions and robustness

Assumptions (same as 2-sample \( t \)-test)

Measurements in each group represent a random sample from corresponding population.
Variable is normally distributed in each of the \( k \) populations.
Variance is the same in all \( k \) populations.

ANOVA assumptions and robustness

Robustness (same as 2-sample \( t \)-test)

Fairly robust to deviations in normality.
Somewhat robust to deviations in equal variances when:
- Sample sizes are “large” in each group
- Sample sizes are about the same in each group (balanced)
- Standard deviations are within about a 3-fold difference

Nonparametric alternative to ANOVA

Definition: The Kruskal-Wallis test is a nonparametric method for multiple groups based on ranks.

The Kruskal-Wallis test is similar to the Mann-Whitney \( U \)-test and has the same assumptions:

Random samples from each population.
To use as a test of difference among populations in means or medians, the distribution of the variable must have the same shape in every population.

Planned and unplanned comparisons

So there is a difference between means amongst all groups. But which means are different from one another?

Planned comparisons

Definition: A planned comparison is a comparison between means planned during the design of the study, identified before the data are examined.

A planned comparison must have a strong a priori justification, such as an expectation from theory or a prior study.

Only one or a small number of planned comparisons is allowed, to minimize inflating the Type I error rate.

Unplanned comparisons

Definition: An unplanned comparison is one of multiple comparisons, such as between all pairs of means, carried out to help determine where differences between means lie.

Unplanned comparisons are a form of data dredging, so we need to minimize the rising Type I errors that we get from performing many tests.

Planned comparison (details)

A planned comparison is very similar to a 2-sample \( t \)-test, except that when computing the standard error we use the pooled sample variance (i.e., the error mean square \( \mathrm{MS}_{\mathrm{error}} \)) based on all \( k \) groups, and the corresponding error \( df=N-k \), rather than the pooled sample variance based only on the two groups being compared, i.e.

\[ \mathrm{SE} = \sqrt{\mathrm{MS}_{\mathrm{error}}\left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right)} \]

Planned comparison (details)

This step increases precision and power.

For instance, the planned 95% confidence interval for the difference between the mean of the “knee” and “control” groups is estimated as:

\[ -0.788< \mu_{2} - \mu_{1} < 0.734 \]

versus

\[ -0.813< \mu_{2} - \mu_{1} < 0.759 \]

for the two-sample confidence interval.

Planned comparison (in R)

In R, we can use the multcomp package to do this.

Caffeine example:

caffPlanned <- glht(caffResults, linfct = mcp(ppmCaffeine = c("ppm100 - ppm50 = 0")))
confint(caffPlanned)

ppm100 - ppm50 = 0 is called a contrast. In this case, it simply means “Test if the means between the two groups ppm50 and ppm100 are the same.”

Planned comparison (example)


     Simultaneous Confidence Intervals

Multiple Comparisons of Means: User-defined Contrasts


Fit: lm(formula = consumptionDifferenceFromControl ~ ppmCaffeine, 
    data = strungOutBees)

Quantile = 2.1199
95% family-wise confidence level


Linear Hypotheses:
                    Estimate lwr     upr    
ppm100 - ppm50 == 0 -0.1800  -0.5834  0.2234

Unplanned comparisons

Definition: An unplanned comparison is one of multiple comparisons, such as between all pairs of means, carried out to help determine where differences between means lie.

Unplanned comparisons are a form of data dredging, so we need to minimize the rising Type I errors that we get from performing many tests.

Unplanned comparison (details)

Definition: With the Tukey-Kramer method, the probability of making at least one Type I error throughout the course of testing all pairs of means is no greater than the significance level \( \alpha \).

The Tukey-Kramer method works like a series of two-sample \( t \)-tests, but it uses a larger critical value to limit the Type I error rate.

Unplanned comparison (in R)

library(multcomp)
tukeyResults <- glht(caffResults, 
                     linfct = mcp(ppmCaffeine = "Tukey"))
# summary(tukeyResults)-> next slide
# same thing as 
# x<-aov(consumptionDifferenceFromControl ~ ppmCaffeine, data=strungOutBees)
#TukeyHSD(x)

Unplanned comparison (example)

Unplanned comparison (visualization)

Groups in the figure are assigned the same symbol if their means are not significantly different.

Unplanned comparison (visualization)

Groups in the figure are assigned the same symbol if their means are not significantly different.

Tukey-Kramer Assumptions

Same assumptions as ANOVA.
\( P \)-value for T-K is exact for balanced designs.
\( P \)-value for T-K is conservative for unbalanced designs.
Conservative means real probability of making at least one Type I error is smaller than \( \alpha \), which makes it harder to reject \( H_{0} \).