The Analysis of Variance (ANOVA) - Part 2

M. Drew LaMar
October 30, 2020

Course Announcements

  • Exam #3 opens today at 3:00 pm
    • Due date: November 2, 2:00 pm
    • You have 3 hours to complete the exam once opened
    • Chapters 9-12

Analysis of variance (example)

Analysis of variance (example)

Practice Problem #1

Many humans like the effect of caffeine, but it occurs in plants as a deterrent to herbivory by animals. Caffeine is also found in flower nectar, and nectar is meant as a reward for pollinators, not a deterrent. How does caffeine in nectar affect visitation by pollinators?

Analysis of variance (example)

Practice Problem #1

Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).

Analysis of variance (example)

Analysis of variance (example)

Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).

Discuss: Describe the experimental design.

Analysis of variance (intro)

\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \end{eqnarray*} } \]

Analysis of variance (derivation)

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i})\right]^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y})^2 + (Y_{ij} - \bar{Y}_{i})^2 + 2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i})\right] \\ & = & \sum_{i}\sum_{j}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \sum_{i}n_{i}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \end{eqnarray*} } \]

Analysis of variance (derivation)

Can show:

\[ \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) = 0, \]

and thus

\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}}. \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

From sum-of-squares to mean squares

Definition: The group mean square is given by

\[ \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{df_{\mathrm{groups}}}, \] with \( df_{\mathrm{groups}} = k-1 \).

Definition: The error mean square is given by

\[ \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{df_{\mathrm{error}}}, \] with \( df_{\mathrm{error}} = \sum (n_{i}-1) = N-k \).

ANOVA Table

Analysis of variance (example)

str(strungOutBees)
'data.frame':   20 obs. of  2 variables:
 $ ppmCaffeine                     : Factor w/ 4 levels "ppm50","ppm100",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ consumptionDifferenceFromControl: num  -0.4 0.01 0.65 0.24 0.34 -0.39 0.53 0.44 0.19 -0.08 ...

Discuss: Is this data tidy or messy?

Definition: Tidy!

Analysis of variance (example)

stripchart(consumptionDifferenceFromControl ~ ppmCaffeine, 
           data = strungOutBees, 
           vertical = TRUE, 
           method = "jitter", 
           xlab="Caffeine (ppm)",
           col="red")

Analysis of variance (example)

plot of chunk unnamed-chunk-3

Analysis of variance (example)

Discuss: State the null and alternative hypotheses appropriate for this question.

\[ \begin{eqnarray*} H_{0} & : & \mu_{50} = \mu_{100} = \mu_{150} = \mu_{200} \\ H_{A} & : & \mathrm{At \ least \ one \ of \ the \ means \ is \ different} \end{eqnarray*} \]

Analysis of variance (example)

Short cut using R

caffResults <- lm(consumptionDifferenceFromControl ~ ppmCaffeine, data=strungOutBees)
anova(caffResults)
Analysis of Variance Table

Response: consumptionDifferenceFromControl
            Df Sum Sq Mean Sq F value  Pr(>F)  
ppmCaffeine  3 1.1344 0.37814  4.1779 0.02308 *
Residuals   16 1.4482 0.09051                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis of variance (example)

Definition: The \( R^{2} \) value in ANOVA is the “fraction of the variation explained by groups” and is given by

\[ R^{2} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{SS}_{\mathrm{total}}}. \] Note: \( 0 \leq R^2 \leq 1 \).

beeAnovaSummary <- summary(caffResults)
beeAnovaSummary$r.squared
[1] 0.4392573

Analysis of variance (example)

Long way

Question: Calculate the following summary statistics for each group: \( n_{i} \), \( \bar{Y}_{i} \), and \( s_{i} \).

Analysis of variance (example)

library(dplyr)
beeStats <- strungOutBees %>% 
  group_by(ppmCaffeine) %>% 
  summarise(n = n(), 
            mean = mean(consumptionDifferenceFromControl),
            sd = sd(consumptionDifferenceFromControl))
knitr::kable(beeStats)
ppmCaffeine n mean sd
ppm50 5 0.008 0.2887386
ppm100 5 -0.172 0.1694698
ppm150 5 0.376 0.3093218
ppm200 5 0.378 0.3927722

Analysis of variance (example)

Compute sum-of-squares

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{groups}} & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 \end{eqnarray*} } \]

grandMean <- mean(strungOutBees$consumptionDifferenceFromControl)
(SS_groups <- sum(beeStats$n*(beeStats$mean - grandMean)^2))
[1] 1.134415

Analysis of variance (example)

Compute sum-of-squares

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{error}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 = \sum_{i}(n_{i}-1)s_{i}^2 \end{eqnarray*} } \]

(SS_error <- sum((beeStats$n-1)*beeStats$sd^2))
[1] 1.44816

Analysis of variance (example)

Compute degree of freedom

\[ \begin{equation} \mathrm{df}_{\mathrm{groups}} = k-1 \end{equation} \]

where \( k \) is number of groups.

(df_groups <- 4-1)
[1] 3
(df_error <- nrow(strungOutBees)-4)
[1] 16

Analysis of variance (example)

Compute degree of freedom

\[ \begin{equation} \mathrm{df}_{\mathrm{error}} = N-k \end{equation} \]

where \( N \) is number of total observations in data.

(df_error <- nrow(strungOutBees)-4)
[1] 16

Analysis of variance (example)

Compute mean squares

\[ \begin{equation} \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{df}_{\mathrm{groups}}} \end{equation} \]

(MS_groups <- SS_groups/df_groups)
[1] 0.3781383

Analysis of variance (example)

Compute mean squares

\[ \begin{equation} \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{\mathrm{df}_{\mathrm{error}}} \end{equation} \]

(MS_error <- SS_error/df_error)
[1] 0.09051

Analysis of variance (example)

Compute \( F \)-statistic and \( P \)-value

\[ \begin{equation} F = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \end{equation} \]

(F_ratio <- MS_groups/MS_error)
[1] 4.177862
(pval <- pf(F_ratio, df_groups, df_error, lower.tail=FALSE))
[1] 0.02307757

Analysis of variance (example)

Using Statistical F Table

Analysis of variance (example)

Using Statistical F Table

Analysis of variance (example)

Create manual table and compare

mytable <- data.frame(Df = c(df_groups, df_error), 
                      SumSq = c(SS_groups, SS_error), 
                      MeanSq = c(MS_groups, MS_error), 
                      Fval = c(F_ratio, NA), 
                      Pval = c(pval, NA))
rownames(mytable) <- c("ppmCaffeine", "Residuals")
knitr::kable(mytable)
Df SumSq MeanSq Fval Pval
ppmCaffeine 3 1.134415 0.3781383 4.177862 0.0230776
Residuals 16 1.448160 0.0905100 NA NA
Analysis of Variance Table

Response: consumptionDifferenceFromControl
            Df Sum Sq Mean Sq F value  Pr(>F)  
ppmCaffeine  3 1.1344 0.37814  4.1779 0.02308 *
Residuals   16 1.4482 0.09051                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA assumptions and robustness

Assumptions (same as 2-sample \( t \)-test)

  • Measurements in each group represent a random sample from corresponding population.
  • Variable is normally distributed in each of the \( k \) populations.
  • Variance is the same in all \( k \) populations.

ANOVA assumptions and robustness

Robustness (same as 2-sample \( t \)-test)

  • Robust to deviations in normality.
  • Somewhat robust to deviations in equal variances when:
    • Sample sizes are “large”, about the same size in each group
    • Sample sizes are about the same (balanced)
    • Standard deviations are within about a 3-fold difference

Nonparametric alternative to ANOVA

Definition: The Kruskal-Wallis test is a nonparametric method for mulutiple groups based on ranks.

The Kruskal-Wallis test is similar to the Mann-Whitney \( U \)-test and has the same assumptions:

  • Group samples are random samples.
  • To use as a test of difference between means or medians, the distributions must have the same shape in every population.

Power of Kruskal-Wallis test is nearly as powerful as ANOVA when sample sizes are large, but has smaller power than ANOVA for small sample sizes.