The Analysis of Variance (ANOVA) - Part 2

M. Drew LaMar
November 16, 2018

Class Announcements

  • Homework #7 mini-extension: Due tonight at 11:59 pm
  • Homework #8: Chapter 13 and 15 (Due Friday, December 7, 11:59 pm) [LAST GRADED HW]
  • Project: Due Friday, December 14, 11:59 pm
    • For those not doing sleep study: Same exact format (i.e. 3 questions, explore data, etc.)

Analysis of variance (intro)

\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \end{eqnarray*} } \]

Analysis of variance (derivation)

\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i})\right]^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y})^2 + (Y_{ij} - \bar{Y}_{i})^2 + 2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i})\right] \\ & = & \sum_{i}\sum_{j}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \sum_{i}n_{i}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \end{eqnarray*} } \]

Analysis of variance (derivation)

Can show:

\[ \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) = 0, \]

and thus

\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}}. \]

Analysis of variance (derivation)

Data: With \( i \) representing group \( i \), we have

\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]

\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]

From sum-of-squares to mean squares

Definition: The group mean square is given by

\[ \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{df_{\mathrm{groups}}}, \] with \( df_{\mathrm{groups}} = k-1 \).

Definition: The error mean square is given by

\[ \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{df_{\mathrm{error}}}, \] with \( df_{\mathrm{error}} = \sum (n_{i}-1) = N-k \).

Analysis of variance (example)

str(strungOutBees)
'data.frame':   20 obs. of  2 variables:
 $ ppmCaffeine                     : Factor w/ 4 levels "ppm50","ppm100",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ consumptionDifferenceFromControl: num  -0.4 0.01 0.65 0.24 0.34 -0.39 0.53 0.44 0.19 -0.08 ...

Discuss: Is this data tidy or messy?

Definition: Tidy!

Analysis of variance (example)

stripchart(consumptionDifferenceFromControl ~ ppmCaffeine, 
           data = strungOutBees, 
           vertical = TRUE, 
           method = "jitter", 
           xlab="Caffeine (ppm)",
           col="red")

Analysis of variance (example)

plot of chunk unnamed-chunk-3

Analysis of variance (example)

Discuss: State the null and alternative hypotheses appropriate for this question.

\[ \begin{eqnarray*} H_{0} & : & \mu_{50} = \mu_{100} = \mu_{150} = \mu_{200} \\ H_{A} & : & \mathrm{At \ least \ one \ of \ the \ means \ is \ different} \end{eqnarray*} \]

Analysis of variance (example)

Short cut using R

caffResults <- lm(consumptionDifferenceFromControl ~ ppmCaffeine, data=strungOutBees)
anova(caffResults)
Analysis of Variance Table

Response: consumptionDifferenceFromControl
            Df Sum Sq Mean Sq F value  Pr(>F)  
ppmCaffeine  3 1.1344 0.37814  4.1779 0.02308 *
Residuals   16 1.4482 0.09051                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Analysis of variance (example)

Definition: The \( R^{2} \) value in ANOVA is the “fraction of the variation explained by groups” and is given by

\[ R^{2} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{SS}_{\mathrm{total}}}. \] Note: \( 0 \leq R^2 \leq 1 \).

beeAnovaSummary <- summary(caffResults)
beeAnovaSummary$r.squared
[1] 0.4392573