One-Way ANOVA

Conducting hypothesis tests for the means of a quantitative variable grouped by the levels of a single multi-nomial categorical variable using R.

Michael Foley

2019-03-21

Analysis of variance (ANOVA) tests the equality of mean responses \(\mu_1, \mu_2, ..., \mu_m\) among the \(m\) levels (groups) of a categorical explanatory variable. ANOVA determines whether the variability among the sample means is too large to be from chance alone. \(H_0\) is that all means are equal, and \(H_a\) is at least one mean differs from the others. The ANOVA F test statistic is the ratio of the between-group variability to the within-group variability. If the between-group variability dominates, the F statistic is large and the associated p-value is small, leading to rejection of \(H_0\).

The test does not indicate which populations cause the rejection of \(H_0\). ANOVA returns reliable results if the following conditions are met:

  1. the measurements for \(X_i, i \in (1..m)\) are independent,
  2. the measurements in each population are normally distributed (less critical if sample size is large)1 Use a nonparametric statistical method such as the Kruskal-Wallis test when the population distributions are very non-normal. Under Kruskal-Wallis, \(H_0\) is equality of the m population means., and
  3. the measurements in each population have the same variance \(\sigma_{X_i}^2, i \in (1..m)\) (less critical if sample sizes for each population are similar. A general rule of thumb for equal variances is to compare the smallest and largest sample standard deviation. If the ratio of these two sample standard deviations fall within 0.5 to 2, then we can assume equal variances. More formal tests include the Bartlett test, and Levene test2 See Homogeneity of variance in Cookbook for R).

For each level \(i\) in a sample of one factor with \(m\) levels, the sum of squared differences between the level mean and the overall mean is \(SS_B = \sum_{i=1}^m n_i (\bar{y}_i - \bar{y})^2\). \(SS_B\) divided by its degrees of freedom is the between-sample variance, \(MS_B = \sigma_B^2 = \frac{SS_B}{m-1}\). The sum of squared differences between each value and its corresponding level mean is \(SS_W = \sum_{i,j} (y_{ij} - \bar{y}_{i.})^2 = \sum_{i=1}^m (n_i - 1) s_i^2\). \(SS_W\) divided by its degrees of freedom the within-sample variance, \(MS_W = \sigma_W^2 = \frac{SS_W}{n-m}\). The test statistic \(F = \frac{MS_B}{MS_W}\) has an F distribution with \(m-1\) numerator degrees of freedom and \(n-m\) denominator degrees of freedom. Under \(H_0\), both \(MS_B\) and \(MS_W\) estimate \(\sigma_\epsilon^2\), the variance common to all \(m\) populations. Under the alternative hypothesis, \(MS_B\) estimates \(\sigma_\epsilon^2 + \theta\), whereas \(MS_W\) still estimates \(\sigma_\epsilon^2\). The larger is F, the less likely \(H_0\) is true. Summarize the results in an analysis of variance (ANOVA) table.

SS df MS F Test
\(SS_B\) \(m-1\) \(\frac{SS_B}{m-1}\) \(\frac{MS_B}{MS_W}\)
\(SS_W\) \(n-m\) \(\frac{SS_W}{n - m}\)
$SS \(n-1\)

To determine which group(s) differ(s) from the others, conduct a post-hoc test. Options include the Sidak and the Holm T test, Fisher’s Least Significant Difference test, Tukey’s Honestly Significant Difference test, the Scheffee test, the Newman-Keuls test, Dunnett’s Multiple Comparison test, the Duncan Multiple Range test, and the Bonferroni Procedure.

Example

In a completely randomized design experiment, 20 young pigs are assigned at random among 4 experimental groups, and each group is fed a different diet. The response variable is the pig’s weight in kg after consuming the diet for 10 months. Are the mean pig weights the same for all 4 diets?

library(tidyr)  # for gather()
## Warning: package 'tidyr' was built under R version 3.4.4
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
pig_weight <- read.delim(file = "Data/pig_weight.txt", header = TRUE, sep = ",")
pig_gath <- gather(pig_weight, diet, weight)
pig_gath$diet <- factor(pig_gath$diet, levels = c("Feed.1", "Feed.2", "Feed.3", "Feed.4"))

ggplot(data = pig_gath, aes(x = diet, y = weight)) +
  geom_boxplot()

The measurements are independent because this is a completely randomized experiment. The individual populations could be assumed normally distributed if \(n >= 30\), but \(n = 20\), so we need to check for normality. The sample sizes are similar (5 per each of the 4 factor levels), so the equality of sample variances is less critical, but we can check anyway.

First a check of the normality condition. Test for normality by starting with the assumption that the distribution are normal, \(H_0: normal\), then falsifying the assumption if sufficient evidence exists. In these normal Q-Q plots, look for substantial deviations from a straight line. These plots looks good.

layout(rbind(c(1, 2), c(3, 4)))
qqnorm(pig_gath[pig_gath$diet == "Feed.1",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.1",]$weight)
qqnorm(pig_gath[pig_gath$diet == "Feed.2",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.2",]$weight)
qqnorm(pig_gath[pig_gath$diet == "Feed.3",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.3",]$weight)
qqnorm(pig_gath[pig_gath$diet == "Feed.4",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.4",]$weight)

There are statistical tests for that provide a quantitative evaluation, but the sample sizes are two small for them to be useful.

Now check for equal variances with Bartlett’s test of homogeneity of variances. The p-value is >>.05, so do not reject \(H_0\) of equal variances.

bartlett.test(weight ~ diet, data = pig_gath)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  weight by diet
## Bartlett's K-squared = 0.46965, df = 3, p-value = 0.9255

Now we are ready for the one-way ANOVA test. The null hypothesis is that all means are equal. The p-value is <.0001, so we reject \(H_0\).

summary(aov_pig <- aov(weight ~ diet, data = pig_gath))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## diet         3   4703  1567.7   206.7 5.28e-13 ***
## Residuals   16    121     7.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Perform a post-hoc test to see which of the groups differ. Here we use Tukey’s test. All pairs differed from each other.

TukeyHSD(aov_pig)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = weight ~ diet, data = pig_gath)
## 
## $diet
##                 diff        lwr       upr     p adj
## Feed.2-Feed.1   8.56   3.576977 13.543023 0.0008075
## Feed.3-Feed.1  39.66  34.676977 44.643023 0.0000000
## Feed.4-Feed.1  25.70  20.716977 30.683023 0.0000000
## Feed.3-Feed.2  31.10  26.116977 36.083023 0.0000000
## Feed.4-Feed.2  17.14  12.156977 22.123023 0.0000002
## Feed.4-Feed.3 -13.96 -18.943023 -8.976977 0.0000030
plot(TukeyHSD(aov_pig))