Analysis of variance (ANOVA) tests the equality of mean responses \(\mu_1, \mu_2, ..., \mu_m\) among the \(m\) levels (groups) of a categorical explanatory variable. ANOVA determines whether the variability among the sample means is too large to be from chance alone. \(H_0\) is that all means are equal, and \(H_a\) is at least one mean differs from the others. The ANOVA F test statistic is the ratio of the between-group variability to the within-group variability. If the between-group variability dominates, the F statistic is large and the associated p-value is small, leading to rejection of \(H_0\).
The test does not indicate which populations cause the rejection of \(H_0\). ANOVA returns reliable results if the following conditions are met:
For each level \(i\) in a sample of one factor with \(m\) levels, the sum of squared differences between the level mean and the overall mean is \(SS_B = \sum_{i=1}^m n_i (\bar{y}_i - \bar{y})^2\). \(SS_B\) divided by its degrees of freedom is the between-sample variance, \(MS_B = \sigma_B^2 = \frac{SS_B}{m-1}\). The sum of squared differences between each value and its corresponding level mean is \(SS_W = \sum_{i,j} (y_{ij} - \bar{y}_{i.})^2 = \sum_{i=1}^m (n_i - 1) s_i^2\). \(SS_W\) divided by its degrees of freedom the within-sample variance, \(MS_W = \sigma_W^2 = \frac{SS_W}{n-m}\). The test statistic \(F = \frac{MS_B}{MS_W}\) has an F distribution with \(m-1\) numerator degrees of freedom and \(n-m\) denominator degrees of freedom. Under \(H_0\), both \(MS_B\) and \(MS_W\) estimate \(\sigma_\epsilon^2\), the variance common to all \(m\) populations. Under the alternative hypothesis, \(MS_B\) estimates \(\sigma_\epsilon^2 + \theta\), whereas \(MS_W\) still estimates \(\sigma_\epsilon^2\). The larger is F, the less likely \(H_0\) is true. Summarize the results in an analysis of variance (ANOVA) table.
SS | df | MS | F Test |
---|---|---|---|
\(SS_B\) | \(m-1\) | \(\frac{SS_B}{m-1}\) | \(\frac{MS_B}{MS_W}\) |
\(SS_W\) | \(n-m\) | \(\frac{SS_W}{n - m}\) | |
$SS | \(n-1\) |
To determine which group(s) differ(s) from the others, conduct a post-hoc test. Options include the Sidak and the Holm T test, Fisher’s Least Significant Difference test, Tukey’s Honestly Significant Difference test, the Scheffee test, the Newman-Keuls test, Dunnett’s Multiple Comparison test, the Duncan Multiple Range test, and the Bonferroni Procedure.
In a completely randomized design experiment, 20 young pigs are assigned at random among 4 experimental groups, and each group is fed a different diet. The response variable is the pig’s weight in kg after consuming the diet for 10 months. Are the mean pig weights the same for all 4 diets?
library(tidyr) # for gather()
## Warning: package 'tidyr' was built under R version 3.4.4
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
pig_weight <- read.delim(file = "Data/pig_weight.txt", header = TRUE, sep = ",")
pig_gath <- gather(pig_weight, diet, weight)
pig_gath$diet <- factor(pig_gath$diet, levels = c("Feed.1", "Feed.2", "Feed.3", "Feed.4"))
ggplot(data = pig_gath, aes(x = diet, y = weight)) +
geom_boxplot()
The measurements are independent because this is a completely randomized experiment. The individual populations could be assumed normally distributed if \(n >= 30\), but \(n = 20\), so we need to check for normality. The sample sizes are similar (5 per each of the 4 factor levels), so the equality of sample variances is less critical, but we can check anyway.
First a check of the normality condition. Test for normality by starting with the assumption that the distribution are normal, \(H_0: normal\), then falsifying the assumption if sufficient evidence exists. In these normal Q-Q plots, look for substantial deviations from a straight line. These plots looks good.
layout(rbind(c(1, 2), c(3, 4)))
qqnorm(pig_gath[pig_gath$diet == "Feed.1",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.1",]$weight)
qqnorm(pig_gath[pig_gath$diet == "Feed.2",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.2",]$weight)
qqnorm(pig_gath[pig_gath$diet == "Feed.3",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.3",]$weight)
qqnorm(pig_gath[pig_gath$diet == "Feed.4",]$weight)
qqline(pig_gath[pig_gath$diet == "Feed.4",]$weight)
There are statistical tests for that provide a quantitative evaluation, but the sample sizes are two small for them to be useful.
Now check for equal variances with Bartlett’s test of homogeneity of variances. The p-value is >>.05, so do not reject \(H_0\) of equal variances.
bartlett.test(weight ~ diet, data = pig_gath)
##
## Bartlett test of homogeneity of variances
##
## data: weight by diet
## Bartlett's K-squared = 0.46965, df = 3, p-value = 0.9255
Now we are ready for the one-way ANOVA test. The null hypothesis is that all means are equal. The p-value is <.0001, so we reject \(H_0\).
summary(aov_pig <- aov(weight ~ diet, data = pig_gath))
## Df Sum Sq Mean Sq F value Pr(>F)
## diet 3 4703 1567.7 206.7 5.28e-13 ***
## Residuals 16 121 7.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perform a post-hoc test to see which of the groups differ. Here we use Tukey’s test. All pairs differed from each other.
TukeyHSD(aov_pig)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ diet, data = pig_gath)
##
## $diet
## diff lwr upr p adj
## Feed.2-Feed.1 8.56 3.576977 13.543023 0.0008075
## Feed.3-Feed.1 39.66 34.676977 44.643023 0.0000000
## Feed.4-Feed.1 25.70 20.716977 30.683023 0.0000000
## Feed.3-Feed.2 31.10 26.116977 36.083023 0.0000000
## Feed.4-Feed.2 17.14 12.156977 22.123023 0.0000002
## Feed.4-Feed.3 -13.96 -18.943023 -8.976977 0.0000030
plot(TukeyHSD(aov_pig))