Originally developed by one of the founders of modern statistics, Sir Ronald Fisher, Analysis of Variance (ANOVA) is a broad class of statistical models for comparing means across multiple groups. The main idea is that the variability of all of the observations (measured as the variance, hence the name) is partitioned into the variation within groups vs. the variation between groups. In it’s most basic form, we can think of ANOVA as generalizing the two-sample t-test to more than two groups.
As an aside, you may ask, “Why not just run t-tests on each pair of groups?” While this initially seems reasonable, you have to remember that statistical inferences are probabilistic statements. And when we run multiple comparisons using the same data, we increase our chance of making what statisticians call a type-I error, rejecting the null hypothesis without sufficient evidence. Take a statistics class to learn more. We’ll return to the multiple comarison problem below.
The simplest ANOVA model, called One-way ANOVA compares means of a response variable across 3 or more groups based on a single treatment factor. For example, here we consider the weights of six-week old chicks, each of which was given one of six different feed types. As usual, we’ll need to load a few packages.
require(mosaic) # be sure to install and load the usual packages if you have not already
require(lattice)
require(datasets)
data(chickwts)
You can find out more about the data using help()
and once the data are loaded you can click on the data frame to view the actual dataset in the data window.
help(chickwts)
Let’s start with a graphical summary of the data first. Since we have six different feed types, with multiple samples of each, a boxplot will be helpful. We can also examine summary statistics for each feed type using favstats()
from the mosaic
package.
bwplot(weight~feed, data=chickwts, xlab="Feed type", ylab="Chick weight at six weeks (g)")
favstats(weight~feed, data=chickwts)
## .group min Q1 median Q3 max mean sd n missing
## 1 casein 216 277.2 342.0 370.8 404 323.6 64.43 12 0
## 2 horsebean 108 137.0 151.5 176.2 227 160.2 38.63 10 0
## 3 linseed 141 178.0 221.0 257.8 309 218.8 52.24 12 0
## 4 meatmeal 153 249.5 263.0 320.0 380 276.9 64.90 11 0
## 5 soybean 158 206.8 248.0 270.0 329 246.4 54.13 14 0
## 6 sunflower 226 312.8 328.0 340.2 423 328.9 48.84 12 0
With six feed types, there are lots of numbers. Notice that we have between 10 and 14 observations in each treatment, which is close to the minimum for using a boxplot. The plot itself is actually pretty informative, but it might be nice to order the feed types by chick weight, rather than just alphabetically. To do so, we can use the reorder()
function on the feed variable in the plot function. Try it!
bwplot(weight~reorder(feed,weight,median), data=chickwts, xlab="Feed type", ylab="Chick weight at six weeks (g)")
Now we can see that the median chick weight is lowest when the chicks are fed horsebean, and highest when they eat sunflower or casein. However, the boxplots indicate that there is quite a bit of overlap in chick weight among many of the feed types. In ANOVA terms, there is variation both within and between treatment groups. To formally test for an effect of feed type on chick weight (and thus growth, presuming they all started about the same size), we can use the lm()
function, which is familiar from linear regression. You see, both ANOVA and regression are linear models, which is what lm
stands for.
chickmodel=lm(weight~feed, data=chickwts) #this command runs the model
ANOVA results are summarized in a standardized table format, which we can see using the anova()
command.
anova(chickmodel)
## Analysis of Variance Table
##
## Response: weight
## Df Sum Sq Mean Sq F value Pr(>F)
## feed 5 231129 46226 15.4 5.9e-10 ***
## Residuals 65 195556 3009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The first three columns of the table are components of the F statistic, which is the test statistic for ANOVA, whose probability distribution is known. As usual, larger values of the test statistic occur when the observations differ from the null hypothesis, which in this case would be that there is no difference in wiehgt among feed types. The p-value (Pr(>F)
) is the probability of observing an F value at least this large if all of the chick weights were drawn from the same population, regardless of feed type. We can see here that the p-value is very, very small, which tells us that chick weights do indeed appear to vary systematically across feed types.
But which weights differ from which. Looking back at the plot, it appears that horsebean and casein are certainly different, but what about soybean and meatmeal, or sunflower and casein? We can get some information from a summary of the linear model.
summary(chickmodel)
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.91 -34.41 1.57 38.17 103.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.58 15.83 20.44 < 2e-16 ***
## feedhorsebean -163.38 23.49 -6.96 2.1e-09 ***
## feedlinseed -104.83 22.39 -4.68 1.5e-05 ***
## feedmeatmeal -46.67 22.90 -2.04 0.04557 *
## feedsoybean -77.15 21.58 -3.58 0.00067 ***
## feedsunflower 5.33 22.39 0.24 0.81249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.9 on 65 degrees of freedom
## Multiple R-squared: 0.542, Adjusted R-squared: 0.506
## F-statistic: 15.4 on 5 and 65 DF, p-value: 5.94e-10
The table of coefficients is the formula for the linear model. If you want to estimate the weight of a chick, based on feed, you start with the intercept term (323.6, simply the mean of the casein fed chicks, which serve as a baseline), then add the coefficient for the corresponding feed type, if the chick was fed something other than casein. The table of coefficients also includes t-tests for each coefficient. For the different feed types, these are essentially two-sample t-tests comparing the mean weight of casein to each of the other treatments.
But this leaves us with two problems. First, as mentioned above, doing multiple tests on the same data increases the probability of making an erroneous inference. Second, we still can only compare each feed to casein, when in fact we might want to make every pairwise comparison.
Several statistical methods have been put forward to solve this common ANOVA problem, but one is the “Honest Significant Difference” test developed by a statistician named John Tukey. In R, we can use the TukeyHSD()
function, which is modified in the mosaic
package to take an lm
model as input. Tukey’s test attempts to adjust for the multiple comparison problem and make more conservative inferences.
TukeyHSD(chickmodel)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = x)
##
## $feed
## diff lwr upr p adj
## horsebean-casein -163.383 -232.347 -94.42 0.0000
## linseed-casein -104.833 -170.587 -39.08 0.0002
## meatmeal-casein -46.674 -113.906 20.56 0.3325
## soybean-casein -77.155 -140.517 -13.79 0.0084
## sunflower-casein 5.333 -60.421 71.09 0.9999
## linseed-horsebean 58.550 -10.414 127.51 0.1413
## meatmeal-horsebean 116.709 46.335 187.08 0.0001
## soybean-horsebean 86.229 19.542 152.92 0.0042
## sunflower-horsebean 168.717 99.753 237.68 0.0000
## meatmeal-linseed 58.159 -9.073 125.39 0.1277
## soybean-linseed 27.679 -35.684 91.04 0.7933
## sunflower-linseed 110.167 44.413 175.92 0.0001
## soybean-meatmeal -30.481 -95.375 34.41 0.7391
## sunflower-meatmeal 52.008 -15.224 119.24 0.2207
## sunflower-soybean 82.488 19.126 145.85 0.0039
Here, with every pairwise comparison made, there are a lot of results to put together. But if we look through all of the confidence intervals and p-values carefully, we can see that horsebean chicks (group A) can be distinguished from all but the linseed chicks (group AB), which in turn cannot be distinguished from the soybean (group B) or meatmeal chicks (group BC). The sunflower and casein chicks (both group C) can be distinguished from soybean, linseed, and horsebean chicks, but not from meatmeal chicks. If we assign the group codes in this way, any pair of feed types that do not share a letter can be reliably distinguished in terms of chick weight, on average.
We can add these onto a boxplot pretty easily if we use the basic boxplot
function, followed by the text
function.
boxplot(weight~reorder(feed,weight,median), data=chickwts, xlab="Feed type", ylab="Chick weight at six weeks (g)", ylim=c(100,500))
text(x=1:6, y=475, c("a","ab","b","bc","c","c"))