Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the “variation” among and between groups) used to analyze the differences among group means in a sample. ANOVA was developed by statistician and evolutionary biologist Ronald Fisher. The ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means.
## Loading required package: tools
The data arise from an experiment to study the gain in weight of rats fed on four different diets, distinguished by amount of protein (low and high) and by source of protein (beef and cereal).
What data looks like
head(weightgain[19:23,], n=5)
## source type weightgain
## 19 Beef High 117
## 20 Beef High 111
## 21 Cereal Low 107
## 22 Cereal Low 95
## 23 Cereal Low 97
Now some stats
print("mean:")
## [1] "mean:"
tapply(weightgain$weightgain,list(weightgain$source, weightgain$type), mean)
## High Low
## Beef 100.0 79.2
## Cereal 85.9 83.9
print("standard deviation:")
## [1] "standard deviation:"
tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), sd)
## High Low
## Beef 15.13642 13.88684
## Cereal 15.02184 15.70881
There seems to be differences on weight gain by the infuences of the two factors, but how can we know if the observed differences aren’t from random effect?
To answer this question, we can apply ANOVA to the data. The model formula specifies a two-way layout with interaction term between the two factor variables
wg_aov <- aov(weightgain ~ source * type, data = weightgain) #"*" between source and type specify interaction term required. If not required, use "+" symbol
The estimates of the intercept and the main and interaction effects can be extracted from the model fit by
coef(wg_aov)
## (Intercept) sourceCereal typeLow
## 100.0 -14.1 -20.8
## sourceCereal:typeLow
## 18.8
Plot of mean weight gain for each level of the two factors
plot.design(weightgain)
Lastly Summary of the ANOVA model:
summary(wg_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## source 1 221 220.9 0.988 0.3269
## type 1 1300 1299.6 5.812 0.0211 *
## source:type 1 884 883.6 3.952 0.0545 .
## Residuals 36 8049 223.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As you can see, source’s p-value is too large, but type and the interaction term (between source and type) seems to be pretty significant (PR(>F) <0.1), and let’s examine the interaction between the two factor variables.
interaction.plot(weightgain$type, weightgain$source, weightgain$weightgain)
Aha! This plot reveals that the source variable queal to cereal does not have much impact on the weight gain - rather a flat negatively sloped line, whereas steep neg slope is observed when source equals beef (moving from high to low from the ‘type’ factor), thus the low p-value for the interaction term, but not the ‘source’ variable by itself.
Data:
head(foster[46:48,])
## litgen motgen weight
## 46 I J 54.5
## 47 J A 59.0
## 48 J A 57.4
The data are from a foster feeding experiment with rat mothers and litters of four different genotypes. The measurement is the litter weight after a trial feeding period.
We can derive the two analyses of variance tables for the foster feeding
summary(aov(weight ~ litgen * motgen, data = foster))
## Df Sum Sq Mean Sq F value Pr(>F)
## litgen 3 60.2 20.05 0.370 0.77522
## motgen 3 775.1 258.36 4.763 0.00574 **
## litgen:motgen 9 824.1 91.56 1.688 0.12005
## Residuals 45 2440.8 54.24
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now let’s switch the order of the two factors
summary(aov(weight ~ motgen * litgen, data = foster))
## Df Sum Sq Mean Sq F value Pr(>F)
## motgen 3 771.6 257.20 4.742 0.00587 **
## litgen 3 63.6 21.21 0.391 0.76000
## motgen:litgen 9 824.1 91.56 1.688 0.12005
## Residuals 45 2440.8 54.24
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot.design(foster)
There seems to be small differences in the sum of squares for the two main effects, and consequentlym in the associated F-tests and p-values. We can investigate the effect of genotype B on litter weight in more detail by the use of multiple comparison procedures. Such procedures allow a comparison of all pairs of levels of a factor while maintaining the nominal significance level at its specified value and producing adjusted confidence intervals for mean differences. One such procedure is “Tukey honest significant differences”.
unique(foster$motgen)
## [1] A B I J
## Levels: A B I J
foster_aov <- aov(weight ~ litgen * motgen, data = foster)
foster_hsd <- TukeyHSD(foster_aov, "motgen")
foster_hsd
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ litgen * motgen, data = foster)
##
## $motgen
## diff lwr upr p adj
## B-A 3.330369 -3.859729 10.5204672 0.6078581
## I-A -1.895574 -8.841869 5.0507207 0.8853702
## J-A -6.566168 -13.627285 0.4949498 0.0767540
## I-B -5.225943 -12.416041 1.9641552 0.2266493
## J-B -9.896537 -17.197624 -2.5954489 0.0040509
## J-I -4.670593 -11.731711 2.3905240 0.3035490
As you can see from the Tuckey test, out of the 6 unique combinations of the factor “mortgen”, only “J and B”’s Confidence interval does NOT contain 0, implying significance at 95% Confidence level or at 5% alpha.
Now let’s plot the result:
plot(foster_hsd)
Data:
head(water, n=4)
## location town mortality hardness
## 1 South Bath 1247 105
## 2 North Birkenhead 1668 17
## 3 South Birmingham 1466 5
## 4 North Blackburn 1800 14
The mortality and drinking water hardness for 61 cities in England and Wales.
Let’s assess the differences of both hardness and mortality in the North or South. The hypothesis that the two-dimensional mean-vector of water hardness and mortality is the same for cities in the North and the South can be tested by Hotelling-Lawley test in a multivariate analysis of variance framework. The manova can be used to fit such a model and the corresponding summary method performs the test specified by the test argument
summary(manova(cbind(hardness, mortality) ~ location, data = water), test = "Hotelling-Lawley")
## Df Hotelling-Lawley approx F num Df den Df Pr(>F)
## location 1 0.90021 26.106 2 58 8.217e-09 ***
## Residuals 59
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is very small, suggesting strong evidence that the mean vectors of the two variables are not the same in the two regions.
Let’s look at the sample means separately:
tapply(water$hardness, water$location, mean)
## North South
## 30.40000 69.76923
tapply(water$mortality, water$location, mean)
## North South
## 1633.600 1376.808
we see large differences in the two regions both in water hardness and mortality, where low mortality is associated with hard water in the South and high mortality with soft water in the North
Data:
head(skulls, n=4)
## epoch mb bh bl nh
## 1 c4000BC 131 138 89 49
## 2 c4000BC 125 131 92 48
## 3 c4000BC 131 132 99 50
## 4 c4000BC 119 132 96 44
The data describe measurements made on Egyptian skulls from five epochs.
means <- aggregate(skulls[,c("mb", "bh", "bl", "nh")], list(epoch = skulls$epoch), mean)
means
## epoch mb bh bl nh
## 1 c4000BC 131.3667 133.6000 99.16667 50.53333
## 2 c3300BC 132.3667 132.7000 99.06667 50.23333
## 3 c1850BC 134.4667 133.8000 96.03333 50.56667
## 4 c200BC 135.5000 132.3000 94.53333 51.96667
## 5 cAD150 136.1667 130.3333 93.50000 51.36667
There appear to be quite large differences between the epoch means. We can now test for a difference more formally by using MANOVA with the following R code to apply each of the four possible test criteria mentioned earlier
skulls_manova <- manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls)
summary(skulls_manova, test = "Pillai")
## Df Pillai approx F num Df den Df Pr(>F)
## epoch 4 0.35331 3.512 16 580 4.675e-06 ***
## Residuals 145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(skulls_manova, test = "Wilks")
## Df Wilks approx F num Df den Df Pr(>F)
## epoch 4 0.66359 3.9009 16 434.45 7.01e-07 ***
## Residuals 145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(skulls_manova, test = "Hotelling-Lawley")
## Df Hotelling-Lawley approx F num Df den Df Pr(>F)
## epoch 4 0.48182 4.231 16 562 8.278e-08 ***
## Residuals 145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(skulls_manova, test = "Roy")
## Df Roy approx F num Df den Df Pr(>F)
## epoch 4 0.4251 15.41 4 145 1.588e-10 ***
## Residuals 145
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value associated with each four test criteria is very small and there is strong evidence that the skull measurements differ between the five epochs. We might now move on to investigate which epochs differ and on which variables. We can look at the univariate F-tests for each of the four variables by using the code
summary.aov(skulls_manova)
## Response mb :
## Df Sum Sq Mean Sq F value Pr(>F)
## epoch 4 502.83 125.707 5.9546 0.0001826 ***
## Residuals 145 3061.07 21.111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response bh :
## Df Sum Sq Mean Sq F value Pr(>F)
## epoch 4 229.9 57.477 2.4474 0.04897 *
## Residuals 145 3405.3 23.485
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response bl :
## Df Sum Sq Mean Sq F value Pr(>F)
## epoch 4 803.3 200.823 8.3057 4.636e-06 ***
## Residuals 145 3506.0 24.179
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response nh :
## Df Sum Sq Mean Sq F value Pr(>F)
## epoch 4 61.2 15.300 1.507 0.2032
## Residuals 145 1472.1 10.153
We see that the results for the maximum breadths (mb) and basialiveolar length (bl) are highly significant, with those for the other two variables, in particular for nasal heights (nh), suggesting little evidence of a difference. To look at the pairwise multivariate tests (any of the four test criteria are equivalent in the case of a one-way layout with two levels only) we can use the summary method and manova function as follows:
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls, subset = epoch %in% c("c4000BC", "c3300BC")))
## Df Pillai approx F num Df den Df Pr(>F)
## epoch 1 0.027674 0.39135 4 55 0.8139
## Residuals 58
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls, subset = epoch %in% c("c4000BC", "c1850BC")))
## Df Pillai approx F num Df den Df Pr(>F)
## epoch 1 0.18757 3.1744 4 55 0.02035 *
## Residuals 58
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls, subset = epoch %in% c("c4000BC", "c200BC")))
## Df Pillai approx F num Df den Df Pr(>F)
## epoch 1 0.30297 5.9766 4 55 0.0004564 ***
## Residuals 58
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(manova(cbind(mb, bh, bl, nh) ~ epoch, data = skulls, subset = epoch %in% c("c4000BC", "cAD150")))
## Df Pillai approx F num Df den Df Pr(>F)
## epoch 1 0.36182 7.7956 4 55 4.736e-05 ***
## Residuals 58
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To keep the overall significance level for the set of all pairwise multivariate tests under some control (and still maintain a reasonable power), it is recommended to setting the nominal level alpha = 0.15 and carrying out each test at the alpha/m level where m is the number of tests performed. The results of the four pairwise tests suggest that as the epochs become further separated in time the four skull measurements become increasingly distinct.