Take the data set maize from package biotools. They consist of multivariate observations on ears of five maize genotypes (families).
# install.packages('biotools')
library(biotools)
data(maize)
str(maize)'data.frame': 20 obs. of 6 variables:
$ NKPR : num 36 34.2 28.1 30.7 34 ...
$ ED : num 4.33 4.04 3.74 4.32 4.66 ...
$ CD : num 2.32 2.25 2.14 2.29 2.5 ...
$ PH : num 2.09 1.89 2.12 2.08 2.1 ...
$ family: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 ...
$ env : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 2 2 2 2 2 ...
With such a multivariate responses, we might be interested in testing the null hypothesis
\[H_0: \mu_1 = \mu_2 = ... = \mu_5\]
that is, the the mean vectors for levels of family are equal, under the nominal level \(\alpha\) of significance, which can be evaluated by applying a multivariate test through a MANOVA model.
The estimated mean vectors are
aggregate( .~ family, data = maize, FUN = mean) family NKPR ED CD PH env
1 1 36 4.3 2.4 2.1 2.5
2 2 32 4.0 2.3 1.7 2.5
3 3 30 3.6 2.1 2.1 2.5
4 4 31 4.2 2.4 2.0 2.5
5 5 34 4.5 2.4 2.0 2.5
A MANOVA model is fitted with:
M <- lm(cbind(NKPR, ED, CD) ~ family, data = maize)
anova(M, test = "Wilks")Analysis of Variance Table
Df Wilks approx F num Df den Df Pr(>F)
(Intercept) 1 0.0013 3273 3 13.0 < 2e-16 ***
family 4 0.1003 4 12 34.7 0.00066 ***
Residuals 15
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We’ve seen that there is a significant (\(p < 0.05\)) effect of family on the multivariate response. Now one question is where are those differences? To answer that, multiple pairwise tests can be run. The function mvpaircomp() allows on to choose between multivariate statistics such as Wilks’ lambda, Pillai’s trace etc. to perform multiple tests on factor levels a fitted model.
mvpaircomp(M, "family", test = "Wilks", adjust = "bonferroni")
Multivariate Pairwise Comparisons
Wilks approx F num DF den DF Pr(>F)
1 - 2 0.578 3.17 3 13 0.6047
1 - 3 0.327 8.93 3 13 0.0179 *
1 - 4 0.556 3.46 3 13 0.4806
1 - 5 0.676 2.08 3 13 1.0000
2 - 3 0.612 2.75 3 13 0.8531
2 - 4 0.604 2.84 3 13 0.7895
2 - 5 0.463 5.02 3 13 0.1583
3 - 4 0.320 9.22 3 13 0.0156 *
3 - 5 0.243 13.50 3 13 0.0027 **
4 - 5 0.804 1.06 3 13 1.0000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With bonferroni p-value adjustment for multiple comparisons
One requirement of the model fitted earlier is that is residual covariance matrices of the factor levels are homogeneous. Furthermore, in cluster analysis, it might be of interest to check if the covariance matrices of the clusters can be considered equals, especially if one intends to perform a linear discriminant analysis (LDA) using a pooled matrix. In those cases, the Box’s M-test can be applied.
res <- residuals(M)
head(res) NKPR ED CD
1 -0.085 -0.016 -0.052
2 1.970 0.026 -0.070
3 -1.933 0.102 0.014
4 -0.274 0.078 -0.080
5 0.297 0.201 0.075
6 0.395 -0.021 0.066
bm <- boxM(data = res, grouping = maize$family)
bm
Box's M-test for Homogeneity of Covariance Matrices
data: res
Chi-Sq (approx.) = 33, df = 24, p-value = 0.1
# install.packages('mvShapiroTest')
library('mvShapiroTest')
mvShapiro.Test(res)
Generalized Shapiro-Wilk test for Multivariate Normality by
Villasenor-Alva and Gonzalez-Estrada
data: res
MVW = 0.9, p-value = 0.1