Introduction

Take the data set maize from package biotools. They consist of multivariate observations on ears of five maize genotypes (families).

# install.packages('biotools')
library(biotools)
data(maize)
str(maize)
'data.frame':   20 obs. of  6 variables:
 $ NKPR  : num  36 34.2 28.1 30.7 34 ...
 $ ED    : num  4.33 4.04 3.74 4.32 4.66 ...
 $ CD    : num  2.32 2.25 2.14 2.29 2.5 ...
 $ PH    : num  2.09 1.89 2.12 2.08 2.1 ...
 $ family: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 ...
 $ env   : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 2 2 2 2 2 ...

With such a multivariate responses, we might be interested in testing the null hypothesis

\[H_0: \mu_1 = \mu_2 = ... = \mu_5\]

that is, the the mean vectors for levels of family are equal, under the nominal level \(\alpha\) of significance, which can be evaluated by applying a multivariate test through a MANOVA model.

The estimated mean vectors are

aggregate( .~ family, data = maize, FUN = mean)
  family NKPR  ED  CD  PH env
1      1   36 4.3 2.4 2.1 2.5
2      2   32 4.0 2.3 1.7 2.5
3      3   30 3.6 2.1 2.1 2.5
4      4   31 4.2 2.4 2.0 2.5
5      5   34 4.5 2.4 2.0 2.5

MANOVA

A MANOVA model is fitted with:

M <- lm(cbind(NKPR, ED, CD) ~ family, data = maize)
anova(M, test = "Wilks")
Analysis of Variance Table

            Df  Wilks approx F num Df den Df  Pr(>F)    
(Intercept)  1 0.0013     3273      3   13.0 < 2e-16 ***
family       4 0.1003        4     12   34.7 0.00066 ***
Residuals   15                                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multivariate pairwise comparisons

We’ve seen that there is a significant (\(p < 0.05\)) effect of family on the multivariate response. Now one question is where are those differences? To answer that, multiple pairwise tests can be run. The function mvpaircomp() allows on to choose between multivariate statistics such as Wilks’ lambda, Pillai’s trace etc. to perform multiple tests on factor levels a fitted model.

mvpaircomp(M, "family", test = "Wilks", adjust = "bonferroni")

            Multivariate Pairwise Comparisons

      Wilks approx F num DF den DF Pr(>F)   
1 - 2 0.578     3.17      3     13 0.6047   
1 - 3 0.327     8.93      3     13 0.0179 * 
1 - 4 0.556     3.46      3     13 0.4806   
1 - 5 0.676     2.08      3     13 1.0000   
2 - 3 0.612     2.75      3     13 0.8531   
2 - 4 0.604     2.84      3     13 0.7895   
2 - 5 0.463     5.02      3     13 0.1583   
3 - 4 0.320     9.22      3     13 0.0156 * 
3 - 5 0.243    13.50      3     13 0.0027 **
4 - 5 0.804     1.06      3     13 1.0000   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With bonferroni p-value adjustment for multiple comparisons

Box-M test

One requirement of the model fitted earlier is that is residual covariance matrices of the factor levels are homogeneous. Furthermore, in cluster analysis, it might be of interest to check if the covariance matrices of the clusters can be considered equals, especially if one intends to perform a linear discriminant analysis (LDA) using a pooled matrix. In those cases, the Box’s M-test can be applied.

res <- residuals(M)
head(res)
    NKPR     ED     CD
1 -0.085 -0.016 -0.052
2  1.970  0.026 -0.070
3 -1.933  0.102  0.014
4 -0.274  0.078 -0.080
5  0.297  0.201  0.075
6  0.395 -0.021  0.066
bm <- boxM(data = res, grouping = maize$family)
bm

    Box's M-test for Homogeneity of Covariance Matrices

data:  res
Chi-Sq (approx.) = 33, df = 24, p-value = 0.1

Multivariate normality

# install.packages('mvShapiroTest')
library('mvShapiroTest')
mvShapiro.Test(res)

    Generalized Shapiro-Wilk test for Multivariate Normality by
    Villasenor-Alva and Gonzalez-Estrada

data:  res
MVW = 0.9, p-value = 0.1

Miscellanea


License: CC BY 4.0