MANOVA in R: Implementation

As with most of the things in R, performing a MANOVA statistical test boils down to a single function call. But we’ll need a dataset first. The Iris dataset is well-known among the data science crowd, and it is built into R:

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

The dataset must have more observations (rows) per group in the independent variable than a number of the dependent variables.

Dependent variables

As MANOVA cares for the difference in means for each factor, let’s visualize the boxplot for every dependent variable.

library(ggplot2)
library(gridExtra)
box_sl <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_sw <- ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pl <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pw <- ggplot(iris, aes(x = Species, y = Petal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")

grid.arrange(box_sl, box_sw, box_pl, box_pw, ncol = 2, nrow = 2)

It seems like the setosa species is more separated than the other two, but let’s not jump to conclusions

One-way MANOVA in R

dependent_vars <- cbind(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
independent_var <- iris$Species

manova_model <- manova(dependent_vars ~ independent_var, data = iris)
summary(manova_model)

##                  Df Pillai approx F num Df den Df    Pr(>F)    
## independent_var   2 1.1919   53.466      8    290 < 2.2e-16 ***
## Residuals       147                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, MANOVA in R uses Pillai’s Trace test statistic. The P-value is practically zero, which means we can safely reject the null hypothesis in the favor of the alternative one – at least one group mean vector differs from the rest.

We could also measure the effect size. One metric often used with MANOVA is Partial Eta Squared. It measures the effect the independent variable has on the dependent variables. If the value is 0.14 or greater, we can say the effect size is large. Here’s how to calculate it in R:

library(effectsize)

## Registered S3 method overwritten by 'parameters':
##   method                         from      
##   format.parameters_distribution datawizard

eta_squared(manova_model)

## # Effect Size for ANOVA (Type I)
## 
## Parameter       | Eta2 (partial) |       95% CI
## -----------------------------------------------
## independent_var |           0.60 | [0.54, 1.00]
## 
## - One-sided CIs: upper bound fixed at (1).

The value is 0.6, which means the effect size is large. It’s a great way to double-check the summary results of a MANOVA test, but how can we actually know which group mean vector differs from the rest? That’s where a post-hoc test comes into play.

Interpret MANOVA in R With a Post-Hoc Test

The P-Value is practically zero, and the Partial Eta Squared suggests a large effect size – but which group or groups are different from the rest? There’s no way to tell without a post-hoc test. We’ll use Linear Discriminant Analysis (LDA), which finds a linear combination of features that best separates two or more groups.

library(MASS)
iris_lda <- lda(independent_var ~ dependent_vars, CV = F)
iris_lda

## Call:
## lda(independent_var ~ dependent_vars, CV = F)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            dependent_vars1 dependent_vars2 dependent_vars3 dependent_vars4
## setosa               5.006           3.428           1.462           0.246
## versicolor           5.936           2.770           4.260           1.326
## virginica            6.588           2.974           5.552           2.026
## 
## Coefficients of linear discriminants:
##                        LD1         LD2
## dependent_vars1  0.8293776  0.02410215
## dependent_vars2  1.5344731  2.16452123
## dependent_vars3 -2.2012117 -0.93192121
## dependent_vars4 -2.8104603  2.83918785
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9912 0.0088

The snippet below uses the predict() function to get the linear discriminants and combines them with our independent variable:

lda_df <- data.frame(species = iris[, "Species"],
                     lda = predict(iris_lda)$x)
lda_df

##        species    lda.LD1      lda.LD2
## 1       setosa  8.0617998  0.300420621
## 2       setosa  7.1286877 -0.786660426
## 3       setosa  7.4898280 -0.265384488
## 4       setosa  6.8132006 -0.670631068
## 5       setosa  8.1323093  0.514462530
## 6       setosa  7.7019467  1.461720967
## 7       setosa  7.2126176  0.355836209
## 8       setosa  7.6052935 -0.011633838
## 9       setosa  6.5605516 -1.015163624
## 10      setosa  7.3430599 -0.947319209
## 11      setosa  8.3973865  0.647363392
## 12      setosa  7.2192969 -0.109646389
## 13      setosa  7.3267960 -1.072989426
## 14      setosa  7.5724707 -0.805464137
## 15      setosa  9.8498430  1.585936985
## 16      setosa  9.1582389  2.737596471
## 17      setosa  8.5824314  1.834489452
## 18      setosa  7.7807538  0.584339407
## 19      setosa  8.0783588  0.968580703
## 20      setosa  8.0209745  1.140503656
## 21      setosa  7.4968023 -0.188377220
## 22      setosa  7.5864812  1.207970318
## 23      setosa  8.6810429  0.877590154
## 24      setosa  6.2514036  0.439696367
## 25      setosa  6.5589334 -0.389222752
## 26      setosa  6.7713832 -0.970634453
## 27      setosa  6.8230803  0.463011612
## 28      setosa  7.9246164  0.209638715
## 29      setosa  7.9912902  0.086378713
## 30      setosa  6.8294645 -0.544960851
## 31      setosa  6.7589549 -0.759002759
## 32      setosa  7.3749525  0.565844592
## 33      setosa  9.1263463  1.224432671
## 34      setosa  9.4676820  1.825226345
## 35      setosa  7.0620139 -0.663400423
## 36      setosa  7.9587624 -0.164961722
## 37      setosa  8.6136720  0.403253602
## 38      setosa  8.3304176  0.228133530
## 39      setosa  6.9341201 -0.705519379
## 40      setosa  7.6882313 -0.009223623
## 41      setosa  7.9179372  0.675121313
## 42      setosa  5.6618807 -1.934355243
## 43      setosa  7.2410147 -0.272615132
## 44      setosa  6.4144356  1.247301306
## 45      setosa  6.8594438  1.051653957
## 46      setosa  6.7647039 -0.505151855
## 47      setosa  8.0818994  0.763392750
## 48      setosa  7.1867690 -0.360986823
## 49      setosa  8.3144488  0.644953177
## 50      setosa  7.6719674 -0.134893840
## 51  versicolor -1.4592755  0.028543764
## 52  versicolor -1.7977057  0.484385502
## 53  versicolor -2.4169489 -0.092784031
## 54  versicolor -2.2624735 -1.587252508
## 55  versicolor -2.5486784 -0.472204898
## 56  versicolor -2.4299673 -0.966132066
## 57  versicolor -2.4484846  0.795961954
## 58  versicolor -0.2226665 -1.584673183
## 59  versicolor -1.7502012 -0.821180130
## 60  versicolor -1.9584224 -0.351563753
## 61  versicolor -1.1937603 -2.634455704
## 62  versicolor -1.8589257  0.319006544
## 63  versicolor -1.1580939 -2.643409913
## 64  versicolor -2.6660572 -0.642504540
## 65  versicolor -0.3783672  0.086638931
## 66  versicolor -1.2011726  0.084437359
## 67  versicolor -2.7681025  0.032199536
## 68  versicolor -0.7768540 -1.659161847
## 69  versicolor -3.4980543 -1.684956162
## 70  versicolor -1.0904279 -1.626583496
## 71  versicolor -3.7158961  1.044514421
## 72  versicolor -0.9976104 -0.490530602
## 73  versicolor -3.8352593 -1.405958061
## 74  versicolor -2.2574125 -1.426794234
## 75  versicolor -1.2557133 -0.546424197
## 76  versicolor -1.4375576 -0.134424979
## 77  versicolor -2.4590614 -0.935277280
## 78  versicolor -3.5184849  0.160588866
## 79  versicolor -2.5897987 -0.174611728
## 80  versicolor  0.3074879 -1.318871459
## 81  versicolor -1.1066918 -1.752253714
## 82  versicolor -0.6055246 -1.942980378
## 83  versicolor -0.8987038 -0.904940034
## 84  versicolor -4.4984664 -0.882749915
## 85  versicolor -2.9339780  0.027379106
## 86  versicolor -2.1036082  1.191567675
## 87  versicolor -2.1425821  0.088779781
## 88  versicolor -2.4794560 -1.940739273
## 89  versicolor -1.3255257 -0.162869550
## 90  versicolor -1.9555789 -1.154348262
## 91  versicolor -2.4015702 -1.594583407
## 92  versicolor -2.2924888 -0.332860296
## 93  versicolor -1.2722722 -1.214584279
## 94  versicolor -0.2931761 -1.798715092
## 95  versicolor -2.0059888 -0.905418042
## 96  versicolor -1.1816631 -0.537570242
## 97  versicolor -1.6161564 -0.470103580
## 98  versicolor -1.4215888 -0.551244626
## 99  versicolor  0.4759738 -0.799905482
## 100 versicolor -1.5494826 -0.593363582
## 101  virginica -7.8394740  2.139733449
## 102  virginica -5.5074800 -0.035813989
## 103  virginica -6.2920085  0.467175777
## 104  virginica -5.6054563 -0.340738058
## 105  virginica -6.8505600  0.829825394
## 106  virginica -7.4181678 -0.173117995
## 107  virginica -4.6779954 -0.499095015
## 108  virginica -6.3169269 -0.968980756
## 109  virginica -6.3277368 -1.383289934
## 110  virginica -6.8528134  2.717589632
## 111  virginica -4.4407251  1.347236918
## 112  virginica -5.4500957 -0.207736942
## 113  virginica -5.6603371  0.832713617
## 114  virginica -5.9582372 -0.094017545
## 115  virginica -6.7592628  1.600232061
## 116  virginica -5.8070433  2.010198817
## 117  virginica -5.0660123 -0.026273384
## 118  virginica -6.6088188  1.751635872
## 119  virginica -9.1714749 -0.748255067
## 120  virginica -4.7645357 -2.155737197
## 121  virginica -6.2728391  1.649481407
## 122  virginica -5.3607119  0.646120732
## 123  virginica -7.5811998 -0.980722934
## 124  virginica -4.3715028 -0.121297458
## 125  virginica -5.7231753  1.293275530
## 126  virginica -5.2791592 -0.042458238
## 127  virginica -4.0808721  0.185936572
## 128  virginica -4.0770364  0.523238483
## 129  virginica -6.5191040  0.296976389
## 130  virginica -4.5837194 -0.856815813
## 131  virginica -6.2282401 -0.712719638
## 132  virginica -5.2204877  1.468195094
## 133  virginica -6.8001500  0.580895175
## 134  virginica -3.8151597 -0.942985932
## 135  virginica -5.1074897 -2.130589999
## 136  virginica -6.7967163  0.863090395
## 137  virginica -6.5244960  2.445035271
## 138  virginica -4.9955028  0.187768525
## 139  virginica -3.9398530  0.614020389
## 140  virginica -5.2038309  1.144768076
## 141  virginica -6.6530868  1.805319760
## 142  virginica -5.1055595  1.992182010
## 143  virginica -5.5074800 -0.035813989
## 144  virginica -6.7960192  1.460686950
## 145  virginica -6.8473594  2.428950671
## 146  virginica -5.6450035  1.677717335
## 147  virginica -5.1795646 -0.363475041
## 148  virginica -4.9677409  0.821140550
## 149  virginica -5.8861454  2.345090513
## 150  virginica -4.6831543  0.332033811

The final step in this post-hoc test is to visualize the above lda_df as a scatter plot. Ideally, we should see one or multiple groups stand out:

ggplot(lda_df) +
  geom_point(aes(x = lda.LD1, y = lda.LD2, color = species), size = 4) +
  theme_classic()

The setosa species is significantly different when compared to virginica and versicolor. These two are more similar, suggesting that it was the setosa group that had the most impact for us to reject the null hypothesis.

To summarize – the group mean vector of the setosa class is significantly different from the other group means, so it’s safe to assume it was a crucial factor for rejecting the null hypothesis.

MANOVA in R: Implementation

Kristel Jean C. Guzman

2022-04-25