MANOVA in R – How To Implement and Interpret One-Way MANOVA

MANOVA in R: Implementation

The Iris dataset is well-known among the data science crowd, and it is built into R:

View(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

The Iris dataset in R.

One critical condition must met – the dataset must have more observations (rows) per group in the independent variable than a number of the dependent variables. For example, the Iris dataset has 3 groups and 4 dependent variables. This means we need more than 4 observations for each of the flower species.

Dependent variables

Let’s visualize the boxplot for every dependent variable. There will be 4 plots in total arranged in a 2×2 grid, each having a dedicated boxplot for a specific flower species:

library(ggplot2)
library(gridExtra)

box_sl <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_sw <- ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pl <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pw <- ggplot(iris, aes(x = Species, y = Petal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")

grid.arrange(box_sl, box_sw, box_pl, box_pw, ncol = 2, nrow = 2)

Boxplots for all dependent variables and all factors of the independent variable.

It seems like the setosa species is more separated than the other two, but let’s not jump to conclusions.

One-way MANOVA in R

We can now perform a one-way MANOVA in R.

dependent_vars <- cbind(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
independent_var <- iris$Species

manova_model <- manova(dependent_vars ~ independent_var, data = iris)
summary(manova_model)
##                  Df Pillai approx F num Df den Df    Pr(>F)    
## independent_var   2 1.1919   53.466      8    290 < 2.2e-16 ***
## Residuals       147                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

MANOVA in R test summary.

The P-value is practically zero, which means we can safely reject the null hypothesis in the favor of the alternative one – at least one group mean vector differs from the rest.

We could also measure the effect size.

library(effectsize)
## Warning: package 'effectsize' was built under R version 4.1.3
## Registered S3 method overwritten by 'parameters':
##   method                         from      
##   format.parameters_distribution datawizard
eta_squared(manova_model)
## # Effect Size for ANOVA (Type I)
## 
## Parameter       | Eta2 (partial) |       95% CI
## -----------------------------------------------
## independent_var |           0.60 | [0.54, 1.00]
## 
## - One-sided CIs: upper bound fixed at (1).

Partial Eta Squared value for the MANOVA test.

The value is 0.6, which means the effect size is large.

Interpret MANOVA in R With a Post-Hoc Test

A post-hoc test will come into play to know which group or groups are different from the rest.

Implement Linear Discriminant Analysis in R using the lda() function from the MASS package:

library(MASS)

iris_lda <- lda(independent_var ~ dependent_vars, CV = F)
iris_lda
## Call:
## lda(independent_var ~ dependent_vars, CV = F)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            dependent_vars1 dependent_vars2 dependent_vars3 dependent_vars4
## setosa               5.006           3.428           1.462           0.246
## versicolor           5.936           2.770           4.260           1.326
## virginica            6.588           2.974           5.552           2.026
## 
## Coefficients of linear discriminants:
##                        LD1         LD2
## dependent_vars1  0.8293776  0.02410215
## dependent_vars2  1.5344731  2.16452123
## dependent_vars3 -2.2012117 -0.93192121
## dependent_vars4 -2.8104603  2.83918785
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9912 0.0088

Linear Discriminant Analysis results.

Take a look at the coefficients to see how the dependent variables are used to form the LDA decision rule. LD1 is calculated as

LD1 = 0.83 * iris$Sepal.Length + 1.53 * iris$Sepal.Width - 2.20 * iris$Petal.Length - 2.81 * iris$Petal.Width

LD1
##   [1]   5.946   5.015   5.375   4.699   6.016   5.585   5.097   5.490   4.447
##  [10]   5.229   6.281   5.104   5.213   5.458   7.732   7.039   6.465   5.665
##  [19]   5.962   5.904   5.382   5.470   6.564   4.137   4.444   4.658   4.708
##  [28]   5.809   5.876   4.715   4.645   5.260   7.008   7.349   4.948   5.844
##  [37]   6.498   6.214   4.820   5.573   5.802   3.551   5.126   4.299   4.743
##  [46]   4.651   5.965   5.072   6.198   5.557  -3.568  -3.907  -4.525  -4.369
##  [55]  -4.656  -4.538  -4.558  -2.331  -3.858  -4.067  -3.300  -3.968  -3.264
##  [64]  -4.774  -2.488  -3.310  -4.877  -2.885  -5.603  -3.198  -5.825  -3.106
##  [73]  -5.941  -4.365  -3.364  -3.546  -4.566  -5.626  -4.698  -1.801  -3.214
##  [82]  -2.713  -3.007  -6.605  -5.043  -4.214  -4.251  -4.585  -3.435  -4.063
##  [91]  -4.509  -4.401  -3.380  -2.401  -4.114  -3.291  -3.725  -3.530  -1.633
## [100]  -3.658  -9.947  -7.614  -8.398  -7.712  -8.957  -9.523  -6.785  -8.422
## [109]  -8.432  -8.961  -6.549  -7.556  -7.767  -8.064  -8.866  -7.915  -7.173
## [118]  -8.717 -11.274  -6.869  -8.380  -7.468  -9.685  -6.478  -7.831  -7.386
## [127]  -6.188  -6.185  -8.625  -6.690  -8.333  -7.329  -8.906  -5.922  -7.213
## [136]  -8.902  -8.633  -7.103  -6.048  -7.311  -8.760  -7.213  -7.614  -8.903
## [145]  -8.955  -7.752  -7.285  -7.075  -7.995  -6.791

The snippet below uses the predict() function to get the linear discriminants and combines them with our independent variable:

lda_df <- data.frame(
  species = iris[, "Species"],
  lda = predict(iris_lda)$x
)
lda_df
##        species    lda.LD1      lda.LD2
## 1       setosa  8.0617998  0.300420621
## 2       setosa  7.1286877 -0.786660426
## 3       setosa  7.4898280 -0.265384488
## 4       setosa  6.8132006 -0.670631068
## 5       setosa  8.1323093  0.514462530
## 6       setosa  7.7019467  1.461720967
## 7       setosa  7.2126176  0.355836209
## 8       setosa  7.6052935 -0.011633838
## 9       setosa  6.5605516 -1.015163624
## 10      setosa  7.3430599 -0.947319209
## 11      setosa  8.3973865  0.647363392
## 12      setosa  7.2192969 -0.109646389
## 13      setosa  7.3267960 -1.072989426
## 14      setosa  7.5724707 -0.805464137
## 15      setosa  9.8498430  1.585936985
## 16      setosa  9.1582389  2.737596471
## 17      setosa  8.5824314  1.834489452
## 18      setosa  7.7807538  0.584339407
## 19      setosa  8.0783588  0.968580703
## 20      setosa  8.0209745  1.140503656
## 21      setosa  7.4968023 -0.188377220
## 22      setosa  7.5864812  1.207970318
## 23      setosa  8.6810429  0.877590154
## 24      setosa  6.2514036  0.439696367
## 25      setosa  6.5589334 -0.389222752
## 26      setosa  6.7713832 -0.970634453
## 27      setosa  6.8230803  0.463011612
## 28      setosa  7.9246164  0.209638715
## 29      setosa  7.9912902  0.086378713
## 30      setosa  6.8294645 -0.544960851
## 31      setosa  6.7589549 -0.759002759
## 32      setosa  7.3749525  0.565844592
## 33      setosa  9.1263463  1.224432671
## 34      setosa  9.4676820  1.825226345
## 35      setosa  7.0620139 -0.663400423
## 36      setosa  7.9587624 -0.164961722
## 37      setosa  8.6136720  0.403253602
## 38      setosa  8.3304176  0.228133530
## 39      setosa  6.9341201 -0.705519379
## 40      setosa  7.6882313 -0.009223623
## 41      setosa  7.9179372  0.675121313
## 42      setosa  5.6618807 -1.934355243
## 43      setosa  7.2410147 -0.272615132
## 44      setosa  6.4144356  1.247301306
## 45      setosa  6.8594438  1.051653957
## 46      setosa  6.7647039 -0.505151855
## 47      setosa  8.0818994  0.763392750
## 48      setosa  7.1867690 -0.360986823
## 49      setosa  8.3144488  0.644953177
## 50      setosa  7.6719674 -0.134893840
## 51  versicolor -1.4592755  0.028543764
## 52  versicolor -1.7977057  0.484385502
## 53  versicolor -2.4169489 -0.092784031
## 54  versicolor -2.2624735 -1.587252508
## 55  versicolor -2.5486784 -0.472204898
## 56  versicolor -2.4299673 -0.966132066
## 57  versicolor -2.4484846  0.795961954
## 58  versicolor -0.2226665 -1.584673183
## 59  versicolor -1.7502012 -0.821180130
## 60  versicolor -1.9584224 -0.351563753
## 61  versicolor -1.1937603 -2.634455704
## 62  versicolor -1.8589257  0.319006544
## 63  versicolor -1.1580939 -2.643409913
## 64  versicolor -2.6660572 -0.642504540
## 65  versicolor -0.3783672  0.086638931
## 66  versicolor -1.2011726  0.084437359
## 67  versicolor -2.7681025  0.032199536
## 68  versicolor -0.7768540 -1.659161847
## 69  versicolor -3.4980543 -1.684956162
## 70  versicolor -1.0904279 -1.626583496
## 71  versicolor -3.7158961  1.044514421
## 72  versicolor -0.9976104 -0.490530602
## 73  versicolor -3.8352593 -1.405958061
## 74  versicolor -2.2574125 -1.426794234
## 75  versicolor -1.2557133 -0.546424197
## 76  versicolor -1.4375576 -0.134424979
## 77  versicolor -2.4590614 -0.935277280
## 78  versicolor -3.5184849  0.160588866
## 79  versicolor -2.5897987 -0.174611728
## 80  versicolor  0.3074879 -1.318871459
## 81  versicolor -1.1066918 -1.752253714
## 82  versicolor -0.6055246 -1.942980378
## 83  versicolor -0.8987038 -0.904940034
## 84  versicolor -4.4984664 -0.882749915
## 85  versicolor -2.9339780  0.027379106
## 86  versicolor -2.1036082  1.191567675
## 87  versicolor -2.1425821  0.088779781
## 88  versicolor -2.4794560 -1.940739273
## 89  versicolor -1.3255257 -0.162869550
## 90  versicolor -1.9555789 -1.154348262
## 91  versicolor -2.4015702 -1.594583407
## 92  versicolor -2.2924888 -0.332860296
## 93  versicolor -1.2722722 -1.214584279
## 94  versicolor -0.2931761 -1.798715092
## 95  versicolor -2.0059888 -0.905418042
## 96  versicolor -1.1816631 -0.537570242
## 97  versicolor -1.6161564 -0.470103580
## 98  versicolor -1.4215888 -0.551244626
## 99  versicolor  0.4759738 -0.799905482
## 100 versicolor -1.5494826 -0.593363582
## 101  virginica -7.8394740  2.139733449
## 102  virginica -5.5074800 -0.035813989
## 103  virginica -6.2920085  0.467175777
## 104  virginica -5.6054563 -0.340738058
## 105  virginica -6.8505600  0.829825394
## 106  virginica -7.4181678 -0.173117995
## 107  virginica -4.6779954 -0.499095015
## 108  virginica -6.3169269 -0.968980756
## 109  virginica -6.3277368 -1.383289934
## 110  virginica -6.8528134  2.717589632
## 111  virginica -4.4407251  1.347236918
## 112  virginica -5.4500957 -0.207736942
## 113  virginica -5.6603371  0.832713617
## 114  virginica -5.9582372 -0.094017545
## 115  virginica -6.7592628  1.600232061
## 116  virginica -5.8070433  2.010198817
## 117  virginica -5.0660123 -0.026273384
## 118  virginica -6.6088188  1.751635872
## 119  virginica -9.1714749 -0.748255067
## 120  virginica -4.7645357 -2.155737197
## 121  virginica -6.2728391  1.649481407
## 122  virginica -5.3607119  0.646120732
## 123  virginica -7.5811998 -0.980722934
## 124  virginica -4.3715028 -0.121297458
## 125  virginica -5.7231753  1.293275530
## 126  virginica -5.2791592 -0.042458238
## 127  virginica -4.0808721  0.185936572
## 128  virginica -4.0770364  0.523238483
## 129  virginica -6.5191040  0.296976389
## 130  virginica -4.5837194 -0.856815813
## 131  virginica -6.2282401 -0.712719638
## 132  virginica -5.2204877  1.468195094
## 133  virginica -6.8001500  0.580895175
## 134  virginica -3.8151597 -0.942985932
## 135  virginica -5.1074897 -2.130589999
## 136  virginica -6.7967163  0.863090395
## 137  virginica -6.5244960  2.445035271
## 138  virginica -4.9955028  0.187768525
## 139  virginica -3.9398530  0.614020389
## 140  virginica -5.2038309  1.144768076
## 141  virginica -6.6530868  1.805319760
## 142  virginica -5.1055595  1.992182010
## 143  virginica -5.5074800 -0.035813989
## 144  virginica -6.7960192  1.460686950
## 145  virginica -6.8473594  2.428950671
## 146  virginica -5.6450035  1.677717335
## 147  virginica -5.1795646 -0.363475041
## 148  virginica -4.9677409  0.821140550
## 149  virginica -5.8861454  2.345090513
## 150  virginica -4.6831543  0.332033811

LDA dataset.

The final step in this post-hoc test is to visualize the above lda_df as a scatter plot. Ideally, we should see one or multiple groups stand out:

ggplot(lda_df) +
  geom_point(aes(x = lda.LD1, y = lda.LD2, color = species), size = 4) +
  theme_classic()

LDA dataset as a scatter plot.

The setosa species is significantly different when compared to virginica and versicolor. These two are more similar, suggesting that it was the setosa group that had the most impact for us to reject the null hypothesis.

Conclusion: The group mean vector of the setosa class is significantly different from the other group means, so it’s safe to assume it was a crucial factor for rejecting the null hypothesis.