Manova in R

MANOVA in R: Implementation

As with most of the things in R, performing a MANOVA statistical test boils down to a single function call. But we’ll need a dataset first. The Iris dataset is well-known among the data science crowd, and it is built into R:

data<-head(iris)
View(data)

Dependent variables

library(ggplot2)
library(gridExtra)

box_sl <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_sw <- ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pl <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pw <- ggplot(iris, aes(x = Species, y = Petal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")

grid.arrange(box_sl, box_sw, box_pl, box_pw, ncol = 2, nrow = 2)

The setosa species appears to be more separated than the other two, but let’s not jump to conclusions.

One-way MANOVA in R We can now perform a one-way MANOVA in R. The best practice is to separate the dependent from the independent variable before calling the manova() function. Once the test is done, you can print its summary:

dependent_vars <- cbind(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
independent_var <- iris$Species

manova_model <- manova(dependent_vars ~ independent_var, data = iris)
summary(manova_model)

##                  Df Pillai approx F num Df den Df    Pr(>F)    
## independent_var   2 1.1919   53.466      8    290 < 2.2e-16 ***
## Residuals       147                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, MANOVA in R uses Pillai’s Trace test statistic. The P-value is practically zero, which means we can safely reject the null hypothesis in the favor of the alternative one – at least one group mean vector differs from the rest.

While we’re here, we could also measure the effect size. One metric often used with MANOVA is Partial Eta Squared. It measures the effect the independent variable has on the dependent variables. If the value is 0.14 or greater, we can say the effect size is large. Here’s how to calculate it in R:

library(effectsize)

## Registered S3 method overwritten by 'parameters':
##   method                         from      
##   format.parameters_distribution datawizard

eta_squared(manova_model)

## # Effect Size for ANOVA (Type I)
## 
## Parameter       | Eta2 (partial) |       95% CI
## -----------------------------------------------
## independent_var |           0.60 | [0.54, 1.00]
## 
## - One-sided CIs: upper bound fixed at (1).

The value is 0.6, which means the effect size is large. It’s a great way to double-check the summary results of a MANOVA test, but how can we actually know which group mean vector differs from the rest? That’s where a post-hoc test comes into play.

Interpret MANOVA in R With a Post-Hoc Test

You can implement Linear Discriminant Analysis in R using the lda() function from the MASS package:

library(MASS)

iris_lda <- lda(independent_var ~ dependent_vars, CV = F)
iris_lda

## Call:
## lda(independent_var ~ dependent_vars, CV = F)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            dependent_vars1 dependent_vars2 dependent_vars3 dependent_vars4
## setosa               5.006           3.428           1.462           0.246
## versicolor           5.936           2.770           4.260           1.326
## virginica            6.588           2.974           5.552           2.026
## 
## Coefficients of linear discriminants:
##                        LD1         LD2
## dependent_vars1  0.8293776  0.02410215
## dependent_vars2  1.5344731  2.16452123
## dependent_vars3 -2.2012117 -0.93192121
## dependent_vars4 -2.8104603  2.83918785
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9912 0.0088

The snippet below uses the predict() function to get the linear discriminants and combines them with our independent variable:

lda_df <- data.frame(
  species = iris[, "Species"],
  lda = predict(iris_lda)$x
)
lda_df

##        species    lda.LD1      lda.LD2
## 1       setosa  8.0617998  0.300420621
## 2       setosa  7.1286877 -0.786660426
## 3       setosa  7.4898280 -0.265384488
## 4       setosa  6.8132006 -0.670631068
## 5       setosa  8.1323093  0.514462530
## 6       setosa  7.7019467  1.461720967
## 7       setosa  7.2126176  0.355836209
## 8       setosa  7.6052935 -0.011633838
## 9       setosa  6.5605516 -1.015163624
## 10      setosa  7.3430599 -0.947319209
## 11      setosa  8.3973865  0.647363392
## 12      setosa  7.2192969 -0.109646389
## 13      setosa  7.3267960 -1.072989426
## 14      setosa  7.5724707 -0.805464137
## 15      setosa  9.8498430  1.585936985
## 16      setosa  9.1582389  2.737596471
## 17      setosa  8.5824314  1.834489452
## 18      setosa  7.7807538  0.584339407
## 19      setosa  8.0783588  0.968580703
## 20      setosa  8.0209745  1.140503656
## 21      setosa  7.4968023 -0.188377220
## 22      setosa  7.5864812  1.207970318
## 23      setosa  8.6810429  0.877590154
## 24      setosa  6.2514036  0.439696367
## 25      setosa  6.5589334 -0.389222752
## 26      setosa  6.7713832 -0.970634453
## 27      setosa  6.8230803  0.463011612
## 28      setosa  7.9246164  0.209638715
## 29      setosa  7.9912902  0.086378713
## 30      setosa  6.8294645 -0.544960851
## 31      setosa  6.7589549 -0.759002759
## 32      setosa  7.3749525  0.565844592
## 33      setosa  9.1263463  1.224432671
## 34      setosa  9.4676820  1.825226345
## 35      setosa  7.0620139 -0.663400423
## 36      setosa  7.9587624 -0.164961722
## 37      setosa  8.6136720  0.403253602
## 38      setosa  8.3304176  0.228133530
## 39      setosa  6.9341201 -0.705519379
## 40      setosa  7.6882313 -0.009223623
## 41      setosa  7.9179372  0.675121313
## 42      setosa  5.6618807 -1.934355243
## 43      setosa  7.2410147 -0.272615132
## 44      setosa  6.4144356  1.247301306
## 45      setosa  6.8594438  1.051653957
## 46      setosa  6.7647039 -0.505151855
## 47      setosa  8.0818994  0.763392750
## 48      setosa  7.1867690 -0.360986823
## 49      setosa  8.3144488  0.644953177
## 50      setosa  7.6719674 -0.134893840
## 51  versicolor -1.4592755  0.028543764
## 52  versicolor -1.7977057  0.484385502
## 53  versicolor -2.4169489 -0.092784031
## 54  versicolor -2.2624735 -1.587252508
## 55  versicolor -2.5486784 -0.472204898
## 56  versicolor -2.4299673 -0.966132066
## 57  versicolor -2.4484846  0.795961954
## 58  versicolor -0.2226665 -1.584673183
## 59  versicolor -1.7502012 -0.821180130
## 60  versicolor -1.9584224 -0.351563753
## 61  versicolor -1.1937603 -2.634455704
## 62  versicolor -1.8589257  0.319006544
## 63  versicolor -1.1580939 -2.643409913
## 64  versicolor -2.6660572 -0.642504540
## 65  versicolor -0.3783672  0.086638931
## 66  versicolor -1.2011726  0.084437359
## 67  versicolor -2.7681025  0.032199536
## 68  versicolor -0.7768540 -1.659161847
## 69  versicolor -3.4980543 -1.684956162
## 70  versicolor -1.0904279 -1.626583496
## 71  versicolor -3.7158961  1.044514421
## 72  versicolor -0.9976104 -0.490530602
## 73  versicolor -3.8352593 -1.405958061
## 74  versicolor -2.2574125 -1.426794234
## 75  versicolor -1.2557133 -0.546424197
## 76  versicolor -1.4375576 -0.134424979
## 77  versicolor -2.4590614 -0.935277280
## 78  versicolor -3.5184849  0.160588866
## 79  versicolor -2.5897987 -0.174611728
## 80  versicolor  0.3074879 -1.318871459
## 81  versicolor -1.1066918 -1.752253714
## 82  versicolor -0.6055246 -1.942980378
## 83  versicolor -0.8987038 -0.904940034
## 84  versicolor -4.4984664 -0.882749915
## 85  versicolor -2.9339780  0.027379106
## 86  versicolor -2.1036082  1.191567675
## 87  versicolor -2.1425821  0.088779781
## 88  versicolor -2.4794560 -1.940739273
## 89  versicolor -1.3255257 -0.162869550
## 90  versicolor -1.9555789 -1.154348262
## 91  versicolor -2.4015702 -1.594583407
## 92  versicolor -2.2924888 -0.332860296
## 93  versicolor -1.2722722 -1.214584279
## 94  versicolor -0.2931761 -1.798715092
## 95  versicolor -2.0059888 -0.905418042
## 96  versicolor -1.1816631 -0.537570242
## 97  versicolor -1.6161564 -0.470103580
## 98  versicolor -1.4215888 -0.551244626
## 99  versicolor  0.4759738 -0.799905482
## 100 versicolor -1.5494826 -0.593363582
## 101  virginica -7.8394740  2.139733449
## 102  virginica -5.5074800 -0.035813989
## 103  virginica -6.2920085  0.467175777
## 104  virginica -5.6054563 -0.340738058
## 105  virginica -6.8505600  0.829825394
## 106  virginica -7.4181678 -0.173117995
## 107  virginica -4.6779954 -0.499095015
## 108  virginica -6.3169269 -0.968980756
## 109  virginica -6.3277368 -1.383289934
## 110  virginica -6.8528134  2.717589632
## 111  virginica -4.4407251  1.347236918
## 112  virginica -5.4500957 -0.207736942
## 113  virginica -5.6603371  0.832713617
## 114  virginica -5.9582372 -0.094017545
## 115  virginica -6.7592628  1.600232061
## 116  virginica -5.8070433  2.010198817
## 117  virginica -5.0660123 -0.026273384
## 118  virginica -6.6088188  1.751635872
## 119  virginica -9.1714749 -0.748255067
## 120  virginica -4.7645357 -2.155737197
## 121  virginica -6.2728391  1.649481407
## 122  virginica -5.3607119  0.646120732
## 123  virginica -7.5811998 -0.980722934
## 124  virginica -4.3715028 -0.121297458
## 125  virginica -5.7231753  1.293275530
## 126  virginica -5.2791592 -0.042458238
## 127  virginica -4.0808721  0.185936572
## 128  virginica -4.0770364  0.523238483
## 129  virginica -6.5191040  0.296976389
## 130  virginica -4.5837194 -0.856815813
## 131  virginica -6.2282401 -0.712719638
## 132  virginica -5.2204877  1.468195094
## 133  virginica -6.8001500  0.580895175
## 134  virginica -3.8151597 -0.942985932
## 135  virginica -5.1074897 -2.130589999
## 136  virginica -6.7967163  0.863090395
## 137  virginica -6.5244960  2.445035271
## 138  virginica -4.9955028  0.187768525
## 139  virginica -3.9398530  0.614020389
## 140  virginica -5.2038309  1.144768076
## 141  virginica -6.6530868  1.805319760
## 142  virginica -5.1055595  1.992182010
## 143  virginica -5.5074800 -0.035813989
## 144  virginica -6.7960192  1.460686950
## 145  virginica -6.8473594  2.428950671
## 146  virginica -5.6450035  1.677717335
## 147  virginica -5.1795646 -0.363475041
## 148  virginica -4.9677409  0.821140550
## 149  virginica -5.8861454  2.345090513
## 150  virginica -4.6831543  0.332033811

The final step in this post-hoc test is to visualize the above lda_df as a scatter plot. Ideally, we should see one or multiple groups stand out:

ggplot(lda_df) +
  geom_point(aes(x = lda.LD1, y = lda.LD2, color = species), size = 4) +
  theme_classic()

The setosa species is significantly different when compared to virginica and versicolor. These two are more similar, suggesting that it was the setosa group that had the most impact for us to reject the null hypothesis.

To summarize – the group mean vector of the setosa class is significantly different from the other group means, so it’s safe to assume it was a crucial factor for rejecting the null hypothesis.

Manova in R

Klarice Shaira Tan

4/25/2022