Multivariate Analysis of Variance (MANOVA)

MANOVA in R: Implementation

As with most of the things in R, performing a MANOVA statistical test boils down to a single function call. But we’ll need a dataset first. The Iris dataset is well-known among the data science crowd, and it is built into R:

It doesn’t matter if you use the same dataset as us, as long as one critical condition is met – the dataset must have more observations (rows) per group in the independent variable than a number of the dependent variables. For example, the Iris dataset has 3 groups and 4 dependent variables. This means we need more than 4 observations for each of the flower species. We have 50 for each, so we’re good to go.

Dependent variables

As MANOVA cares for the difference in means for each factor, let’s visualize the boxplot for every dependent variable. There will be 4 plots in total arranged in a 2×2 grid, each having a dedicated boxplot for a specific flower species:

It seems like the setosa species is more separated than the other two, but let’s not jump to conclusions.

One-way MANOVA in R

We can now perform a one-way MANOVA in R. The best practice is to separate the dependent from the independent variable before calling the manova() function. Once the test is done, you can print its summary:

                 Df Pillai approx F num Df den Df    Pr(>F)    
independent_var   2 1.1919   53.466      8    290 < 2.2e-16 ***
Residuals       147                                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, MANOVA in R uses Pillai’s Trace test statistic. The P-value is practically zero, which means we can safely reject the null hypothesis in the favor of the alternative one – at least one group mean vector differs from the rest.

While we’re here, we could also measure the effect size. One metric often used with MANOVA is Partial Eta Squared. It measures the effect the independent variable has on the dependent variables. If the value is 0.14 or greater, we can say the effect size is large. Here’s how to calculate it in R:

# Effect Size for ANOVA (Type I)

Parameter       | Eta2 (partial) |       95% CI
-----------------------------------------------
independent_var |           0.60 | [0.54, 1.00]

- One-sided CIs: upper bound fixed at [1.00].

The value is 0.6, which means the effect size is large. It’s a great way to double-check the summary results of a MANOVA test, but how can we actually know which group mean vector differs from the rest? That’s where a post-hoc test comes into play.

Interpret MANOVA in R With a Post-Hoc Test

The P-Value is practically zero, and the Partial Eta Squared suggests a large effect size – but which group or groups are different from the rest? There’s no way to tell without a post-hoc test. We’ll use Linear Discriminant Analysis (LDA), which finds a linear combination of features that best separates two or more groups.

By doing so, we’ll be able to visualize a scatter plot showing the two linear discriminants on the X and Y axes, and color code them to match our independent variable – the flower species.

You can implement Linear Discriminant Analysis in R using the lda() function from the MASS package:

Call:
lda(independent_var ~ dependent_vars, CV = F)

Prior probabilities of groups:
    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 

Group means:
           dependent_vars1 dependent_vars2 dependent_vars3 dependent_vars4
setosa               5.006           3.428           1.462           0.246
versicolor           5.936           2.770           4.260           1.326
virginica            6.588           2.974           5.552           2.026

Coefficients of linear discriminants:
                       LD1         LD2
dependent_vars1  0.8293776  0.02410215
dependent_vars2  1.5344731  2.16452123
dependent_vars3 -2.2012117 -0.93192121
dependent_vars4 -2.8104603  2.83918785

Proportion of trace:
   LD1    LD2 
0.9912 0.0088

Take a look at the coefficients to see how the dependent variables are used to form the LDA decision rule. LD1 is calculated as

LD1 = 0.83 * Sepal.Length + 1.53 * Sepal.Width - 2.20 * Petal Length - 2.81 * Petal.Width,

when rounded to two decimal points.

The snippet below uses the predict() function to get the linear discriminants and combines them with our independent variable:

The final step in this post-hoc test is to visualize the above lda_df as a scatter plot. Ideally, we should see one or multiple groups stand out:

The setosa species is significantly different when compared to virginica and versicolor. These two are more similar, suggesting that it was the setosa group that had the most impact for us to reject the null hypothesis.

To summarize – the group mean vector of the setosa class is significantly different from the other group means, so it’s safe to assume it was a crucial factor for rejecting the null hypothesis.

Conclusion

And there you have it – your go-to guide to MANOVA in R. You now know what MANOVA is, when you should use it, and how to implement and interpret it with the R programming language. For additional practice, we recommend you apply the above code to a dataset of your choice. Just make sure to satisfy all MANOVA prerequisites. You could also remove the setosa class from the Iris dataset and repeat the test.