Activity 5

MANOVA in R: Implementation

As with most of the things in R, performing a MANOVA statistical test boils down to a single function call. But we willl need a dataset first. The Iris dataset is well-known among the data science crowd, and it is built into R:

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

It does not t matter if you use the same dataset as us, as long as one critical condition is met and the dataset must have more observations (rows) per group in the independent variable than a number of the dependent variables. For example, the Iris dataset has 3 groups and 4 dependent variables. This means we need more than 4 observations for each of the flower species. We have 50 for each, so we are good to go.

Dependent variables

As MANOVA cares for the difference in means for each factor, let us visualize the boxplot for every dependent variable. There will be 4 plots in total arranged in a 2x2 grid, each having a dedicated boxplot for a specific flower species:

library(ggplot2)
library(gridExtra)

box_sl <- ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_sw <- ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pl <- ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")
box_pw <- ggplot(iris, aes(x = Species, y = Petal.Width, fill = Species)) +
  geom_boxplot() +
  theme(legend.position = "top")

grid.arrange(box_sl, box_sw, box_pl, box_pw, ncol = 2, nrow = 2)

One-way MANOVA in R

We can now perform a one-way MANOVA in R. The best practice is to separate the dependent from the independent variable before calling the manova() function. Once the test is done, you can print its summary:

dependent_vars <- cbind(iris$Sepal.Length, iris$Sepal.Width, iris$Petal.Length, iris$Petal.Width)
independent_var <- iris$Species

manova_model <- manova(dependent_vars ~ independent_var, data = iris)
summary(manova_model)

##                  Df Pillai approx F num Df den Df    Pr(>F)    
## independent_var   2 1.1919   53.466      8    290 < 2.2e-16 ***
## Residuals       147                                            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, MANOVA in R uses Pillai’s Trace test statistic. The P-value is practically zero, which means we can safely reject the null hypothesis in the favor of the alternative one-at least one group mean vector differs from the rest.

While we’re here, we could also measure the effect size. One metric often used with MANOVA is Partial Eta Squared. It measures the effect the independent variable has on the dependent variables. If the value is 0.14 or greater, we can say the effect size is large. Here’s how to calculate it in R:

library(effectsize)

## Registered S3 method overwritten by 'parameters':
##   method                         from      
##   format.parameters_distribution datawizard

eta_squared(manova_model)

## # Effect Size for ANOVA (Type I)
## 
## Parameter       | Eta2 (partial) |       95% CI
## -----------------------------------------------
## independent_var |           0.60 | [0.54, 1.00]
## 
## - One-sided CIs: upper bound fixed at (1).

The value is 0.6, which means the effect size is large. It’s a great way to double-check the summary results of a MANOVA test, but how can we actually know which group mean vector differs from the rest? That’s where a post-hoc test comes into play.

Interpret MANOVA in R With a Post-Hoc Test

The P-Value is practically zero,and the Partial Eta Squared suggests a large effect size but which group or groups are different from the rest? There’s no way to tell without a post-hoc test. We will use Linear Discriminant Analysis (LDA) which finds a linear combination of features that best separates two or more groups.

By doing so, we’ll be able to visualize a scatter plot showing the two linear discriminants on the X and Y axes, and color code them to match our independent variable - the flower species.

You can implement Linear Discriminant Analysis in R using the lda() function from the MASS package:

library(MASS)

iris_lda <- lda(independent_var ~ dependent_vars, CV = F)
iris_lda

## Call:
## lda(independent_var ~ dependent_vars, CV = F)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            dependent_vars1 dependent_vars2 dependent_vars3 dependent_vars4
## setosa               5.006           3.428           1.462           0.246
## versicolor           5.936           2.770           4.260           1.326
## virginica            6.588           2.974           5.552           2.026
## 
## Coefficients of linear discriminants:
##                        LD1         LD2
## dependent_vars1  0.8293776  0.02410215
## dependent_vars2  1.5344731  2.16452123
## dependent_vars3 -2.2012117 -0.93192121
## dependent_vars4 -2.8104603  2.83918785
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9912 0.0088

Take a look at the coefficients to see how the dependent variables are used to form the LDA decision rule. LD1 is calculated as \[LD1=0.83*Sepal.Length+1.53*Sepal.Width-2.20*Petal.Length-2.81*Petal.Width\]

, when rounded to two decimal points.

The snippet below uses the predict() function to get the linear discriminants and combines them with our independent variable:

lda_df <- data.frame(
  species = iris[, "Species"],
  lda = predict(iris_lda)$x
)
lda_df

##        species    lda.LD1      lda.LD2
## 1       setosa  8.0617998  0.300420621
## 2       setosa  7.1286877 -0.786660426
## 3       setosa  7.4898280 -0.265384488
## 4       setosa  6.8132006 -0.670631068
## 5       setosa  8.1323093  0.514462530
## 6       setosa  7.7019467  1.461720967
## 7       setosa  7.2126176  0.355836209
## 8       setosa  7.6052935 -0.011633838
## 9       setosa  6.5605516 -1.015163624
## 10      setosa  7.3430599 -0.947319209
## 11      setosa  8.3973865  0.647363392
## 12      setosa  7.2192969 -0.109646389
## 13      setosa  7.3267960 -1.072989426
## 14      setosa  7.5724707 -0.805464137
## 15      setosa  9.8498430  1.585936985
## 16      setosa  9.1582389  2.737596471
## 17      setosa  8.5824314  1.834489452
## 18      setosa  7.7807538  0.584339407
## 19      setosa  8.0783588  0.968580703
## 20      setosa  8.0209745  1.140503656
## 21      setosa  7.4968023 -0.188377220
## 22      setosa  7.5864812  1.207970318
## 23      setosa  8.6810429  0.877590154
## 24      setosa  6.2514036  0.439696367
## 25      setosa  6.5589334 -0.389222752
## 26      setosa  6.7713832 -0.970634453
## 27      setosa  6.8230803  0.463011612
## 28      setosa  7.9246164  0.209638715
## 29      setosa  7.9912902  0.086378713
## 30      setosa  6.8294645 -0.544960851
## 31      setosa  6.7589549 -0.759002759
## 32      setosa  7.3749525  0.565844592
## 33      setosa  9.1263463  1.224432671
## 34      setosa  9.4676820  1.825226345
## 35      setosa  7.0620139 -0.663400423
## 36      setosa  7.9587624 -0.164961722
## 37      setosa  8.6136720  0.403253602
## 38      setosa  8.3304176  0.228133530
## 39      setosa  6.9341201 -0.705519379
## 40      setosa  7.6882313 -0.009223623
## 41      setosa  7.9179372  0.675121313
## 42      setosa  5.6618807 -1.934355243
## 43      setosa  7.2410147 -0.272615132
## 44      setosa  6.4144356  1.247301306
## 45      setosa  6.8594438  1.051653957
## 46      setosa  6.7647039 -0.505151855
## 47      setosa  8.0818994  0.763392750
## 48      setosa  7.1867690 -0.360986823
## 49      setosa  8.3144488  0.644953177
## 50      setosa  7.6719674 -0.134893840
## 51  versicolor -1.4592755  0.028543764
## 52  versicolor -1.7977057  0.484385502
## 53  versicolor -2.4169489 -0.092784031
## 54  versicolor -2.2624735 -1.587252508
## 55  versicolor -2.5486784 -0.472204898
## 56  versicolor -2.4299673 -0.966132066
## 57  versicolor -2.4484846  0.795961954
## 58  versicolor -0.2226665 -1.584673183
## 59  versicolor -1.7502012 -0.821180130
## 60  versicolor -1.9584224 -0.351563753
## 61  versicolor -1.1937603 -2.634455704
## 62  versicolor -1.8589257  0.319006544
## 63  versicolor -1.1580939 -2.643409913
## 64  versicolor -2.6660572 -0.642504540
## 65  versicolor -0.3783672  0.086638931
## 66  versicolor -1.2011726  0.084437359
## 67  versicolor -2.7681025  0.032199536
## 68  versicolor -0.7768540 -1.659161847
## 69  versicolor -3.4980543 -1.684956162
## 70  versicolor -1.0904279 -1.626583496
## 71  versicolor -3.7158961  1.044514421
## 72  versicolor -0.9976104 -0.490530602
## 73  versicolor -3.8352593 -1.405958061
## 74  versicolor -2.2574125 -1.426794234
## 75  versicolor -1.2557133 -0.546424197
## 76  versicolor -1.4375576 -0.134424979
## 77  versicolor -2.4590614 -0.935277280
## 78  versicolor -3.5184849  0.160588866
## 79  versicolor -2.5897987 -0.174611728
## 80  versicolor  0.3074879 -1.318871459
## 81  versicolor -1.1066918 -1.752253714
## 82  versicolor -0.6055246 -1.942980378
## 83  versicolor -0.8987038 -0.904940034
## 84  versicolor -4.4984664 -0.882749915
## 85  versicolor -2.9339780  0.027379106
## 86  versicolor -2.1036082  1.191567675
## 87  versicolor -2.1425821  0.088779781
## 88  versicolor -2.4794560 -1.940739273
## 89  versicolor -1.3255257 -0.162869550
## 90  versicolor -1.9555789 -1.154348262
## 91  versicolor -2.4015702 -1.594583407
## 92  versicolor -2.2924888 -0.332860296
## 93  versicolor -1.2722722 -1.214584279
## 94  versicolor -0.2931761 -1.798715092
## 95  versicolor -2.0059888 -0.905418042
## 96  versicolor -1.1816631 -0.537570242
## 97  versicolor -1.6161564 -0.470103580
## 98  versicolor -1.4215888 -0.551244626
## 99  versicolor  0.4759738 -0.799905482
## 100 versicolor -1.5494826 -0.593363582
## 101  virginica -7.8394740  2.139733449
## 102  virginica -5.5074800 -0.035813989
## 103  virginica -6.2920085  0.467175777
## 104  virginica -5.6054563 -0.340738058
## 105  virginica -6.8505600  0.829825394
## 106  virginica -7.4181678 -0.173117995
## 107  virginica -4.6779954 -0.499095015
## 108  virginica -6.3169269 -0.968980756
## 109  virginica -6.3277368 -1.383289934
## 110  virginica -6.8528134  2.717589632
## 111  virginica -4.4407251  1.347236918
## 112  virginica -5.4500957 -0.207736942
## 113  virginica -5.6603371  0.832713617
## 114  virginica -5.9582372 -0.094017545
## 115  virginica -6.7592628  1.600232061
## 116  virginica -5.8070433  2.010198817
## 117  virginica -5.0660123 -0.026273384
## 118  virginica -6.6088188  1.751635872
## 119  virginica -9.1714749 -0.748255067
## 120  virginica -4.7645357 -2.155737197
## 121  virginica -6.2728391  1.649481407
## 122  virginica -5.3607119  0.646120732
## 123  virginica -7.5811998 -0.980722934
## 124  virginica -4.3715028 -0.121297458
## 125  virginica -5.7231753  1.293275530
## 126  virginica -5.2791592 -0.042458238
## 127  virginica -4.0808721  0.185936572
## 128  virginica -4.0770364  0.523238483
## 129  virginica -6.5191040  0.296976389
## 130  virginica -4.5837194 -0.856815813
## 131  virginica -6.2282401 -0.712719638
## 132  virginica -5.2204877  1.468195094
## 133  virginica -6.8001500  0.580895175
## 134  virginica -3.8151597 -0.942985932
## 135  virginica -5.1074897 -2.130589999
## 136  virginica -6.7967163  0.863090395
## 137  virginica -6.5244960  2.445035271
## 138  virginica -4.9955028  0.187768525
## 139  virginica -3.9398530  0.614020389
## 140  virginica -5.2038309  1.144768076
## 141  virginica -6.6530868  1.805319760
## 142  virginica -5.1055595  1.992182010
## 143  virginica -5.5074800 -0.035813989
## 144  virginica -6.7960192  1.460686950
## 145  virginica -6.8473594  2.428950671
## 146  virginica -5.6450035  1.677717335
## 147  virginica -5.1795646 -0.363475041
## 148  virginica -4.9677409  0.821140550
## 149  virginica -5.8861454  2.345090513
## 150  virginica -4.6831543  0.332033811

The final step in this post-hoc test is to visualize the above lda_df as a scatter plot. Ideally, we should see one or multiple groups stand out:

ggplot(lda_df) +
  geom_point(aes(x = lda.LD1, y = lda.LD2, color = species), size = 4) +
  theme_classic()

The setosa species is significantly different when compared to virginica and versicolor. These two are more similar, suggesting that it was the setosa group that had the most impact for us to reject the null hypothesis.

To summarize, the group mean vector of the setosa class is significantly different from the other group means, so it’s safe to assume it was a crucial factor for rejecting the null hypothesis.

Activity 5

Develin O. Omayan

4/25/2022