MANOVA - Multivariate Analysis of Variance

2024-09-22

Background of MANOVA

Defining MANOVA

What is MANOVA

MANOVA or Multivariate Analysis of Variance is a statistical method used to determine if there are statistically significant differences between groups when considering multiple dependent variables simultaneously.

When is MANOVA used?

After Defining MANOVA, it is important to consider real life applications of this test.
Consider a drug in development. There are three age groups, and for each group, two dependent variables are measured: blood glucose concentration and a serum concentration associated with the drug. A MANOVA test would determine whether there are significant differences in the combination of these two dependent variables across the three age groups

Importance of MANOVA in statistical analysis

MANOVA plays a significant rule in multiple dependent variable analysis. MANOVA can be performed in almost any field, however, MANOVA is best used when dealing with data that follows a normal distribution.
MANOVA provides a way to test significance when considering multiple dependent variables at one time. After performing a MANOVA test, individual ANOVA tests can be done to then test for single dependent variable significance, or a Post Proc Test can be performed as well.

Statistical Background for MANOVA

Required Assumptions for MANOVA

There are five criteria the data set must meet before one can perform a MANOVA

Presence of Multinormality of data sets
Non Colinearality of data sets
Homogenity of Covariant Matricies
Independent Observations
Adequate Sample Size

MANOVA and Hypothesis Testing

The Null Hypothesis

H\(_0\) : There is no significant difference in the combination of dependent variable means across the n groups.

The Alterantive Hypothesis

H\(_1\) : There is no significant difference in the combination of dependent variable means across the n groups.

Process for Performing a MANOVA

Formulate hypothesis for experiment
Ensure data sets meet all five requirements before performing MANOVA
Organize and prepare data
Conduct MANOVA using software
Interpret Results for MANOVA

Wilks’ Lamba Value : Null Hypothesis results
Pillai’s Trace, Hotelling’s Trace, and Roy’s Largest Root : Alternative Hypothesis Results
P Values : Assuming < 0.05 results in statistical significance

For significant results, perform a Post-hoc test

The MANOVA process

For each dependent variable, compute the means with respect to each group. Then, calculate the total means for all of the dependent variables
Compute the between group means and the within group means. With MANOVA, work with matrices for variance and covariance
Calculate the test statistics, there are four main tests used.
Determine the degree of freedom
Convert each statistic to the F - Statistic using F-value appropriate transformations
Assess significance with respect to designated P value

Additional Information for steps

Step 2

The SSB or sum of squares between means shows how much the group means differ from the overall means of all dependent variables
the SSW or sum of squares means within groups shows how much individual data points vary from within the groups.

Step 3

The Wilks’ Lambda value is the ratio of the wwithin group variables to the total variance. A lower value suggests greater differences between groups
Pillar’s Trace is the sum of variance in the dependent variables explained by the independent variable
Hotelling’s Trace compares the between group variables to the within group variables directly
Roy’s Largest Roots focuses on the largest eigenvector of teh between group variance matrix.

Additional Information (Continued)

Step 4

The Between Groups degree of freedom is the number of groups minus one

The Within Groups degree of freedom is total number of observations minus the number of groups.

The SSB and SSW in LaTeX

\[ \mathbf{SSB} = \sum_{k=1}^{g} n_k (\bar{\mathbf{Y}}_k - \bar{\mathbf{Y}})' (\bar{\mathbf{Y}}_k - \bar{\mathbf{Y}}) \]

\[ \mathbf{SWB} = \sum_{k=1}^{g} \sum_{i=1}^{n_k} (\mathbf{Y}_{ki} - \bar{\mathbf{Y}}_k)' (\mathbf{Y}_{ki} - \bar{\mathbf{Y}}_k) \]

Applying MANOVA to a Real Life Example

Defining the Problem

Consider

A person is conducting an experiment on weight loss, testing three methods to reduce weight. They record the weight loss and cholesterol levels for each method. The table on the next slide shows the data collected.

Hypothesis

H\(_0\) : The combination of weight loss and cholesterol level, is not affected by the type of diet

H\(_1\) : The combination of weight loss and cholesterol level, is affected by the type of diet

Table of Data

Weight Loss and Cholesterol Reduction by Method
Method	Weight_Loss	Cholesterol_Reduction
A	4.52	11
A	3.25	7
A	5.03	12
B	6.38	14
B	7.12	19
B	5.75	13
C	3.21	4
C	2.82	6
C	3.29	5

Graphical View of the Data

Runing the MANOVA in R

data <- data.frame(
  diet = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
  weight_loss = c(4.52, 3.25, 5.03, 6.38, 7.12, 5.75, 3.21, 2.82, 3.29),
  cholesterol_reduction = c(11, 7, 12, 14, 19, 13, 4, 6, 5)
)

model <- manova(cbind(weight_loss, cholesterol_reduction) ~ diet, data = data)

summaryData <- summary(model)
print(summaryData)

##           Df Pillai approx F num Df den Df  Pr(>F)  
## diet       2 1.1378   3.9591      4     12 0.02833 *
## Residuals  6                                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After the Test

Once a MANOVA is performed, the resulting values can be used for hypothesis testings

Since the previous P value is less than the 0.05 value used in statistics, the null hypothesis is rejected while the alternative hypothesis is accepted

Thus, the combination of weight loss and exercise level is affected by the type of exercise.

Final Remarks

Recap

To recap…

MANOVA is a powerful statistical tool that tests for statistical significance among multiple dependent variables across two or more groups.
To perform a MANOVA, there are five required assumptions. All dependent data needs to follow a normal distribution, there must not be colinearality between dependent variable data sets, each observation must be independent, there must be an adequate population size and there must be homogenity of covarient matrices.