This a tutorial for doing between subject MANOVA

you are welcome to contact me for any advice or recommendation at: vet.m.mohamed@gmail.com

for Complete tutorial and data set see here

first of all lets load libraries which we are going to use in this Analysis and clearing the work space

library(tidyverse)
library(car)
library(multcomp)
library(ez)
library(MOTE)
library(psych)

Then lets import the data set

data<-read.csv("./data sets/9 manova.csv")
head(data)
##   CASENO  FEM MASC ESTEEM CONTROL ATTROLE SEL2 INTEXT NEUROTIC
## 1   1250 High  Low     19      10      33   44      2       18
## 2   2060 High  Low     17       6      32    3      5       NA
## 3   2170 High  Low     26       6      35   11      3        7
## 4    380  Low  Low     27       9      34   13      3       17
## 5    780  Low  Low     22       6      34   27      3        8
## 6   3350 High High     12       8      49   62      3        1

lets take the analysis variables out

data<-data[,c("FEM","MASC","ESTEEM","ATTROLE","NEUROTIC")]

adding individual index

data<-data%>%mutate(idx=1:n())

Now lets test the assumptions and the appropriateness of the data to MANOVA

We will start by accuracy and missing data

summary(data)
##    FEM        MASC         ESTEEM         ATTROLE         NEUROTIC     
##  High:262   High:125   Min.   : 8.00   Min.   :18.00   Min.   : 0.000  
##  Low :107   Low :244   1st Qu.:13.00   1st Qu.:30.00   1st Qu.: 5.000  
##                        Median :16.00   Median :35.00   Median : 9.000  
##                        Mean   :15.75   Mean   :35.01   Mean   : 8.752  
##                        3rd Qu.:18.00   3rd Qu.:40.00   3rd Qu.:12.000  
##                        Max.   :29.00   Max.   :62.00   Max.   :23.000  
##                                                        NA's   :62      
##       idx     
##  Min.   :  1  
##  1st Qu.: 93  
##  Median :185  
##  Mean   :185  
##  3rd Qu.:277  
##  Max.   :369  
## 

In Neurotic we have a large numbers of Na’s

So we have to see the percentage of the missing data

apply(data,2,function(x){
        sum(is.na(x))/length(x)
})
##       FEM      MASC    ESTEEM   ATTROLE  NEUROTIC       idx 
## 0.0000000 0.0000000 0.0000000 0.0000000 0.1680217 0.0000000

We have here 17.8% of data in neurotic are missing so we can’t do imputation

Now lets test for outliers using maharanis test

mah<-mahalanobis(x = data[,-c(1,2,6)],
                 center = colMeans(data[,-c(1,2,6)],na.rm = T),cov = cov(data[,-c(1,2,6)],use = "pairwise.complete.obs"))

determining the cutoff point through chi square distribution

cut<-qchisq(p = .99,df = 2)

summary of the outliers

summary(mah<cut)
##    Mode   FALSE    TRUE    NA's 
## logical       9     298      62

we have 9 outliers and we have to remove them

data<-data[which(mah<cut),]

Now lets move to the second phase :

testing the normality - linearity - Homogeneity - Homoscdasticity

par(mfrow=c(2,2))
qqnorm(data$ESTEEM%>%jitter(factor = 5),main = "QQplot for self esteem")
qqnorm(data$ATTROLE%>%jitter(factor = 5),main = "QQplot for Attitude to roles")
qqnorm(data$NEUROTIC%>%jitter(factor = 5),main = "QQplot for neurotic")

Other way : making fake data for fake regression to test the assumption in multivariate way

set.seed(345)
rfake<-rchisq(n = nrow(data),7)

the df is arbitrary number

fakelm<-data%>%with(lm(rfake~ESTEEM+ATTROLE+NEUROTIC))

Now lets Extract the residual and fitted values

stdresid<-resid(fakelm)%>%scale()
stdfit<-fitted(fakelm)%>%scale()

start with normality we can check it using qqnorm plot

qqPlot(stdresid)

## [1] 145 270

fair enough !! got good data here

Now its time for homogeneity and homoscdasticity

plot(stdfit,stdresid)
abline(h = 0)
abline(v = 0)

good , the data here is some king of homogeneity

but we still need use test like levene to prove it

data%>%with(leveneTest(ESTEEM~FEM*MASC)) #testing for self Esteem
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   3  2.9819 0.03168 *
##       294                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
data%>%with(leveneTest(ATTROLE~FEM*MASC)) #testing for Attitude toword female roles
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.3212 0.8101
##       294
data%>%with(leveneTest(NEUROTIC~FEM*MASC)) # testing for neuroticism
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   3  0.8344 0.4759
##       294

the size of the sample Affect greatly on the significance level so we will choose p.value =.001

all Homogeneity tests is p>.001

so we will assume the homogeneity here

now lets take more steps further toward manova analysis

DV = cbind(data$ESTEEM, data$ATTROLE, data$NEUROTIC)#combining the dependent variables together

man<-data%>%with(lm(DV~FEM*MASC,data = data,
                    contrasts = list(FEM=contr.sum,MASC=contr.sum)))

manout<-Manova(mod = man,"III") #type three SS

summary(manout,multivariate = T)
## 
## Type III MANOVA Tests:
## 
## Sum of squares and products for error:
##           [,1]       [,2]      [,3]
## [1,] 3123.7151   731.0503 1588.7022
## [2,]  731.0503 11065.9775  392.0191
## [3,] 1588.7022   392.0191 6642.7359
## 
## ------------------------------------------
##  
## Term: (Intercept) 
## 
## Sum of squares and products for the hypothesis:
##           [,1]     [,2]    [,3]
## [1,]  48235.43 107255.4 26545.1
## [2,] 107255.41 238491.2 59025.2
## [3,]  26545.10  59025.2 14608.4
## 
## Multivariate Tests: (Intercept)
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1   0.97057 3209.901      3    292 < 2.22e-16 ***
## Wilks             1   0.02943 3209.901      3    292 < 2.22e-16 ***
## Hotelling-Lawley  1  32.97844 3209.901      3    292 < 2.22e-16 ***
## Roy               1  32.97844 3209.901      3    292 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: FEM 
## 
## Sum of squares and products for the hypothesis:
##            [,1]       [,2]       [,3]
## [1,]   21.67707 -115.71907  14.140906
## [2,] -115.71907  617.74514 -75.488648
## [3,]   14.14091  -75.48865   9.224736
## 
## Multivariate Tests: FEM
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.0643788 6.697377      3    292 0.00021917 ***
## Wilks             1 0.9356212 6.697377      3    292 0.00021917 ***
## Hotelling-Lawley  1 0.0688087 6.697377      3    292 0.00021917 ***
## Roy               1 0.0688087 6.697377      3    292 0.00021917 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: MASC 
## 
## Sum of squares and products for the hypothesis:
##          [,1]     [,2]     [,3]
## [1,] 866.1000 876.6653 331.3743
## [2,] 876.6653 887.3596 335.4166
## [3,] 331.3743 335.4166 126.7855
## 
## Multivariate Tests: MASC
##                  Df test stat approx F num Df den Df     Pr(>F)    
## Pillai            1 0.2468829  31.9073      3    292 < 2.22e-16 ***
## Wilks             1 0.7531171  31.9073      3    292 < 2.22e-16 ***
## Hotelling-Lawley  1 0.3278147  31.9073      3    292 < 2.22e-16 ***
## Roy               1 0.3278147  31.9073      3    292 < 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## ------------------------------------------
##  
## Term: FEM:MASC 
## 
## Sum of squares and products for the hypothesis:
##          [,1]     [,2]      [,3]
## [1,] 65.95962 60.55372 21.156472
## [2,] 60.55372 55.59089 19.422538
## [3,] 21.15647 19.42254  6.785915
## 
## Multivariate Tests: FEM:MASC
##                  Df test stat approx F num Df den Df   Pr(>F)  
## Pillai            1 0.0237737 2.370323      3    292 0.070704 .
## Wilks             1 0.9762263 2.370323      3    292 0.070704 .
## Hotelling-Lawley  1 0.0243526 2.370323      3    292 0.070704 .
## Roy               1 0.0243526 2.370323      3    292 0.070704 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

here we see the the significant main effect of the IVs

the effect size here = 1- wilk’s lambda

1-0.917827 #effect size of FEM
## [1] 0.082173
1-0.8237555 #effect size of MASC
## [1] 0.1762445
1-0.9933737 #effect size of interaction
## [1] 0.0066263

here we will do post-hoc test –> single anova for each DV

#self esteem
data%>%with(ezANOVA(data = data,dv = ESTEEM,
                    wid = idx,between = c(FEM,MASC),type = 3))
## $ANOVA
##     Effect DFn DFd         F            p p<.05        ges
## 2      FEM   1 294  2.040217 1.542492e-01       0.00689169
## 3     MASC   1 294 81.516199 2.347421e-17     * 0.21707772
## 4 FEM:MASC   1 294  6.208033 1.326833e-02     * 0.02067910
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd        F          p p<.05
## 1   3 294 34.09956 1120.676 2.981913 0.03167783     *
#Attitude
data%>%with(ezANOVA(data = data,dv = ATTROLE,
                    wid = idx,between = c(FEM,MASC),type = 3))
## $ANOVA
##     Effect DFn DFd         F            p p<.05         ges
## 2      FEM   1 294 16.412203 6.526295e-05     * 0.052872287
## 3     MASC   1 294 23.575298 1.954297e-06     * 0.074235300
## 4 FEM:MASC   1 294  1.476934 2.252298e-01       0.004998475
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd        F         p p<.05
## 1   3 294 12.22599 3730.791 0.321151 0.8100803
#Neurotic

data%>%with(ezANOVA(data = na.omit(data),dv = NEUROTIC,
                    wid = idx,between = c(FEM,MASC),type = 3))
## $ANOVA
##     Effect DFn DFd         F          p p<.05         ges
## 2      FEM   1 294 0.4082764 0.52334311       0.001386770
## 3     MASC   1 294 5.6113824 0.01849047     * 0.018728869
## 4 FEM:MASC   1 294 0.3003369 0.58408712       0.001020512
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd         F         p p<.05
## 1   3 294 18.60425 2185.047 0.8344062 0.4758602

thankful the IVs is two level only and there is no interaction effect (except in self esteem) so the p-value in anova is enough and Explain only the mean

but we still need to calculate the effect size of each sig. different means using MOTE library or Cohen_d() function in psych package

using Cohen.d()

data[,-c(2,6)]%>%cohen.d(group = "FEM")
## Call: cohen.d(x = ., group = "FEM")
## Cohen d statistic of difference between two means
##           lower effect upper
## ESTEEM    0.059   0.31  0.57
## ATTROLE  -0.723  -0.47 -0.21
## NEUROTIC -0.129   0.12  0.38
## 
## Multivariate (Mahalanobis) distance between groups
## [1] 0.64
## r equivalent of difference between two means
##   ESTEEM  ATTROLE NEUROTIC 
##    0.140   -0.206    0.056
data[,-c(1,6)]%>%cohen.d(group = "MASC")
## Call: cohen.d(x = ., group = "MASC")
## Cohen d statistic of difference between two means
##          lower effect upper
## ESTEEM   0.798   1.09  1.37
## ATTROLE  0.316   0.57  0.82
## NEUROTIC 0.052   0.30  0.54
## 
## Multivariate (Mahalanobis) distance between groups
## [1] 1.2
## r equivalent of difference between two means
##   ESTEEM  ATTROLE NEUROTIC 
##     0.45     0.26     0.14

using d.ind.t()

data%>%group_by(FEM)%>%summarize(mean=mean(ATTROLE,na.rm = T),
                                      sd=sd(ATTROLE,na.rm = T),num=n())
## # A tibble: 2 x 4
##   FEM    mean    sd   num
##   <fct> <dbl> <dbl> <int>
## 1 High   36.0  6.45   214
## 2 Low    33.1  6.15    84
d.ind.t(m1 = 36.0,m2 = 32.9,sd1 = 6.80,sd2 = 6.41,n1 = 202,n2 = 90)$d
## [1] 0.4638822

we here see that the effect size using Cohen.d() is so close to d.ind.t() for attribute on each group of FEM but Cohen.d() here is more easier

HOPE TO FIND IT USEFULL

THANKS FOR BEING HERE

REGARDS