This a tutorial for doing between subject MANOVA
you are welcome to contact me for any advice or recommendation at: vet.m.mohamed@gmail.com
for Complete tutorial and data set see here
first of all lets load libraries which we are going to use in this Analysis and clearing the work space
library(tidyverse)
library(car)
library(multcomp)
library(ez)
library(MOTE)
library(psych)
Then lets import the data set
data<-read.csv("./data sets/9 manova.csv")
head(data)
## CASENO FEM MASC ESTEEM CONTROL ATTROLE SEL2 INTEXT NEUROTIC
## 1 1250 High Low 19 10 33 44 2 18
## 2 2060 High Low 17 6 32 3 5 NA
## 3 2170 High Low 26 6 35 11 3 7
## 4 380 Low Low 27 9 34 13 3 17
## 5 780 Low Low 22 6 34 27 3 8
## 6 3350 High High 12 8 49 62 3 1
lets take the analysis variables out
data<-data[,c("FEM","MASC","ESTEEM","ATTROLE","NEUROTIC")]
adding individual index
data<-data%>%mutate(idx=1:n())
Now lets test the assumptions and the appropriateness of the data to MANOVA
We will start by accuracy and missing data
summary(data)
## FEM MASC ESTEEM ATTROLE NEUROTIC
## High:262 High:125 Min. : 8.00 Min. :18.00 Min. : 0.000
## Low :107 Low :244 1st Qu.:13.00 1st Qu.:30.00 1st Qu.: 5.000
## Median :16.00 Median :35.00 Median : 9.000
## Mean :15.75 Mean :35.01 Mean : 8.752
## 3rd Qu.:18.00 3rd Qu.:40.00 3rd Qu.:12.000
## Max. :29.00 Max. :62.00 Max. :23.000
## NA's :62
## idx
## Min. : 1
## 1st Qu.: 93
## Median :185
## Mean :185
## 3rd Qu.:277
## Max. :369
##
In Neurotic we have a large numbers of Na’s
So we have to see the percentage of the missing data
apply(data,2,function(x){
sum(is.na(x))/length(x)
})
## FEM MASC ESTEEM ATTROLE NEUROTIC idx
## 0.0000000 0.0000000 0.0000000 0.0000000 0.1680217 0.0000000
We have here 17.8% of data in neurotic are missing so we can’t do imputation
Now lets test for outliers using maharanis test
mah<-mahalanobis(x = data[,-c(1,2,6)],
center = colMeans(data[,-c(1,2,6)],na.rm = T),cov = cov(data[,-c(1,2,6)],use = "pairwise.complete.obs"))
determining the cutoff point through chi square distribution
cut<-qchisq(p = .99,df = 2)
summary of the outliers
summary(mah<cut)
## Mode FALSE TRUE NA's
## logical 9 298 62
we have 9 outliers and we have to remove them
data<-data[which(mah<cut),]
Now lets move to the second phase :
testing the normality - linearity - Homogeneity - Homoscdasticity
par(mfrow=c(2,2))
qqnorm(data$ESTEEM%>%jitter(factor = 5),main = "QQplot for self esteem")
qqnorm(data$ATTROLE%>%jitter(factor = 5),main = "QQplot for Attitude to roles")
qqnorm(data$NEUROTIC%>%jitter(factor = 5),main = "QQplot for neurotic")
Other way : making fake data for fake regression to test the assumption in multivariate way
set.seed(345)
rfake<-rchisq(n = nrow(data),7)
the df is arbitrary number
fakelm<-data%>%with(lm(rfake~ESTEEM+ATTROLE+NEUROTIC))
Now lets Extract the residual and fitted values
stdresid<-resid(fakelm)%>%scale()
stdfit<-fitted(fakelm)%>%scale()
start with normality we can check it using qqnorm plot
qqPlot(stdresid)
## [1] 145 270
fair enough !! got good data here
Now its time for homogeneity and homoscdasticity
plot(stdfit,stdresid)
abline(h = 0)
abline(v = 0)
good , the data here is some king of homogeneity
but we still need use test like levene to prove it
data%>%with(leveneTest(ESTEEM~FEM*MASC)) #testing for self Esteem
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.9819 0.03168 *
## 294
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
data%>%with(leveneTest(ATTROLE~FEM*MASC)) #testing for Attitude toword female roles
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.3212 0.8101
## 294
data%>%with(leveneTest(NEUROTIC~FEM*MASC)) # testing for neuroticism
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.8344 0.4759
## 294
the size of the sample Affect greatly on the significance level so we will choose p.value =.001
all Homogeneity tests is p>.001
so we will assume the homogeneity here
now lets take more steps further toward manova analysis
DV = cbind(data$ESTEEM, data$ATTROLE, data$NEUROTIC)#combining the dependent variables together
man<-data%>%with(lm(DV~FEM*MASC,data = data,
contrasts = list(FEM=contr.sum,MASC=contr.sum)))
manout<-Manova(mod = man,"III") #type three SS
summary(manout,multivariate = T)
##
## Type III MANOVA Tests:
##
## Sum of squares and products for error:
## [,1] [,2] [,3]
## [1,] 3123.7151 731.0503 1588.7022
## [2,] 731.0503 11065.9775 392.0191
## [3,] 1588.7022 392.0191 6642.7359
##
## ------------------------------------------
##
## Term: (Intercept)
##
## Sum of squares and products for the hypothesis:
## [,1] [,2] [,3]
## [1,] 48235.43 107255.4 26545.1
## [2,] 107255.41 238491.2 59025.2
## [3,] 26545.10 59025.2 14608.4
##
## Multivariate Tests: (Intercept)
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.97057 3209.901 3 292 < 2.22e-16 ***
## Wilks 1 0.02943 3209.901 3 292 < 2.22e-16 ***
## Hotelling-Lawley 1 32.97844 3209.901 3 292 < 2.22e-16 ***
## Roy 1 32.97844 3209.901 3 292 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: FEM
##
## Sum of squares and products for the hypothesis:
## [,1] [,2] [,3]
## [1,] 21.67707 -115.71907 14.140906
## [2,] -115.71907 617.74514 -75.488648
## [3,] 14.14091 -75.48865 9.224736
##
## Multivariate Tests: FEM
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0643788 6.697377 3 292 0.00021917 ***
## Wilks 1 0.9356212 6.697377 3 292 0.00021917 ***
## Hotelling-Lawley 1 0.0688087 6.697377 3 292 0.00021917 ***
## Roy 1 0.0688087 6.697377 3 292 0.00021917 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: MASC
##
## Sum of squares and products for the hypothesis:
## [,1] [,2] [,3]
## [1,] 866.1000 876.6653 331.3743
## [2,] 876.6653 887.3596 335.4166
## [3,] 331.3743 335.4166 126.7855
##
## Multivariate Tests: MASC
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.2468829 31.9073 3 292 < 2.22e-16 ***
## Wilks 1 0.7531171 31.9073 3 292 < 2.22e-16 ***
## Hotelling-Lawley 1 0.3278147 31.9073 3 292 < 2.22e-16 ***
## Roy 1 0.3278147 31.9073 3 292 < 2.22e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------
##
## Term: FEM:MASC
##
## Sum of squares and products for the hypothesis:
## [,1] [,2] [,3]
## [1,] 65.95962 60.55372 21.156472
## [2,] 60.55372 55.59089 19.422538
## [3,] 21.15647 19.42254 6.785915
##
## Multivariate Tests: FEM:MASC
## Df test stat approx F num Df den Df Pr(>F)
## Pillai 1 0.0237737 2.370323 3 292 0.070704 .
## Wilks 1 0.9762263 2.370323 3 292 0.070704 .
## Hotelling-Lawley 1 0.0243526 2.370323 3 292 0.070704 .
## Roy 1 0.0243526 2.370323 3 292 0.070704 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
here we see the the significant main effect of the IVs
the effect size here = 1- wilk’s lambda
1-0.917827 #effect size of FEM
## [1] 0.082173
1-0.8237555 #effect size of MASC
## [1] 0.1762445
1-0.9933737 #effect size of interaction
## [1] 0.0066263
here we will do post-hoc test –> single anova for each DV
#self esteem
data%>%with(ezANOVA(data = data,dv = ESTEEM,
wid = idx,between = c(FEM,MASC),type = 3))
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 FEM 1 294 2.040217 1.542492e-01 0.00689169
## 3 MASC 1 294 81.516199 2.347421e-17 * 0.21707772
## 4 FEM:MASC 1 294 6.208033 1.326833e-02 * 0.02067910
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 3 294 34.09956 1120.676 2.981913 0.03167783 *
#Attitude
data%>%with(ezANOVA(data = data,dv = ATTROLE,
wid = idx,between = c(FEM,MASC),type = 3))
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 FEM 1 294 16.412203 6.526295e-05 * 0.052872287
## 3 MASC 1 294 23.575298 1.954297e-06 * 0.074235300
## 4 FEM:MASC 1 294 1.476934 2.252298e-01 0.004998475
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 3 294 12.22599 3730.791 0.321151 0.8100803
#Neurotic
data%>%with(ezANOVA(data = na.omit(data),dv = NEUROTIC,
wid = idx,between = c(FEM,MASC),type = 3))
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 FEM 1 294 0.4082764 0.52334311 0.001386770
## 3 MASC 1 294 5.6113824 0.01849047 * 0.018728869
## 4 FEM:MASC 1 294 0.3003369 0.58408712 0.001020512
##
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 3 294 18.60425 2185.047 0.8344062 0.4758602
thankful the IVs is two level only and there is no interaction effect (except in self esteem) so the p-value in anova is enough and Explain only the mean
but we still need to calculate the effect size of each sig. different means using MOTE library or Cohen_d() function in psych package
using Cohen.d()
data[,-c(2,6)]%>%cohen.d(group = "FEM")
## Call: cohen.d(x = ., group = "FEM")
## Cohen d statistic of difference between two means
## lower effect upper
## ESTEEM 0.059 0.31 0.57
## ATTROLE -0.723 -0.47 -0.21
## NEUROTIC -0.129 0.12 0.38
##
## Multivariate (Mahalanobis) distance between groups
## [1] 0.64
## r equivalent of difference between two means
## ESTEEM ATTROLE NEUROTIC
## 0.140 -0.206 0.056
data[,-c(1,6)]%>%cohen.d(group = "MASC")
## Call: cohen.d(x = ., group = "MASC")
## Cohen d statistic of difference between two means
## lower effect upper
## ESTEEM 0.798 1.09 1.37
## ATTROLE 0.316 0.57 0.82
## NEUROTIC 0.052 0.30 0.54
##
## Multivariate (Mahalanobis) distance between groups
## [1] 1.2
## r equivalent of difference between two means
## ESTEEM ATTROLE NEUROTIC
## 0.45 0.26 0.14
using d.ind.t()
data%>%group_by(FEM)%>%summarize(mean=mean(ATTROLE,na.rm = T),
sd=sd(ATTROLE,na.rm = T),num=n())
## # A tibble: 2 x 4
## FEM mean sd num
## <fct> <dbl> <dbl> <int>
## 1 High 36.0 6.45 214
## 2 Low 33.1 6.15 84
d.ind.t(m1 = 36.0,m2 = 32.9,sd1 = 6.80,sd2 = 6.41,n1 = 202,n2 = 90)$d
## [1] 0.4638822
we here see that the effect size using Cohen.d() is so close to d.ind.t() for attribute on each group of FEM but Cohen.d() here is more easier
HOPE TO FIND IT USEFULL
THANKS FOR BEING HERE
REGARDS