Demo_MANOVA

This is the “iris” dataset in R.
I will use this dataset to demonstrate the step of MANOVA.
Let’s look at the data.

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

dim(iris)

## [1] 150   5

str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

There are 3 kinds of Species, four variables.
x=group(Species) y=four variables.

You might need some plots to understand the data.
Because I am really lazy, so I just demo a easy plot.
You should draw more plots to help you explore the data.

library(ggplot2)
library(reshape)
irisshow <- melt(iris,id=c("Species"))
ggplot(irisshow,aes(x=variable,y=value,fill=Species))+
        geom_boxplot()

Now we could do MANOVA

m <- manova(cbind(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width)~Species,
              data=iris)
summary(m,test="Wilks")

##            Df    Wilks approx F num Df den Df    Pr(>F)    
## Species     2 0.023439   199.15      8    288 < 2.2e-16 ***
## Residuals 147                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now we know that there are significant differences between these three kinds of Species.
However, we don’t know what make them different.

We have three methods
First but worst one, do several ANOVA

summary.aov(m)

##  Response Sepal.Length :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## Species       2 63.212  31.606  119.26 < 2.2e-16 ***
## Residuals   147 38.956   0.265                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Sepal.Width :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## Species       2 11.345  5.6725   49.16 < 2.2e-16 ***
## Residuals   147 16.962  0.1154                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Petal.Length :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## Species       2 437.10 218.551  1180.2 < 2.2e-16 ***
## Residuals   147  27.22   0.185                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Petal.Width :
##              Df Sum Sq Mean Sq F value    Pr(>F)    
## Species       2 80.413  40.207  960.01 < 2.2e-16 ***
## Residuals   147  6.157   0.042                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Second, Step down analysis

s1 <- lm(Sepal.Length~Species,data=iris)
s2 <- lm(Sepal.Width~Species+Sepal.Length,data=iris)
s3 <- lm(Petal.Length~Species+Sepal.Length+Sepal.Width,data=iris)
s4 <- lm(Petal.Width~Species+Sepal.Length+Sepal.Width+Petal.Length,data=iris)
summary(s1)

## 
## Call:
## lm(formula = Sepal.Length ~ Species, data = iris)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6880 -0.3285 -0.0060  0.3120  1.3120 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.0060     0.0728  68.762  < 2e-16 ***
## Speciesversicolor   0.9300     0.1030   9.033 8.77e-16 ***
## Speciesvirginica    1.5820     0.1030  15.366  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5148 on 147 degrees of freedom
## Multiple R-squared:  0.6187, Adjusted R-squared:  0.6135 
## F-statistic: 119.3 on 2 and 147 DF,  p-value: < 2.2e-16

summary(s2)

## 
## Call:
## lm(formula = Sepal.Width ~ Species + Sepal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.95096 -0.16522  0.00171  0.18416  0.72918 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.67650    0.23536   7.123 4.46e-11 ***
## Speciesversicolor -0.98339    0.07207 -13.644  < 2e-16 ***
## Speciesvirginica  -1.00751    0.09331 -10.798  < 2e-16 ***
## Sepal.Length       0.34988    0.04630   7.557 4.19e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.289 on 146 degrees of freedom
## Multiple R-squared:  0.5693, Adjusted R-squared:  0.5604 
## F-statistic: 64.32 on 3 and 146 DF,  p-value: < 2.2e-16

summary(s3)

## 
## Call:
## lm(formula = Petal.Length ~ Species + Sepal.Length + Sepal.Width, 
##     data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.75196 -0.18755  0.00432  0.16965  0.79580 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -1.63430    0.26783  -6.102 9.08e-09 ***
## Speciesversicolor  2.17023    0.10657  20.364  < 2e-16 ***
## Speciesvirginica   3.04911    0.12267  24.857  < 2e-16 ***
## Sepal.Length       0.64631    0.05353  12.073  < 2e-16 ***
## Sepal.Width       -0.04058    0.08113  -0.500    0.618    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2833 on 145 degrees of freedom
## Multiple R-squared:  0.9749, Adjusted R-squared:  0.9742 
## F-statistic:  1410 on 4 and 145 DF,  p-value: < 2.2e-16

summary(s4)

## 
## Call:
## lm(formula = Petal.Width ~ Species + Sepal.Length + Sepal.Width + 
##     Petal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59239 -0.08288 -0.01349  0.08773  0.45239 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.47314    0.17659  -2.679  0.00824 ** 
## Speciesversicolor  0.64811    0.12314   5.263 5.04e-07 ***
## Speciesvirginica   1.04637    0.16548   6.323 3.03e-09 ***
## Sepal.Length      -0.09293    0.04458  -2.084  0.03889 *  
## Sepal.Width        0.24220    0.04776   5.072 1.20e-06 ***
## Petal.Length       0.24220    0.04884   4.959 1.97e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1666 on 144 degrees of freedom
## Multiple R-squared:  0.9538, Adjusted R-squared:  0.9522 
## F-statistic: 594.9 on 5 and 144 DF,  p-value: < 2.2e-16

How to determine the sequence of variables is based on your prior knowledge (or effect size).
I am not good enough to explain details in words.
If you still have any questions after reading the file, you could further ask me, and I will try my best to orally explain for you.

Third,discriminant analysis
This method is good for predicting the outcome of the group(In this data,Speices)
x=group(Species) Y= four variables

library(MASS)
DA1 <- lda(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=iris)
DA1$means

##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa            5.006       3.428        1.462       0.246
## versicolor        5.936       2.770        4.260       1.326
## virginica         6.588       2.974        5.552       2.026

DA1$scaling

##                     LD1         LD2
## Sepal.Length  0.8293776  0.02410215
## Sepal.Width   1.5344731  2.16452123
## Petal.Length -2.2012117 -0.93192121
## Petal.Width  -2.8104603  2.83918785

I found that it is really hard to explain this method in words, sorry about that.
As metioned above, you could search on the NET for explanation or ask me orally.

Thanks for reading the 虎頭蛇尾 demo.