This is the “iris” dataset in R.
I will use this dataset to demonstrate the step of MANOVA.
Let’s look at the data.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
dim(iris)
## [1] 150 5
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
There are 3 kinds of Species, four variables.
x=group(Species) y=four variables.
You might need some plots to understand the data.
Because I am really lazy, so I just demo a easy plot.
You should draw more plots to help you explore the data.
library(ggplot2)
library(reshape)
irisshow <- melt(iris,id=c("Species"))
ggplot(irisshow,aes(x=variable,y=value,fill=Species))+
geom_boxplot()
Now we could do MANOVA
m <- manova(cbind(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width)~Species,
data=iris)
summary(m,test="Wilks")
## Df Wilks approx F num Df den Df Pr(>F)
## Species 2 0.023439 199.15 8 288 < 2.2e-16 ***
## Residuals 147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now we know that there are significant differences between these three kinds of Species.
However, we don’t know what make them different.
We have three methods
First but worst one, do several ANOVA
summary.aov(m)
## Response Sepal.Length :
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 63.212 31.606 119.26 < 2.2e-16 ***
## Residuals 147 38.956 0.265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response Sepal.Width :
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 11.345 5.6725 49.16 < 2.2e-16 ***
## Residuals 147 16.962 0.1154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response Petal.Length :
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 437.10 218.551 1180.2 < 2.2e-16 ***
## Residuals 147 27.22 0.185
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Response Petal.Width :
## Df Sum Sq Mean Sq F value Pr(>F)
## Species 2 80.413 40.207 960.01 < 2.2e-16 ***
## Residuals 147 6.157 0.042
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Second, Step down analysis
s1 <- lm(Sepal.Length~Species,data=iris)
s2 <- lm(Sepal.Width~Species+Sepal.Length,data=iris)
s3 <- lm(Petal.Length~Species+Sepal.Length+Sepal.Width,data=iris)
s4 <- lm(Petal.Width~Species+Sepal.Length+Sepal.Width+Petal.Length,data=iris)
summary(s1)
##
## Call:
## lm(formula = Sepal.Length ~ Species, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6880 -0.3285 -0.0060 0.3120 1.3120
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0060 0.0728 68.762 < 2e-16 ***
## Speciesversicolor 0.9300 0.1030 9.033 8.77e-16 ***
## Speciesvirginica 1.5820 0.1030 15.366 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5148 on 147 degrees of freedom
## Multiple R-squared: 0.6187, Adjusted R-squared: 0.6135
## F-statistic: 119.3 on 2 and 147 DF, p-value: < 2.2e-16
summary(s2)
##
## Call:
## lm(formula = Sepal.Width ~ Species + Sepal.Length, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.95096 -0.16522 0.00171 0.18416 0.72918
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.67650 0.23536 7.123 4.46e-11 ***
## Speciesversicolor -0.98339 0.07207 -13.644 < 2e-16 ***
## Speciesvirginica -1.00751 0.09331 -10.798 < 2e-16 ***
## Sepal.Length 0.34988 0.04630 7.557 4.19e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.289 on 146 degrees of freedom
## Multiple R-squared: 0.5693, Adjusted R-squared: 0.5604
## F-statistic: 64.32 on 3 and 146 DF, p-value: < 2.2e-16
summary(s3)
##
## Call:
## lm(formula = Petal.Length ~ Species + Sepal.Length + Sepal.Width,
## data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75196 -0.18755 0.00432 0.16965 0.79580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.63430 0.26783 -6.102 9.08e-09 ***
## Speciesversicolor 2.17023 0.10657 20.364 < 2e-16 ***
## Speciesvirginica 3.04911 0.12267 24.857 < 2e-16 ***
## Sepal.Length 0.64631 0.05353 12.073 < 2e-16 ***
## Sepal.Width -0.04058 0.08113 -0.500 0.618
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2833 on 145 degrees of freedom
## Multiple R-squared: 0.9749, Adjusted R-squared: 0.9742
## F-statistic: 1410 on 4 and 145 DF, p-value: < 2.2e-16
summary(s4)
##
## Call:
## lm(formula = Petal.Width ~ Species + Sepal.Length + Sepal.Width +
## Petal.Length, data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59239 -0.08288 -0.01349 0.08773 0.45239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.47314 0.17659 -2.679 0.00824 **
## Speciesversicolor 0.64811 0.12314 5.263 5.04e-07 ***
## Speciesvirginica 1.04637 0.16548 6.323 3.03e-09 ***
## Sepal.Length -0.09293 0.04458 -2.084 0.03889 *
## Sepal.Width 0.24220 0.04776 5.072 1.20e-06 ***
## Petal.Length 0.24220 0.04884 4.959 1.97e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1666 on 144 degrees of freedom
## Multiple R-squared: 0.9538, Adjusted R-squared: 0.9522
## F-statistic: 594.9 on 5 and 144 DF, p-value: < 2.2e-16
How to determine the sequence of variables is based on your prior knowledge (or effect size).
I am not good enough to explain details in words.
If you still have any questions after reading the file, you could further ask me, and I will try my best to orally explain for you.
Third,discriminant analysis
This method is good for predicting the outcome of the group(In this data,Speices)
x=group(Species) Y= four variables
library(MASS)
DA1 <- lda(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=iris)
DA1$means
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa 5.006 3.428 1.462 0.246
## versicolor 5.936 2.770 4.260 1.326
## virginica 6.588 2.974 5.552 2.026
DA1$scaling
## LD1 LD2
## Sepal.Length 0.8293776 0.02410215
## Sepal.Width 1.5344731 2.16452123
## Petal.Length -2.2012117 -0.93192121
## Petal.Width -2.8104603 2.83918785
I found that it is really hard to explain this method in words, sorry about that.
As metioned above, you could search on the NET for explanation or ask me orally.
Thanks for reading the 虎頭蛇尾 demo.