Anna Shirokanova
3/11/2019
Facts:
applies the principle of PCA to categorical variables
all variables should be of the same measurement type (nominal)
it is a descriptive technique (no inference)
technique to graphically display two-way or multiway contingency tables (cross-tabs)
uses matrices to calculate distances between rows and columns
requires tabulated data as input, non-negative data with no zero row/column percentage sums, raw scales can be any
can involve ‘supplementary variables’ that are not part of calculations of axes but are plotted in the same graph
library(FactoMineR)
library(factoextra)
library(ade4)
data(housetasks)
library(ca)
library(ggplot2)
library(ggrepel)
library(gplots)
library(graphics)
library(gridExtra)chisqresults <- chisq.test(housetasks)
chisqresults##
## Pearson's Chi-squared test
##
## data: housetasks
## X-squared = 1944.5, df = 36, p-value < 2.2e-16
If p<.05, it is a statistically significant deviation from the independence
dt <- as.table(as.matrix(housetasks))
balloonplot(t(dt), main = "Housetasks", xlab = "", ylab = "",
label = F, show.margins = F)balloonplot(t(dt), main = "Housetasks", xlab = "", ylab = "",
label = F)mosaicplot(dt, shade = TRUE, las=2,
main = "housetasks") Residuals are the difference between observed and expected.
Laundry, Main_meal, Dinner and breakfeast (blue color) are mainly done by the wife in our example.
ca <- ca(housetasks)
names(ca)## [1] "sv" "nd" "rownames" "rowmass" "rowdist"
## [6] "rowinertia" "rowcoord" "rowsup" "colnames" "colmass"
## [11] "coldist" "colinertia" "colcoord" "colsup" "N"
## [16] "call"
summary(ca)##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.542889 48.7 48.7 ************
## 2 0.445003 39.9 88.6 **********
## 3 0.127048 11.4 100.0 ***
## -------- -----
## Total: 1.114940 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Lndr | 101 925 120 | -992 740 183 | -495 185 56 |
## 2 | Mn_m | 88 974 81 | -876 742 124 | -490 232 47 |
## 3 | Dnnr | 62 930 34 | -693 777 55 | -308 154 13 |
## 4 | Brkf | 80 905 37 | -509 505 38 | -453 400 37 |
## 5 | Tdyn | 70 975 22 | -394 440 20 | 434 535 30 |
## 6 | Dshs | 65 764 18 | -189 118 4 | 442 646 28 |
## 7 | Shpp | 69 811 13 | -118 64 2 | 403 748 25 |
## 8 | Offc | 55 119 48 | 227 53 5 | -254 66 8 |
## 9 | Drvn | 80 767 91 | 742 432 81 | -653 335 76 |
## 10 | Fnnc | 65 997 27 | 271 161 9 | 618 837 56 |
## 11 | Insr | 80 885 52 | 647 576 61 | 474 309 40 |
## 12 | Rprs | 95 933 281 | 1529 707 407 | -864 226 159 |
## 13 | Hldy | 92 992 176 | 252 30 11 | 1435 962 425 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Wife | 344 954 270 | -838 802 445 | -365 152 103 |
## 2 | Altr | 146 110 106 | -62 5 1 | -292 105 28 |
## 3 | Hsbn | 218 980 342 | 1161 772 542 | -602 208 178 |
## 4 | Jntl | 292 998 282 | 149 21 12 | 1027 977 691 |
ca2 <- CA(housetasks)summary.CA(ca2)##
## Call:
## CA(X = housetasks)
##
## The chi square of independence between the two variables is equal to 1944.456 (p-value = 0 ).
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3
## Variance 0.543 0.445 0.127
## % of var. 48.692 39.913 11.395
## Cumulative % of var. 48.692 88.605 100.000
##
## Rows (the 10 first)
## Iner*1000 Dim.1 ctr cos2 Dim.2 ctr
## Laundry | 134.160 | -0.992 18.287 0.740 | 0.495 5.564
## Main_meal | 90.692 | -0.876 12.389 0.742 | 0.490 4.736
## Dinner | 38.246 | -0.693 5.471 0.777 | 0.308 1.321
## Breakfeast | 41.124 | -0.509 3.825 0.505 | 0.453 3.699
## Tidying | 24.667 | -0.394 1.998 0.440 | -0.434 2.966
## Dishes | 19.587 | -0.189 0.426 0.118 | -0.442 2.844
## Shopping | 14.970 | -0.118 0.176 0.064 | -0.403 2.515
## Official | 53.300 | 0.227 0.521 0.053 | 0.254 0.796
## Driving | 101.509 | 0.742 8.078 0.432 | 0.653 7.647
## Finances | 29.564 | 0.271 0.875 0.161 | -0.618 5.559
## cos2 Dim.3 ctr cos2
## Laundry 0.185 | -0.317 7.968 0.075 |
## Main_meal 0.232 | -0.164 1.859 0.026 |
## Dinner 0.154 | -0.207 2.097 0.070 |
## Breakfeast 0.400 | 0.220 3.069 0.095 |
## Tidying 0.535 | -0.094 0.489 0.025 |
## Dishes 0.646 | 0.267 3.634 0.236 |
## Shopping 0.748 | 0.203 2.223 0.189 |
## Official 0.066 | 0.923 36.940 0.881 |
## Driving 0.335 | 0.544 18.596 0.233 |
## Finances 0.837 | 0.035 0.062 0.003 |
##
## Columns
## Iner*1000 Dim.1 ctr cos2 Dim.2 ctr
## Wife | 301.019 | -0.838 44.462 0.802 | 0.365 10.312
## Alternating | 117.824 | -0.062 0.104 0.005 | 0.292 2.783
## Husband | 381.373 | 1.161 54.234 0.772 | 0.602 17.787
## Jointly | 314.725 | 0.149 1.200 0.021 | -1.027 69.118
## cos2 Dim.3 ctr cos2
## Wife 0.152 | -0.200 10.822 0.046 |
## Alternating 0.105 | 0.849 82.549 0.890 |
## Husband 0.208 | -0.189 6.133 0.020 |
## Jointly 0.977 | -0.046 0.495 0.002 |
ca3 <- CA(housetasks, ncp = 3, graph = T)Two main functions used for CoA are ca and CA.
They extract 3 dimensions as there are 4 columns (and 4-1 < 13-1).
Notes
Results for row variables: The principal coordinates for the first 3 dimensions (k = 1, k = 2 and k = 3).
mass: the mass (or total frequency) of each point.
qlt is the total quality of representation of points by the 3 included dimensions.
In our example, it is the sum of the squared correlations over the three included dimensions.
inr: the inertia of the point (in per mills of the total inertia). The higher, the more influential this row.
cor or cos2 (Squared correlations) and ctr (contributions) of the points. Note that cor and ctr are expressed in per mills.
standard coordinates of rows(for distances in common space): ca$rowcoord
standard coordinates of columns: ca$colcoord
fviz_screeplot(ca2)#scree plotfviz_screeplot(ca, geom = "line")eigenvalues <- factoextra::get_eigenvalue(ca2)
head(round(eigenvalues, 2))## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.54 48.69 48.69
## Dim.2 0.45 39.91 88.60
## Dim.3 0.13 11.40 100.00
Eigenvalues correspond to the amount of information retained by each axis. If the data were independent, the expected value of the eigenvalue for each axis would be 1/(nrow(housetasks)-1) = 1/12 = 8.33% in terms of rows. Likewise, the average axis should account for 1/(ncol(housetasks)-1) = 1/3 = 33.33% in terms of the 4 columns.
Dimensions 1 and 2 explain approximately 48.7% and 39.9% of the total inertia, respectively. Inertia is ‘deviation from independence’ (remember the chi-square?) The larger inertia, the higher the role of this dimension.
This corresponds to a cumulative total of 88.6% of total inertia retained by the 2 dimensions.
The average axis should account for 1/(ncol(housetasks)-1) = 1/3 = 33.33% in terms of the 4 columns.
fviz_screeplot(ca) +
geom_hline(yintercept = 33.33, linetype = 2, color = "red")fviz_screeplot(ca2, geom = "line") +
geom_hline(yintercept = 33.33, linetype = 2, color = "blue")summary(ca)$rows## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 Lndr 101 925 120 -992 740 183 -495 185 56
## 2 Mn_m 88 974 81 -876 742 124 -490 232 47
## 3 Dnnr 62 930 34 -693 777 55 -308 154 13
## 4 Brkf 80 905 37 -509 505 38 -453 400 37
## 5 Tdyn 70 975 22 -394 440 20 434 535 30
## 6 Dshs 65 764 18 -189 118 4 442 646 28
## 7 Shpp 69 811 13 -118 64 2 403 748 25
## 8 Offc 55 119 48 227 53 5 -254 66 8
## 9 Drvn 80 767 91 742 432 81 -653 335 76
## 10 Fnnc 65 997 27 271 161 9 618 837 56
## 11 Insr 80 885 52 647 576 61 474 309 40
## 12 Rprs 95 933 281 1529 707 407 -864 226 159
## 13 Hldy 92 992 176 252 30 11 1435 962 425
eig <- as.data.frame(get_eigenvalue(ca2))
trace <- sum(eig$eigenvalue)
cor.coef <- sqrt(trace)
cor.coef## [1] 1.055907
As a rule of thumb, 0.2 is the threshold above which the correlation can be considered important (Bendixen 1995, 576; Healey 2013, 289-290).
1.0559074 indicates a strong association between row and column variables.
fviz_ca_biplot(ca2, title = "Add your title here, human", repel = T)Additionally:
the contribution of rows and columns to dimensions can be analyzed
highly contributing rows and columns can be identified
plot(ca, mass = TRUE, contrib = "relative",#relative contributions are indicated by colour intensities
map = "rowgreen", arrows = c(FALSE, TRUE))Points with zero contribution are displayed in white. The higher the contribution of a point, the closer the corresponding colour to the one specified by the col option
plot1 <- fviz_ca_biplot(ca, arrows = c(F,T), title = "Output from CA")
plot2 <- fviz_ca_biplot(ca2, arrows = c(F,T), title = "Output from FactoMineR")
grid.arrange(plot1, plot2, ncol = 2)row <- get_ca_row(ca2)
head(row$contrib)## Dim 1 Dim 2 Dim 3
## Laundry 18.2867003 5.563891 7.968424
## Main_meal 12.3888433 4.735523 1.858689
## Dinner 5.4713982 1.321022 2.096926
## Breakfeast 3.8249284 3.698613 3.069399
## Tidying 1.9983518 2.965644 0.488734
## Dishes 0.4261663 2.844117 3.634294
fviz_contrib(ca, choice = "row", axes = 1) + geom_hline(yintercept=7.69, linetype = 2, color="red")fviz_contrib(ca, choice = "row", axes = 2) + geom_hline(yintercept=7.69, linetype = 2, color="red")The red line: if the row contributions were uniform, the expected value would be 1/nrow(housetasks) = 1/13 = 7.69%. The red dashed line on the graph above indicates the expected average contribution.
Average contribution: The total contribution of a row, on explaining the variations retained by Dim.1 and Dim.2, is calculated as follow : (C1 * Eig1) + (C2 * Eig2). C1 and C2 are the contributions of the row to dimensions 1 and 2 The expected average contribution of a row for Dim.1 and Dim.2 is : (7.69 * Eig1) + (7.69 * Eig2) = (7.690.54) + (7.690.44) = 7.53%
Dimension 1 is mainly defined by the opposition of Repair and Driving (positive pole), and Laundry and Main_meal (negative pole).
fviz_ca_col(ca, col.col="contrib")+
scale_color_gradient2(low="white", mid="blue",
high="red", midpoint=24.5)+theme_minimal()fviz_ca_col(ca, alpha.col="contrib") # Possible values for the argument alpha.col are : "cos2", "contrib", "coord", "x", "y"fviz_cos2(ca, choice = "col", axes = 1:2)fviz_ca_row(ca, alpha.row = "contrib")+
theme_minimal()fviz_ca_biplot(ca, select.row = list(contrib = 5),
select.col = list(contrib = 5)) +
theme_minimal()library(FactoMineR)
data(tea)#load the data
summary(tea)## breakfast tea.time evening lunch
## breakfast :144 Not.tea time:131 evening :103 lunch : 44
## Not.breakfast:156 tea time :169 Not.evening:197 Not.lunch:256
##
##
##
##
##
## dinner always home work
## dinner : 21 always :103 home :291 Not.work:213
## Not.dinner:279 Not.always:197 Not.home: 9 work : 87
##
##
##
##
##
## tearoom friends resto pub
## Not.tearoom:242 friends :196 Not.resto:221 Not.pub:237
## tearoom : 58 Not.friends:104 resto : 79 pub : 63
##
##
##
##
##
## Tea How sugar how
## black : 74 alone:195 No.sugar:155 tea bag :170
## Earl Grey:193 lemon: 33 sugar :145 tea bag+unpackaged: 94
## green : 33 milk : 63 unpackaged : 36
## other: 9
##
##
##
## where price age sex
## chain store :192 p_branded : 95 Min. :15.00 F:178
## chain store+tea shop: 78 p_cheap : 7 1st Qu.:23.00 M:122
## tea shop : 30 p_private label: 21 Median :32.00
## p_unknown : 12 Mean :37.05
## p_upscale : 53 3rd Qu.:48.00
## p_variable :112 Max. :90.00
##
## SPC Sport age_Q frequency
## employee :59 Not.sportsman:121 15-24:92 1/day : 95
## middle :40 sportsman :179 25-34:69 1 to 2/week: 44
## non-worker :64 35-44:40 +2/day :127
## other worker:20 45-59:61 3 to 6/week: 34
## senior :35 +60 :38
## student :70
## workman :12
## escape.exoticism spirituality healthy
## escape-exoticism :142 Not.spirituality:206 healthy :210
## Not.escape-exoticism:158 spirituality : 94 Not.healthy: 90
##
##
##
##
##
## diuretic friendliness iron.absorption
## diuretic :174 friendliness :242 iron absorption : 31
## Not.diuretic:126 Not.friendliness: 58 Not.iron absorption:269
##
##
##
##
##
## feminine sophisticated slimming
## feminine :129 Not.sophisticated: 85 No.slimming:255
## Not.feminine:171 sophisticated :215 slimming : 45
##
##
##
##
##
## exciting relaxing effect.on.health
## exciting :116 No.relaxing:113 effect on health : 66
## No.exciting:184 relaxing :187 No.effect on health:234
##
##
##
##
##
newtea <- tea[,c("Tea", "How", "how", "sugar", "where", "always")]#shrink
head(newtea)#look at the shrinked data## Tea How how sugar where always
## 1 black alone tea bag sugar chain store Not.always
## 2 black milk tea bag No.sugar chain store Not.always
## 3 Earl Grey alone tea bag No.sugar chain store Not.always
## 4 Earl Grey alone tea bag sugar chain store Not.always
## 5 Earl Grey alone tea bag No.sugar chain store always
## 6 Earl Grey alone tea bag No.sugar chain store Not.always
cats <- apply(newtea, 2, function(x) nlevels(as.factor(x)))
cats#number of levels per variable## Tea How how sugar where always
## 3 4 3 2 3 2
mca <- FactoMineR::MCA(newtea, graph = T)#get the resultsmca#list of results## **Results of the Multiple Correspondence Analysis (MCA)**
## The analysis was performed on 300 individuals, described by 6 variables
## *The results are available in the following objects:
##
## name description
## 1 "$eig" "eigenvalues"
## 2 "$var" "results for the variables"
## 3 "$var$coord" "coord. of the categories"
## 4 "$var$cos2" "cos2 for the categories"
## 5 "$var$contrib" "contributions of the categories"
## 6 "$var$v.test" "v-test for the categories"
## 7 "$ind" "results for the individuals"
## 8 "$ind$coord" "coord. for the individuals"
## 9 "$ind$cos2" "cos2 for the individuals"
## 10 "$ind$contrib" "contributions of the individuals"
## 11 "$call" "intermediate results"
## 12 "$call$marge.col" "weights of columns"
## 13 "$call$marge.li" "weights of rows"
mca$eig # we need dimensions with eigenvalues > 1/n=variables, i.e. 1/6~0.17, 4 dimensions## eigenvalue percentage of variance cumulative percentage of variance
## dim 1 0.27976178 15.259733 15.25973
## dim 2 0.25774772 14.058967 29.31870
## dim 3 0.22013794 12.007524 41.32622
## dim 4 0.18792961 10.250706 51.57693
## dim 5 0.16876495 9.205361 60.78229
## dim 6 0.16368666 8.928363 69.71065
## dim 7 0.15288834 8.339364 78.05002
## dim 8 0.13838682 7.548372 85.59839
## dim 9 0.11569167 6.310455 91.90885
## dim 10 0.08612637 4.697802 96.60665
## dim 11 0.06221147 3.393353 100.00000
library(ggplot2)
mca_vars_df <- data.frame(mca$var$coord, Variable = rep(names(cats), cats))# column coordinates data frame
mca_obs_df <- data.frame(mca$ind$coord)#row coordinates df
ggplot(data = mca_vars_df,
aes(x = Dim.1, y = Dim.2, label = rownames(mca_vars_df))) +
geom_hline(yintercept = 0, colour = "red") +
geom_vline(xintercept = 0, colour = "red") +
geom_text(aes(colour = Variable)) +
ggtitle("MCA Plot of Tea+How it is Drunk") +
labs(x = "Dimension 1 15%", y = "Dimension 2 14%")mca$eig[1,2]## [1] 15.25973
mca$eig[2,2]## [1] 14.05897
plots similarities between individual cases or variables in the abstract Cartesian space
Metric MDS deals with interval or ratio level data, Nonmetric MDS deals with ordinal data
Basic Values circle.
library(MASS)
library(igraph)
library(vegan)
library(haven)Example: European cities (distance in km)
data(eurodist)
as.matrix(eurodist)[6:10, 1:5] # travel distance b/w cities in km## Athens Barcelona Brussels Calais Cherbourg
## Cologne 2762 1498 206 409 785
## Copenhagen 3276 2218 966 1136 1545
## Geneva 2610 803 677 747 853
## Gibraltar 4485 1172 2256 2224 2047
## Hamburg 2977 2018 597 714 1115
mds <- cmdscale(eurodist)plot(mds, type="n")+
text(mds[,1], mds[,2], labels(eurodist))## integer(0)
plot(mds, type="n")+
text(mds[,1], mds[,2]*(-1), labels(eurodist))## integer(0)
rm(mds, eurodist)Let’s look at students’ preferences in choosing NISes in 2015.
nis<-read_spss("NIS_MDS2015.sav")
nis<-nis[,c(1,2,10,3,4,5,6,7,8,9)]
head(nis)## # A tibble: 6 x 10
## id group gen city civil cult youth body asbd econ
## <dbl> <dbl> <dbl+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 131 2 0 7 2 1 6 4 5 3
## 2 132 2 0 3 6 4 1 7 2 5
## 3 133 2 0 4 6 3 1 2 7 5
## 4 134 2 1 2 7 6 1 4 5 3
## 5 135 2 0 4 6 2 1 3 7 5
## 6 136 2 0 7 6 1 2 3 4 5
names(nis)<-c("id","year","sex", "city","polit","cult","youth","body","bigdata","econ")
head(nis)## # A tibble: 6 x 10
## id year sex city polit cult youth body bigdata econ
## <dbl> <dbl> <dbl+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 131 2 0 7 2 1 6 4 5 3
## 2 132 2 0 3 6 4 1 7 2 5
## 3 133 2 0 4 6 3 1 2 7 5
## 4 134 2 1 2 7 6 1 4 5 3
## 5 135 2 0 4 6 2 1 3 7 5
## 6 136 2 0 7 6 1 2 3 4 5
nis<-nis[,c(1,2,3,4,6,7,8,9,10, 5)]
nis$year<-as.factor(nis$year)
nis$sex<-as.factor(nis$sex)
nis <- na.omit(nis)nmds2 <- metaMDS(nis[,4:10], k=2, trymax=100)## Run 0 stress 0.2147611
## Run 1 stress 0.2149404
## ... Procrustes: rmse 0.008630338 max resid 0.08177757
## Run 2 stress 0.2147618
## ... Procrustes: rmse 0.0003278392 max resid 0.003414574
## ... Similar to previous best
## Run 3 stress 0.2153116
## Run 4 stress 0.2155001
## Run 5 stress 0.2149033
## ... Procrustes: rmse 0.007933108 max resid 0.1061095
## Run 6 stress 0.215689
## Run 7 stress 0.2147622
## ... Procrustes: rmse 0.0003430299 max resid 0.003418326
## ... Similar to previous best
## Run 8 stress 0.2155981
## Run 9 stress 0.2148746
## ... Procrustes: rmse 0.01210575 max resid 0.1053541
## Run 10 stress 0.214798
## ... Procrustes: rmse 0.002534361 max resid 0.03293606
## Run 11 stress 0.2146994
## ... New best solution
## ... Procrustes: rmse 0.01027872 max resid 0.1058776
## Run 12 stress 0.2147611
## ... Procrustes: rmse 0.01027813 max resid 0.1064005
## Run 13 stress 0.2152897
## Run 14 stress 0.2147615
## ... Procrustes: rmse 0.01031016 max resid 0.1064215
## Run 15 stress 0.2288929
## Run 16 stress 0.2164164
## Run 17 stress 0.2149342
## ... Procrustes: rmse 0.005436101 max resid 0.06899523
## Run 18 stress 0.215011
## ... Procrustes: rmse 0.009216866 max resid 0.1047025
## Run 19 stress 0.214602
## ... New best solution
## ... Procrustes: rmse 0.008844197 max resid 0.1065491
## Run 20 stress 0.2313235
## Run 21 stress 0.2147618
## ... Procrustes: rmse 0.005188564 max resid 0.07040212
## Run 22 stress 0.2147615
## ... Procrustes: rmse 0.005185983 max resid 0.07035811
## Run 23 stress 0.2145961
## ... New best solution
## ... Procrustes: rmse 0.001182936 max resid 0.01098416
## Run 24 stress 0.2145946
## ... New best solution
## ... Procrustes: rmse 0.008657589 max resid 0.1062222
## Run 25 stress 0.2149035
## ... Procrustes: rmse 0.006036924 max resid 0.06919665
## Run 26 stress 0.214761
## ... Procrustes: rmse 0.01003512 max resid 0.1063332
## Run 27 stress 0.2146549
## ... Procrustes: rmse 0.002437282 max resid 0.0327501
## Run 28 stress 0.2157982
## Run 29 stress 0.2157119
## Run 30 stress 0.2146034
## ... Procrustes: rmse 0.008434247 max resid 0.1065123
## Run 31 stress 0.214761
## ... Procrustes: rmse 0.01005101 max resid 0.1063539
## Run 32 stress 0.215122
## Run 33 stress 0.215617
## Run 34 stress 0.2193506
## Run 35 stress 0.2163142
## Run 36 stress 0.2153219
## Run 37 stress 0.2310069
## Run 38 stress 0.2312564
## Run 39 stress 0.2155989
## Run 40 stress 0.2162609
## Run 41 stress 0.2148626
## ... Procrustes: rmse 0.005580967 max resid 0.06363713
## Run 42 stress 0.2156393
## Run 43 stress 0.2147814
## ... Procrustes: rmse 0.009931416 max resid 0.1060335
## Run 44 stress 0.2286831
## Run 45 stress 0.2148718
## ... Procrustes: rmse 0.005828139 max resid 0.06375893
## Run 46 stress 0.2145576
## ... New best solution
## ... Procrustes: rmse 0.007315256 max resid 0.1052255
## Run 47 stress 0.2145929
## ... Procrustes: rmse 0.007317937 max resid 0.1049761
## Run 48 stress 0.231385
## Run 49 stress 0.2147968
## ... Procrustes: rmse 0.005704325 max resid 0.06967537
## Run 50 stress 0.2158008
## Run 51 stress 0.2147617
## ... Procrustes: rmse 0.006590453 max resid 0.07073478
## Run 52 stress 0.2376889
## Run 53 stress 0.2149279
## ... Procrustes: rmse 0.009547062 max resid 0.1046612
## Run 54 stress 0.2294389
## Run 55 stress 0.2156894
## Run 56 stress 0.2149861
## ... Procrustes: rmse 0.005684492 max resid 0.06505048
## Run 57 stress 0.214761
## ... Procrustes: rmse 0.006519188 max resid 0.07056082
## Run 58 stress 0.2148756
## ... Procrustes: rmse 0.009472834 max resid 0.1045488
## Run 59 stress 0.2331814
## Run 60 stress 0.2163192
## Run 61 stress 0.2156442
## Run 62 stress 0.2147615
## ... Procrustes: rmse 0.006573808 max resid 0.07070563
## Run 63 stress 0.2147614
## ... Procrustes: rmse 0.006564516 max resid 0.07067816
## Run 64 stress 0.2145893
## ... Procrustes: rmse 0.007412029 max resid 0.1050797
## Run 65 stress 0.2145574
## ... New best solution
## ... Procrustes: rmse 3.479808e-05 max resid 0.000200598
## ... Similar to previous best
## *** Solution reached
stressplot(nmds2)Stress is the function that is minimized; it is good when stress is <.2 :)
Large scatter around the line suggests that original dissimilarities are not well preserved in the reduced number of dimensions.
nmds3 <- metaMDS(nis[,4:10], k = 3, trymax=100)## Run 0 stress 0.1409921
## Run 1 stress 0.1409961
## ... Procrustes: rmse 0.001493876 max resid 0.009450459
## ... Similar to previous best
## Run 2 stress 0.1444306
## Run 3 stress 0.1463538
## Run 4 stress 0.1409928
## ... Procrustes: rmse 0.0004933147 max resid 0.002854043
## ... Similar to previous best
## Run 5 stress 0.1461007
## Run 6 stress 0.1420583
## Run 7 stress 0.1460868
## Run 8 stress 0.1409906
## ... New best solution
## ... Procrustes: rmse 0.0002942407 max resid 0.00152511
## ... Similar to previous best
## Run 9 stress 0.1410043
## ... Procrustes: rmse 0.0007619995 max resid 0.006876658
## ... Similar to previous best
## Run 10 stress 0.1453617
## Run 11 stress 0.1409921
## ... Procrustes: rmse 0.0004504055 max resid 0.002404194
## ... Similar to previous best
## Run 12 stress 0.1444132
## Run 13 stress 0.1409949
## ... Procrustes: rmse 0.0003383447 max resid 0.002978687
## ... Similar to previous best
## Run 14 stress 0.146113
## Run 15 stress 0.1453702
## Run 16 stress 0.1456707
## Run 17 stress 0.1409912
## ... Procrustes: rmse 0.0003255643 max resid 0.002239785
## ... Similar to previous best
## Run 18 stress 0.1409888
## ... New best solution
## ... Procrustes: rmse 0.0007818997 max resid 0.006430355
## ... Similar to previous best
## Run 19 stress 0.145683
## Run 20 stress 0.1409929
## ... Procrustes: rmse 0.0008266151 max resid 0.006453891
## ... Similar to previous best
## *** Solution reached
stressplot(nmds3)plot(nmds3)ordiplot(nmds3,type="n")
orditorp(nmds3,display="sites",col="red",air=0.01)
orditorp(nmds3,display="species",cex=1.25,air=0.01)
orditorp(nmds3,display="species",col=c(rep("brown",4),rep("blue",3)),
air=0.01,cex=1.25)colors=c(rep("red",4),rep("blue",3))
ordiplot(nmds3,type="n")
for(i in unique(nis$year)) {
ordihull(nmds3$point[grep(i,nis$year),],draw="polygon",
groups=nis$year[nis$year==i],col=colors[grep(i,nis$year)],label=F)
}
orditorp(nmds3,display="sites",col="red",air=0.01)
orditorp(nmds3,display="species",col=c(rep("purple",4),
rep("blue",3)),air=0.01,cex=1.25)correspondence analysis: use when
example: cultural consumption, voting, and social class
multidimensional scaling: use when
example: latent dimensions behind the preference for politicians