MDS&CorrAanlysis

Multidimensional Scaling and Correspondence Analysis

#############################################################################
######################                                                      #           
# R example code for multidimensional scaling and correspondence analysis:  #
######################                                                      #
#############################################################################


# The built-in R data set called eurodist contains a "dist" object 
# with road distances between 21 European cities.
# For information, type:

help(eurodist)

## starting httpd help server ... done

# To see the (partial) distance matrix, type:

eurodist

##                 Athens Barcelona Brussels Calais Cherbourg Cologne Copenhagen
## Barcelona         3313                                                       
## Brussels          2963      1318                                             
## Calais            3175      1326      204                                    
## Cherbourg         3339      1294      583    460                             
## Cologne           2762      1498      206    409       785                   
## Copenhagen        3276      2218      966   1136      1545     760           
## Geneva            2610       803      677    747       853    1662       1418
## Gibraltar         4485      1172     2256   2224      2047    2436       3196
## Hamburg           2977      2018      597    714      1115     460        460
## Hook of Holland   3030      1490      172    330       731     269        269
## Lisbon            4532      1305     2084   2052      1827    2290       2971
## Lyons             2753       645      690    739       789     714       1458
## Madrid            3949       636     1558   1550      1347    1764       2498
## Marseilles        2865       521     1011   1059      1101    1035       1778
## Milan             2282      1014      925   1077      1209     911       1537
## Munich            2179      1365      747    977      1160     583       1104
## Paris             3000      1033      285    280       340     465       1176
## Rome               817      1460     1511   1662      1794    1497       2050
## Stockholm         3927      2868     1616   1786      2196    1403        650
## Vienna            1991      1802     1175   1381      1588     937       1455
##                 Geneva Gibraltar Hamburg Hook of Holland Lisbon Lyons Madrid
## Barcelona                                                                   
## Brussels                                                                    
## Calais                                                                      
## Cherbourg                                                                   
## Cologne                                                                     
## Copenhagen                                                                  
## Geneva                                                                      
## Gibraltar         1975                                                      
## Hamburg           1118      2897                                            
## Hook of Holland    895      2428     550                                    
## Lisbon            1936       676    2671            2280                    
## Lyons              158      1817    1159             863   1178             
## Madrid            1439       698    2198            1730    668  1281       
## Marseilles         425      1693    1479            1183   1762   320   1157
## Milan              328      2185    1238            1098   2250   328   1724
## Munich             591      2565     805             851   2507   724   2010
## Paris              513      1971     877             457   1799   471   1273
## Rome               995      2631    1751            1683   2700  1048   2097
## Stockholm         2068      3886     949            1500   3231  2108   3188
## Vienna            1019      2974    1155            1205   2937  1157   2409
##                 Marseilles Milan Munich Paris Rome Stockholm
## Barcelona                                                   
## Brussels                                                    
## Calais                                                      
## Cherbourg                                                   
## Cologne                                                     
## Copenhagen                                                  
## Geneva                                                      
## Gibraltar                                                   
## Hamburg                                                     
## Hook of Holland                                             
## Lisbon                                                      
## Lyons                                                       
## Madrid                                                      
## Marseilles                                                  
## Milan                  618                                  
## Munich                1109   331                            
## Paris                  792   856    821                     
## Rome                  1011   586    946  1476               
## Stockholm             2428  2187   1754  1827 2707          
## Vienna                1363   898    428  1249 1209      2105

###############################################################################
###############################################################################

######################################
# Classical multidimensional scaling #
######################################

# Classical MDS for the European distances using the cmdscale function:
# The default number of dimensions is k = 2:

euro.mds.2 <- cmdscale(eurodist, eig=T)
euro.mds.2

## $points
##                         [,1]        [,2]
## Athens           2290.274680  1798.80293
## Barcelona        -825.382790   546.81148
## Brussels           59.183341  -367.08135
## Calais            -82.845973  -429.91466
## Cherbourg        -352.499435  -290.90843
## Cologne           293.689633  -405.31194
## Copenhagen        681.931545 -1108.64478
## Geneva             -9.423364   240.40600
## Gibraltar       -2048.449113   642.45854
## Hamburg           561.108970  -773.36929
## Hook of Holland   164.921799  -549.36704
## Lisbon          -1935.040811    49.12514
## Lyons            -226.423236   187.08779
## Madrid          -1423.353697   305.87513
## Marseilles       -299.498710   388.80726
## Milan             260.878046   416.67381
## Munich            587.675679    81.18224
## Paris            -156.836257  -211.13911
## Rome              709.413282  1109.36665
## Stockholm         839.445911 -1836.79055
## Vienna            911.230500   205.93020
## 
## $eig
##  [1]  1.953838e+07  1.185656e+07  1.528844e+06  1.118742e+06  7.893472e+05
##  [6]  5.816552e+05  2.623192e+05  1.925976e+05  1.450845e+05  1.079673e+05
## [11]  5.139484e+04 -3.725290e-09 -9.496124e+03 -5.305820e+04 -1.322166e+05
## [16] -2.573360e+05 -3.326719e+05 -5.162523e+05 -9.191491e+05 -1.006504e+06
## [21] -2.251844e+06
## 
## $x
## NULL
## 
## $ac
## [1] 0
## 
## $GOF
## [1] 0.7537543 0.8679134

# The first number in the GOF section is what I called P_k, which is 
# (almost) the P^(1) criterion given near the top of page 96.

# Changing the number of dimensions k:

euro.mds.4 <- cmdscale(eurodist, k=4, eig=T)
euro.mds.4

## $points
##                         [,1]        [,2]       [,3]        [,4]
## Athens           2290.274680  1798.80293   53.79314 -103.826958
## Barcelona        -825.382790   546.81148 -113.85842   84.585831
## Brussels           59.183341  -367.08135  177.55291   38.797514
## Calais            -82.845973  -429.91466  300.19274  106.353695
## Cherbourg        -352.499435  -290.90843  457.35294  111.449150
## Cologne           293.689633  -405.31194  360.09323 -636.202379
## Copenhagen        681.931545 -1108.64478   26.09257  151.693056
## Geneva             -9.423364   240.40600 -344.20659  656.121110
## Gibraltar       -2048.449113   642.45854  167.86631   78.621423
## Hamburg           561.108970  -773.36929   80.91722   48.548472
## Hook of Holland   164.921799  -549.36704  270.82327  116.886334
## Lisbon          -1935.040811    49.12514 -483.02056 -315.241752
## Lyons            -226.423236   187.08779 -358.43234 -257.737009
## Madrid          -1423.353697   305.87513  253.26763    2.478812
## Marseilles       -299.498710   388.80726 -109.17417   12.651217
## Milan             260.878046   416.67381 -171.52428   20.926369
## Munich            587.675679    81.18224  -75.88485   13.080496
## Paris            -156.836257  -211.13911  131.30852   27.089432
## Rome              709.413282  1109.36665 -179.83052 -109.895049
## Stockholm         839.445911 -1836.79055 -541.35188 -108.755016
## Vienna            911.230500   205.93020   98.02313   62.375253
## 
## $eig
##  [1]  1.953838e+07  1.185656e+07  1.528844e+06  1.118742e+06  7.893472e+05
##  [6]  5.816552e+05  2.623192e+05  1.925976e+05  1.450845e+05  1.079673e+05
## [11]  5.139484e+04 -3.725290e-09 -9.496124e+03 -5.305820e+04 -1.322166e+05
## [16] -2.573360e+05 -3.326719e+05 -5.162523e+05 -9.191491e+05 -1.006504e+06
## [21] -2.251844e+06
## 
## $x
## NULL
## 
## $ac
## [1] 0
## 
## $GOF
## [1] 0.8173197 0.9411060

# Using the P_k criterion for a variety of values of k
# to choose the appropriate amount of dimension reduction:

# If we are considering values of k from 1 to, say, 12:

max.k <- 12
P.k <- rep(0,max.k)
SStress <- rep(0,max.k)
for (kk in 1:max.k){
my.mds.kk <- cmdscale(eurodist,k=kk,eig=T)
P.k[kk] <- my.mds.kk$GOF[1]
#SStress[kk] <- ( sum( (eurodist^2 - (dist(my.mds.kk$points))^2)^2 )/sum(eurodist^4) )^0.5
}

## Warning in cmdscale(eurodist, k = kk, eig = T): only 11 of the first 12
## eigenvalues are > 0

cbind(1:max.k,P.k)

##                P.k
##  [1,]  1 0.4690928
##  [2,]  2 0.7537543
##  [3,]  3 0.7904600
##  [4,]  4 0.8173197
##  [5,]  5 0.8362709
##  [6,]  6 0.8502358
##  [7,]  7 0.8565337
##  [8,]  8 0.8611578
##  [9,]  9 0.8646411
## [10,] 10 0.8672332
## [11,] 11 0.8684672
## [12,] 12 0.8684672

# It looks like 2 or 3 dimensions would be reasonable, and 4 gives quite a good fit.

# A 2-D representation of the solution for k=2:

par(pty="s")  # Creates a square plotting region

## The reason for the (-2500, 2500) limits below is because the coordinates in the first two dimensions,
## which can be seen by typing:  euro.mds.2$points
## range from -2048.449113 to 2290.274680 
## Making the axes go from -2500 to 2500 allows all the labels to fit nicely on the plot.
## For other data sets, you'll have to adjust these numbers.
## You can look at the '$points' component of the output and see the smallest and largest values
## and use limits that add a little space beyond this range.

plot(euro.mds.2$points[,1], euro.mds.2$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-2500,2500), ylim=c(-2500,2500) )
text(euro.mds.2$points[,1], euro.mds.2$points[,2], labels=labels(eurodist) )

# Same thing, but using abbreviations:

euro.abb <- abbreviate(labels(eurodist))

plot(euro.mds.2$points[,1], euro.mds.2$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-2500,2500), ylim=c(-2500,2500) )
text(euro.mds.2$points[,1], euro.mds.2$points[,2], labels=euro.abb )

# Maybe rotating across the x-axis would produce a better reflection of reality!

plot(euro.mds.2$points[,1], -euro.mds.2$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-2500,2500), ylim=c(-2500,2500) )
text(euro.mds.2$points[,1], -euro.mds.2$points[,2], labels=euro.abb )

### A 3-D plot of the 3-D solution, including labels of the city names:

library(scatterplot3d)

   s3d <- scatterplot3d(euro.mds.4$points[,1],euro.mds.4$points[,2],euro.mds.4$points[,3],        # x y and z axis
                 color="blue", pch=19,        # filled blue circles
                 type="h",                    # vertical lines to the x-y plane
                 main="3-D Scatterplot of 3-D MDS Solution",
                 xlab="Coordinate 1",
                 ylab="Coordinate 2",
                 zlab="Coordinate 3")
s3d.coords <- s3d$xyz.convert(euro.mds.4$points[,1],euro.mds.4$points[,2],euro.mds.4$points[,3]) # convert 3D coords to 2D projection
text(s3d.coords$x, s3d.coords$y,             # x and y coordinates
    labels=labels(eurodist),               # text to plot
    cex=.7, pos=4)           # shrink text and place to right of points)

### This plot works well.


###############################################################################
###############################################################################


# Cola dissimilarity matrix:
# [Data from Table 5.1 (Subject 1)]


cola.diss <- matrix(c(
0,16,81,56,87,60,84,50,99,16,
16,0,47,32,68,35,94,87,25,92,
81,47,0,71,44,21,98,79,53,90,
56,32,71,0,71,98,57,73,98,83,
87,68,44,71,0,34,99,19,52,79,
60,35,21,98,34,0,99,92,17,44,
84,94,98,57,99,99,0,45,99,24,
50,87,79,73,19,92,45,0,84,18,
99,25,53,98,52,17,99,84,0,98,
16,92,90,83,79,44,24,18,98,0
),ncol=10,nrow=10,byrow=T)


# Using the P_k criterion for a variety of values of k
# to choose the appropriate amount of dimension reduction:

# If we are considering values of k from 1 to, say, 8:

max.k <- 8
P.k <- rep(0,max.k)
SStress <- rep(0,max.k)
for (kk in 1:max.k){
my.mds.kk <- cmdscale(cola.diss,k=kk,eig=T)
P.k[kk] <- my.mds.kk$GOF[1]
#SStress[kk] <- ( sum( (cola.diss^2 - (dist(my.mds.kk$points))^2)^2 )/sum(cola.diss^4) )^0.5
}

## Warning in cmdscale(cola.diss, k = kk, eig = T): only 6 of the first 7
## eigenvalues are > 0

## Warning in cmdscale(cola.diss, k = kk, eig = T): only 6 of the first 8
## eigenvalues are > 0

cbind(1:max.k,P.k)

##              P.k
## [1,] 1 0.3420924
## [2,] 2 0.5150296
## [3,] 3 0.6512628
## [4,] 4 0.7382206
## [5,] 5 0.7811064
## [6,] 6 0.7814888
## [7,] 7 0.7814888
## [8,] 8 0.7814888

# Plotting SStress against k for several values of k:

#plot(1:max.k, SStress, type='b')


# It looks like we need at least 4 dimensions,
# certainly no more than 5.

cola.mds.4 <- cmdscale(cola.diss,k=4,eig=T)
cola.mds.4

## $points
##            [,1]       [,2]       [,3]       [,4]
##  [1,]  21.92429  23.662190 -37.331091  19.351930
##  [2,] -28.98172  39.274204  -8.781346  -4.040602
##  [3,] -36.36491   1.133554  10.186102   3.592112
##  [4,]  19.77363  47.041946  25.571121   3.779614
##  [5,] -22.49050 -24.812275  25.578182  26.493603
##  [6,] -37.11963 -18.724872 -30.423673  -6.507892
##  [7,]  54.59627  -4.059733  16.301788 -34.683726
##  [8,]  34.87852 -24.893987  18.417004  20.061233
##  [9,] -48.89961 -13.275202   9.003597 -24.674885
## [10,]  42.68366 -25.345825 -28.521684  -3.371388
## 
## $eig
##  [1]  1.332804e+04  6.737696e+03  5.307690e+03  3.387906e+03  1.670848e+03
##  [6]  1.489571e+01 -5.456968e-12 -2.135513e+03 -2.894177e+03 -3.483584e+03
## 
## $x
## NULL
## 
## $ac
## [1] 0
## 
## $GOF
## [1] 0.7382206 0.9446336

# A representation of the 2-dimensional solution:


par(pty="s")  # Creates a square plotting region

plot(cola.mds.4$points[,1], cola.mds.4$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-55,55), ylim=c(-55,55) )
text(cola.mds.4$points[,1], cola.mds.4$points[,2])

# Interesting:  Note the positions of colas 1, 8, and 10 on the map
# and examine the pairwise dissimilarities for these three colas...

###############################################################################
###############################################################################


# Reading in the College data set from the 'ISLR' packages:

# install.packages("ISLR")  # if needed

library(ISLR)

## Warning: package 'ISLR' was built under R version 4.4.1

data(College)

LargeColleges <- subset(College, Apps >= 7600)

coll.abbs<- abbreviate(row.names(LargeColleges))

LargeColleges.numeric <- LargeColleges[,-1]
attach(LargeColleges.numeric)



std <- apply(LargeColleges.numeric,2,sd) # finding standard deviations of variables

# dividing each variable by its standard deviation:
LargeColleges.std <- sweep(LargeColleges.numeric,2,std,FUN="/")  

# Calculating Euclidean distances between these 80 scaled observations:

coll.dist <- dist(LargeColleges.std)


# Using the P_k criterion for a variety of values of k
# to choose the appropriate amount of dimension reduction:

# If we are considering values of k from 1 to, say, 6:

max.k <- 6
P.k <- rep(0,max.k)
SStress <- rep(0,max.k)
for (kk in 1:max.k){
my.mds.kk <- cmdscale(coll.dist,k=kk,eig=T)
P.k[kk] <- my.mds.kk$GOF[1]
SStress[kk] <- ( sum( (coll.dist^2 - (dist(my.mds.kk$points))^2)^2 )/sum(coll.dist^4) )^0.5
}
cbind(1:max.k,P.k)

##              P.k
## [1,] 1 0.4149158
## [2,] 2 0.5871127
## [3,] 3 0.6772569
## [4,] 4 0.7493639
## [5,] 5 0.8052328
## [6,] 6 0.8510800

# Plotting SStress against k for several values of k:
# This makes more sense when the MDS is based on Euclidean distances:

plot(1:max.k, SStress, type='b')

# Using 4 or 5 dimensions seems appropriate here.

coll.mds.5 <- cmdscale(coll.dist,k=5,eig=T)
# coll.mds.5
coll.mds.5$points[1:5,]

##                                            [,1]       [,2]        [,3]
## Arizona State University Main campus  3.2044080  0.9462378  1.68927895
## Boston University                     0.1058321  0.6824223 -2.88371613
## Bowling Green State University        2.5495834 -1.1376802 -0.04682925
## Brown University                     -4.8241776  0.6556161  0.69321611
## California Polytechnic-San Luis       1.7544313 -2.3028817  0.34116507
##                                            [,4]       [,5]
## Arizona State University Main campus  1.2855091 -0.8086297
## Boston University                     1.9413748  0.6820729
## Bowling Green State University       -0.8592628 -0.1361180
## Brown University                     -0.6029216 -0.5377545
## California Polytechnic-San Luis      -1.0408904  0.7991447

# A representation of the 2-dimensional solution:

par(pty="s")  # Creates a square plotting region

## The value of 'cex' controls how large the labels are printed on the plot.

plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-7,7), ylim=c(-7,7) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=coll.abbs, cex=0.7)

# Zooming in on the cluster of points in the middle:

plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-7,7), ylim=c(-4,4) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=coll.abbs, cex=0.7)

# Is it better to use the full row names as labels?

# The Zoomed-in plot (doesn't show Rutgers):

plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-7,7), ylim=c(-4,4) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=row.names(LargeColleges), cex=0.5)

## Zooming in even more to see our favorite university:

plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
     xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(0,4), ylim=c(-2,0) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=row.names(LargeColleges), cex=0.7)

### A 3-D plot of the 3-D solution, including labels of the college names:

library(scatterplot3d)

   s3d <- scatterplot3d(coll.mds.5$points[,1],coll.mds.5$points[,2],coll.mds.5$points[,3],        # x y and z axis
                 color="blue", pch=19,        # filled blue circles
                 type="h",                    # vertical lines to the x-y plane
                 main="3-D Scatterplot of 3-D MDS Solution",
                 xlab="Coordinate 1",
                 ylab="Coordinate 2",
                 zlab="Coordinate 3")
s3d.coords <- s3d$xyz.convert(coll.mds.5$points[,1],coll.mds.5$points[,2],coll.mds.5$points[,3]) # convert 3D coords to 2D projection
text(s3d.coords$x, s3d.coords$y,             # x and y coordinates
    labels=row.names(LargeColleges),               # text to plot
    cex=.5, pos=4)           # shrink text 50% and place to right of points)

### Honestly there are too many observations for this plot to work well.

###############################################################################
###############################################################################

library(MASS)  # loading the MASS package

# Nonmetric scaling on the cola distances:

cola.iso<-isoMDS(cola.diss, k = 2)

## initial  value 20.673515 
## iter   5 value 15.248310
## iter  10 value 14.248849
## iter  10 value 14.239367
## iter  10 value 14.236568
## final  value 14.236568 
## converged

plot(cola.iso$points, type = "n",xlab="Coordinate 1", ylab="Coordinate 2")
text(cola.iso$points, labels = as.character(1:nrow(cola.diss)))
abline(h=0); abline(v=0)

###############################
# Correspondence analysis     #
###############################

# Several contingency tables from the book:

girls <- matrix(c(
21,21,14,13,8,
8,9,6,8,2,
2,3,4,10,10),byrow=T, ncol=5, nrow=3,
dimnames = list(c('nbf','bfns','bfs'),
                               c('AG1', 'AG2', 'AG3', 'AG4', 'AG5')))

smokemoms <- matrix(c(
50,315,24,4012,
9,40,6,459,
41,147,14,1594,
4,11,1,124), byrow=T, ncol=4, nrow=4,
dimnames = list(c('YNS','YS','ONS','OS'),
                               c('pd','pa','ftd', 'fta')))

hodgkin <- matrix(c(
74,18,12,
68,16,12,
154,54,58,
18,10,44), byrow=T, ncol=3, nrow=4,
dimnames = list(c('LP','NS','MC','LD'),
                               c('positive','partial','none')))

# One that's not in the book --- a snoring / heart disease data set:

snore.heart.data <- matrix(c(24,35,51,1355,603,416), nrow=2, ncol=3, byrow=TRUE, 
dimnames = list(c("Yes", "No"), c("Never", "Occasionally", "~ Every Night")))

as.table(snore.heart.data)

##     Never Occasionally ~ Every Night
## Yes    24           35            51
## No   1355          603           416

#########################################################
## Code to get the matrices of chi-squared distances 
## for both rows and columns 
#########################################################
#################  BEGIN CODE ###########################

# Enter name of data matrix / contingency table:
mymat <- hodgkin
rr <- nrow(mymat); cc <- ncol(mymat)
row.sums <- apply(mymat,1,sum)
col.sums <- apply(mymat,2,sum)
N<-sum(row.sums)

pijrow <- matrix(0,nrow=rr,ncol=cc); pijcol <- matrix(0,nrow=rr,ncol=cc)
distmat.row <- matrix(0,nrow=rr,ncol=rr); distmat.col <- matrix(0,nrow=cc,ncol=cc)

for (i in 1:rr){
pijcol[i,] <- mymat[i,]/row.sums[i]
}
for (j in 1:cc){
pijrow[,j] <- mymat[,j]/col.sums[j]
}

for (i in 1:rr){
for (ii in 1:(i-1)) {
my.hold <- sqrt( sum( (N/col.sums)*(pijcol[i,]-pijcol[ii,])^2 ) )
distmat.row[i,ii] <- my.hold; distmat.row[ii,i] <- my.hold
}
}

for (j in 1:cc){
for (jj in 1:(j-1)) {
my.hold <- sqrt( sum( (N/row.sums)*(pijrow[,j]-pijrow[,jj])^2 )  )
distmat.col[j,jj] <- my.hold; distmat.col[jj,j] <- my.hold
}
}

round(distmat.col,digits=2)

##      [,1] [,2] [,3]
## [1,] 0.00 0.23 0.91
## [2,] 0.23 0.00 0.73
## [3,] 0.91 0.73 0.00

round(distmat.row,digits=2)

##      [,1] [,2] [,3] [,4]
## [1,] 0.00 0.03 0.28 1.19
## [2,] 0.03 0.00 0.27 1.17
## [3,] 0.28 0.27 0.00 0.93
## [4,] 1.19 1.17 0.93 0.00

###################  END CODE ###########################
#########################################################

# A straightforward chi-square test for independence of rows and columns:

chisq.test(hodgkin)

## 
##  Pearson's Chi-squared test
## 
## data:  hodgkin
## X-squared = 75.89, df = 6, p-value = 2.517e-14

# We see that histological classification and response to treatment are clearly associated.

############################################################################

# A quick way to do correspondence analysis using the "corresp" function
# in the MASS library:

library(MASS) # loading the MASS package

corresp(hodgkin,nf=2)  # The two-dimensional solution.

## First canonical correlation(s): 0.37200775 0.05167191 
## 
##  Row scores:
##           [,1]       [,2]
## LP -0.79084450 -0.9084108
## NS -0.73631970 -1.1996887
## MC -0.07824262  1.0040902
## LD  2.41315356 -0.7978217
## 
##  Column scores:
##                [,1]       [,2]
## positive -0.6609412 -0.5258636
## partial  -0.1675926  2.1122757
## none      1.7774573 -0.3323957

biplot(corresp(hodgkin,nf=2)); abline(h=0); abline(v=0)

# It apparently uses a slightly different algorithm than the book's method:
# While the results are close, they are not identical.

# Some conclusions:

# LP and NS have very similar row profiles.
# "No response" occurs especially often with LD.
# "Positive response" is associated with LP and with NS.
# "Partial response" is linked with MC.

# Examining the 1-dimensional solution can be useful in interpretation as well:

corresp(hodgkin,nf=1)

## First canonical correlation(s): 0.3720078 
## 
##  Row scores:
##          LP          NS          MC          LD 
## -0.79084450 -0.73631970 -0.07824262  2.41315356 
## 
##  Column scores:
##   positive    partial       none 
## -0.6609412 -0.1675926  1.7774573

# Note "LD" and "none" both have large positive coordinates.
# Note "LP" (and "NS") and "positive" both have large negative coordinates.
# But "LD" and "positive" have opposite signs.

# To (nearly) replicate the book's examples in section 5.3 and in 5.3.1,
# replace "hodgkin" with "girls" or "smokemoms" in the above code.

# What are the substantive conclusions in the "smoking / motherhood" analysis?

MDS&CorrAanlysis

2024-07-31