Multidimensional Scaling and Correspondence Analysis
#############################################################################
###################### #
# R example code for multidimensional scaling and correspondence analysis: #
###################### #
#############################################################################
# The built-in R data set called eurodist contains a "dist" object
# with road distances between 21 European cities.
# For information, type:
help(eurodist)
## starting httpd help server ... done
# To see the (partial) distance matrix, type:
eurodist
## Athens Barcelona Brussels Calais Cherbourg Cologne Copenhagen
## Barcelona 3313
## Brussels 2963 1318
## Calais 3175 1326 204
## Cherbourg 3339 1294 583 460
## Cologne 2762 1498 206 409 785
## Copenhagen 3276 2218 966 1136 1545 760
## Geneva 2610 803 677 747 853 1662 1418
## Gibraltar 4485 1172 2256 2224 2047 2436 3196
## Hamburg 2977 2018 597 714 1115 460 460
## Hook of Holland 3030 1490 172 330 731 269 269
## Lisbon 4532 1305 2084 2052 1827 2290 2971
## Lyons 2753 645 690 739 789 714 1458
## Madrid 3949 636 1558 1550 1347 1764 2498
## Marseilles 2865 521 1011 1059 1101 1035 1778
## Milan 2282 1014 925 1077 1209 911 1537
## Munich 2179 1365 747 977 1160 583 1104
## Paris 3000 1033 285 280 340 465 1176
## Rome 817 1460 1511 1662 1794 1497 2050
## Stockholm 3927 2868 1616 1786 2196 1403 650
## Vienna 1991 1802 1175 1381 1588 937 1455
## Geneva Gibraltar Hamburg Hook of Holland Lisbon Lyons Madrid
## Barcelona
## Brussels
## Calais
## Cherbourg
## Cologne
## Copenhagen
## Geneva
## Gibraltar 1975
## Hamburg 1118 2897
## Hook of Holland 895 2428 550
## Lisbon 1936 676 2671 2280
## Lyons 158 1817 1159 863 1178
## Madrid 1439 698 2198 1730 668 1281
## Marseilles 425 1693 1479 1183 1762 320 1157
## Milan 328 2185 1238 1098 2250 328 1724
## Munich 591 2565 805 851 2507 724 2010
## Paris 513 1971 877 457 1799 471 1273
## Rome 995 2631 1751 1683 2700 1048 2097
## Stockholm 2068 3886 949 1500 3231 2108 3188
## Vienna 1019 2974 1155 1205 2937 1157 2409
## Marseilles Milan Munich Paris Rome Stockholm
## Barcelona
## Brussels
## Calais
## Cherbourg
## Cologne
## Copenhagen
## Geneva
## Gibraltar
## Hamburg
## Hook of Holland
## Lisbon
## Lyons
## Madrid
## Marseilles
## Milan 618
## Munich 1109 331
## Paris 792 856 821
## Rome 1011 586 946 1476
## Stockholm 2428 2187 1754 1827 2707
## Vienna 1363 898 428 1249 1209 2105
###############################################################################
###############################################################################
######################################
# Classical multidimensional scaling #
######################################
# Classical MDS for the European distances using the cmdscale function:
# The default number of dimensions is k = 2:
euro.mds.2 <- cmdscale(eurodist, eig=T)
euro.mds.2
## $points
## [,1] [,2]
## Athens 2290.274680 1798.80293
## Barcelona -825.382790 546.81148
## Brussels 59.183341 -367.08135
## Calais -82.845973 -429.91466
## Cherbourg -352.499435 -290.90843
## Cologne 293.689633 -405.31194
## Copenhagen 681.931545 -1108.64478
## Geneva -9.423364 240.40600
## Gibraltar -2048.449113 642.45854
## Hamburg 561.108970 -773.36929
## Hook of Holland 164.921799 -549.36704
## Lisbon -1935.040811 49.12514
## Lyons -226.423236 187.08779
## Madrid -1423.353697 305.87513
## Marseilles -299.498710 388.80726
## Milan 260.878046 416.67381
## Munich 587.675679 81.18224
## Paris -156.836257 -211.13911
## Rome 709.413282 1109.36665
## Stockholm 839.445911 -1836.79055
## Vienna 911.230500 205.93020
##
## $eig
## [1] 1.953838e+07 1.185656e+07 1.528844e+06 1.118742e+06 7.893472e+05
## [6] 5.816552e+05 2.623192e+05 1.925976e+05 1.450845e+05 1.079673e+05
## [11] 5.139484e+04 -3.725290e-09 -9.496124e+03 -5.305820e+04 -1.322166e+05
## [16] -2.573360e+05 -3.326719e+05 -5.162523e+05 -9.191491e+05 -1.006504e+06
## [21] -2.251844e+06
##
## $x
## NULL
##
## $ac
## [1] 0
##
## $GOF
## [1] 0.7537543 0.8679134
# The first number in the GOF section is what I called P_k, which is
# (almost) the P^(1) criterion given near the top of page 96.
# Changing the number of dimensions k:
euro.mds.4 <- cmdscale(eurodist, k=4, eig=T)
euro.mds.4
## $points
## [,1] [,2] [,3] [,4]
## Athens 2290.274680 1798.80293 53.79314 -103.826958
## Barcelona -825.382790 546.81148 -113.85842 84.585831
## Brussels 59.183341 -367.08135 177.55291 38.797514
## Calais -82.845973 -429.91466 300.19274 106.353695
## Cherbourg -352.499435 -290.90843 457.35294 111.449150
## Cologne 293.689633 -405.31194 360.09323 -636.202379
## Copenhagen 681.931545 -1108.64478 26.09257 151.693056
## Geneva -9.423364 240.40600 -344.20659 656.121110
## Gibraltar -2048.449113 642.45854 167.86631 78.621423
## Hamburg 561.108970 -773.36929 80.91722 48.548472
## Hook of Holland 164.921799 -549.36704 270.82327 116.886334
## Lisbon -1935.040811 49.12514 -483.02056 -315.241752
## Lyons -226.423236 187.08779 -358.43234 -257.737009
## Madrid -1423.353697 305.87513 253.26763 2.478812
## Marseilles -299.498710 388.80726 -109.17417 12.651217
## Milan 260.878046 416.67381 -171.52428 20.926369
## Munich 587.675679 81.18224 -75.88485 13.080496
## Paris -156.836257 -211.13911 131.30852 27.089432
## Rome 709.413282 1109.36665 -179.83052 -109.895049
## Stockholm 839.445911 -1836.79055 -541.35188 -108.755016
## Vienna 911.230500 205.93020 98.02313 62.375253
##
## $eig
## [1] 1.953838e+07 1.185656e+07 1.528844e+06 1.118742e+06 7.893472e+05
## [6] 5.816552e+05 2.623192e+05 1.925976e+05 1.450845e+05 1.079673e+05
## [11] 5.139484e+04 -3.725290e-09 -9.496124e+03 -5.305820e+04 -1.322166e+05
## [16] -2.573360e+05 -3.326719e+05 -5.162523e+05 -9.191491e+05 -1.006504e+06
## [21] -2.251844e+06
##
## $x
## NULL
##
## $ac
## [1] 0
##
## $GOF
## [1] 0.8173197 0.9411060
# Using the P_k criterion for a variety of values of k
# to choose the appropriate amount of dimension reduction:
# If we are considering values of k from 1 to, say, 12:
max.k <- 12
P.k <- rep(0,max.k)
SStress <- rep(0,max.k)
for (kk in 1:max.k){
my.mds.kk <- cmdscale(eurodist,k=kk,eig=T)
P.k[kk] <- my.mds.kk$GOF[1]
#SStress[kk] <- ( sum( (eurodist^2 - (dist(my.mds.kk$points))^2)^2 )/sum(eurodist^4) )^0.5
}
## Warning in cmdscale(eurodist, k = kk, eig = T): only 11 of the first 12
## eigenvalues are > 0
cbind(1:max.k,P.k)
## P.k
## [1,] 1 0.4690928
## [2,] 2 0.7537543
## [3,] 3 0.7904600
## [4,] 4 0.8173197
## [5,] 5 0.8362709
## [6,] 6 0.8502358
## [7,] 7 0.8565337
## [8,] 8 0.8611578
## [9,] 9 0.8646411
## [10,] 10 0.8672332
## [11,] 11 0.8684672
## [12,] 12 0.8684672
# It looks like 2 or 3 dimensions would be reasonable, and 4 gives quite a good fit.
# A 2-D representation of the solution for k=2:
par(pty="s") # Creates a square plotting region
## The reason for the (-2500, 2500) limits below is because the coordinates in the first two dimensions,
## which can be seen by typing: euro.mds.2$points
## range from -2048.449113 to 2290.274680
## Making the axes go from -2500 to 2500 allows all the labels to fit nicely on the plot.
## For other data sets, you'll have to adjust these numbers.
## You can look at the '$points' component of the output and see the smallest and largest values
## and use limits that add a little space beyond this range.
plot(euro.mds.2$points[,1], euro.mds.2$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-2500,2500), ylim=c(-2500,2500) )
text(euro.mds.2$points[,1], euro.mds.2$points[,2], labels=labels(eurodist) )
# Same thing, but using abbreviations:
euro.abb <- abbreviate(labels(eurodist))
plot(euro.mds.2$points[,1], euro.mds.2$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-2500,2500), ylim=c(-2500,2500) )
text(euro.mds.2$points[,1], euro.mds.2$points[,2], labels=euro.abb )
# Maybe rotating across the x-axis would produce a better reflection of reality!
plot(euro.mds.2$points[,1], -euro.mds.2$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-2500,2500), ylim=c(-2500,2500) )
text(euro.mds.2$points[,1], -euro.mds.2$points[,2], labels=euro.abb )
### A 3-D plot of the 3-D solution, including labels of the city names:
library(scatterplot3d)
s3d <- scatterplot3d(euro.mds.4$points[,1],euro.mds.4$points[,2],euro.mds.4$points[,3], # x y and z axis
color="blue", pch=19, # filled blue circles
type="h", # vertical lines to the x-y plane
main="3-D Scatterplot of 3-D MDS Solution",
xlab="Coordinate 1",
ylab="Coordinate 2",
zlab="Coordinate 3")
s3d.coords <- s3d$xyz.convert(euro.mds.4$points[,1],euro.mds.4$points[,2],euro.mds.4$points[,3]) # convert 3D coords to 2D projection
text(s3d.coords$x, s3d.coords$y, # x and y coordinates
labels=labels(eurodist), # text to plot
cex=.7, pos=4) # shrink text and place to right of points)
### This plot works well.
###############################################################################
###############################################################################
# Cola dissimilarity matrix:
# [Data from Table 5.1 (Subject 1)]
cola.diss <- matrix(c(
0,16,81,56,87,60,84,50,99,16,
16,0,47,32,68,35,94,87,25,92,
81,47,0,71,44,21,98,79,53,90,
56,32,71,0,71,98,57,73,98,83,
87,68,44,71,0,34,99,19,52,79,
60,35,21,98,34,0,99,92,17,44,
84,94,98,57,99,99,0,45,99,24,
50,87,79,73,19,92,45,0,84,18,
99,25,53,98,52,17,99,84,0,98,
16,92,90,83,79,44,24,18,98,0
),ncol=10,nrow=10,byrow=T)
# Using the P_k criterion for a variety of values of k
# to choose the appropriate amount of dimension reduction:
# If we are considering values of k from 1 to, say, 8:
max.k <- 8
P.k <- rep(0,max.k)
SStress <- rep(0,max.k)
for (kk in 1:max.k){
my.mds.kk <- cmdscale(cola.diss,k=kk,eig=T)
P.k[kk] <- my.mds.kk$GOF[1]
#SStress[kk] <- ( sum( (cola.diss^2 - (dist(my.mds.kk$points))^2)^2 )/sum(cola.diss^4) )^0.5
}
## Warning in cmdscale(cola.diss, k = kk, eig = T): only 6 of the first 7
## eigenvalues are > 0
## Warning in cmdscale(cola.diss, k = kk, eig = T): only 6 of the first 8
## eigenvalues are > 0
cbind(1:max.k,P.k)
## P.k
## [1,] 1 0.3420924
## [2,] 2 0.5150296
## [3,] 3 0.6512628
## [4,] 4 0.7382206
## [5,] 5 0.7811064
## [6,] 6 0.7814888
## [7,] 7 0.7814888
## [8,] 8 0.7814888
# Plotting SStress against k for several values of k:
#plot(1:max.k, SStress, type='b')
# It looks like we need at least 4 dimensions,
# certainly no more than 5.
cola.mds.4 <- cmdscale(cola.diss,k=4,eig=T)
cola.mds.4
## $points
## [,1] [,2] [,3] [,4]
## [1,] 21.92429 23.662190 -37.331091 19.351930
## [2,] -28.98172 39.274204 -8.781346 -4.040602
## [3,] -36.36491 1.133554 10.186102 3.592112
## [4,] 19.77363 47.041946 25.571121 3.779614
## [5,] -22.49050 -24.812275 25.578182 26.493603
## [6,] -37.11963 -18.724872 -30.423673 -6.507892
## [7,] 54.59627 -4.059733 16.301788 -34.683726
## [8,] 34.87852 -24.893987 18.417004 20.061233
## [9,] -48.89961 -13.275202 9.003597 -24.674885
## [10,] 42.68366 -25.345825 -28.521684 -3.371388
##
## $eig
## [1] 1.332804e+04 6.737696e+03 5.307690e+03 3.387906e+03 1.670848e+03
## [6] 1.489571e+01 -5.456968e-12 -2.135513e+03 -2.894177e+03 -3.483584e+03
##
## $x
## NULL
##
## $ac
## [1] 0
##
## $GOF
## [1] 0.7382206 0.9446336
# A representation of the 2-dimensional solution:
par(pty="s") # Creates a square plotting region
plot(cola.mds.4$points[,1], cola.mds.4$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-55,55), ylim=c(-55,55) )
text(cola.mds.4$points[,1], cola.mds.4$points[,2])
# Interesting: Note the positions of colas 1, 8, and 10 on the map
# and examine the pairwise dissimilarities for these three colas...
###############################################################################
###############################################################################
# Reading in the College data set from the 'ISLR' packages:
# install.packages("ISLR") # if needed
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.4.1
data(College)
LargeColleges <- subset(College, Apps >= 7600)
coll.abbs<- abbreviate(row.names(LargeColleges))
LargeColleges.numeric <- LargeColleges[,-1]
attach(LargeColleges.numeric)
std <- apply(LargeColleges.numeric,2,sd) # finding standard deviations of variables
# dividing each variable by its standard deviation:
LargeColleges.std <- sweep(LargeColleges.numeric,2,std,FUN="/")
# Calculating Euclidean distances between these 80 scaled observations:
coll.dist <- dist(LargeColleges.std)
# Using the P_k criterion for a variety of values of k
# to choose the appropriate amount of dimension reduction:
# If we are considering values of k from 1 to, say, 6:
max.k <- 6
P.k <- rep(0,max.k)
SStress <- rep(0,max.k)
for (kk in 1:max.k){
my.mds.kk <- cmdscale(coll.dist,k=kk,eig=T)
P.k[kk] <- my.mds.kk$GOF[1]
SStress[kk] <- ( sum( (coll.dist^2 - (dist(my.mds.kk$points))^2)^2 )/sum(coll.dist^4) )^0.5
}
cbind(1:max.k,P.k)
## P.k
## [1,] 1 0.4149158
## [2,] 2 0.5871127
## [3,] 3 0.6772569
## [4,] 4 0.7493639
## [5,] 5 0.8052328
## [6,] 6 0.8510800
# Plotting SStress against k for several values of k:
# This makes more sense when the MDS is based on Euclidean distances:
plot(1:max.k, SStress, type='b')
# Using 4 or 5 dimensions seems appropriate here.
coll.mds.5 <- cmdscale(coll.dist,k=5,eig=T)
# coll.mds.5
coll.mds.5$points[1:5,]
## [,1] [,2] [,3]
## Arizona State University Main campus 3.2044080 0.9462378 1.68927895
## Boston University 0.1058321 0.6824223 -2.88371613
## Bowling Green State University 2.5495834 -1.1376802 -0.04682925
## Brown University -4.8241776 0.6556161 0.69321611
## California Polytechnic-San Luis 1.7544313 -2.3028817 0.34116507
## [,4] [,5]
## Arizona State University Main campus 1.2855091 -0.8086297
## Boston University 1.9413748 0.6820729
## Bowling Green State University -0.8592628 -0.1361180
## Brown University -0.6029216 -0.5377545
## California Polytechnic-San Luis -1.0408904 0.7991447
# A representation of the 2-dimensional solution:
par(pty="s") # Creates a square plotting region
## The value of 'cex' controls how large the labels are printed on the plot.
plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-7,7), ylim=c(-7,7) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=coll.abbs, cex=0.7)
# Zooming in on the cluster of points in the middle:
plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-7,7), ylim=c(-4,4) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=coll.abbs, cex=0.7)
# Is it better to use the full row names as labels?
# The Zoomed-in plot (doesn't show Rutgers):
plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(-7,7), ylim=c(-4,4) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=row.names(LargeColleges), cex=0.5)
## Zooming in even more to see our favorite university:
plot(coll.mds.5$points[,1], coll.mds.5$points[,2], type='n',
xlab="Coordinate 1", ylab="Coordinate 2", xlim=c(0,4), ylim=c(-2,0) )
text(coll.mds.5$points[,1], coll.mds.5$points[,2], labels=row.names(LargeColleges), cex=0.7)
### A 3-D plot of the 3-D solution, including labels of the college names:
library(scatterplot3d)
s3d <- scatterplot3d(coll.mds.5$points[,1],coll.mds.5$points[,2],coll.mds.5$points[,3], # x y and z axis
color="blue", pch=19, # filled blue circles
type="h", # vertical lines to the x-y plane
main="3-D Scatterplot of 3-D MDS Solution",
xlab="Coordinate 1",
ylab="Coordinate 2",
zlab="Coordinate 3")
s3d.coords <- s3d$xyz.convert(coll.mds.5$points[,1],coll.mds.5$points[,2],coll.mds.5$points[,3]) # convert 3D coords to 2D projection
text(s3d.coords$x, s3d.coords$y, # x and y coordinates
labels=row.names(LargeColleges), # text to plot
cex=.5, pos=4) # shrink text 50% and place to right of points)
### Honestly there are too many observations for this plot to work well.
###############################################################################
###############################################################################
library(MASS) # loading the MASS package
# Nonmetric scaling on the cola distances:
cola.iso<-isoMDS(cola.diss, k = 2)
## initial value 20.673515
## iter 5 value 15.248310
## iter 10 value 14.248849
## iter 10 value 14.239367
## iter 10 value 14.236568
## final value 14.236568
## converged
plot(cola.iso$points, type = "n",xlab="Coordinate 1", ylab="Coordinate 2")
text(cola.iso$points, labels = as.character(1:nrow(cola.diss)))
abline(h=0); abline(v=0)
###############################
# Correspondence analysis #
###############################
# Several contingency tables from the book:
girls <- matrix(c(
21,21,14,13,8,
8,9,6,8,2,
2,3,4,10,10),byrow=T, ncol=5, nrow=3,
dimnames = list(c('nbf','bfns','bfs'),
c('AG1', 'AG2', 'AG3', 'AG4', 'AG5')))
smokemoms <- matrix(c(
50,315,24,4012,
9,40,6,459,
41,147,14,1594,
4,11,1,124), byrow=T, ncol=4, nrow=4,
dimnames = list(c('YNS','YS','ONS','OS'),
c('pd','pa','ftd', 'fta')))
hodgkin <- matrix(c(
74,18,12,
68,16,12,
154,54,58,
18,10,44), byrow=T, ncol=3, nrow=4,
dimnames = list(c('LP','NS','MC','LD'),
c('positive','partial','none')))
# One that's not in the book --- a snoring / heart disease data set:
snore.heart.data <- matrix(c(24,35,51,1355,603,416), nrow=2, ncol=3, byrow=TRUE,
dimnames = list(c("Yes", "No"), c("Never", "Occasionally", "~ Every Night")))
as.table(snore.heart.data)
## Never Occasionally ~ Every Night
## Yes 24 35 51
## No 1355 603 416
#########################################################
## Code to get the matrices of chi-squared distances
## for both rows and columns
#########################################################
################# BEGIN CODE ###########################
# Enter name of data matrix / contingency table:
mymat <- hodgkin
rr <- nrow(mymat); cc <- ncol(mymat)
row.sums <- apply(mymat,1,sum)
col.sums <- apply(mymat,2,sum)
N<-sum(row.sums)
pijrow <- matrix(0,nrow=rr,ncol=cc); pijcol <- matrix(0,nrow=rr,ncol=cc)
distmat.row <- matrix(0,nrow=rr,ncol=rr); distmat.col <- matrix(0,nrow=cc,ncol=cc)
for (i in 1:rr){
pijcol[i,] <- mymat[i,]/row.sums[i]
}
for (j in 1:cc){
pijrow[,j] <- mymat[,j]/col.sums[j]
}
for (i in 1:rr){
for (ii in 1:(i-1)) {
my.hold <- sqrt( sum( (N/col.sums)*(pijcol[i,]-pijcol[ii,])^2 ) )
distmat.row[i,ii] <- my.hold; distmat.row[ii,i] <- my.hold
}
}
for (j in 1:cc){
for (jj in 1:(j-1)) {
my.hold <- sqrt( sum( (N/row.sums)*(pijrow[,j]-pijrow[,jj])^2 ) )
distmat.col[j,jj] <- my.hold; distmat.col[jj,j] <- my.hold
}
}
round(distmat.col,digits=2)
## [,1] [,2] [,3]
## [1,] 0.00 0.23 0.91
## [2,] 0.23 0.00 0.73
## [3,] 0.91 0.73 0.00
round(distmat.row,digits=2)
## [,1] [,2] [,3] [,4]
## [1,] 0.00 0.03 0.28 1.19
## [2,] 0.03 0.00 0.27 1.17
## [3,] 0.28 0.27 0.00 0.93
## [4,] 1.19 1.17 0.93 0.00
################### END CODE ###########################
#########################################################
# A straightforward chi-square test for independence of rows and columns:
chisq.test(hodgkin)
##
## Pearson's Chi-squared test
##
## data: hodgkin
## X-squared = 75.89, df = 6, p-value = 2.517e-14
# We see that histological classification and response to treatment are clearly associated.
############################################################################
# A quick way to do correspondence analysis using the "corresp" function
# in the MASS library:
library(MASS) # loading the MASS package
corresp(hodgkin,nf=2) # The two-dimensional solution.
## First canonical correlation(s): 0.37200775 0.05167191
##
## Row scores:
## [,1] [,2]
## LP -0.79084450 -0.9084108
## NS -0.73631970 -1.1996887
## MC -0.07824262 1.0040902
## LD 2.41315356 -0.7978217
##
## Column scores:
## [,1] [,2]
## positive -0.6609412 -0.5258636
## partial -0.1675926 2.1122757
## none 1.7774573 -0.3323957
biplot(corresp(hodgkin,nf=2)); abline(h=0); abline(v=0)
# It apparently uses a slightly different algorithm than the book's method:
# While the results are close, they are not identical.
# Some conclusions:
# LP and NS have very similar row profiles.
# "No response" occurs especially often with LD.
# "Positive response" is associated with LP and with NS.
# "Partial response" is linked with MC.
# Examining the 1-dimensional solution can be useful in interpretation as well:
corresp(hodgkin,nf=1)
## First canonical correlation(s): 0.3720078
##
## Row scores:
## LP NS MC LD
## -0.79084450 -0.73631970 -0.07824262 2.41315356
##
## Column scores:
## positive partial none
## -0.6609412 -0.1675926 1.7774573
# Note "LD" and "none" both have large positive coordinates.
# Note "LP" (and "NS") and "positive" both have large negative coordinates.
# But "LD" and "positive" have opposite signs.
# To (nearly) replicate the book's examples in section 5.3 and in 5.3.1,
# replace "hodgkin" with "girls" or "smokemoms" in the above code.
# What are the substantive conclusions in the "smoking / motherhood" analysis?