We analyze Fisher’s iris data set (which has been used to illustrate various classification methods). The data can be obtained directly in R. First, we rescale the variables by dividing with their standard deviations.

data(iris)
variris <- apply(iris[,-5],2,var)
iris.adjusted <- sweep(iris[,-5],2,sqrt(variris),"/")

classical Torgerson-Gower scaling.

Then we perform a classical Torgerson-Gower scaling. Here, we request a 2-dimensinal solution by minimizing the loss function “STRAIN”:

iris.scal <- cmdscale(dist(iris.adjusted),k=2,eig=T)

Let’s plot the results now:

library(MASS)
eqscplot(iris.scal$points,type="n")
text(iris.scal$point,row.names(iris),cex=.8)

eqscplot(iris.scal$points,type="n")
text(iris.scal$point,labels = iris[row.names(iris),5],cex=.8)

Since we use “rescaled Euclidean distance” to represent dissimilarity, by double centering the classical Torgerson-Gower MDS is comparable to PCA by using the correlation matrix (see my class slides). In this case, all the eigenvalues solved from the “doubly centered squared distance matrix” are clearly nonnegative. Let us check out the goodness-of-fit by looking at the proportion of variance explained by the first 2 dimensions (same as how we explain in PCA):

iris.scal$GOF
## [1] 0.9581321 0.9581321

which shows a good fit.

Remark: Note that there are two different ways for computing goodness of fit (see help(cmdscale)) when the eigenvalue is “negative”. This may happen when the dissimilarities do not represent Euclidean distances (e.g. some of them are negative). In this case, the doubly centered squared dissimilarity matrix can not be precisely expressed as XX’. The first output above uses the absolute value of the eigenvalue, while the second output uses the max(eigenvalue, 0) (i.e., use zero when it is negative). However, the two outputs here are the same since all eigenvalues are positive now.

The dimension reduction can be used to examine the “dissimilarity” between variables as well.

Here we use the correlation coefficient as a measure of “proximity” (similarity) between two variables.

Furthermore, the dissimilarity is taken to be the reciprocal of the correlation coefficient.

variable.scal <- cmdscale(1/cor(iris[,-5]),k=2,eig=T)
eqscplot(variable.scal$points,type="n")
text(variable.scal$point,row.names(cor(iris[,-5])),cex=.8)

Isotonic regression

A version of nonmetric scaling due to Kruskal and Shepard (using isotonic regression) is implemented in the isoMDS function in R.

library(MASS)
iris.iso <- isoMDS(dist(iris.adjusted[-102,]))
## initial  value 4.822105 
## iter   5 value 4.183883
## iter   5 value 4.182693
## iter   5 value 4.182693
## final  value 4.182693 
## converged

We are forced to exclude object 102 since the program complains that it is very close to object 143 (almost zero distance). Let’s plot the results.

eqscplot(iris.iso$points,type="n")
text(iris.iso$points,label=row.names(iris[-102,]),cex=.8)

Let’s calculate a measure of fit of the solution, the minimum value of the stress (in percentage), which is the square root of the ratio of the sum of squared differences between the input distances and those of the configuration to the sum of configuration distances squared (i.e. Kruskal’s Stress-1).

iris.iso$stress
## [1] 4.182693

which shows a fairly good fit.

We next take look at the scree plot of the “stress” (in percent): Let us run the following source code (copy and paste in the R prompt, press “enter”) for checking the stress up to dimension k.

================================

scree.plot = function(d, k) {
    stresses=isoMDS(d, k=k)$stress
    for(i in rev(seq(k-1)))  
        stresses=append(stresses,isoMDS(d, k=i)$stress)
    plot(seq(k),rev(stresses), type="b", xaxp=c(1,k, k-1), ylab="Stress", xlab="Number of dimensions")
    }

================================

Check out the scree plot up to 6 dimensions:

scree.plot(dist(iris.adjusted[-102,]), k=6)
## initial  value 0.000000 
## final  value 0.000000 
## converged
## initial  value 0.000000 
## final  value 0.000000 
## converged
## initial  value 0.000000 
## final  value 0.000000 
## converged
## initial  value 0.758598 
## iter   5 value 0.641253
## final  value 0.636005 
## converged
## initial  value 4.822105 
## iter   5 value 4.183883
## iter   5 value 4.182693
## iter   5 value 4.182693
## final  value 4.182693 
## converged
## initial  value 22.498863 
## iter   5 value 18.308776
## iter   5 value 18.304023
## iter   5 value 18.304023
## final  value 18.304023 
## converged

The scree plot shows a clear elbow at dimension = 3. However, it seems that a 2D solution is also adequate. Now we check out the Shepard diagram for a 2D solution:

iris.sh<-Shepard(dist(iris.adjusted[-102,]), iris.iso$points, p=2)
plot(iris.sh, pch=".")
lines(iris.sh$x, iris.sh$yf, type = "S")

The plot shows very little amount of spread around the isotonic step function, which also indicates a good fit of the 2D solution.

nonlinear mapping(sammon function)

Another version of nonmetric scaling is implemented in the sammon function in R, which uses “nonlinear mapping” instead of isotonic regression.

iris.sammon <- sammon(dist(iris.adjusted[-102,]),k=2)
## Initial stress        : 0.00963
## stress after  10 iters: 0.00622, magic = 0.500
## stress after  20 iters: 0.00621, magic = 0.500
eqscplot(iris.sammon$points,type="n")
text(iris.sammon$points,label=row.names(iris[-102,]),cex=.8)

Checking goodness of fit:

iris.sammon$stress
## [1] 0.006206077

which shows almost a perfect fit for a 2D solution.

Let us look at more on the scree plot and Shepard diagram. Run the following source code: ================================

scree.plot = function(d, k) {
    stresses=sammon(d, k=k)$stress
    for(i in rev(seq(k-1)))  
        stresses=append(stresses,sammon(d, k=i)$stress)
    plot(seq(k),rev(stresses), type="b", xaxp=c(1,k, k-1), ylab="Stress", xlab="Number of dimensions")
    }

================================

Check out the scree plot up to 6 dimensions:

scree.plot(dist(iris.adjusted[-102,]), k=6)
## Initial stress        : 0.00000
## stress after  10 iters: 0.00000, magic = 0.092
## Initial stress        : 0.00000
## stress after  10 iters: 0.00000, magic = 0.500
## Initial stress        : 0.00000
## stress after  10 iters: 0.00000, magic = 0.500
## Initial stress        : 0.00035
## stress after  10 iters: 0.00022, magic = 0.500
## stress after  20 iters: 0.00021, magic = 0.500
## Initial stress        : 0.00963
## stress after  10 iters: 0.00622, magic = 0.500
## stress after  20 iters: 0.00621, magic = 0.500
## Initial stress        : 0.12602
## stress after  10 iters: 0.08035, magic = 0.500
## stress after  20 iters: 0.07863, magic = 0.500
## stress after  30 iters: 0.07798, magic = 0.500
## stress after  40 iters: 0.07771, magic = 0.500
## stress after  50 iters: 0.07766, magic = 0.500

The scree plot shows a clear elbow at dimension = 2, which suggests that a 2D solution should be adequate. Now we check out the Shepard diagram for a 2D solution:

iris.sh<-Shepard(dist(iris.adjusted[-102,]), iris.sammon$points, p=2)
plot(iris.sh, pch=".")
lines(iris.sh$x, iris.sh$yf, type = "S")

The plot shows very little amount of spread around the fitted function, which also indicates a good fit of the 2D solution.