Note: This is a working paper which will be expanded/updated frequently. The directory gifi.stat.ucla.edu/full has a pdf copy of this article, the complete Rmd file with all code chunks, the bib file, the LaTeX style file, and the R source code.
over the n×p configurations X (Kruskal (1964a), Kruskal (1964b)). Symmetric matrix W={wij} has known non-negative weights, symmetric matrix Δ={δij} has non-negative dissimilarities, and the symmetric matrix-valued function D(X)={dij(X)} has Euclidean distances between the points, i.e. the rows of X.
In the usual applications of MDS the user chooses dimensionality p to be much smaller than n, because the name of the game is dimensionality reduction. In this paper we give a detailed analysis of full dimensional scaling or FDS, which is MDS with p=n.
Now expand stress. If we assume without loss of generality that ∑∑1≤i<j≤nwijδ2ij=1, and we define
ρ(X):=∑∑1≤i<j≤nwijδijdij(X),η(X):=∑∑1≤i<j≤nwijd2ij. Now σ(X)=1−ρ(X)+12η(X). To develop some convenient matrix notation we define V:=∑∑1≤i<j≤nwijAij,B(X):=∑∑1≤i<j≤n{wijδijdij(X)Aij if dij(X)>0,0 if dij(X)=0, so that ρ(X)=tr X′B(X)X,η(X)=tr X′VX.Note that V has off-diagonal elements equal to −wij and B(X) has off-diagonal elements bij(X)=−wijδijdij(X) if dij(X)>0 and zero otherwise. Diagonal elements are filled to make rows and columns sum to zero. Both V and B(X) are doubly-centered and positive semi-definite. If we assume without loss of generality that the MDS problem is irredicible (De Leeuw (1977)) then rank(V)=n−1 and the only vectors in the null space are multiples of e:=(1,1,⋯,1). If all weights are eual to one then V=nI−ee′.
De Leeuw (1984) proves a key necessary condition for a local minimum in MDS. Also see De Leeuw, Groenen, and Mair (2016) for a more recent proof. We also review some additional results on maxima and minima of stress.
Theorem 1: [De Leeuw 84, De Leeuw 93, De Leeuw and Groenen 97]
Let’s look at a small example with four points, all dissimilarities equal, and all weights equal to one. There is a local minimum with four points in the corners of a square, with stress equal to 0.0285955. And there is another local minimum with three points forming an equilateral triangle, and the fourth point in the center. This has stress 0.0669873. We can compute stress for all points of the form αX+βY, where X and Y are the two local minima. Figure 1 has a contour plot of σ(α,β), showing the local minima at (1,0) and (0,1), and the local maximum at (0,0).
and d2ij(C):=tr AijC.
We call the space of all positive semi-definite n×n matrices cross product space. The problem of minimizing σ over n×p-dimensional configuration space is equivalent to the problem of minimizing σ over the set of matrices C in n×n-dimensional cross product space that have rank less than or equal to p. The corresponding solutions are related by the simple relationship C=XX′.
Theorem 2: [De Leeuw 93] Stress is convex on cross product space.
Proof: First, η is linear in C. Second, ρ(C)=∑∑1≤i<j≤nwijδij√tr AijC. This is the weighted sum of square roots of non-negative functions that are linear in C, and it is consequently concave. Thus σ is convex. QED
Unfortunately the subset of cross product space of all matrices with rank less than or equal to p is far from simple (see Datorro (2015)), so computational approaches to MDS prefer to work in configuration space.
Cross product space, the set of all positive semi-definite matrices, is a closed convex cone K in the linear space of all n×n symmetric matrices. This has an interesting consequence.
Theorem 3: [De Leeuw 93] Full-dimensional scaling, i.e. minimizing σ over K, is a convex programming problem. Thus in FDS all local minima are global. If wijδij>0 for all i,j then the minimum is unique.
This result has been around since about 1985. De Leeuw (1993) gives a proof, but the report it appeared in remained unpublished. A published proof is in De Leeuw and Groenen (1997). Another treatment of FDS, with a somewhat different emphasis, is in De Leeuw (2014).
Now, by a familiar theorem (Theorem 31.4 in Rockafellar (1970)), a matrix C minimizes σ over K if and only if C∈K,V−B(C)∈K,tr C(V−B(C))=0.We give a computational proof of this result for FDS that actually yields a bit more.
Theorem 4: [Expansion] For Δ∈K we have σ(C+ϵΔ)=σ(C)−2ϵ12∑tr AiC=0wiδi√tr AiΔ+ϵ tr (V−B(C))Δ+o(ϵ).Proof: Simple expansion. QED
Theorem 5: [Convex]: Suppose C is a solution to the problem of minimizing σ over K. Then
Proof: The ϵ12 term in (14) needs to vanish at a local minimum. This proves the first part. It follows that at a local minimum
σ(C+ϵΔ)=σ(C)+ϵ tr (V−B(C))Δ+o(ϵ).If V−B(C) is not positive semi-definite, then there is a Δ∈K such that tr (V−B(C))Δ<0. Thus C cannot be the minimum, which proves the second part. If we choose Δ=C we find
σ((1+ϵ)C)=σ(C)+ϵ tr (V−B(C))C+o(ϵ).and choosing ϵ small and negative shows we must have tr (V−B(C))C=0 for C to be a minimum. This proves the third part. Finally, if σ has a minimum at C, and C is positive definite, then from parts 2 and 3 we have V=B(C). Comparing off-diagonal elements shows Δ=D(C), and thus σ(C)=0. QED
If C is the solution of the FDS problem, then rank(C) defines the Gower rank of the dissimilarities. The number of positive eigenvalues of the negative of the doubly-centered matrix of squared dissimilarities, the matrix factored in classical MDS, defines the Torgerson rank of the dissimilarities. The Gower conjecture is that the Gower rank is less than or equal to the Torgerson rank. No proof and no counter examples have been found.
We compute the FDS solution using the smacof algorithm X(k+1)=V+B(X(k))in the space of all n×n configurations, using the identity matrix as a default starting point. Since we work in configuration space, not in crossproduct space, this does not guarantee convergence to the unique FDS solution, but after convergence we can easily check the necessary and sufficient conditions of theorem 5.
As a small example, consider four points with all dissimilarities equal to one, except δ14 which is equal to three. Clearly the triangle inequality is violated, and thus there certainly is no perfect fit mapping into Euclidean space.
The FDS solution turns out to have rank two, thus the Gower rank is two. The singular values of the FDS solution are
## [1] 0.4508464709 0.2125310645 0.0000001303
Gower rank two also follows from the eigenvalues of the matrix B(C), which are
## [1] 1.0000000000 1.0000000000 0.9205543464
and the stationary equations are B(γ)γ=Uγ. We have already seen an example of σ on what can be called combination space in figure 1, where X and Y are two local minima. Another interesting example uses a configuration X and its gradient Y=Dσ(X). In that case finding the minimum of σ over combination space means finding the optimal step size in the steepest descent gradient method.
For a configuration with three points equally spaced on a horizontal line and the fourth point above the middle of the three
## [,1] [,2]
## [1,] -0.1006209 -0.1006209
## [2,] 0.0000000 -0.1006209
## [3,] 0.1006209 -0.1006209
## [4,] 0.0000000 0.3018627
we have gradient
## [,1] [,2]
## 1 0.3250910 -0.08772788
## 2 0.3638044 -0.07804082
## 3 -0.6888953 -0.08772788
## 4 0.0000000 0.25349657
The contour plot for the linear combinations is
Note that the plot shows various ridges, where stress is not differentiable because some of the distances in the linear combination are zero.
in this case, where we are interested in step size, we should look at linear combinations with α=1 and β<0, i.e. of the form X−|β|Dσ(X). Figure Figure 4 shows the initial descent and an optimal step size around .3.The Ekman (1954) color data give similarities between 14 colors.
## 434 445 465 472 490 504 537 555 584 600 610 628 651
## 445 0.86
## 465 0.42 0.50
## 472 0.42 0.44 0.81
## 490 0.18 0.22 0.47 0.54
## 504 0.06 0.09 0.17 0.25 0.61
## 537 0.07 0.07 0.10 0.10 0.31 0.62
## 555 0.04 0.07 0.08 0.09 0.26 0.45 0.73
## 584 0.02 0.02 0.02 0.02 0.07 0.14 0.22 0.33
## 600 0.07 0.04 0.01 0.01 0.02 0.08 0.14 0.19 0.58
## 610 0.09 0.07 0.02 0.00 0.02 0.02 0.05 0.04 0.37 0.74
## 628 0.12 0.11 0.01 0.01 0.01 0.02 0.02 0.03 0.27 0.50 0.76
## 651 0.13 0.13 0.05 0.02 0.02 0.02 0.02 0.02 0.20 0.41 0.62 0.85
## 674 0.16 0.14 0.03 0.04 0.00 0.01 0.00 0.02 0.23 0.28 0.55 0.68 0.76
We use three different transformations of the similarities to dissimilarities. The first is 1−x, the second (1−x)3 and the third 3√1−x. We need the following iterations to find the FDS solution (up to a change in loss of 1e-15).
## power = 1.00 itel = 6936 stress = 0.0000875293
## power = 3.00 itel = 171 stress = 0.0110248119
## power = 0.33 itel = 423 stress = 0.0000000000
For the same three solutions we compute singular values of the thirteen-dimensional FDS solution.
## [1] 0.1797609824 0.1454675297 0.0843865491 0.0777136109 0.0486123551
## [6] 0.0393576522 0.0236290817 0.0162344515 0.0072756171 0.0000031164
## [11] 0.0000000009 0.0000000000 0.0000000000
##
## [1] 0.2159661347 0.1549184093 0.0000000727 0.0000000041 0.0000000000
## [6] 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## [11] 0.0000000000 0.0000000000 0.0000000000
##
## [1] 0.1336126813 0.1139019875 0.0880453752 0.0851609618 0.0710424935
## [6] 0.0664988952 0.0561005006 0.0535112029 0.0492295395 0.0479964575
## [11] 0.0468628701 0.0410193579 0.0388896490
Thus the Gower ranks of the transformed dissimilarities are, repectively, nine (or ten), two, and thirteen. Note that for the second set of dissimilarities, with Gower rank two, the first two principal components of the thirteen-dimensional solution are the global minimizer in two dimensions. To illustrate the Gower rank in yet another way we give the thirteen non-zero eigenvalues of V+B(X), so that the Gower rank is the number of eigenvalues equal to one. All three solutions satisfy the necessary and sufficient conditions for a global FDS solution.
## [1] 1.0000000432 1.0000000222 1.0000000012 1.0000000005 1.0000000002
## [6] 1.0000000001 1.0000000000 1.0000000000 0.9999993553 0.9989115116
## [11] 0.9976821885 0.9942484083 0.9825147154
##
## [1] 1.0000000000 1.0000000000 0.9234970864 0.9079012130 0.8629365849
## [6] 0.8526920031 0.8298036209 0.8145561677 0.7932385763 0.7916517225
## [11] 0.7864426781 0.7476794757 0.7282682474
##
## [1] 1.0000000820 1.0000000241 1.0000000047 1.0000000009 1.0000000004
## [6] 1.0000000003 1.0000000001 1.0000000001 1.0000000001 1.0000000000
## [11] 0.9999999999 0.9999999689 0.9999999005
We also plot the first two principal components of the thirteen-dimensional FDS solution. Not surprisingly, they look most circular and regular for the solution with power three, because this actually is the global minimum over two-dimensional solutions. The other configurations still have quite a lot of variation in the remaining dimensions.
library(MASS)
torgerson <- function(delta, p = 2) {
doubleCenter <- function(x) {
n <- dim(x)[1]
m <- dim(x)[2]
s <- sum(x) / (n * m)
xr <- rowSums(x) / m
xc <- colSums(x) / n
return((x - outer(xr, xc, "+")) + s)
}
z <-
eigen(-doubleCenter((as.matrix (delta) ^ 2) / 2), symmetric = TRUE)
v <- pmax(z$values, 0)
return(z$vectors[, 1:p] %*% diag(sqrt(v[1:p])))
}
u <- function (i, n) {
return (ifelse (i == 1:n, 1, 0))
}
e <- function (i, j, n) {
d <- u (i, n) - u (j, n)
return (outer (d, d))
}
directSum <- function (x) {
m <- length (x)
nr <- sum (sapply (x, nrow))
nc <- sum (sapply (x, ncol))
z <- matrix (0, nr, nc)
kr <- 0
kc <- 0
for (i in 1:m) {
ir <- nrow (x[[i]])
ic <- ncol (x[[i]])
z[kr + (1:ir), kc + (1:ic)] <- x[[i]]
kr <- kr + ir
kc <- kc + ic
}
return (z)
}
repList <- function(x, n) {
z <- list()
for (i in 1:n)
z <- c(z, list(x))
return(z)
}
makeA <- function (n) {
m <- n * (n - 1) / 2
a <- list()
for (j in 1:(n - 1))
for (i in (j + 1):n) {
d <- u (i, n) - u (j, n)
e <- outer (d, d)
a <- c(a, list (e))
}
return (a)
}
makeD <- function (a, x) {
return (sapply (a, function (z)
sqrt (sum (x * (z %*% x)))))
}
makeB <- function (w, delta, d, a) {
n <- length (a)
m <- nrow (a[[1]])
b <- matrix (0, m , m)
for (i in 1:n)
b <- b + w[i] * (delta[i] / d[i]) * a[[i]]
return (b)
}
makeV <- function (w, a) {
n <- length (a)
m <- nrow (a[[1]])
v <- matrix (0, m, m)
for (i in 1:n)
v <- v + w[i] * a[[i]]
return (v)
}
inBetween <- function (alpha, beta, x, y, w, delta, a) {
z <- alpha * x + beta * y
d <- makeD (a, z)
return (sum (w * (delta - d) ^ 2))
}
biBase <- function (x, y, a) {
biBi <- function (x, y, v) {
a11 <- sum (x * (v %*% x))
a12 <- sum (x * (v %*% y))
a22 <- sum (y * (v %*% y))
return (matrix (c(a11, a12, a12, a22), 2, 2))
}
return (lapply (a, function (u) biBi (x, y, u)))
}
fullMDS <-
function (delta,
w = rep (1, length (delta)),
xini,
a,
itmax = 100,
eps = 1e-6,
verbose = TRUE) {
m <- length (a)
v <- makeV (w, a)
vv <- ginv (v)
xold <- xini
dold <- makeD (a, xini)
sold <- sum ((delta - dold) ^ 2)
bold <- makeB (w, delta, dold, a)
itel <- 1
repeat {
xnew <- vv %*% bold %*% xold
dnew <- makeD (a, xnew)
bnew <- makeB (w, delta, dnew, a)
snew <- sum ((delta - dnew) ^ 2)
if (verbose) {
cat (
formatC (itel, width = 4, format = "d"),
formatC (
sold,
digits = 10,
width = 13,
format = "f"
),
formatC (
snew,
digits = 10,
width = 13,
format = "f"
),
"\n"
)
}
if ((itel == itmax) || (abs(sold - snew) < eps))
break
itel <- itel + 1
xold <- xnew
dold <- dnew
sold <- snew
bold <- bnew
}
return (list (x = xnew, d = dnew, delta = delta, s = snew, b = bnew, v = v, itel = itel))
}
001 01/21/16 Made a start
002 01/23/16 Reorganization and further material
003 01/24/16 First upload. Figures, tables, examples.
004 01/25/16 Corrected small errors
005 01/25/16 Reference to metric transform theory
Datorro, J. 2015. Convex Optimization and Distance Geometry. Second Edition. Palo Alto, CA: Meebo Publishing.
De Leeuw, J. 1977. “Applications of Convex Analysis to Multidimensional Scaling.” In Recent Developments in Statistics, edited by J.R. Barra, F. Brodeau, G. Romier, and B. Van Cutsem, 133–45. Amsterdam, The Netherlands: North Holland Publishing Company. http://www.stat.ucla.edu/~deleeuw/janspubs/1977/chapters/deleeuw_C_77.pdf.
———. 1984. “Differentiability of Kruskal’s Stress at a Local Minimum.” Psychometrika 49: 111–13. http://www.stat.ucla.edu/~deleeuw/janspubs/1984/articles/deleeuw_A_84f.pdf.
———. 1993. “Fitting Distances by Least Squares.” Preprint Series 130. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/1993/reports/deleeuw_R_93c.pdf.
———. 2014. “Bounding, and Sometimes Finding, the Global Minimum in Multidimensional Scaling.” UCLA Department of Statistics. doi:10.13140/RG.2.1.5068.4248.
De Leeuw, J., and P.J.F. Groenen. 1997. “Inverse Multidimensional Scaling.” Journal of Classification 14 (3–21). http://www.stat.ucla.edu/~deleeuw/janspubs/1997/articles/deleeuw_groenen_A_97.pdf.
De Leeuw, J., P. Groenen, and P. Mair. 2016. “Differentiability of rStress at a Local Minimum.” doi:10.13140/RG.2.1.1249.8968.
Ekman, G. 1954. “Dimensions of Color Vision.” Journal of Psychology 38: 467–74.
Kruskal, J.B. 1964a. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis.” Psychometrika 29: 1–27.
———. 1964b. “Nonmetric Multidimensional Scaling: a Numerical Method.” Psychometrika 29: 115–29.
Maehara, H. 1986. “Metric Transforms of Finite Spaces and Connected Graphs.” Discrete Mathematics 61: 235–46.
Rockafellar, R.T. 1970. Convex Analysis. Princeton University Press.