PCA and MDS

Principal Component Analysis (PCA) and classical Multidimensional Scaling (MDS) are closely related. The main difference is the type of input they take:

PCA: A dataset where each row is an observation, and each column is a feature.
MDS: A distance matrix.

Let’s look at an example:

# PCA on USArrests dataset
pca_out <- prcomp(USArrests)

With MDS, we need to compute the distance matrix first:

# Compute Euclidean distance between each observation
dmat <- dist(USArrests)
mds_out <- cmdscale(dmat, k = 4)

We can check that the output is the same (up to a sign):

all.equal(pca_out$x, -mds_out,
          check.attributes = FALSE)

## [1] TRUE

So what is the point of MDS, if there is an extra step but we get the same output? Sometimes, all we have to work with is a distance matrix:

# eurodist gives the road distances (in km) between 21 cities in Europe
mds_out <- cmdscale(eurodist, k = 2)

plot(mds_out[, 1], mds_out[, 2], type = "n",
     xlab = "MDS1", ylab = "MDS2", asp = 1)
text(mds_out[, 1], mds_out[, 2], 
     rownames(mds_out), cex = 0.6)

We can see from the plot that MDS recovers the geographic arrangement of European cities (with North and South flipped).

PCA and MDS

Max Turgeon

21/09/2021