Principal Component Analysis (PCA) and classical Multidimensional Scaling (MDS) are closely related. The main difference is the type of input they take:
Let’s look at an example:
# PCA on USArrests dataset
pca_out <- prcomp(USArrests)
With MDS, we need to compute the distance matrix first:
# Compute Euclidean distance between each observation
dmat <- dist(USArrests)
mds_out <- cmdscale(dmat, k = 4)
We can check that the output is the same (up to a sign):
all.equal(pca_out$x, -mds_out,
check.attributes = FALSE)
## [1] TRUE
So what is the point of MDS, if there is an extra step but we get the same output? Sometimes, all we have to work with is a distance matrix:
# eurodist gives the road distances (in km) between 21 cities in Europe
mds_out <- cmdscale(eurodist, k = 2)
plot(mds_out[, 1], mds_out[, 2], type = "n",
xlab = "MDS1", ylab = "MDS2", asp = 1)
text(mds_out[, 1], mds_out[, 2],
rownames(mds_out), cex = 0.6)
We can see from the plot that MDS recovers the geographic arrangement of European cities (with North and South flipped).