Spectral clustering

Problem with K Mean Clustering

One of the most popular clustering algorithm is the k-means algorithm, which iteratively partitions n observations into k clusters.

The kmeans problem of clustering is defined as an optimization problem, where the objective is to find a partition that minimizes the total distance between observations and their cluster centers.

The algorithm is simple, and, in many cases, effective.

However, k-means fails to discover clusters with non-convex shapes.

Consider, for example, the figure below, where synthetic data is arranged in concentric circles.

#install.packages("mlbench")
library(mlbench)

set.seed(25)
obj <- mlbench.spirals(100,1,0.025)
my.data <-  4 * obj$x
plot(my.data)

Ideally the data can be clustered into 2 groups, where a group corresponds to a circle.

Using k-means to cluster this data gives an incorrect result.

kmm <- kmeans(my.data, 2)
plot(my.data, col = kmm$cluster)

Spectral Clustering: Outline

Consider the task of partitioning a set of n data points into clusters.

The procedure for spectral clustering is as follows:

Using R kernlab package

library(kernlab)

sc <- specc(my.data, centers=2)
plot(my.data, col=sc, pch=4)            # estimated classes (x)
points(my.data, col=obj$classes, pch=5) # true classes (<>)

Step-by-Step R Code

[A] Represent the data points as a similarity graph.

Compute the pairwise similarities between each of the n datapoints.

We may compute S for this dataset using the gaussian kernel:

#Choose Similarity Function 
s <- function(x1, x2, alpha=1) {
  exp(- alpha * norm(as.matrix(x1-x2), type="F"))
  #exp(- alpha * norm(as.matrix((x1-x2)^2), type="F"))    
  
}

#Choose Similarity 
make.similarity <- function(my.data, similarity) {
  N <- nrow(my.data)
  S <- matrix(rep(NA,N^2), ncol=N)
  for(i in 1:N) {
    for(j in 1:N) {
      S[i,j] <- similarity(my.data[i,], my.data[j,])
    }
  }
  S
}

S <- make.similarity(my.data, s)
S[1:8,1:8]

##             [,1]        [,2]        [,3]        [,4]        [,5]        [,6]
## [1,] 1.000000000 0.009005336 0.289228321 0.288069410 0.273396513 0.830105549
## [2,] 0.009005336 1.000000000 0.002607465 0.003036535 0.002510587 0.009109662
## [3,] 0.289228321 0.002607465 1.000000000 0.526515400 0.731119464 0.283445761
## [4,] 0.288069410 0.003036535 0.526515400 1.000000000 0.386525481 0.310707262
## [5,] 0.273396513 0.002510587 0.731119464 0.386525481 1.000000000 0.256728192
## [6,] 0.830105549 0.009109662 0.283445761 0.310707262 0.256728192 1.000000000
## [7,] 0.079421889 0.053381521 0.025779980 0.022879397 0.027919206 0.072337117
## [8,] 0.089694907 0.007625980 0.051989519 0.033755012 0.065391491 0.074919525
##            [,7]       [,8]
## [1,] 0.07942189 0.08969491
## [2,] 0.05338152 0.00762598
## [3,] 0.02577998 0.05198952
## [4,] 0.02287940 0.03375501
## [5,] 0.02791921 0.06539149
## [6,] 0.07233712 0.07491953
## [7,] 1.00000000 0.14270076
## [8,] 0.14270076 1.00000000

The next step is to compute an affinity matrix A based on S.

A must be made of positive values and be symmetric.

This is usually done by applying a k-nearest neighboor filter to build a representation of a graph connecting just the closest dataset points.

make.affinity <- function(S, n.neighboors=2) {
  N <- length(S[,1])

  if (n.neighboors >= N) {  # fully connected
    A <- S
  } else {
    A <- matrix(rep(0,N^2), ncol=N)
    for(i in 1:N) { # for each line
      # only connect to those points with larger similarity 
      best.similarities <- sort(S[i,], decreasing=TRUE)[1:n.neighboors]
      for (s in best.similarities) {
        j <- which(S[i,] == s)
        A[i,j] <- S[i,j]
        A[j,i] <- S[i,j] # to make an undirected graph, ie, the matrix becomes symmetric
      }
    }
  }
  A  
}

#In our class, A=W
A <- make.affinity(S, 3)  # use 3 neighboors (includes self)
A[1:8,1:8]

##           [,1] [,2]      [,3] [,4]      [,5]      [,6] [,7] [,8]
## [1,] 1.0000000    0 0.0000000    0 0.0000000 0.8301055    0    0
## [2,] 0.0000000    1 0.0000000    0 0.0000000 0.0000000    0    0
## [3,] 0.0000000    0 1.0000000    0 0.7311195 0.0000000    0    0
## [4,] 0.0000000    0 0.0000000    1 0.0000000 0.0000000    0    0
## [5,] 0.0000000    0 0.7311195    0 1.0000000 0.0000000    0    0
## [6,] 0.8301055    0 0.0000000    0 0.0000000 1.0000000    0    0
## [7,] 0.0000000    0 0.0000000    0 0.0000000 0.0000000    1    0
## [8,] 0.0000000    0 0.0000000    0 0.0000000 0.0000000    0    1

With this affinity matrix, clustering is replaced by a graph-partition problem, where connected graph components are interpreted as clusters.

This affinity matrix, clustering is replaced by a graph-partition problem, where connected graph components are interpreted as clusters. The graph must be partitioned such that edges connecting different clusters should have low weigths, and edges within the same cluster must have high values. Spectral clustering tries to construct this type of graph.

Degree matrix D:

where each diagonal value is the degree of the respective vertex and all other positions are zero:

r D <- diag(apply(A, 1, sum)) # row sum D[1:8,1:8]

##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8] ## [1,] 2.502838 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## [2,] 0.000000 2.236614 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## [3,] 0.000000 0.000000 2.446829 0.000000 0.000000 0.000000 0.000000 0.000000 ## [4,] 0.000000 0.000000 0.000000 2.422036 0.000000 0.000000 0.000000 0.000000 ## [5,] 0.000000 0.000000 0.000000 0.000000 2.436745 0.000000 0.000000 0.000000 ## [6,] 0.000000 0.000000 0.000000 0.000000 0.000000 2.640524 0.000000 0.000000 ## [7,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.474696 0.000000 ## [8,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.458517

Next we compute the unnormalized graph Laplacian (U=D−A) and/or a normalized version (L).

L has many variants:we use the unnormalized U.

(Existing Reference Tutorials for other variants of L): (1) http://www.di.fc.ul.pt/~jpn/r/spectralclustering/spectralclustering.html

(2) http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5b0%5d.pdf)

r U <- D - A round(U[1:12,1:12],1)

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] ##  [1,]  1.5  0.0  0.0  0.0  0.0 -0.8  0.0  0.0  0.0  -0.7   0.0   0.0 ##  [2,]  0.0  1.2  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0 ##  [3,]  0.0  0.0  1.4  0.0 -0.7  0.0  0.0  0.0  0.0   0.0   0.0   0.0 ##  [4,]  0.0  0.0  0.0  1.4  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0 ##  [5,]  0.0  0.0 -0.7  0.0  1.4  0.0  0.0  0.0  0.0   0.0   0.0   0.0 ##  [6,] -0.8  0.0  0.0  0.0  0.0  1.6  0.0  0.0  0.0  -0.8   0.0   0.0 ##  [7,]  0.0  0.0  0.0  0.0  0.0  0.0  1.5  0.0  0.0   0.0   0.0   0.0 ##  [8,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.5  0.0   0.0   0.0   0.0 ##  [9,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.3   0.0   0.0  -0.6 ## [10,] -0.7  0.0  0.0  0.0  0.0 -0.8  0.0  0.0  0.0   2.2  -0.7   0.0 ## [11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  -0.7   1.5   0.0 ## [12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0 -0.6   0.0   0.0   1.3

Finding Cluster

Suppose we want k clusters, the next step is to find the k smallest eigenvectors (ignoring the trivial constant eigenvector).

k   <- 3
evL <- eigen(U, symmetric=TRUE)
Z   <- evL$vectors[,(ncol(evL$vectors)-k+1):ncol(evL$vectors)]

Now, in this transformed space we may perform a standard k-means clustering to find the appropriate clusters: (Look at the row of Q =k eigen vectors corresponding to the smallest eigen-values)

library(stats)
km <- kmeans(Z, centers=k, nstart=5)
plot(my.data, col=km$cluster)

If we don’t know how much clusters there are, the eigenvalue spectrum has a gap that give us the value of k.

Choosing the optimal k is called rounding. Here, 2 gaps implies, k=2.

signif(evL$values,2)

##   [1]  3.4e+00  3.2e+00  3.0e+00  3.0e+00  2.9e+00  2.9e+00  2.9e+00  2.8e+00
##   [9]  2.8e+00  2.7e+00  2.7e+00  2.7e+00  2.6e+00  2.6e+00  2.6e+00  2.5e+00
##  [17]  2.5e+00  2.5e+00  2.5e+00  2.5e+00  2.5e+00  2.4e+00  2.4e+00  2.4e+00
##  [25]  2.4e+00  2.3e+00  2.3e+00  2.3e+00  2.3e+00  2.2e+00  2.2e+00  2.2e+00
##  [33]  2.1e+00  2.1e+00  2.0e+00  2.0e+00  2.0e+00  1.9e+00  1.9e+00  1.9e+00
##  [41]  1.8e+00  1.8e+00  1.7e+00  1.7e+00  1.7e+00  1.6e+00  1.6e+00  1.6e+00
##  [49]  1.5e+00  1.5e+00  1.4e+00  1.4e+00  1.3e+00  1.3e+00  1.2e+00  1.2e+00
##  [57]  1.1e+00  1.1e+00  1.0e+00  1.0e+00  9.6e-01  9.4e-01  8.6e-01  8.5e-01
##  [65]  7.8e-01  7.6e-01  6.9e-01  6.9e-01  6.2e-01  6.1e-01  5.4e-01  5.3e-01
##  [73]  4.7e-01  4.6e-01  4.0e-01  4.0e-01  3.4e-01  3.3e-01  2.8e-01  2.8e-01
##  [81]  2.3e-01  2.2e-01  1.8e-01  1.8e-01  1.4e-01  1.3e-01  1.0e-01  9.9e-02
##  [89]  7.0e-02  6.9e-02  4.5e-02  4.4e-02  2.5e-02  2.5e-02  1.1e-02  1.1e-02
##  [97]  2.8e-03  2.7e-03  1.6e-16 -4.3e-16

plot(1:10, rev(evL$values)[1:10], log="y")

## Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0 omitted from
## logarithmic plot

abline(v=2.25, col="red", lty=2) # there are just 2 clusters as expected

Spectral clustering

2022-07-19

Finding Cluster