| Problem with K Mean Clustering |
One of the most popular clustering algorithm is the k-means algorithm, which iteratively partitions n observations into k clusters.
The kmeans problem of clustering is defined as an optimization problem, where the objective is to find a partition that minimizes the total distance between observations and their cluster centers.
The algorithm is simple, and, in many cases, effective.
However, k-means fails to discover clusters with non-convex shapes.
Consider, for example, the figure below, where synthetic data is arranged in concentric circles.
#install.packages("mlbench")
library(mlbench)
set.seed(25)
obj <- mlbench.spirals(100,1,0.025)
my.data <- 4 * obj$x
plot(my.data)
Ideally the data can be clustered into 2 groups, where a group corresponds to a circle.
Using k-means to cluster this data gives an incorrect result.
kmm <- kmeans(my.data, 2)
plot(my.data, col = kmm$cluster)
| Spectral Clustering: Outline |
Consider the task of partitioning a set of n data points into clusters.
The procedure for spectral clustering is as follows:
| Using R kernlab package |
library(kernlab)
sc <- specc(my.data, centers=2)
plot(my.data, col=sc, pch=4) # estimated classes (x)
points(my.data, col=obj$classes, pch=5) # true classes (<>)
| Step-by-Step R Code |
[A] Represent the data points as a similarity graph.
We may compute S for this dataset using the gaussian kernel:
#Choose Similarity Function
s <- function(x1, x2, alpha=1) {
exp(- alpha * norm(as.matrix(x1-x2), type="F"))
#exp(- alpha * norm(as.matrix((x1-x2)^2), type="F"))
}
#Choose Similarity
make.similarity <- function(my.data, similarity) {
N <- nrow(my.data)
S <- matrix(rep(NA,N^2), ncol=N)
for(i in 1:N) {
for(j in 1:N) {
S[i,j] <- similarity(my.data[i,], my.data[j,])
}
}
S
}
S <- make.similarity(my.data, s)
S[1:8,1:8]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.000000000 0.009005336 0.289228321 0.288069410 0.273396513 0.830105549
## [2,] 0.009005336 1.000000000 0.002607465 0.003036535 0.002510587 0.009109662
## [3,] 0.289228321 0.002607465 1.000000000 0.526515400 0.731119464 0.283445761
## [4,] 0.288069410 0.003036535 0.526515400 1.000000000 0.386525481 0.310707262
## [5,] 0.273396513 0.002510587 0.731119464 0.386525481 1.000000000 0.256728192
## [6,] 0.830105549 0.009109662 0.283445761 0.310707262 0.256728192 1.000000000
## [7,] 0.079421889 0.053381521 0.025779980 0.022879397 0.027919206 0.072337117
## [8,] 0.089694907 0.007625980 0.051989519 0.033755012 0.065391491 0.074919525
## [,7] [,8]
## [1,] 0.07942189 0.08969491
## [2,] 0.05338152 0.00762598
## [3,] 0.02577998 0.05198952
## [4,] 0.02287940 0.03375501
## [5,] 0.02791921 0.06539149
## [6,] 0.07233712 0.07491953
## [7,] 1.00000000 0.14270076
## [8,] 0.14270076 1.00000000
The next step is to compute an affinity matrix A based on S.
A must be made of positive values and be symmetric.
This is usually done by applying a k-nearest neighboor filter to build a representation of a graph connecting just the closest dataset points.
make.affinity <- function(S, n.neighboors=2) {
N <- length(S[,1])
if (n.neighboors >= N) { # fully connected
A <- S
} else {
A <- matrix(rep(0,N^2), ncol=N)
for(i in 1:N) { # for each line
# only connect to those points with larger similarity
best.similarities <- sort(S[i,], decreasing=TRUE)[1:n.neighboors]
for (s in best.similarities) {
j <- which(S[i,] == s)
A[i,j] <- S[i,j]
A[j,i] <- S[i,j] # to make an undirected graph, ie, the matrix becomes symmetric
}
}
}
A
}
#In our class, A=W
A <- make.affinity(S, 3) # use 3 neighboors (includes self)
A[1:8,1:8]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1.0000000 0 0.0000000 0 0.0000000 0.8301055 0 0
## [2,] 0.0000000 1 0.0000000 0 0.0000000 0.0000000 0 0
## [3,] 0.0000000 0 1.0000000 0 0.7311195 0.0000000 0 0
## [4,] 0.0000000 0 0.0000000 1 0.0000000 0.0000000 0 0
## [5,] 0.0000000 0 0.7311195 0 1.0000000 0.0000000 0 0
## [6,] 0.8301055 0 0.0000000 0 0.0000000 1.0000000 0 0
## [7,] 0.0000000 0 0.0000000 0 0.0000000 0.0000000 1 0
## [8,] 0.0000000 0 0.0000000 0 0.0000000 0.0000000 0 1
With this affinity matrix, clustering is replaced by a graph-partition problem, where connected graph components are interpreted as clusters.
This affinity matrix, clustering is replaced by a graph-partition problem, where connected graph components are interpreted as clusters. The graph must be partitioned such that edges connecting different clusters should have low weigths, and edges within the same cluster must have high values. Spectral clustering tries to construct this type of graph.
| Degree matrix D: |
| where each diagonal value is the degree of the respective vertex and all other positions are zero: |
r D <- diag(apply(A, 1, sum)) # row sum D[1:8,1:8] |
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] ## [1,] 2.502838 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## [2,] 0.000000 2.236614 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ## [3,] 0.000000 0.000000 2.446829 0.000000 0.000000 0.000000 0.000000 0.000000 ## [4,] 0.000000 0.000000 0.000000 2.422036 0.000000 0.000000 0.000000 0.000000 ## [5,] 0.000000 0.000000 0.000000 0.000000 2.436745 0.000000 0.000000 0.000000 ## [6,] 0.000000 0.000000 0.000000 0.000000 0.000000 2.640524 0.000000 0.000000 ## [7,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.474696 0.000000 ## [8,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 2.458517 |
| Next we compute the unnormalized graph Laplacian (U=D−A) and/or a normalized version (L). |
| L has many variants:we use the unnormalized U. |
| (Existing Reference Tutorials for other variants of L): (1) http://www.di.fc.ul.pt/~jpn/r/spectralclustering/spectralclustering.html |
| (2) http://www.kyb.mpg.de/fileadmin/user_upload/files/publications/attachments/Luxburg07_tutorial_4488%5b0%5d.pdf) |
r U <- D - A round(U[1:12,1:12],1) |
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] ## [1,] 1.5 0.0 0.0 0.0 0.0 -0.8 0.0 0.0 0.0 -0.7 0.0 0.0 ## [2,] 0.0 1.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ## [3,] 0.0 0.0 1.4 0.0 -0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ## [4,] 0.0 0.0 0.0 1.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ## [5,] 0.0 0.0 -0.7 0.0 1.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ## [6,] -0.8 0.0 0.0 0.0 0.0 1.6 0.0 0.0 0.0 -0.8 0.0 0.0 ## [7,] 0.0 0.0 0.0 0.0 0.0 0.0 1.5 0.0 0.0 0.0 0.0 0.0 ## [8,] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.5 0.0 0.0 0.0 0.0 ## [9,] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.3 0.0 0.0 -0.6 ## [10,] -0.7 0.0 0.0 0.0 0.0 -0.8 0.0 0.0 0.0 2.2 -0.7 0.0 ## [11,] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.7 1.5 0.0 ## [12,] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.6 0.0 0.0 1.3 |
Suppose we want k clusters, the next step is to find the k smallest eigenvectors (ignoring the trivial constant eigenvector).
k <- 3
evL <- eigen(U, symmetric=TRUE)
Z <- evL$vectors[,(ncol(evL$vectors)-k+1):ncol(evL$vectors)]
Now, in this transformed space we may perform a standard k-means clustering to find the appropriate clusters: (Look at the row of Q =k eigen vectors corresponding to the smallest eigen-values)
library(stats)
km <- kmeans(Z, centers=k, nstart=5)
plot(my.data, col=km$cluster)
If we don’t know how much clusters there are, the eigenvalue spectrum has a gap that give us the value of k.
Choosing the optimal k is called rounding. Here, 2 gaps implies, k=2.
signif(evL$values,2)
## [1] 3.4e+00 3.2e+00 3.0e+00 3.0e+00 2.9e+00 2.9e+00 2.9e+00 2.8e+00
## [9] 2.8e+00 2.7e+00 2.7e+00 2.7e+00 2.6e+00 2.6e+00 2.6e+00 2.5e+00
## [17] 2.5e+00 2.5e+00 2.5e+00 2.5e+00 2.5e+00 2.4e+00 2.4e+00 2.4e+00
## [25] 2.4e+00 2.3e+00 2.3e+00 2.3e+00 2.3e+00 2.2e+00 2.2e+00 2.2e+00
## [33] 2.1e+00 2.1e+00 2.0e+00 2.0e+00 2.0e+00 1.9e+00 1.9e+00 1.9e+00
## [41] 1.8e+00 1.8e+00 1.7e+00 1.7e+00 1.7e+00 1.6e+00 1.6e+00 1.6e+00
## [49] 1.5e+00 1.5e+00 1.4e+00 1.4e+00 1.3e+00 1.3e+00 1.2e+00 1.2e+00
## [57] 1.1e+00 1.1e+00 1.0e+00 1.0e+00 9.6e-01 9.4e-01 8.6e-01 8.5e-01
## [65] 7.8e-01 7.6e-01 6.9e-01 6.9e-01 6.2e-01 6.1e-01 5.4e-01 5.3e-01
## [73] 4.7e-01 4.6e-01 4.0e-01 4.0e-01 3.4e-01 3.3e-01 2.8e-01 2.8e-01
## [81] 2.3e-01 2.2e-01 1.8e-01 1.8e-01 1.4e-01 1.3e-01 1.0e-01 9.9e-02
## [89] 7.0e-02 6.9e-02 4.5e-02 4.4e-02 2.5e-02 2.5e-02 1.1e-02 1.1e-02
## [97] 2.8e-03 2.7e-03 1.6e-16 -4.3e-16
plot(1:10, rev(evL$values)[1:10], log="y")
## Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0 omitted from
## logarithmic plot
abline(v=2.25, col="red", lty=2) # there are just 2 clusters as expected