Source: Analytics Edge Unit 6 Clustering

Techniques involved: Hierarchical clustering, visualize the cluster, K-Centroids Cluster analysis (KCCA)

Load the flower file which contains the matrix of pixel intensities of a flower image. Note that the default in R assumes that the first row in the dataset is the header. So if we didn’t specify that we have no headers in this case, we would have lost the information from the first row of the pixel intensity matrix.

Load the data

setwd("C:/Users/jzchen/Documents/Courses/Analytics Edge/Unit_6_Clustering")
flower <- read.csv("flower.csv", header = F)
str(flower)
## 'data.frame':    50 obs. of  50 variables:
##  $ V1 : num  0.0991 0.0991 0.1034 0.1034 0.1034 ...
##  $ V2 : num  0.112 0.108 0.112 0.116 0.108 ...
##  $ V3 : num  0.134 0.116 0.121 0.116 0.112 ...
##  $ V4 : num  0.138 0.138 0.121 0.121 0.112 ...
##  $ V5 : num  0.138 0.134 0.125 0.116 0.112 ...
##  $ V6 : num  0.138 0.129 0.121 0.108 0.112 ...
##  $ V7 : num  0.129 0.116 0.103 0.108 0.112 ...
##  $ V8 : num  0.116 0.103 0.103 0.103 0.116 ...
##  $ V9 : num  0.1121 0.0991 0.1078 0.1121 0.1164 ...
##  $ V10: num  0.121 0.108 0.112 0.116 0.125 ...
##  $ V11: num  0.134 0.125 0.129 0.134 0.129 ...
##  $ V12: num  0.147 0.134 0.138 0.129 0.138 ...
##  $ V13: num  0.000862 0.146552 0.142241 0.142241 0.133621 ...
##  $ V14: num  0.000862 0.000862 0.142241 0.133621 0.12931 ...
##  $ V15: num  0.142 0.142 0.134 0.121 0.116 ...
##  $ V16: num  0.125 0.125 0.116 0.108 0.108 ...
##  $ V17: num  0.1121 0.1164 0.1078 0.0991 0.0991 ...
##  $ V18: num  0.108 0.112 0.108 0.108 0.108 ...
##  $ V19: num  0.121 0.129 0.125 0.116 0.116 ...
##  $ V20: num  0.138 0.129 0.125 0.116 0.116 ...
##  $ V21: num  0.138 0.134 0.121 0.125 0.125 ...
##  $ V22: num  0.134 0.129 0.125 0.121 0.103 ...
##  $ V23: num  0.125 0.1207 0.1164 0.1164 0.0819 ...
##  $ V24: num  0.1034 0.1034 0.0991 0.0991 0.1034 ...
##  $ V25: num  0.0948 0.0905 0.0905 0.1034 0.125 ...
##  $ V26: num  0.0862 0.0862 0.0991 0.125 0.1422 ...
##  $ V27: num  0.086207 0.086207 0.103448 0.12931 0.000862 ...
##  $ V28: num  0.0991 0.1078 0.1164 0.1293 0.1466 ...
##  $ V29: num  0.116 0.134 0.134 0.121 0.142 ...
##  $ V30: num  0.121 0.138 0.142 0.129 0.138 ...
##  $ V31: num  0.121 0.134 0.142 0.134 0.129 ...
##  $ V32: num  0.116 0.134 0.129 0.116 0.112 ...
##  $ V33: num  0.108 0.112 0.116 0.108 0.108 ...
##  $ V34: num  0.1078 0.1078 0.1034 0.0991 0.1034 ...
##  $ V35: num  0.1078 0.1034 0.0991 0.0991 0.0991 ...
##  $ V36: num  0.1078 0.1034 0.1034 0.0905 0.0862 ...
##  $ V37: num  0.1078 0.1078 0.1034 0.0819 0.0733 ...
##  $ V38: num  0.0948 0.0991 0.0776 0.069 0.0733 ...
##  $ V39: num  0.0733 0.056 0.0474 0.0474 0.056 ...
##  $ V40: num  0.0474 0.0388 0.0431 0.0474 0.0603 ...
##  $ V41: num  0.0345 0.0345 0.0388 0.0474 0.0647 ...
##  $ V42: num  0.0259 0.0259 0.0345 0.0431 0.056 ...
##  $ V43: num  0.0259 0.0259 0.0388 0.0517 0.0603 ...
##  $ V44: num  0.0302 0.0302 0.0345 0.0517 0.0603 ...
##  $ V45: num  0.0259 0.0259 0.0259 0.0388 0.0474 ...
##  $ V46: num  0.0259 0.0172 0.0172 0.0259 0.0345 ...
##  $ V47: num  0.01724 0.01724 0.00862 0.02155 0.02586 ...
##  $ V48: num  0.0216 0.0129 0.0129 0.0172 0.0302 ...
##  $ V49: num  0.0216 0.0216 0.0216 0.0345 0.0603 ...
##  $ V50: num  0.0302 0.0345 0.0388 0.0603 0.0776 ...

Conver the data to matrix

flowerMatrix <- as.matrix(flower)
str(flowerMatrix)
##  num [1:50, 1:50] 0.0991 0.0991 0.1034 0.1034 0.1034 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:50] "V1" "V2" "V3" "V4" ...

To perform any type of clustering, we saw earlier that we would need to convert the matrix of pixel intensities to a vector that contains all the intensity values ranging from zero to one. And the clustering algorithm divides the intensity spectrum, the interval zero to one, into these joint clusters or intervals.

Convert to vector

flowerVector <- as.vector(flowerMatrix)
str(flowerVector)
##  num [1:2500] 0.0991 0.0991 0.1034 0.1034 0.1034 ...

Now you might be wondering why we can’t immediately convert the data frame flower to a vector.

flowerVector2 <- as.vector(flower)
str(flowerVector2)

The above command didn’t really convert the data frame to a vector. So converting the data to a matrix and then to the vector is a crucial step.

Hierarchical clustering

Calculate the pairwise distance

distance <- dist(flowerVector, method = "euclidean")

Build a cluster

clusterIntensity <- hclust(distance, method = "ward.D")

As a reminder, the Ward’s method is a minimum variance method, which tries to find compact and spherical clusters. We can think about it as trying to minimize the variance within each cluster and the distance among clusters. distance

Make a dendrogram

plot(clusterIntensity )

It seems that having two clusters or three clusters is reasonable in our case. We can actually visualize the cuts by plotting rectangles around the clusters on this tree.

rect.hclust(clusterIntensity, k = 3, border = "red")

Assign the data to clusters

Now let us split the data into these three clusters

flowerClusters <- cutree(clusterIntensity, k = 3)
flowerClusters
##    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [35] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [69] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [103] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [137] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [171] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [205] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [239] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [273] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [307] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [341] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3
##  [375] 2 2 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [409] 1 1 1 1 1 1 1 1 1 2 2 1 1 1 3 3 3 3 3 2 1 2 3 2 1 1 1 1 1 1 1 1 1 1
##  [443] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 3 3 3 3
##  [477] 3 2 2 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [511] 1 1 1 1 1 1 2 3 3 2 1 1 3 3 3 3 3 2 3 3 3 1 2 3 3 1 1 1 1 1 1 1 1 1
##  [545] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 2 1 2 3 3 3 1 1 3 3 3 3 3 2
##  [579] 3 3 2 2 3 3 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [613] 2 3 3 2 1 3 3 3 2 1 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1
##  [647] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 2 2 3 3 3 1 2 3 3 3 3 3 3 3
##  [681] 3 3 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
##  [715] 3 3 3 3 3 3 3 2 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1
##  [749] 1 1 1 1 1 1 1 1 1 1 1 2 3 2 1 1 2 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3
##  [783] 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 2 1 1
##  [817] 3 3 3 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 3 2 1 1 2 2 3 3 1 1 1 1 1 1 1 1
##  [851] 1 1 1 1 1 1 1 1 1 1 2 3 3 3 2 1 2 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3
##  [885] 2 1 2 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 2 2
##  [919] 3 3 3 3 3 2 2 3 3 3 3 3 3 3 3 2 1 3 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1
##  [953] 1 1 1 1 1 1 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 3 3 3 3 2 2 3 3
##  [987] 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3
## [1021] 3 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1055] 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 2 2
## [1089] 2 2 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2
## [1123] 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1
## [1157] 2 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 2
## [1191] 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 3 3 2 2 2 2 2 2
## [1225] 2 2 2 2 2 2 2 3 3 3 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1259] 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1
## [1293] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2
## [1327] 2 2 2 2 3 3 3 3 3 3 3 3 3 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3
## [1361] 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1
## [1395] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2
## [1429] 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3
## [1463] 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 2 1 1 1 1 1
## [1497] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 2 2 3 3 3 2 3
## [1531] 3 3 3 3 3 2 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 2 2 1
## [1565] 2 3 3 3 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 2 1 2 2 2 2 2 1 1 1 1 1 1
## [1599] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3
## [1633] 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 3 3
## [1667] 2 3 3 3 3 3 2 2 3 3 3 3 3 2 1 3 3 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1
## [1701] 1 1 1 1 1 1 1 1 1 1 1 2 3 3 2 2 3 3 3 3 3 3 1 3 3 3 3 3 3 2 1 2 3 3
## [1735] 3 3 3 3 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 1 3 3 3
## [1769] 3 3 3 1 1 3 3 3 3 3 3 2 1 1 2 3 3 3 3 2 3 2 1 1 1 1 1 1 1 1 1 1 1 1
## [1803] 1 1 1 1 1 1 1 1 1 2 2 1 2 3 3 3 3 3 2 1 2 3 3 2 3 3 3 2 1 1 1 1 3 3
## [1837] 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3
## [1871] 1 1 2 3 3 2 3 3 3 3 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1905] 1 1 1 1 1 1 1 1 1 1 2 3 3 3 3 3 1 1 2 3 3 1 3 3 3 3 1 1 1 1 1 1 1 1
## [1939] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 3 2 1 1
## [1973] 2 3 3 1 2 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2007] 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 3 3 1 1 3 3 3 1 1 1 1 1 1 1 1 1 1
## [2041] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2
## [2075] 3 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2109] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2143] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2177] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2211] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2245] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2279] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2313] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2347] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2381] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2415] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2449] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2483] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

And we see that flowerClusters is actually a vector that assigns each intensity value in the flower vector to a cluster.

Analyze the individual clusters

tapply(flowerVector, flowerClusters, mean)
##          1          2          3 
## 0.08574315 0.50826255 0.93147713

What we obtain is that the first cluster has a mean intensity value of 0.08, which is closest to zero, and this means that it corresponds to the darkest shape in our image. And then the third cluster here, which is closest to 1, corresponds to the fairest shade.

Now let’s see how the image was segmented. To output an image, we can use the image function in R, which takes a matrix as an input. But the variable flowerClusters, as we just saw, is a vector. So we need to convert it into a matrix.

Plot the segmented image

dim(flowerClusters) = c(50, 50)
str(flowerClusters)
##  int [1:50, 1:50] 1 1 1 1 1 1 1 1 1 1 ...
image(flowerClusters, axes = FALSE)

original image

Let us now check how the original image looked.

image(flowerMatrix, axes = FALSE, col = grey(seq(0,1, length = 256)))

Segment an MRI brain image

Load the data

healthy <- read.csv("healthy.csv", header = F)
healthyMatrix <- as.matrix(healthy)
str(healthyMatrix)
##  num [1:566, 1:646] 0.00427 0.00855 0.01282 0.01282 0.01282 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:646] "V1" "V2" "V3" "V4" ...

Plot the image

image(healthyMatrix, axes = F, col = grey(seq(0,1, length = 256)))

Convert to vector

healthyVector <- as.vector(healthyMatrix)
str(healthyVector)
##  num [1:365636] 0.00427 0.00855 0.01282 0.01282 0.01282 ...

The vector is very long and if we want to calculate the distance between each pair of points, we have to calculate 365636*(365636-1)/2 distances. Therefore, the following command won’t be executed.

distance <- dist(healthyVector, method = "euclidean")

Here we have to use a different clustering method – k means clustering

K-means clustering

The first step in k-means clustering involves specifying the number of clusters, k. But how do we select k? Well, our clusters would ideally assign each point in the image to a tissue class or a particular substance, for instance, grey matter or white matter, and so on. And these substances are known to the medical community. So setting the number of clusters depends on exactly what you’re trying to extract from the image. For the sake of our example, let’s set the number of clusters here, k, to five. And since the k-means clustering algorithm starts by randomly assigning points to clusters, we should set the seed

Choose K

k = 5
set.seed(1)

Since the k-means is an iterative method that could take very long to converge, we need to set a maximum number of iterations.

Build clusters

KMC <- kmeans(healthyVector, centers = k, iter.max = 1000)
str(KMC)
## List of 9
##  $ cluster     : int [1:365636] 3 3 3 3 3 3 3 3 3 3 ...
##  $ centers     : num [1:5, 1] 0.4818 0.1062 0.0196 0.3094 0.1842
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:5] "1" "2" "3" "4" ...
##   .. ..$ : NULL
##  $ totss       : num 5775
##  $ withinss    : num [1:5] 96.6 47.2 39.2 57.5 62.3
##  $ tot.withinss: num 303
##  $ betweenss   : num 5472
##  $ size        : int [1:5] 20556 101085 133162 31555 79278
##  $ iter        : int 2
##  $ ifault      : int 0
##  - attr(*, "class")= chr "kmeans"

The first, and most important, piece of information that we get, is the cluster vector. Which assigns each intensity value in the healthy vector to a cluster. In this case, it will be giving them values 1 through 5, since we have 5 clusters. Now recall that to output the segmented image, we need to extract this vector.

healthyClusters <- KMC$cluster

Now how can we obtain the mean intensity value within each of our 5 clusters? In hierarchical clustering, we needed to do some manual work, and use the tapply function to extract this information. In this case, we have the answers ready, under the vector centers. In fact, for instance, the mean intensity value of the first cluster is 0.48, and the mean intensity value of the last cluster is 0.18.

KMC$centers[2]
## [1] 0.1061945

One last piece of intersting informationi we can get from the summary is the size of the cluster. For instance, the largest cluster that we have is the third one, which combines 133,000 values in it. And interestingly, it’s the one that has the smallest mean intensity value, which means that it corresponds to the darkest shade in our image. Actually, if we look at all the mean intensity values, we can see that they are all less than 0.5. So they’re all pretty close to 0. And this means that our images is pretty dark. If we look at our image again, it’s indeed very dark. And we have very few points that are actually white.

Output the segmented image

First convert the vector to a matrix

dim(healthyClusters) = c(nrow(healthyMatrix), ncol(healthyMatrix))

We’re going to invoke for color here, the rainbow palette in R. And the rainbow palette, or the function rainbow, takes as an input the number of colors that we want. In this case, the number of colors would correspond to the number of clusters.

image(healthyClusters, axes = F, col = rainbow(k))

The question now is, can we use the clusters, or the classes, found by our k-means algorithm on the healthy MRI imageto identify tumors in another MRI image of a sick patient?

Detecting tumors

The tumor.csv file corresponds to an MRI brain image of a patient with oligodendroglioma, a tumor that commonly occurs in the front lobe of the brain.

tumor <- read.csv("tumor.csv", header = F)
tumorMatrix <- as.matrix(tumor)
tumorVecotr <- as.vector(tumorMatrix)

Now, we will not run the k-means algorithm again on the tumor vector. Instead, we will apply the k-means clustering results that we found using the healthy brain image on the tumor vector. In other words, we treat the healthy vector as training set and the tumor vector as a testing set. The flexclust package contains the object class KCCA, which stands for K-Centroids Cluster Analysis. We need to convert the information from the clustering algorithm to an object of the class KCCA. And this conversion is needed before we can use the predict function on the test set tumorVector.

library(flexclust)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: modeltools
## Loading required package: stats4
KMC.kcca <-as.kcca(KMC, healthyVector)

as.kcca takes as a first input the original KMC variable that stored all the information from the k-means clustering function, and the second input is the data that we clustered. And in this case, it’s the training set, which is the healthyVector. Now that R finally finished creating the object KMC.kcca, we can cluster the pixels in the tumorVector using the predict function.

tumorClusters <- predict(KMC.kcca, newdata = tumorVecotr)

And now, the tumorClusters is a vector that assigns a value 1 through 5 to each of the intensity values in the tumorVector, as predicted by the k-means algorithm.

Output the segmented image

First convert the tumor clusters to a matrix

dim(tumorClusters) <- c(nrow(tumorMatrix), ncol(tumorMatrix))
image(tumorClusters, axes = F, col = rainbow(k))

Now when we compare this image to that of the healthy brain, we can see that there is an area that standed out in this new image.