Reading Data:
setwd("G:\\R Learning\\01042017\\exercise2")
df <- read.csv("mod_semeion.csv", header = F)
dim(df)
## [1] 1593 257
Plotting image of an arbitrary digit 1322:Reading digit 1322 into pixels:
pixels <- df[1322, 1:256]
converting “pixels” into matrix:
pixels <- as.matrix(pixels)
dim(pixels)
## [1] 1 256
Transforming pixels to 16 X 16 matrix:
pixels <- matrix(pixels, nrow = 16, ncol = 16)
dim(pixels)
## [1] 16 16
plotting the image present in 1322:
image(1:16, 1:16, pixels)
label for this image:
df[1322, 257]
## [1] 3
K-Means clustering of df[,1:256]:
cl <- kmeans(df[,1:256],centers = 10)
Output as the table:
table(cl$cluster, factor(df[,257]))
##
## 0 1 2 3 4 5 6 7 8 9
## 1 2 0 3 0 112 2 3 1 0 1
## 2 1 2 100 34 1 6 8 0 16 5
## 3 0 45 0 5 24 0 2 24 2 0
## 4 0 69 22 4 3 0 0 10 1 3
## 5 146 0 0 0 0 0 23 0 1 0
## 6 1 0 0 2 7 72 1 3 9 75
## 7 7 1 2 0 6 8 118 0 1 0
## 8 0 43 3 0 8 6 4 119 5 8
## 9 2 0 29 9 0 3 0 1 101 19
## 10 2 2 0 105 0 62 2 0 19 47
Plotting a mosaic graph:
tb<-table(cl$cluster, factor(df[,257]))
mosaicplot(tb,xlab = "Clusters", ylab = "Digits")
From the above table and graph, we can observer that Cluster 1 has highest number of 3 while Cluster 2 has highest number of 2 and same can be observed for other clusters.