1 Objective

Mengetahui sebuah data termasuk segmen/cluster mana dari hasil kmeans() yang sudah ada. Hal ini bisa dibayangkan seperti melakukan prediksi dengan metode k-NN.

2 Data

set.seed(2019)
idx <- sample(1:150, 100)
iris_train <- iris[idx,]
iris_test <- iris[-idx,]

Data iris_train akan digunakan untuk membuat clustering dan memperoleh centroid dari hasil kmenas(). Setelah itu data iris_test akan di-predict masuk ke cluster mana.

3 Segmentation with k-means

set.seed(2019)
cls <- kmeans(x = iris_train[,-5], centers = 3, nstart = 25, iter.max = 1000)
cls$size
[1] 35 41 24
cls$withinss
[1] 10.19200 28.05512 12.89750
cls$tot.withinss
[1] 51.14462
cls$centers
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     4.982857    3.408571     1.480000   0.2657143
2     5.963415    2.743902     4.412195   1.4365854
3     6.704167    3.050000     5.670833   2.1083333
table(iris_train$Species, cls$cluster)
            
              1  2  3
  setosa     35  0  0
  versicolor  0 31  1
  virginica   0 10 23

4 Predict

Selanjutnya kita akan menduga cluster dari masing-masing data pada iris_train berdasarkan hasil diatas. Kita harus membuat fungsi terlebih dahulu. Untuk menentukan cluster dari data baru ini, kita akan menghitung jaraknya terhadap centroid menggunakan jarak Euclidean.

\[ d = \sqrt{\sum_{i = 1}^n (x_i - y_i)^2} \]

pred.cluster <- function(x, centers) {
  # compute euclidean distance from each sample to each cluster center
  tmp <- sapply(seq_len(nrow(x)),
                function(i) apply(centers, 1,
                                  function(v) sqrt(sum((x[i, ]-v)^2))))
  max.col(-t(tmp))  # find index of min distance
}
center <- cls$centers
cluster <- pred.cluster(x = iris_test[,-5], centers = center)
table(iris_test$Species, cluster)
            cluster
              1  2  3
  setosa     15  0  0
  versicolor  0 17  1
  virginica   0  3 14
LS0tDQp0aXRsZTogICInUHJlZGljdCcgVGhlIENsdXN0ZXIgRm9yIE5ldyBEYXRhIg0KYXV0aG9yOiAiQnkgQWVwIEhpZGF5YXR1bG9oIg0KZGF0ZTogICAiMjAxOSBBdWd1c3QgMTYiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIG51bWJlcl9zZWN0aW9uczogeWVzDQogICAgdGhlbWU6IHNwYWNlbGFiDQogICAgZGZfcHJpbnQ6IHBhZ2VkDQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZGVwdGg6IDQNCiAgICB0b2NfZmxvYXQ6IHRydWUNCi0tLQ0KDQo8c3R5bGUgdHlwZT0idGV4dC9jc3MiPg0KDQpib2R5eyAvKiBOb3JtYWwgICovDQogICAgICBmb250LXNpemU6IDE0cHg7DQogIH0NCnRkIHsgIC8qIFRhYmxlICAqLw0KICBmb250LXNpemU6IDEycHg7DQp9DQpoMS50aXRsZSB7DQogIGZvbnQtc2l6ZTogMzhweDsNCiAgY29sb3I6IGxpZ2h0Ymx1ZTsNCiAgZm9udC13ZWlnaHQ6IGJvbGQ7DQp9DQpoMSB7IC8qIEhlYWRlciAxICovDQogIGZvbnQtc2l6ZTogMjRweDsNCiAgY29sb3I6IERhcmtCbHVlOw0KfQ0KaDIgeyAvKiBIZWFkZXIgMiAqLw0KICBmb250LXNpemU6IDIwcHg7DQogIGNvbG9yOiBEYXJrQmx1ZTsNCn0NCmgzIHsgLyogSGVhZGVyIDMgKi8NCiAgZm9udC1zaXplOiAxNnB4Ow0KIyAgZm9udC1mYW1pbHk6ICJUaW1lcyBOZXcgUm9tYW4iLCBUaW1lcywgc2VyaWY7DQogIGNvbG9yOiBEYXJrQmx1ZTsNCn0NCmg0IHsgLyogSGVhZGVyIDQgKi8NCiAgZm9udC1zaXplOiAxNHB4Ow0KICBjb2xvcjogRGFya0JsdWU7DQp9DQpjb2RlLnJ7IC8qIENvZGUgYmxvY2sgKi8NCiAgICBmb250LXNpemU6IDEycHg7DQp9DQpwcmUgeyAvKiBDb2RlIGJsb2NrIC0gZGV0ZXJtaW5lcyBjb2RlIHNwYWNpbmcgYmV0d2VlbiBsaW5lcyAqLw0KICAgIGZvbnQtc2l6ZTogMTJweDsNCn0NCjwvc3R5bGU+DQoNCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQoja25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFKQ0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG89VFJVRSwgcmVzdWx0cz0naG9sZCcsIHdhcm5pbmc9RkFMU0UsIGZpZy5zaG93PSdob2xkJywgbWVzc2FnZT1GQUxTRSkgDQpvcHRpb25zKHNjaXBlbiA9IDk5KQ0KYGBgDQoNCg0KIyBPYmplY3RpdmUNCg0KTWVuZ2V0YWh1aSBzZWJ1YWggZGF0YSB0ZXJtYXN1ayBzZWdtZW4vY2x1c3RlciBtYW5hIGRhcmkgaGFzaWwgYGttZWFucygpYCB5YW5nIHN1ZGFoIGFkYS4gSGFsIGluaSBiaXNhIGRpYmF5YW5na2FuIHNlcGVydGkgbWVsYWt1a2FuIHByZWRpa3NpIGRlbmdhbiBtZXRvZGUgay1OTi4NCg0KIyBEYXRhDQoNCmBgYHtyfQ0Kc2V0LnNlZWQoMjAxOSkNCmlkeCA8LSBzYW1wbGUoMToxNTAsIDEwMCkNCmlyaXNfdHJhaW4gPC0gaXJpc1tpZHgsXQ0KaXJpc190ZXN0IDwtIGlyaXNbLWlkeCxdDQpgYGANCg0KRGF0YSBgaXJpc190cmFpbmAgYWthbiBkaWd1bmFrYW4gdW50dWsgbWVtYnVhdCBjbHVzdGVyaW5nIGRhbiBtZW1wZXJvbGVoIGNlbnRyb2lkIGRhcmkgaGFzaWwgYGttZW5hcygpYC4gU2V0ZWxhaCBpdHUgZGF0YSBgaXJpc190ZXN0YCBha2FuIGRpLWBwcmVkaWN0YCBtYXN1ayBrZSBjbHVzdGVyIG1hbmEuDQoNCiMgU2VnbWVudGF0aW9uIHdpdGggay1tZWFucw0KDQpgYGB7cn0NCiMgc2V0LnNlZWQoMjAxOSkNCmNscyA8LSBrbWVhbnMoeCA9IGlyaXNfdHJhaW5bLC01XSwgY2VudGVycyA9IDMsIG5zdGFydCA9IDI1LCBpdGVyLm1heCA9IDEwMDApDQpjbHMkc2l6ZQ0KY2xzJHdpdGhpbnNzDQpjbHMkdG90LndpdGhpbnNzDQpjbHMkY2VudGVycw0KdGFibGUoaXJpc190cmFpbiRTcGVjaWVzLCBjbHMkY2x1c3RlcikNCmBgYA0KDQojIFByZWRpY3QNCg0KU2VsYW5qdXRueWEga2l0YSBha2FuIG1lbmR1Z2EgY2x1c3RlciBkYXJpIG1hc2luZy1tYXNpbmcgZGF0YSBwYWRhIGBpcmlzX3RyYWluYCBiZXJkYXNhcmthbiBoYXNpbCBkaWF0YXMuIEtpdGEgaGFydXMgbWVtYnVhdCBmdW5nc2kgdGVybGViaWggZGFodWx1LiBVbnR1ayBtZW5lbnR1a2FuIGNsdXN0ZXIgZGFyaSBkYXRhIGJhcnUgaW5pLCBraXRhIGFrYW4gbWVuZ2hpdHVuZyBqYXJha255YSB0ZXJoYWRhcCBjZW50cm9pZCBtZW5nZ3VuYWthbiBqYXJhayBgRXVjbGlkZWFuYC4NCg0KJCQNCmQgPSBcc3FydHtcc3VtX3tpID0gMX1ebiAoeF9pIC0geV9pKV4yfQ0KJCQNCg0KYGBge3J9DQpwcmVkLmNsdXN0ZXIgPC0gZnVuY3Rpb24oeCwgY2VudGVycykgew0KICAjIGNvbXB1dGUgZXVjbGlkZWFuIGRpc3RhbmNlIGZyb20gZWFjaCBzYW1wbGUgdG8gZWFjaCBjbHVzdGVyIGNlbnRlcg0KICB0bXAgPC0gc2FwcGx5KHNlcV9sZW4obnJvdyh4KSksDQogICAgICAgICAgICAgICAgZnVuY3Rpb24oaSkgYXBwbHkoY2VudGVycywgMSwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBmdW5jdGlvbih2KSBzcXJ0KHN1bSgoeFtpLCBdLXYpXjIpKSkpDQogIG1heC5jb2woLXQodG1wKSkgICMgZmluZCBpbmRleCBvZiBtaW4gZGlzdGFuY2UNCn0NCmBgYA0KDQpgYGB7cn0NCmNlbnRlciA8LSBjbHMkY2VudGVycw0KY2x1c3RlciA8LSBwcmVkLmNsdXN0ZXIoeCA9IGlyaXNfdGVzdFssLTVdLCBjZW50ZXJzID0gY2VudGVyKQ0KdGFibGUoaXJpc190ZXN0JFNwZWNpZXMsIGNsdXN0ZXIpDQpgYGANCg0K