Clustering with k-means and hclust

Let’s go over DataCamp Intro to Machine Learning Chapter 5 slides

iris_numeric <- iris %>% select(-Species)
set.seed(1)
km1 <- kmeans(iris[,1:4],centers= 3, nstart = 10)
#OR
set.seed(1)
km2 <- kmeans(iris_numeric,centers= 3, nstart = 10)
km1
K-means clustering with 3 clusters of sizes 50, 38, 62

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.006000    3.428000     1.462000    0.246000
2     6.850000    3.073684     5.742105    2.071053
3     5.901613    2.748387     4.393548    1.433871

Clustering vector:
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [75] 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2
[149] 2 3

Within cluster sum of squares by cluster:
[1] 15.15100 23.87947 39.82097
 (between_SS / total_SS =  88.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"         "iter"         "ifault"      
km1$cluster
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [75] 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2
[149] 2 3
table(iris$Species, km1$cluster)
            
              1  2  3
  setosa     50  0  0
  versicolor  0  2 48
  virginica   0 36 14
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, col=Species)) +
  geom_point()

iris
d <- dist(iris_numeric)
hier <-hclust(d)
plot(hier)

nrow(iris)
[1] 150
iris %>% head()

Scaling and standardizing data

scale(iris_numeric) %>% head()
     Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,]   -0.8976739  1.01560199    -1.335752   -1.311052
[2,]   -1.1392005 -0.13153881    -1.335752   -1.311052
[3,]   -1.3807271  0.32731751    -1.392399   -1.311052
[4,]   -1.5014904  0.09788935    -1.279104   -1.311052
[5,]   -1.0184372  1.24503015    -1.335752   -1.311052
[6,]   -0.5353840  1.93331463    -1.165809   -1.048667
mtcars %>% head()
scale(mtcars) %>% head()
                         mpg        cyl        disp         hp       drat           wt       qsec         vs         am       gear       carb
Mazda RX4          0.1508848 -0.1049878 -0.57061982 -0.5350928  0.5675137 -0.610399567 -0.7771651 -0.8680278  1.1899014  0.4235542  0.7352031
Mazda RX4 Wag      0.1508848 -0.1049878 -0.57061982 -0.5350928  0.5675137 -0.349785269 -0.4637808 -0.8680278  1.1899014  0.4235542  0.7352031
Datsun 710         0.4495434 -1.2248578 -0.99018209 -0.7830405  0.4739996 -0.917004624  0.4260068  1.1160357  1.1899014  0.4235542 -1.1221521
Hornet 4 Drive     0.2172534 -0.1049878  0.22009369 -0.5350928 -0.9661175 -0.002299538  0.8904872  1.1160357 -0.8141431 -0.9318192 -1.1221521
Hornet Sportabout -0.2307345  1.0148821  1.04308123  0.4129422 -0.8351978  0.227654255 -0.4637808 -0.8680278 -0.8141431 -0.9318192 -0.5030337
Valiant           -0.3302874 -0.1049878 -0.04616698 -0.6080186 -1.5646078  0.248094592  1.3269868  1.1160357 -0.8141431 -0.9318192 -1.1221521

Final Quiz

Next week we’ll have the final quiz, it will be about clustering

LS0tCnRpdGxlOiB8ICAgCiB8IERhdGEgQW5hbHlzaXMgYW5kIFZpc3VhbGl6YXRpb24gIAogfCBMZXNzb24gMTQgICAKIHwgQ2x1c3RlcmluZwphdXRob3I6ICJhbHBlciB5aWxtYXoiCmRhdGU6ICJEZWNlbWJlciAxOXRoLCAyMDE3IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIENsdXN0ZXJpbmcgd2l0aCBrLW1lYW5zIGFuZCBoY2x1c3QKCkxldCdzIGdvIG92ZXIgRGF0YUNhbXAgSW50cm8gdG8gTWFjaGluZSBMZWFybmluZyBDaGFwdGVyIDUgc2xpZGVzCgpgYGB7cn0KaXJpc19udW1lcmljIDwtIGlyaXMgJT4lIHNlbGVjdCgtU3BlY2llcykKc2V0LnNlZWQoMSkKa20xIDwtIGttZWFucyhpcmlzWywxOjRdLGNlbnRlcnM9IDMsIG5zdGFydCA9IDEwKQojT1IKc2V0LnNlZWQoMSkKa20yIDwtIGttZWFucyhpcmlzX251bWVyaWMsY2VudGVycz0gMywgbnN0YXJ0ID0gMTApCgprbTEKa20xJGNsdXN0ZXIKCnRhYmxlKGlyaXMkU3BlY2llcywga20xJGNsdXN0ZXIpCmBgYAoKCgoKYGBge3J9CmdncGxvdChpcmlzLCBhZXMoeD1TZXBhbC5MZW5ndGgsIHk9U2VwYWwuV2lkdGgsIGNvbD1TcGVjaWVzKSkgKwogIGdlb21fcG9pbnQoKQoKaXJpcwoKYGBgCgoKCmBgYHtyfQpkIDwtIGRpc3QoaXJpc19udW1lcmljKQpoaWVyIDwtaGNsdXN0KGQpCnBsb3QoaGllcikKbnJvdyhpcmlzKQppcmlzICU+JSBoZWFkKCkKYGBgCgojIFNjYWxpbmcgYW5kIHN0YW5kYXJkaXppbmcgZGF0YQoKYGBge3J9CnNjYWxlKGlyaXNfbnVtZXJpYykgJT4lIGhlYWQoKQpgYGAKCmBgYHtyfQptdGNhcnMgJT4lIGhlYWQoKQpgYGAKCmBgYHtyfQpzY2FsZShtdGNhcnMpICU+JSBoZWFkKCkKYGBgCgoKCiMgRmluYWwgUXVpegoKTmV4dCB3ZWVrIHdlJ2xsIGhhdmUgdGhlIGZpbmFsIHF1aXosIGl0IHdpbGwgYmUgYWJvdXQgY2x1c3RlcmluZwoKCg==