Let’s go over DataCamp Intro to Machine Learning Chapter 5 slides
iris_numeric <- iris %>% select(-Species)
set.seed(1)
km1 <- kmeans(iris[,1:4],centers= 3, nstart = 10)
#OR
set.seed(1)
km2 <- kmeans(iris_numeric,centers= 3, nstart = 10)
km1
K-means clustering with 3 clusters of sizes 50, 38, 62
Cluster means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.006000 3.428000 1.462000 0.246000
2 6.850000 3.073684 5.742105 2.071053
3 5.901613 2.748387 4.393548 1.433871
Clustering vector:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[75] 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2
[149] 2 3
Within cluster sum of squares by cluster:
[1] 15.15100 23.87947 39.82097
(between_SS / total_SS = 88.4 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss" "betweenss" "size" "iter" "ifault"
km1$cluster
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[75] 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 2 2 2 2 3 2 2 2 2 2 2 3 3 2 2 2 2 3 2 3 2 3 2 2 3 3 2 2 2 2 2 3 2 2 2 2 3 2 2 2 3 2 2 2 3 2
[149] 2 3
table(iris$Species, km1$cluster)
1 2 3
setosa 50 0 0
versicolor 0 2 48
virginica 0 36 14
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, col=Species)) +
geom_point()
iris
d <- dist(iris_numeric)
hier <-hclust(d)
plot(hier)
nrow(iris)
[1] 150
iris %>% head()
scale(iris_numeric) %>% head()
Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,] -0.8976739 1.01560199 -1.335752 -1.311052
[2,] -1.1392005 -0.13153881 -1.335752 -1.311052
[3,] -1.3807271 0.32731751 -1.392399 -1.311052
[4,] -1.5014904 0.09788935 -1.279104 -1.311052
[5,] -1.0184372 1.24503015 -1.335752 -1.311052
[6,] -0.5353840 1.93331463 -1.165809 -1.048667
mtcars %>% head()
scale(mtcars) %>% head()
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 0.1508848 -0.1049878 -0.57061982 -0.5350928 0.5675137 -0.610399567 -0.7771651 -0.8680278 1.1899014 0.4235542 0.7352031
Mazda RX4 Wag 0.1508848 -0.1049878 -0.57061982 -0.5350928 0.5675137 -0.349785269 -0.4637808 -0.8680278 1.1899014 0.4235542 0.7352031
Datsun 710 0.4495434 -1.2248578 -0.99018209 -0.7830405 0.4739996 -0.917004624 0.4260068 1.1160357 1.1899014 0.4235542 -1.1221521
Hornet 4 Drive 0.2172534 -0.1049878 0.22009369 -0.5350928 -0.9661175 -0.002299538 0.8904872 1.1160357 -0.8141431 -0.9318192 -1.1221521
Hornet Sportabout -0.2307345 1.0148821 1.04308123 0.4129422 -0.8351978 0.227654255 -0.4637808 -0.8680278 -0.8141431 -0.9318192 -0.5030337
Valiant -0.3302874 -0.1049878 -0.04616698 -0.6080186 -1.5646078 0.248094592 1.3269868 1.1160357 -0.8141431 -0.9318192 -1.1221521
Next week we’ll have the final quiz, it will be about clustering