Rkmeans(x, centers, iter.max = 10, nstart = 1)
x: a scaled numeric data matrix (does not work with distance object)centers: integer value for number of clustersiter.max =10: how many iterations to perform, by default 10, recommend increasingnstart = 1: number of random starting points, recommend increasing| Criterion | k-means | PAM |
|---|---|---|
| Cluster center | Centroid (mean of points) | Medoid (actual point) |
| Optimization goal | Minimize sum of square Euclidean distances to cluster centroid | Minimize sum of pairwise dissimilarities between points and medoid |
| Speed | Fast | Slow |
| Robust to outliers | No | Yes |
| Distance metric | Euclidean only | Any |
| Cluster shapes | Spherical/convex | Spherical/convex |
Rcluster packagespam(x, k, nstart)
x: data matrix or dissimilarity matrix!k: number of clusters to identifynstart: number of random initial medoids to initialize algorithmmultishapes datamultishapes data set is in the factoextra package: x y shape
1 -0.8037393 -0.8530526 1
2 0.8528507 0.3676184 1
3 0.9271795 -0.2749024 1
4 -0.7526261 -0.5115652 1
5 0.7068462 0.8106792 1
6 1.0346985 0.3946550 1
kmeans to multishapes datakmeans to identify 5 clusters:pam to multishapes datapam to identify 5 clusters:library(tidyverse)
drugs <- read.csv('Data/IllicitDrug.csv') %>%
column_to_rownames('State')
head(drugs) DrugUse BingeDrink Poverty HSdrop Income
Alabama 3.3 15.6 14.5 12.6 22946
Alaska 8.5 19.8 9.4 10.9 28523
Arizona 5.4 17.4 16.6 14.4 25307
Arkansas 2.8 16.8 14.8 11.4 22114
California 6.2 16.7 15.4 14.2 29819
Colorado 6.5 19.8 9.2 9.8 31678
fviz_nbclust(drugs_scaled,
FUNcluster = kmeans,
method='wss',
) +
labs(title = 'Plot of WSS vs k using kmeans') +
fviz_nbclust(drugs_scaled,
FUNcluster = kmeans,
method='silhouette',
) +
labs(title = 'Plot of avg silhouette vs k using kmeans') +
fviz_nbclust(drugs_scaled,
FUNcluster = pam,
method='wss',
) +
labs(title = 'Plot of WSS vs k using PAM') +
fviz_nbclust(drugs_scaled,
FUNcluster = pam,
method='silhouette',
) +
labs(title = 'Plot of avg silhouette vs k using PAM')kmeans_biplot <- fviz_pca(drug_pca,
habillage = factor(kmeans4$cluster),
repel = TRUE) +
ggtitle('K-means 4-cluster solution') +
guides(color='none',shape='none')
pam_biplot <- fviz_pca(drug_pca,
habillage = factor(pam4$clustering),
repel = TRUE) +
ggtitle('PAM 4-cluster solution') +
guides(color='none',shape='none')
kmeans_biplot + pam_biplot