Kelompok 14 :
Kartika Nur Savira (24031554049)
Tara Tabriza Rachman (24031554107)
Dataset: https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset
Notebook ini membandingkan 5 algoritma clustering pada dataset gym members: K-means, K-medians, DBSCAN, Mean Shift, dan Fuzzy C-means. Data diproses menggunakan PCA sebelum clustering untuk mereduksi dimensi dan menghilangkan multikolinearitas antar fitur.
install.packages("psych")
install.packages("tidyverse")
install.packages("flexclust")
install.packages("dbscan")
install.packages("meanShiftR")
install.packages("e1071")
install.packages("cluster")
install.packages("fpc")
install.packages("mclust")
install.packages("factoextra")
install.packages("gridExtra")
install.packages("corrplot")# Import Data
data <- read.csv("/Users/savv/gym_members_exercise_tracking.csv")
knitr::kable(head(data))| Age | Gender | Weight..kg. | Height..m. | Max_BPM | Avg_BPM | Resting_BPM | Session_Duration..hours. | Calories_Burned | Workout_Type | Fat_Percentage | Water_Intake..liters. | Workout_Frequency..days.week. | Experience_Level | BMI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 56 | Male | 88.3 | 1.71 | 180 | 157 | 60 | 1.69 | 1313 | Yoga | 12.6 | 3.5 | 4 | 3 | 30.20 |
| 46 | Female | 74.9 | 1.53 | 179 | 151 | 66 | 1.30 | 883 | HIIT | 33.9 | 2.1 | 4 | 2 | 32.00 |
| 32 | Female | 68.1 | 1.66 | 167 | 122 | 54 | 1.11 | 677 | Cardio | 33.4 | 2.3 | 4 | 2 | 24.71 |
| 25 | Male | 53.2 | 1.70 | 190 | 164 | 56 | 0.59 | 532 | Strength | 28.8 | 2.1 | 3 | 1 | 18.41 |
| 38 | Male | 46.1 | 1.79 | 188 | 158 | 68 | 0.64 | 556 | Strength | 29.2 | 2.8 | 3 | 1 | 14.39 |
| 56 | Female | 58.0 | 1.68 | 168 | 156 | 74 | 1.59 | 1116 | HIIT | 15.5 | 2.7 | 5 | 3 | 20.55 |
## Dimensi: 973 x 15
## 'data.frame': 973 obs. of 15 variables:
## $ Age : int 56 46 32 25 38 56 36 40 28 28 ...
## $ Gender : chr "Male" "Female" "Female" "Male" ...
## $ Weight..kg. : num 88.3 74.9 68.1 53.2 46.1 ...
## $ Height..m. : num 1.71 1.53 1.66 1.7 1.79 1.68 1.72 1.51 1.94 1.84 ...
## $ Max_BPM : int 180 179 167 190 188 168 174 189 185 169 ...
## $ Avg_BPM : int 157 151 122 164 158 156 169 141 127 136 ...
## $ Resting_BPM : int 60 66 54 56 68 74 73 64 52 64 ...
## $ Session_Duration..hours. : num 1.69 1.3 1.11 0.59 0.64 1.59 1.49 1.27 1.03 1.08 ...
## $ Calories_Burned : num 1313 883 677 532 556 ...
## $ Workout_Type : chr "Yoga" "HIIT" "Cardio" "Strength" ...
## $ Fat_Percentage : num 12.6 33.9 33.4 28.8 29.2 15.5 21.3 30.6 28.9 29.7 ...
## $ Water_Intake..liters. : num 3.5 2.1 2.3 2.1 2.8 2.7 2.3 1.9 2.6 2.7 ...
## $ Workout_Frequency..days.week.: int 4 4 4 3 3 5 3 3 4 3 ...
## $ Experience_Level : int 3 2 2 1 1 3 2 2 2 1 ...
## $ BMI : num 30.2 32 24.7 18.4 14.4 ...
## Age Gender Weight..kg. Height..m.
## Min. :18.00 Length:973 Min. : 40.00 Min. :1.500
## 1st Qu.:28.00 Class :character 1st Qu.: 58.10 1st Qu.:1.620
## Median :40.00 Mode :character Median : 70.00 Median :1.710
## Mean :38.68 Mean : 73.85 Mean :1.723
## 3rd Qu.:49.00 3rd Qu.: 86.00 3rd Qu.:1.800
## Max. :59.00 Max. :129.90 Max. :2.000
## Max_BPM Avg_BPM Resting_BPM Session_Duration..hours.
## Min. :160.0 Min. :120.0 Min. :50.00 Min. :0.500
## 1st Qu.:170.0 1st Qu.:131.0 1st Qu.:56.00 1st Qu.:1.040
## Median :180.0 Median :143.0 Median :62.00 Median :1.260
## Mean :179.9 Mean :143.8 Mean :62.22 Mean :1.256
## 3rd Qu.:190.0 3rd Qu.:156.0 3rd Qu.:68.00 3rd Qu.:1.460
## Max. :199.0 Max. :169.0 Max. :74.00 Max. :2.000
## Calories_Burned Workout_Type Fat_Percentage Water_Intake..liters.
## Min. : 303.0 Length:973 Min. :10.00 Min. :1.500
## 1st Qu.: 720.0 Class :character 1st Qu.:21.30 1st Qu.:2.200
## Median : 893.0 Mode :character Median :26.20 Median :2.600
## Mean : 905.4 Mean :24.98 Mean :2.627
## 3rd Qu.:1076.0 3rd Qu.:29.30 3rd Qu.:3.100
## Max. :1783.0 Max. :35.00 Max. :3.700
## Workout_Frequency..days.week. Experience_Level BMI
## Min. :2.000 Min. :1.00 Min. :12.32
## 1st Qu.:3.000 1st Qu.:1.00 1st Qu.:20.11
## Median :3.000 Median :2.00 Median :24.16
## Mean :3.322 Mean :1.81 Mean :24.91
## 3rd Qu.:4.000 3rd Qu.:2.00 3rd Qu.:28.56
## Max. :5.000 Max. :3.00 Max. :49.84
Interpretasi:
Dataset terdiri dari 973 anggota gym dengan 15 variabel (13 numerik, 2 kategorikal:
GenderdanWorkout_Type).Fitur numerik yang tersedia mencakup data fisiologis (Age, Weight, Height, BMI, Fat_Percentage, BPM) dan perilaku latihan (Session_Duration, Calories_Burned, Water_Intake, Workout_Frequency, Experience_Level).
## Age Gender
## 0 0
## Weight..kg. Height..m.
## 0 0
## Max_BPM Avg_BPM
## 0 0
## Resting_BPM Session_Duration..hours.
## 0 0
## Calories_Burned Workout_Type
## 0 0
## Fat_Percentage Water_Intake..liters.
## 0 0
## Workout_Frequency..days.week. Experience_Level
## 0 0
## BMI
## 0
data_num <- data[, sapply(data, is.numeric)]
cat("Fitur numerik yang digunakan:", ncol(data_num), "\n")## Fitur numerik yang digunakan: 13
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 56 | 88.3 | 1.71 | 180 | 157 | 60 | 1.69 | 1313 | 12.6 | 3.5 | 4 | 3 | 30.20 |
| 46 | 74.9 | 1.53 | 179 | 151 | 66 | 1.30 | 883 | 33.9 | 2.1 | 4 | 2 | 32.00 |
| 32 | 68.1 | 1.66 | 167 | 122 | 54 | 1.11 | 677 | 33.4 | 2.3 | 4 | 2 | 24.71 |
| 25 | 53.2 | 1.70 | 190 | 164 | 56 | 0.59 | 532 | 28.8 | 2.1 | 3 | 1 | 18.41 |
| 38 | 46.1 | 1.79 | 188 | 158 | 68 | 0.64 | 556 | 29.2 | 2.8 | 3 | 1 | 14.39 |
| 56 | 58.0 | 1.68 | 168 | 156 | 74 | 1.59 | 1116 | 15.5 | 2.7 | 5 | 3 | 20.55 |
Interpretasi:
Terdapat 13 fitur numerik yang digunakan dan tidak ditemukan missing values pada seluruh kolom, sehingga data siap diproses tanpa perlu imputasi.
Nama-nama fitur diganti menjadi X1 - X13 agar lebih mudah ketika melakukan proses lebih lanjut.
Masing-masing variabel memiliki korelasi dan tidak bernilai 0.
## X1 X2 X3 X4 X5
## X1 1.000000000 -0.036339635 -0.027837495 -0.0170725970 0.0359691433
## X2 -0.036339635 1.000000000 0.365321203 0.0570611305 0.0097174780
## X3 -0.027837495 0.365321203 1.000000000 -0.0176598843 -0.0147762881
## X4 -0.017072597 0.057061130 -0.017659884 1.0000000000 -0.0397514432
## X5 0.035969143 0.009717478 -0.014776288 -0.0397514432 1.0000000000
## X6 0.004353714 -0.032138091 -0.005089864 0.0366474807 0.0596355022
## X7 -0.019911904 -0.013665561 -0.010205897 0.0100509814 0.0160144382
## X8 -0.154678760 0.095443473 0.086348051 0.0020900159 0.3396586672
## X9 0.002370051 -0.225511640 -0.235520936 -0.0090557315 -0.0073016551
## X10 0.041528359 0.394275710 0.393532902 0.0316206428 -0.0029106374
## X11 0.008055163 -0.011769328 -0.011269883 -0.0290990657 -0.0106807977
## X12 -0.018675927 0.003378528 -0.010266611 0.0005448337 -0.0008881572
## X13 -0.013691370 0.853157690 -0.159468750 0.0671052310 0.0216054995
## X6 X7 X8 X9 X10
## X1 0.004353714 -0.019911904 -0.154678760 0.002370051 0.041528359
## X2 -0.032138091 -0.013665561 0.095443473 -0.225511640 0.394275710
## X3 -0.005089864 -0.010205897 0.086348051 -0.235520936 0.393532902
## X4 0.036647481 0.010050981 0.002090016 -0.009055731 0.031620643
## X5 0.059635502 0.016014438 0.339658667 -0.007301655 -0.002910637
## X6 1.000000000 -0.016648808 0.016517951 -0.016834389 0.007725998
## X7 -0.016648808 1.000000000 0.908140376 -0.581519771 0.283410977
## X8 0.016517951 0.908140376 1.000000000 -0.597615248 0.356930683
## X9 -0.016834389 -0.581519771 -0.597615248 1.000000000 -0.588682834
## X10 0.007725998 0.283410977 0.356930683 -0.588682834 1.000000000
## X11 -0.007966891 0.644140366 0.576150125 -0.537059548 0.238562571
## X12 0.001757585 0.764768119 0.694129448 -0.654362613 0.304103549
## X13 -0.032542632 -0.006492647 0.059760826 -0.119257760 0.213696572
## X11 X12 X13
## X1 0.008055163 -0.0186759269 -0.013691370
## X2 -0.011769328 0.0033785279 0.853157690
## X3 -0.011269883 -0.0102666112 -0.159468750
## X4 -0.029099066 0.0005448337 0.067105231
## X5 -0.010680798 -0.0008881572 0.021605500
## X6 -0.007966891 0.0017575852 -0.032542632
## X7 0.644140366 0.7647681189 -0.006492647
## X8 0.576150125 0.6941294479 0.059760826
## X9 -0.537059548 -0.6543626129 -0.119257760
## X10 0.238562571 0.3041035494 0.213696572
## X11 1.000000000 0.8370787094 0.001644974
## X12 0.837078709 1.0000000000 0.016031073
## X13 0.001644974 0.0160310726 1.000000000
KMO/MSA memiliki nilai > 0.5
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.47
## MSA for each item =
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
## 0.02 0.35 0.16 0.63 0.06 0.25 0.50 0.49 0.86 0.75 0.80 0.79 0.29
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.54
## MSA for each item =
## X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
## 0.36 0.16 0.60 0.09 0.40 0.59 0.58 0.87 0.82 0.80 0.80 0.30
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.61
## MSA for each item =
## X2 X3 X4 X6 X7 X8 X9 X10 X11 X12 X13
## 0.36 0.16 0.54 0.19 0.74 0.75 0.87 0.83 0.80 0.79 0.30
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.72
## MSA for each item =
## X2 X4 X6 X7 X8 X9 X10 X11 X12 X13
## 0.51 0.53 0.19 0.74 0.75 0.85 0.73 0.80 0.78 0.50
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = r)
## Overall MSA = 0.72
## MSA for each item =
## X2 X4 X7 X8 X9 X10 X11 X12 X13
## 0.51 0.56 0.74 0.75 0.85 0.73 0.80 0.79 0.50
Metode principal component analysis atau PCA merupakan suatu teknik multivariat yang bertujuan untuk mereduksi faktor atau variabel dalam jumlah besar menjadi beberapa faktor yang lebih sedikit.
Scaling Z-score diperlukan karena rentang antar fitur sangat berbeda
— misalnya Calories_Burned bisa mencapai ribuan, sementara
Height (m) hanya berkisar 1.5–2.0. Tanpa scaling, fitur
dengan rentang besar akan mendominasi perhitungan jarak dan membuat
clustering menjadi bias.
## [1] 3.97241779 2.01436367 0.99301511 0.82277961 0.55528999 0.30782414 0.14308312
## [8] 0.11605509 0.07517148
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.099617240 -0.65940372 -0.06538785 -0.14384144 -0.008399245
## [2,] -0.005696882 -0.09097688 0.99424603 0.02099783 0.044865893
## [3,] -0.440116953 0.15513574 0.04407065 -0.18577363 -0.440966682
## [4,] -0.435887810 0.07399296 0.02110284 -0.12288184 -0.571382876
## [5,] 0.408796215 0.08320719 0.02566220 -0.36646008 -0.141992160
## [6,] -0.271758072 -0.29072163 -0.04086506 0.73988720 0.001629442
## [7,] -0.400789286 0.15495238 -0.02254232 -0.23976481 0.589730005
## [8,] -0.446212227 0.14249754 0.01358778 -0.16818811 0.329125078
## [9,] -0.075090869 -0.62578718 -0.04231333 -0.40218961 0.026782003
## [,6] [,7] [,8] [,9]
## [1,] -0.002632990 0.35104856 -0.6091448359 -0.189544735
## [2,] -0.001620776 0.02530538 -0.0008635083 0.008245426
## [3,] -0.067321913 -0.10161042 0.1160223900 -0.723191661
## [4,] -0.073596872 0.24264945 -0.0301603762 0.630316021
## [5,] -0.796334024 -0.12359334 -0.1450612238 0.019569357
## [6,] -0.509513725 -0.14481995 0.1091713262 -0.005232390
## [7,] -0.308731214 0.50252965 0.2472519228 -0.024420993
## [8,] -0.008113606 -0.63480224 -0.4634801799 0.161059984
## [9,] 0.030146211 -0.33935681 0.5528674818 0.129434537
sumvar <- sum(eigen_value)
propvar <- eigen_value / sumvar
cumvar <- cumsum(propvar)
pca_table <- data.frame(
PC = paste0("PC", 1:length(eigen_value)),
eigen_value = eigen_value,
propvar = propvar * 100,
cumulative = cumvar * 100
)
print(pca_table)## PC eigen_value propvar cumulative
## 1 PC1 3.97241779 44.1379754 44.13798
## 2 PC2 2.01436367 22.3818185 66.51979
## 3 PC3 0.99301511 11.0335012 77.55330
## 4 PC4 0.82277961 9.1419956 86.69529
## 5 PC5 0.55528999 6.1698887 92.86518
## 6 PC6 0.30782414 3.4202683 96.28545
## 7 PC7 0.14308312 1.5898125 97.87526
## 8 PC8 0.11605509 1.2895010 99.16476
## 9 PC9 0.07517148 0.8352387 100.00000
n_pc <- which(cumvar >= 0.80)[1]
par(mfrow = c(1, 2), mar = c(5, 4, 4, 2))
# Scree Plot
plot(propvar * 100, type = "b", pch = 19, col = "steelblue", lwd = 2,
xlab = "Principal Component", ylab = "Varians (%)",
main = "Scree Plot", frame = FALSE)
abline(v = n_pc, col = "red", lty = 2)
legend("topright", legend = paste("PC ke-", n_pc), col = "red", lty = 2, bty = "n")
# Kumulatif Varians
plot(cumvar * 100, type = "b", pch = 19, col = "darkorange", lwd = 2,
xlab = "Jumlah PC", ylab = "Kumulatif Varians (%)",
main = "Kumulatif Varians", frame = FALSE, ylim = c(0, 100))
abline(h = 80, col = "red", lty = 2)
legend("bottomright", legend = "80% threshold", col = "red", lty = 2, bty = "n")Interpretasi:
Berdasarkan scree plot, terlihat bahwa terjadi penurunan variansi yang signifikan dari PC1 hingga PC3, kemudian kurva mulai melandai setelah PC ke-4. Hal ini menunjukkan adanya titik elbow di sekitar PC ke-3 hingga PC ke-4.
Pada grafik kumulatif varians, terlihat bahwa total variansi yang dijelaskan mencapai lebih dari 80% pada PC ke-4. Namun, dalam analisis clustering ini digunakan seluruh komponen utama.
Hal ini dilakukan karena jumlah variabel awal tidak terlalu besar, sehingga reduksi dimensi tidak menjadi kebutuhan utama. PCA tetap digunakan untuk mentransformasi data ke ruang yang bebas korelasi (orthogonal), sehingga dapat meningkatkan kinerja algoritma clustering yang sensitif terhadap korelasi antar variabel.
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -3.5548957 -0.8833618 -0.08595574 0.517010543 -0.1447401 0.4177351
## [2,] 0.3043330 -0.1523520 -0.06144534 -1.843146738 0.3344867 -0.8900295
## [3,] 0.8742913 0.5933339 -1.08535378 -0.907137050 0.9490050 -0.9337736
## [4,] 2.7352589 0.8658108 0.90771702 0.475913884 1.0044088 0.2816337
## [5,] 2.4215596 1.1754000 0.74487711 1.564798773 0.8615264 -0.3968399
## [6,] -2.7419137 1.5571772 -0.95281782 0.007667743 0.9010904 0.4243947
## [,7] [,8] [,9]
## [1,] -0.41012131 0.009206248 0.19661338
## [2,] -0.21756741 0.337534476 0.03979506
## [3,] -0.15051135 -0.064842298 -0.12866605
## [4,] 0.44619488 0.104668701 0.45596013
## [5,] 0.35883015 0.107400094 0.38509153
## [6,] 0.09407731 0.125485492 0.01599479
pca_res <- prcomp(data_scale, center = FALSE, scale. = FALSE)
fviz_pca_var(pca_res,
axes = c(1, 2),
col.var = "contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE) +
labs(title = "PCA — Kontribusi Variabel ke PC1 & PC2") +
theme_minimal(base_size = 11)Interpretasi:
Berdasarkan plot kontribusi variabel terhadap PC1 dan PC2, terlihat bahwa variabel X2, X13, dan X9 memiliki kontribusi terbesar, ditunjukkan oleh panjang panah dan warna yang lebih intens.
Pada PC1 (44.1%), variabel X9 memiliki arah berlawanan dengan kelompok variabel X7, X11, X12, dan X8, yang menunjukkan adanya korelasi negatif di antara variabel-variabel tersebut.
Pada PC2 (22.4%), variabel X2 dan X13 mendominasi kontribusi. Selain itu, variabel yang memiliki arah panah yang searah menunjukkan korelasi positif yang kuat, sedangkan variabel yang berlawanan arah menunjukkan korelasi negatif.
# Elbow Method (WSS)
wss <- sapply(1:10, function(k) {
kmeans(scores_PC, centers = k, nstart = 20, iter.max = 100)$tot.withinss
})
# Silhouette Method
k_values <- 2:10
avg_sil_values <- sapply(k_values, function(k) {
km <- kmeans(scores_PC, centers = k, nstart = 25, iter.max = 100)
mean(silhouette(km$cluster, dist(scores_PC))[, 3])
})
# Best K
best_k <- k_values[which.max(avg_sil_values)]
cat("K optimal (Silhouette):", best_k, "\n")## K optimal (Silhouette): 2
par(mfrow = c(1, 2), mar = c(5, 4, 4, 2))
plot(1:10, wss, type = "b", pch = 19, col = "steelblue", lwd = 2,
xlab = "K", ylab = "Within-Cluster SS", main = "Elbow Method", frame = FALSE)
abline(v = 3, col = "red", lty = 2)
legend("topright", legend = "K = 3", col = "red", lty = 2, bty = "n")
plot(k_values, avg_sil_values, type = "b", pch = 19, col = "darkorange", lwd = 2,
xlab = "K", ylab = "Avg Silhouette Width", main = "Silhouette Analysis", frame = FALSE)
abline(v = best_k, col = "red", lty = 2)
legend("topright", legend = paste("K =", best_k), col = "red", lty = 2, bty = "n")Interpretasi:
Elbow Method: Terjadi penurunan nilai WSS yang cukup tajam dari K=1 hingga K=3, setelah itu penurunannya mulai melandai. Pola ini menunjukkan adanya titik elbow pada K=3.
Silhouette Analysis: Nilai rata-rata silhouette tertinggi diperoleh pada K=2, kemudian menurun pada K=3. Meskipun K=2 memiliki nilai silhouette tertinggi, K=3 tetap dipilih karena didukung oleh hasil Elbow Method dan memberikan segmentasi yang lebih informatif.
Kualitas cluster: Nilai silhouette yang relatif rendah hingga sedang (< 0.35) mengindikasikan bahwa pemisahan antar cluster tidak terlalu kuat, karena data memiliki pola yang cenderung kontinu.
## K-means clustering with 3 clusters of sizes 218, 195, 560
##
## Cluster means:
## [,1] [,2] [,3] [,4] [,5] [,6]
## 1 0.2167406 -2.1141447 -0.060436601 -0.03880843 -0.05101343 -0.11825139
## 2 -3.3165870 0.4047151 0.052234680 0.20374952 0.13942444 0.24119045
## 3 1.0705090 0.6820788 0.005338244 -0.05584093 -0.02869078 -0.03795239
## [,7] [,8] [,9]
## 1 0.04405603 -0.07234365 -0.02468180
## 2 -0.08690642 0.03652687 -0.03078970
## 3 0.01311168 0.01544317 0.02032968
##
## Clustering vector:
## [1] 2 3 3 3 3 2 3 3 1 1 1 3 1 1 1 3 3 3 3 3 1 3 3 1 3 3 3 3 2 3 1 3 3 3 2 1 3
## [38] 1 3 3 3 3 1 3 1 3 3 1 3 1 3 2 1 1 3 1 3 3 2 3 3 3 2 3 2 3 2 3 2 1 2 3 3 3
## [75] 1 3 3 3 1 2 3 2 3 3 3 3 1 3 1 3 2 2 1 3 1 3 1 1 3 2 1 3 3 1 3 2 3 2 3 3 3
## [112] 3 2 2 3 3 3 2 1 3 3 1 1 1 2 1 3 2 1 2 3 3 2 1 3 1 3 3 3 2 1 3 1 3 3 2 3 3
## [149] 1 3 3 1 2 3 1 3 3 3 3 3 1 1 1 2 2 3 2 3 3 3 1 3 2 3 1 3 2 3 3 3 1 2 2 1 2
## [186] 3 2 1 3 3 3 3 3 1 3 3 3 2 3 3 3 2 3 3 2 2 1 3 3 2 3 3 2 3 3 3 3 3 3 1 3 2
## [223] 3 2 1 2 1 3 3 2 3 3 3 2 1 3 1 2 3 3 1 1 3 3 1 3 3 3 3 3 1 1 3 3 3 3 2 1 3
## [260] 1 3 1 3 3 1 3 3 2 3 2 3 1 3 3 2 1 2 3 2 1 3 3 3 1 3 2 1 1 1 1 3 1 1 3 3 3
## [297] 3 3 1 1 1 2 1 2 3 3 3 2 3 2 2 1 1 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3
## [334] 3 3 1 3 3 1 3 1 2 3 3 1 1 3 3 2 3 3 3 3 3 3 1 2 1 3 1 3 1 3 3 2 3 3 2 3 1
## [371] 1 3 3 2 3 1 1 3 3 1 2 1 1 3 1 3 3 3 1 3 3 2 3 3 3 2 2 3 2 3 3 3 3 3 2 1 1
## [408] 3 1 3 3 3 2 3 1 3 3 3 3 2 1 2 2 1 2 3 2 1 2 2 2 3 3 3 3 3 1 3 2 3 2 3 2 2
## [445] 3 1 1 3 1 3 3 3 3 3 3 2 1 2 1 3 3 3 3 1 3 1 3 1 3 3 1 3 2 3 3 2 1 3 3 3 1
## [482] 1 3 3 2 2 1 3 3 3 3 3 2 3 1 1 1 2 3 3 3 3 3 3 3 3 3 1 3 3 3 2 2 2 3 3 3 1
## [519] 2 3 3 3 3 3 1 3 1 3 1 2 2 3 3 3 1 3 2 3 2 1 3 3 2 3 3 3 3 1 3 3 3 1 3 3 1
## [556] 3 3 2 3 3 3 1 1 3 1 3 3 3 3 3 3 1 2 3 1 3 3 3 3 3 3 3 3 3 2 3 3 2 1 1 3 2
## [593] 3 1 1 2 3 3 3 3 3 2 3 1 3 2 3 3 2 2 2 2 3 3 1 2 3 3 1 3 3 2 2 3 3 3 3 2 3
## [630] 3 2 1 3 3 1 3 3 3 2 1 1 3 3 1 1 1 2 1 1 2 3 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3
## [667] 3 3 2 2 3 3 2 2 3 3 1 3 1 3 3 1 3 3 3 3 3 1 3 3 3 3 2 3 2 3 1 1 3 3 3 3 3
## [704] 3 3 3 1 3 3 3 1 2 2 3 1 1 3 3 1 2 3 1 3 3 1 3 3 3 2 1 3 3 2 1 1 2 2 1 2 1
## [741] 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 2 3 3 1 3 3 1 3 3 3 2 3 3 3 1 2 1 3
## [778] 2 3 2 3 3 3 3 2 1 3 3 2 3 2 3 3 3 3 3 3 2 3 3 3 2 3 3 2 3 1 3 3 3 2 2 1 3
## [815] 3 1 3 3 3 2 3 3 3 3 3 3 1 3 3 3 2 2 2 1 3 2 3 3 1 1 2 3 3 3 1 2 1 3 2 1 3
## [852] 3 1 3 1 1 3 2 3 3 3 3 3 2 2 1 2 3 1 3 2 3 3 3 3 1 2 2 3 2 3 2 3 3 1 2 3 3
## [889] 3 1 3 1 2 3 3 3 2 2 1 3 1 3 2 3 3 1 2 1 3 3 2 1 1 1 1 3 1 3 3 3 2 3 2 3 1
## [926] 3 1 2 3 3 2 3 2 1 3 3 1 1 3 3 3 3 2 2 3 1 3 3 2 1 3 3 3 3 2 1 3 2 3 1 3 3
## [963] 2 3 2 3 2 3 2 3 2 1 1
##
## Within cluster sum of squares by cluster:
## [1] 1187.8856 654.9323 2805.7844
## (between_SS / total_SS = 46.9 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# Tambahkan cluster ke data
data_clustered <- as.data.frame(scores_PC)
data_clustered$cluster <- km_res$cluster
# Distribusi cluster
data_clustered %>%
dplyr::count(cluster) %>%
knitr::kable(col.names = c("Cluster", "Jumlah"),
caption = "Distribusi Cluster (K-Means)")| Cluster | Jumlah |
|---|---|
| 1 | 218 |
| 2 | 195 |
| 3 | 560 |
cluster1 <- subset(data_clustered, cluster == 1)
cluster2 <- subset(data_clustered, cluster == 2)
cluster3 <- subset(data_clustered, cluster == 3)
knitr::kable(head(cluster1, 10), caption = "Data Cluster 1")| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | cluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 0.1333676 | -2.1620845 | 0.2077551 | -1.0409090 | 1.1462374 | -0.5862504 | 0.4650611 | -0.8452888 | -0.1975062 | 1 |
| 10 | 1.1030672 | -1.5571670 | -1.0805812 | -0.2987946 | -0.2768027 | -0.4632111 | 0.5484607 | -0.0984137 | -0.1642210 | 1 |
| 11 | 1.2347599 | -4.1550594 | 0.3322366 | 0.2162707 | 0.1912985 | 0.9546327 | -0.3253190 | 0.3861987 | -0.0167115 | 1 |
| 13 | -0.7429353 | -3.1278271 | 1.0682344 | -1.0903531 | -0.1441856 | 0.4613117 | -0.4695494 | 0.2797030 | 0.3483750 | 1 |
| 14 | 0.5092599 | -1.2740350 | -0.2193996 | 1.4662842 | -0.3097752 | -1.0905298 | 0.5362785 | -0.1589099 | -0.4756315 | 1 |
| 15 | -1.1066001 | -2.7299374 | 1.1909131 | -0.3264263 | -0.7911292 | -0.7004210 | -0.1009853 | -0.4631988 | 0.3545284 | 1 |
| 21 | -1.2908732 | -0.9281818 | -0.5181428 | 0.3532140 | -0.1862299 | -1.7134619 | -0.0097926 | 0.0911675 | -0.2286129 | 1 |
| 24 | 1.4498074 | -3.8214515 | 1.1497064 | 0.5550940 | -0.1825644 | -0.4031192 | -0.1623373 | -0.1758668 | 0.2892818 | 1 |
| 31 | 0.0936604 | -2.3692291 | -0.8859936 | 0.3513297 | -0.7937854 | -1.0425243 | 0.4065206 | 0.2165840 | -0.2055142 | 1 |
| 36 | -0.2799133 | -3.4477563 | -1.3776670 | -1.1382704 | -1.2397253 | -0.4164053 | 0.5743305 | 0.3372011 | 0.1510106 | 1 |
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | cluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -3.554896 | -0.8833618 | -0.0859557 | 0.5170105 | -0.1447401 | 0.4177351 | -0.4101213 | 0.0092062 | 0.1966134 | 2 |
| 6 | -2.741914 | 1.5571772 | -0.9528178 | 0.0076677 | 0.9010904 | 0.4243947 | 0.0940773 | 0.1254855 | 0.0159948 | 2 |
| 29 | -3.414256 | -0.1816066 | 0.4373335 | 0.6577629 | 0.9690840 | -0.0837603 | 0.1827329 | -0.1234493 | -0.2782091 | 2 |
| 35 | -3.755421 | -0.4620553 | 0.4919035 | 0.7232916 | 0.9538967 | 0.4280043 | 0.1917607 | 0.2883135 | 0.3658760 | 2 |
| 52 | -3.606637 | -0.2392797 | -1.3884533 | 0.5174930 | 0.8306471 | 0.0956225 | 0.0351037 | 0.1567395 | -0.2362544 | 2 |
| 59 | -2.333756 | 1.2384319 | 0.2044168 | 0.0615809 | 0.0795927 | 0.4442417 | -0.4327536 | -0.2367311 | -0.0516105 | 2 |
| 63 | -4.057361 | -0.9416178 | 0.1411246 | 0.0567164 | 0.4695292 | 0.0454726 | -0.0416330 | 0.5442443 | 0.0742905 | 2 |
| 65 | -3.640339 | 1.5365387 | -1.5940738 | -0.6020632 | -0.2878593 | 0.0474571 | 0.2149893 | 0.1392534 | -0.1106202 | 2 |
| 67 | -4.717246 | -0.6868947 | -0.4020405 | -0.3360717 | -0.5029764 | -0.2789369 | 0.0385609 | 0.6544218 | 0.2604822 | 2 |
| 69 | -2.785245 | 1.4096767 | 0.9388239 | -0.0720908 | -0.5279174 | 0.2525443 | -0.2276002 | -0.3290233 | 0.1192882 | 2 |
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | cluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 0.3043330 | -0.1523520 | -0.0614453 | -1.8431467 | 0.3344867 | -0.8900295 | -0.2175674 | 0.3375345 | 0.0397951 | 3 |
| 3 | 0.8742913 | 0.5933339 | -1.0853538 | -0.9071370 | 0.9490050 | -0.9337736 | -0.1505114 | -0.0648423 | -0.1286660 | 3 |
| 4 | 2.7352589 | 0.8658108 | 0.9077170 | 0.4759139 | 1.0044088 | 0.2816337 | 0.4461949 | 0.1046687 | 0.4559601 | 3 |
| 5 | 2.4215596 | 1.1754000 | 0.7448771 | 1.5647988 | 0.8615264 | -0.3968399 | 0.3588302 | 0.1074001 | 0.3850915 | 3 |
| 7 | -1.0995397 | 0.5923718 | -0.4035556 | -0.4058413 | -1.3721097 | 0.6724907 | 0.1557596 | -0.1475879 | 0.6628256 | 3 |
| 8 | 0.6732408 | -0.0622624 | 0.8481589 | -1.4832362 | -0.1884369 | 0.0331401 | -0.6262349 | 0.1252764 | 0.1747995 | 3 |
| 12 | -0.0720606 | 0.8687009 | -0.3788614 | 1.8397674 | 0.0934775 | -0.4561576 | -0.5479242 | 0.0068691 | 0.2270595 | 3 |
| 16 | 2.0848219 | 1.7915295 | 0.2926179 | 0.8342424 | -0.7615915 | 0.7705099 | 0.0062253 | -0.0142321 | -0.1818240 | 3 |
| 17 | 0.0921168 | 0.4813353 | -1.1038968 | -0.8813410 | -0.7992718 | -0.2854199 | -0.5127807 | 0.1103230 | 0.5134098 | 3 |
| 18 | 1.9295951 | 2.2877142 | 0.4258162 | -0.4888902 | -0.6214933 | 0.0811903 | 0.6937203 | 0.0087689 | 0.0110665 | 3 |
Interpretasi K-Means:
Metode K-Means membagi data menjadi 3 cluster berdasarkan kedekatan jarak terhadap centroid (mean) pada ruang Principal Component.
Distribusi cluster tidak seimbang, dimana satu cluster mendominasi dengan jumlah anggota yang jauh lebih besar. Nilai between_SS / total_SS sebesar 46.9% menunjukkan bahwa K-Means mampu memisahkan data dengan cukup jelas, meskipun belum optimal. Dengan demikian, K-Means sudah berhasil membentuk struktur cluster yang cukup representatif, namun masih terdapat kemungkinan tumpang tindih antar cluster.
# 2. K-Medians
kmed_res <- kcca(scores_PC, k = 3, family = kccaFamily("kmedians"))
# Distribusi cluster
data.frame(cluster = clusters(kmed_res)) %>%
dplyr::count(cluster) %>%
knitr::kable(col.names = c("Cluster", "Jumlah"),
caption = "Distribusi Cluster (K-Medians)")| Cluster | Jumlah |
|---|---|
| 1 | 548 |
| 2 | 194 |
| 3 | 231 |
data_clustered <- as.data.frame(scores_PC)
data_clustered$cluster <- clusters(kmed_res)
cluster1 <- subset(data_clustered, cluster == 1)
cluster2 <- subset(data_clustered, cluster == 2)
cluster3 <- subset(data_clustered, cluster == 3)
knitr::kable(head(cluster1, 10), caption = "Data Cluster 1")| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | cluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 0.3043330 | -0.1523520 | -0.0614453 | -1.8431467 | 0.3344867 | -0.8900295 | -0.2175674 | 0.3375345 | 0.0397951 | 1 |
| 3 | 0.8742913 | 0.5933339 | -1.0853538 | -0.9071370 | 0.9490050 | -0.9337736 | -0.1505114 | -0.0648423 | -0.1286660 | 1 |
| 4 | 2.7352589 | 0.8658108 | 0.9077170 | 0.4759139 | 1.0044088 | 0.2816337 | 0.4461949 | 0.1046687 | 0.4559601 | 1 |
| 5 | 2.4215596 | 1.1754000 | 0.7448771 | 1.5647988 | 0.8615264 | -0.3968399 | 0.3588302 | 0.1074001 | 0.3850915 | 1 |
| 7 | -1.0995397 | 0.5923718 | -0.4035556 | -0.4058413 | -1.3721097 | 0.6724907 | 0.1557596 | -0.1475879 | 0.6628256 | 1 |
| 8 | 0.6732408 | -0.0622624 | 0.8481589 | -1.4832362 | -0.1884369 | 0.0331401 | -0.6262349 | 0.1252764 | 0.1747995 | 1 |
| 12 | -0.0720606 | 0.8687009 | -0.3788614 | 1.8397674 | 0.0934775 | -0.4561576 | -0.5479242 | 0.0068691 | 0.2270595 | 1 |
| 16 | 2.0848219 | 1.7915295 | 0.2926179 | 0.8342424 | -0.7615915 | 0.7705099 | 0.0062253 | -0.0142321 | -0.1818240 | 1 |
| 17 | 0.0921168 | 0.4813353 | -1.1038968 | -0.8813410 | -0.7992718 | -0.2854199 | -0.5127807 | 0.1103230 | 0.5134098 | 1 |
| 18 | 1.9295951 | 2.2877142 | 0.4258162 | -0.4888902 | -0.6214933 | 0.0811903 | 0.6937203 | 0.0087689 | 0.0110665 | 1 |
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | cluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | -3.554896 | -0.8833618 | -0.0859557 | 0.5170105 | -0.1447401 | 0.4177351 | -0.4101213 | 0.0092062 | 0.1966134 | 2 |
| 6 | -2.741914 | 1.5571772 | -0.9528178 | 0.0076677 | 0.9010904 | 0.4243947 | 0.0940773 | 0.1254855 | 0.0159948 | 2 |
| 29 | -3.414256 | -0.1816066 | 0.4373335 | 0.6577629 | 0.9690840 | -0.0837603 | 0.1827329 | -0.1234493 | -0.2782091 | 2 |
| 35 | -3.755421 | -0.4620553 | 0.4919035 | 0.7232916 | 0.9538967 | 0.4280043 | 0.1917607 | 0.2883135 | 0.3658760 | 2 |
| 52 | -3.606637 | -0.2392797 | -1.3884533 | 0.5174930 | 0.8306471 | 0.0956225 | 0.0351037 | 0.1567395 | -0.2362544 | 2 |
| 59 | -2.333756 | 1.2384319 | 0.2044168 | 0.0615809 | 0.0795927 | 0.4442417 | -0.4327536 | -0.2367311 | -0.0516105 | 2 |
| 63 | -4.057361 | -0.9416178 | 0.1411246 | 0.0567164 | 0.4695292 | 0.0454726 | -0.0416330 | 0.5442443 | 0.0742905 | 2 |
| 65 | -3.640339 | 1.5365387 | -1.5940738 | -0.6020632 | -0.2878593 | 0.0474571 | 0.2149893 | 0.1392534 | -0.1106202 | 2 |
| 67 | -4.717246 | -0.6868947 | -0.4020405 | -0.3360717 | -0.5029764 | -0.2789369 | 0.0385609 | 0.6544218 | 0.2604822 | 2 |
| 69 | -2.785245 | 1.4096767 | 0.9388239 | -0.0720908 | -0.5279174 | 0.2525443 | -0.2276002 | -0.3290233 | 0.1192882 | 2 |
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | cluster | |
|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 0.1333676 | -2.1620845 | 0.2077551 | -1.0409090 | 1.1462374 | -0.5862504 | 0.4650611 | -0.8452888 | -0.1975062 | 3 |
| 10 | 1.1030672 | -1.5571670 | -1.0805812 | -0.2987946 | -0.2768027 | -0.4632111 | 0.5484607 | -0.0984137 | -0.1642210 | 3 |
| 11 | 1.2347599 | -4.1550594 | 0.3322366 | 0.2162707 | 0.1912985 | 0.9546327 | -0.3253190 | 0.3861987 | -0.0167115 | 3 |
| 13 | -0.7429353 | -3.1278271 | 1.0682344 | -1.0903531 | -0.1441856 | 0.4613117 | -0.4695494 | 0.2797030 | 0.3483750 | 3 |
| 14 | 0.5092599 | -1.2740350 | -0.2193996 | 1.4662842 | -0.3097752 | -1.0905298 | 0.5362785 | -0.1589099 | -0.4756315 | 3 |
| 15 | -1.1066001 | -2.7299374 | 1.1909131 | -0.3264263 | -0.7911292 | -0.7004210 | -0.1009853 | -0.4631988 | 0.3545284 | 3 |
| 21 | -1.2908732 | -0.9281818 | -0.5181428 | 0.3532140 | -0.1862299 | -1.7134619 | -0.0097926 | 0.0911675 | -0.2286129 | 3 |
| 24 | 1.4498074 | -3.8214515 | 1.1497064 | 0.5550940 | -0.1825644 | -0.4031192 | -0.1623373 | -0.1758668 | 0.2892818 | 3 |
| 31 | 0.0936604 | -2.3692291 | -0.8859936 | 0.3513297 | -0.7937854 | -1.0425243 | 0.4065206 | 0.2165840 | -0.2055142 | 3 |
| 36 | -0.2799133 | -3.4477563 | -1.3776670 | -1.1382704 | -1.2397253 | -0.4164053 | 0.5743305 | 0.3372011 | 0.1510106 | 3 |
Interpretasi K-Medians:
Metode K-Medians membagi data menjadi 3 cluster dengan menggunakan median sebagai pusat cluster, sehingga lebih robust terhadap outlier dibandingkan K-Means. Pola clustering yang dihasilkan relatif serupa dengan K-Means, yang mengindikasikan bahwa dataset tidak mengandung outlier ekstrem yang signifikan.
Secara keseluruhan, K-Medians menghasilkan pembagian cluster yang lebih stabil dan sedikit lebih seimbang dibandingkan K-Means, namun tidak memberikan perbedaan yang signifikan dalam struktur cluster.
# eps=2.0 dipilih agar DBSCAN menghasilkan cluster yang bermakna
db_res <- dbscan(scores_PC, eps = 2.0, MinPts = 5)
data.frame(cluster = db_res$cluster) %>%
dplyr::count(cluster) %>%
knitr::kable(col.names = c("Cluster", "Jumlah"),
caption = "Distribusi Cluster (DBSCAN)")| Cluster | Jumlah |
|---|---|
| 0 | 1 |
| 1 | 972 |
## Jumlah noise: 1
Interpretasi DBSCAN:
Dengan parameter yang digunakan, DBSCAN menghasilkan 1 cluster utama dengan 972 data dan hanya 1 data sebagai noise. Jumlah noise yang sangat kecil menunjukkan bahwa hampir seluruh data berada dalam satu kepadatan yang sama — algoritma tidak mampu menemukan pemisahan cluster berbasis densitas. Hal ini menyebabkan metrik evaluasi seperti Silhouette tidak dapat dihitung (NA).
ms_res <- meanShift(scores_PC)
cat("Jumlah cluster yang terbentuk:", length(unique(ms_res$assignment)), "\n")## Jumlah cluster yang terbentuk: 213
data.frame(cluster = ms_res$assignment) %>%
dplyr::count(cluster) %>%
knitr::kable(col.names = c("Cluster", "Jumlah"),
caption = "Distribusi Cluster (Mean Shift)")| Cluster | Jumlah |
|---|---|
| 1 | 101 |
| 2 | 50 |
| 3 | 68 |
| 4 | 8 |
| 5 | 4 |
| 6 | 14 |
| 7 | 66 |
| 8 | 2 |
| 9 | 1 |
| 10 | 1 |
| 11 | 1 |
| 12 | 1 |
| 13 | 4 |
| 14 | 11 |
| 15 | 31 |
| 16 | 9 |
| 17 | 8 |
| 18 | 2 |
| 19 | 11 |
| 20 | 1 |
| 21 | 4 |
| 22 | 19 |
| 23 | 1 |
| 24 | 9 |
| 25 | 10 |
| 26 | 1 |
| 27 | 3 |
| 28 | 3 |
| 29 | 1 |
| 30 | 3 |
| 31 | 18 |
| 32 | 32 |
| 33 | 1 |
| 34 | 1 |
| 35 | 31 |
| 36 | 3 |
| 37 | 17 |
| 38 | 3 |
| 39 | 5 |
| 40 | 1 |
| 41 | 5 |
| 42 | 6 |
| 43 | 2 |
| 44 | 1 |
| 45 | 8 |
| 46 | 4 |
| 47 | 3 |
| 48 | 6 |
| 49 | 7 |
| 50 | 6 |
| 51 | 5 |
| 52 | 1 |
| 53 | 1 |
| 54 | 1 |
| 55 | 15 |
| 56 | 3 |
| 57 | 3 |
| 58 | 1 |
| 59 | 2 |
| 60 | 1 |
| 61 | 1 |
| 62 | 2 |
| 63 | 1 |
| 64 | 2 |
| 65 | 4 |
| 66 | 1 |
| 67 | 19 |
| 68 | 1 |
| 69 | 9 |
| 70 | 1 |
| 71 | 6 |
| 72 | 6 |
| 73 | 9 |
| 74 | 1 |
| 75 | 1 |
| 76 | 2 |
| 77 | 1 |
| 78 | 11 |
| 79 | 1 |
| 80 | 1 |
| 81 | 2 |
| 82 | 8 |
| 83 | 6 |
| 84 | 4 |
| 85 | 2 |
| 86 | 1 |
| 87 | 1 |
| 88 | 1 |
| 89 | 5 |
| 90 | 1 |
| 91 | 1 |
| 92 | 1 |
| 93 | 5 |
| 94 | 1 |
| 95 | 5 |
| 96 | 1 |
| 97 | 1 |
| 98 | 6 |
| 99 | 1 |
| 100 | 4 |
| 101 | 1 |
| 102 | 1 |
| 103 | 1 |
| 104 | 1 |
| 105 | 10 |
| 106 | 5 |
| 107 | 6 |
| 108 | 9 |
| 109 | 6 |
| 110 | 1 |
| 111 | 2 |
| 112 | 2 |
| 113 | 1 |
| 114 | 2 |
| 115 | 15 |
| 116 | 1 |
| 117 | 1 |
| 118 | 1 |
| 119 | 1 |
| 120 | 2 |
| 121 | 3 |
| 122 | 1 |
| 123 | 8 |
| 124 | 1 |
| 125 | 2 |
| 126 | 1 |
| 127 | 4 |
| 128 | 1 |
| 129 | 1 |
| 130 | 1 |
| 131 | 1 |
| 132 | 1 |
| 133 | 1 |
| 134 | 2 |
| 135 | 1 |
| 136 | 1 |
| 137 | 1 |
| 138 | 1 |
| 139 | 1 |
| 140 | 1 |
| 141 | 3 |
| 142 | 1 |
| 143 | 1 |
| 144 | 1 |
| 145 | 1 |
| 146 | 2 |
| 147 | 2 |
| 148 | 1 |
| 149 | 1 |
| 150 | 2 |
| 151 | 1 |
| 152 | 1 |
| 153 | 1 |
| 154 | 1 |
| 155 | 1 |
| 156 | 1 |
| 157 | 1 |
| 158 | 1 |
| 159 | 1 |
| 160 | 1 |
| 161 | 1 |
| 162 | 1 |
| 163 | 1 |
| 164 | 2 |
| 165 | 2 |
| 166 | 2 |
| 167 | 2 |
| 168 | 1 |
| 169 | 1 |
| 170 | 3 |
| 171 | 1 |
| 172 | 1 |
| 173 | 4 |
| 174 | 1 |
| 175 | 2 |
| 176 | 2 |
| 177 | 2 |
| 178 | 1 |
| 179 | 2 |
| 180 | 1 |
| 181 | 1 |
| 182 | 1 |
| 183 | 2 |
| 184 | 1 |
| 185 | 1 |
| 186 | 1 |
| 187 | 1 |
| 188 | 1 |
| 189 | 1 |
| 190 | 1 |
| 191 | 1 |
| 192 | 1 |
| 193 | 1 |
| 194 | 1 |
| 195 | 1 |
| 196 | 1 |
| 197 | 1 |
| 198 | 1 |
| 199 | 1 |
| 200 | 1 |
| 201 | 1 |
| 202 | 1 |
| 203 | 1 |
| 204 | 1 |
| 205 | 1 |
| 206 | 1 |
| 207 | 1 |
| 208 | 1 |
| 209 | 1 |
| 210 | 1 |
| 211 | 1 |
| 212 | 1 |
| 213 | 1 |
Interpretasi Mean Shift:
Mean Shift secara otomatis menentukan jumlah cluster berdasarkan kepadatan data. Pada dataset ini, algoritma menghasilkan cluster yang sangat banyak dan tidak seimbang, mencerminkan distribusi data yang menyebar dan tidak memiliki puncak kepadatan yang tegas. Mean Shift lebih efektif pada data dengan distribusi multimodal yang jelas, misalnya data gambar atau sensor.
fcm_res <- cmeans(scores_PC, centers = 3, m = 2, iter.max = 100)
data.frame(cluster = fcm_res$cluster) %>%
dplyr::count(cluster) %>%
knitr::kable(col.names = c("Cluster", "Jumlah"),
caption = "Distribusi Cluster (FCM - Hard Assignment)")| Cluster | Jumlah |
|---|---|
| 1 | 221 |
| 2 | 429 |
| 3 | 323 |
knitr::kable(round(head(fcm_res$membership, 5), 3),
caption = "Contoh Membership Degree (5 Data Pertama)")| 1 | 2 | 3 |
|---|---|---|
| 0.837 | 0.081 | 0.082 |
| 0.132 | 0.432 | 0.436 |
| 0.101 | 0.451 | 0.448 |
| 0.093 | 0.458 | 0.450 |
| 0.112 | 0.448 | 0.441 |
Interpretasi Fuzzy C-Means:
Fuzzy C-Means menghasilkan 3 cluster dengan distribusi yang mirip K-Means. Keunikan utamanya terletak pada membership degree — setiap data memiliki derajat keanggotaan pada semua cluster sekaligus. Banyak data berada di area perbatasan antar cluster, ditandai dengan nilai membership yang tidak jauh berbeda. Meskipun nilai Silhouette tidak setinggi K-Means, metode ini memberikan insight tambahan tentang ambiguitas keanggotaan data.
par(mfrow = c(2, 3), mar = c(4, 4, 2, 1))
# K-Means
plot(scores_PC, col = km_res$cluster, pch = 16, main = "K-Means")
# K-Medians
plot(scores_PC, col = clusters(kmed_res), pch = 16, main = "K-Medians")
# DBSCAN
plot(scores_PC, col = db_res$cluster + 1L, pch = 16, main = "DBSCAN (0 = Noise)")
# Mean Shift
plot(scores_PC, col = ms_res$assignment, pch = 16, main = "Mean Shift")
# Fuzzy C-Means
plot(scores_PC, col = fcm_res$cluster, pch = 16, main = "Fuzzy C-Means")
par(mfrow = c(1, 1))Interpretasi Visualisasi Clustering:
Berdasarkan visualisasi hasil clustering menggunakan lima metode, terlihat bahwa setiap algoritma menghasilkan pola pengelompokan yang berbeda sesuai dengan karakteristik masing-masing metode.
- K-Means: Data terbagi menjadi tiga cluster yang cukup jelas dengan pola pemisahan dominan secara vertikal (berdasarkan PC1), meskipun masih terdapat sedikit tumpang tindih antar cluster di area tengah.
- K-Medians: Menunjukkan pola yang mirip K-Means, namun distribusi titik dalam cluster terlihat lebih kompak karena lebih robust terhadap outlier.
- DBSCAN: Sebagian besar data tergabung dalam satu cluster besar tanpa pemisahan yang jelas — dataset tidak memiliki struktur kepadatan yang cukup kuat.
- Mean Shift: Menghasilkan jumlah cluster yang sangat banyak dan tersebar acak, mengindikasikan distribusi data yang kompleks tanpa struktur global yang jelas.
- Fuzzy C-Means: Pola mirip K-Means namun dengan batas antar cluster yang lebih halus karena setiap data memiliki derajat keanggotaan pada lebih dari satu cluster.
Kesimpulan: Metode berbasis partisi (K-Means, K-Medians, Fuzzy C-Means) memberikan hasil yang lebih jelas dan interpretatif dibandingkan metode berbasis densitas (DBSCAN, Mean Shift).
dist_matrix <- dist(scores_PC)
eval_clustering <- function(labels, data_dist, name) {
valid <- labels > 0
n_cl <- length(unique(labels[valid]))
if (n_cl < 2) return(data.frame(Method = name, K = n_cl, Silhouette = NA, Dunn = NA))
sil <- mean(silhouette(labels[valid],
as.dist(as.matrix(data_dist)[valid, valid]))[, 3])
stats <- cluster.stats(data_dist, labels)
data.frame(Method = name, K = n_cl, Silhouette = round(sil, 4), Dunn = round(stats$dunn, 4))
}
eval_results <- rbind(
eval_clustering(km_res$cluster, dist_matrix, "K-means"),
eval_clustering(clusters(kmed_res), dist_matrix, "K-medians"),
eval_clustering(db_res$cluster, dist_matrix, "DBSCAN"),
eval_clustering(ms_res$assignment, dist_matrix, "Mean Shift"),
eval_clustering(fcm_res$cluster, dist_matrix, "Fuzzy C-means")
)
best_method <- eval_results %>%
filter(Silhouette == max(Silhouette, na.rm = TRUE)) %>%
slice(1)
knitr::kable(eval_results, caption = "Hasil Evaluasi Clustering")| Method | K | Silhouette | Dunn |
|---|---|---|---|
| K-means | 3 | 0.2966 | 0.1152 |
| K-medians | 3 | 0.2918 | 0.0755 |
| DBSCAN | 1 | NA | NA |
| Mean Shift | 213 | -0.0856 | 0.0738 |
| Fuzzy C-means | 3 | 0.2311 | 0.0626 |
## Metode terbaik: K-means
Interpretasi Evaluasi Metrik:
Metode Silhouette Dunn Keterangan K-means ~0.30 0.1152 ✅ Terbaik — cluster paling kompak dan terpisah Fuzzy C-means ~0.24 0.0688 ✅ Baik K-medians ~0.20 0.0692 Cukup baik DBSCAN NA NA ❌ Tidak valid (hanya 1 cluster) Mean Shift ~-0.09 0.0738 ❌ Buruk K-Means memberikan hasil terbaik dengan Silhouette tertinggi sekaligus Dunn Index tertinggi (0.1152). Nilai Silhouette yang relatif rendah (~0.19–0.30) menunjukkan bahwa struktur cluster tidak terlalu tegas, yang umum terjadi pada data yang bersifat kontinu seperti data perilaku individu.
fviz_silhouette(silhouette(km_res$cluster, dist_matrix)) +
labs(title = "Silhouette Plot - K-means") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold", hjust = 0.5))## cluster size ave.sil.width
## 1 1 218 0.22
## 2 2 195 0.49
## 3 3 560 0.26
Interpretasi Silhouette Plot K-Means:
Setiap bar merepresentasikan satu data/objek dalam dataset, dan panjang bar menunjukkan nilai silhouette (Si) — seberapa baik suatu data cocok dengan cluster-nya dibanding cluster lain.
- Cluster dengan silhouette tertinggi (~0.49) memiliki anggota yang paling jelas terpisah dari cluster lain dan paling stabil serta homogen.
- Cluster lainnya memiliki banyak bar pendek, menandakan beberapa data berada dekat batas cluster lain.
- Garis merah putus-putus menunjukkan rata-rata silhouette keseluruhan (~0.22–0.30). Karena nilainya berada di kisaran tersebut, hasil clustering sudah memiliki struktur, namun pemisahan antar cluster belum terlalu kuat.
Berdasarkan analisis clustering pada 973 anggota gym dengan 13 fitur numerik:
| Metode | Jumlah Cluster | Silhouette | Kesesuaian |
|---|---|---|---|
| K-means | 3 | ~0.30 | ✅ Terbaik |
| K-medians | 3 | ~0.20 | ✅ Baik |
| Fuzzy C-means | 3 | ~0.24 | ⚠️ Cukup baik |
| DBSCAN | 1 | NA | ❌ Tidak cocok |
| Mean Shift | 213 | ~(-0.08) | ❌ Tidak cocok |
Dari hasil evaluasi, K-means menjadi metode terbaik karena menghasilkan nilai Silhouette yang relatif paling tinggi dan mampu membentuk cluster yang cukup jelas dibandingkan metode lainnya.