Praktikum 11 APG : Clustering
Pengantar
Summary Clustering
SEKILAS ANALISIS CLUSTER
Analisis cluster (atau clustering, segmentasi data, …) adalah suatu teknik analisis yang bertujuan menemukan kesamaan antara data sesuai dengan karakteristik yang ditemukan dalam data dan mengelompokkan objek data serupa ke dalam kelompok.
Metode analisis kluster adalah teknik di mana tidak ada asumsi yang dibuat mengenai jumlah kelompok atau struktur kelompok.
Analisis cluster adalah istilah umum untuk berbagai metode numerik yang memiliki tujuan secara umum mengungkap atau menemukan kelompok atau kelompok pengamatan yang homogen dan terpisah dari kelompok lain.
UKURAN SIMILARITY DAN DISSIMILARITY
Karena analisis cluster berupaya mengidentifikasi vektor-vektor pengamatan yang memiliki kemiripan dan mengelompokkannya ke dalam kelompok-kelompok, banyak teknik yang menggunakan indeks kesamaan (similarity) atau kedekatan (proximity) antara masing-masing pasangan pengamatan.
Ukuran proximity yang umum digunakan adalah jarak antara dua pengamatan.
Pemilihan metrik jarak yang digunakan perlu mempertimbangkan (jenis variabel, sifat variabel, skala).
JENIS METRIK JARAK
[Numerik] Euclidean Distance, Squared Euclidean Distance, Normalized Squared Euclidean Distance, Manhattan Distance, Chessboard Distance, Bray-Curtis Distance, Canberra Distance, Cosine Distance, Correlation Distance, Binary Distance, Warping Distance, Canonical Warping Distance
[Boolean] Hamming Distance, Jaccard Dissimilarity, Matching Dissimilarity, Dice Dissimilarity, Rogers-Tanimoto Dissimilarity, Russell-Rao Dissimilarity, Sokal-Sneath Dissimilarity, Yule Dissimilarity
[String] Edit Distance, Damerau-Levenshtein Distance, Hamming Distance, Smith-Waterman Similarity, Needleman-Wunsch Similarity
METODE PENGELOMPOKAN HIERARKI
Divisif => merupakan metode pengelompokan dimana pada awalanya setiap objek digabungkan menjadi satu kelompok besar , kemudian di lakukan pengelompokan ulang hingga setiap objek dianggap sebagai satu kelompok terpisah.
Aglomeratif => Metode alglomerative merupakan metode pengelompokan dimana awalnya setiap objek dianggap sebagai satu cluster terpisah, kemudian dilakukan pengelompokkan satu per satu sehingga membentuk satu cluster besar.
METODE PENGELOMPOKKAN NON HIERARKI
- K-Means, C-Means, K-Medoids, K-Prototype, dll.
Penerapan 1 Hirarki
Single Linkage
dist1 = matrix(c(0,9,3,6,11,9,0,7,5,10,3,7,0,9,2,6,5,9,0,8,11,10,2,8,0),5,5)
a=dist(dist1, method = "euclidean")
single1 <- hclust(a, method = "single")
plot(single1)Complete Linkage
a=dist(dist1, method = "euclidean")
single2 <- hclust(a, method = "complete")
plot(single2)Penerapan 2 Hirarki
Data yang ditunjukkan pada Tabel 1 diambil dari Stanley dan Miller (1979) adalah nilai dari enam variabel untuk 22 pesawat tempur AS. Variabelnya adalah sebagai berikut:
FFD: tanggal penerbangan pertama, dalam beberapa bulan setelah Januari 1940;
SPR: daya spesifik, sebanding dengan daya per satuan berat;
RGF: faktor jangkauan penerbangan;
PLF: muatan sebagai sebagian kecil dari berat kotor pesawat;
SLF: faktor beban berkelanjutan;
CAR: variabel biner yang mengambil nilai 1 jika pesawat dapat mendarat di kapal induk dan 0 sebaliknya.
Lakukan analisis cluster pada data tersebut!
jet <- read.csv("jet.csv", row.names = 1)
jet## FFD SPR RGF PLF SLF CAR
## FH-1 82 1.468 3.30 0.166 0.10 no
## FJ-1 89 1.605 3.64 0.154 0.10 no
## F-86A 101 2.168 4.87 0.177 2.90 yes
## F9F-2 107 2.054 4.72 0.275 1.10 no
## F-94A 115 2.467 4.11 0.298 1.00 yes
## F3D-1 122 1.294 3.75 0.150 0.90 no
## F-89A 127 2.183 3.97 0.000 2.40 yes
## XF10F-1 137 2.426 4.65 0.117 1.80 no
## F9F-6 147 2.607 3.84 0.155 2.30 no
## F-100A 166 4.567 4.92 0.138 3.20 yes
## F4D-1 174 4.588 3.82 0.249 3.50 no
## F11F-1 175 3.618 4.32 0.143 2.80 no
## F-101A 177 5.855 4.53 0.172 2.50 yes
## F3H-2 184 2.898 4.48 0.178 3.00 no
## F-102A 187 3.880 5.39 0.101 3.00 yes
## F-8A 189 0.455 4.99 0.008 2.64 no
## F-104B 194 8.088 4.50 0.251 2.70 yes
## F-105B 197 6.502 5.20 0.366 2.90 yes
## YF-107A 201 6.081 5.65 0.106 2.90 yes
## F-106A 204 7.105 5.40 0.089 3.20 yes
## F-4B 255 8.548 4.20 0.222 2.90 no
## F-111A 328 6.321 6.45 0.187 2.00 yes
# Pilih variabel kecuali variabel FFD dan CAR dan skalakan ke N(0,1)
zjet <- scale(jet[,names(jet)!=c("FFD","CAR")])
zjet## SPR RGF PLF SLF
## FH-1 -1.04614198 -1.69627044 -0.02622718 -2.15830567
## FJ-1 -0.98826834 -1.24473653 -0.16470671 -2.15830567
## F-86A -0.75043723 0.38875379 0.10071238 0.63245061
## F9F-2 -0.79859486 0.18954766 1.23162849 -1.16160700
## F-94A -0.62412905 -0.62055730 1.49704758 -1.26127686
## F3D-1 -1.11964574 -1.09865203 -0.21086655 -1.36094673
## F-89A -0.74410070 -0.80648303 -1.94186060 0.13410128
## XF10F-1 -0.64144890 0.09658479 -0.59168524 -0.46391793
## F9F-6 -0.56498810 -0.97912835 -0.15316675 0.03443141
## F-100A 0.26298522 0.45515584 -0.34934607 0.93146021
## F4D-1 0.27185636 -1.00568916 0.93158952 1.23046981
## F11F-1 -0.13790595 -0.34166871 -0.29164627 0.53278074
## F-101A 0.80708197 -0.06278012 0.04301258 0.23377114
## F3H-2 -0.44205941 -0.12918216 0.11225234 0.73212048
## F-102A -0.02722788 1.07933507 -0.77632460 0.73212048
## F-8A -1.47406901 0.54811870 -1.84954092 0.37330896
## F-104B 1.75038014 -0.10262134 0.95466945 0.43311088
## F-105B 1.08039765 0.82700730 2.28176488 0.63245061
## YF-107A 0.90255236 1.42462571 -0.71862480 0.63245061
## F-106A 1.33512618 1.09261548 -0.91480413 0.93146021
## F-4B 1.94470041 -0.50103362 0.62001060 0.63245061
## F-111A 1.00393685 2.48705844 0.21611198 -0.26457819
## attr(,"scaled:center")
## SPR RGF PLF SLF
## 3.9444545 4.5772727 0.1682727 2.2654545
## attr(,"scaled:scale")
## SPR RGF PLF SLF
## 2.36722604 0.75298885 0.08665541 1.00331226
#Ekstrak korelasi
cor(zjet)## SPR RGF PLF SLF
## SPR 1.0000000 0.44351234 0.33176076 0.5406762
## RGF 0.4435123 1.00000000 -0.06613601 0.4655696
## PLF 0.3317608 -0.06613601 1.00000000 -0.1030198
## SLF 0.5406762 0.46556957 -0.10301985 1.0000000
# Buat distance matrix terlebih dahulu
dj<-dist(zjet)
dj## FH-1 FJ-1 F-86A F9F-2 F-94A F3D-1 F-89A
## FJ-1 0.4758243
## F-86A 3.4984570 3.2532479
## F9F-2 2.4886138 2.2441719 2.1306375
## F-94A 2.1119376 2.0219508 2.5633217 0.8758366
## F3D-1 1.0160824 0.8225039 2.5336546 1.9705482 1.8441844
## F-89A 3.1317298 2.9436524 2.4184887 3.5700056 3.7178127 2.3362161
## XF10F-1 2.5629668 2.2299523 1.3336715 1.9607600 2.3480159 1.6146042 1.7339750
## F9F-6 2.3600872 2.2489871 1.5256390 2.1836990 2.1293516 1.5074284 1.8086590
## F-100A 3.9991884 3.7464669 1.1503885 2.8421666 3.1876944 3.0984340 2.4037135
## F4D-1 3.8229876 3.7855941 2.0093629 2.8959475 2.7348968 3.1568284 3.2450873
## F11F-1 3.1578841 2.9659244 1.0356597 2.4310525 2.5946599 2.2648545 1.8616329
## F-101A 3.4394121 3.2226515 1.6709349 2.4498480 2.5901480 2.7189869 2.6284819
## F3H-2 3.3458065 3.1581760 0.6110825 2.2512026 2.4831210 2.4257640 2.2642819
## F-102A 4.2023067 3.8798901 1.3338149 3.0008012 3.5196277 3.2615453 2.4054766
## F-8A 3.8670337 3.5634353 2.1023041 3.5262459 3.9949627 2.9206791 1.5599817
## F-104B 4.2470992 4.0954933 2.6952766 3.0335569 3.0119222 3.7156752 3.8983521
## F-105B 4.8994132 4.7270546 2.8811478 2.8737426 3.0336182 4.3284176 4.9075910
## YF-107A 4.6695971 4.3354173 2.1158258 3.3825213 3.8741370 3.8324492 3.0714769
## F-106A 4.8769401 4.5793750 2.4424734 3.7889841 4.1711829 4.0716356 3.1016747
## F-4B 4.3104145 4.1904059 2.8853352 3.4051858 3.3118829 3.7962290 3.7595860
## F-111A 5.0346793 4.6504195 2.8807444 3.2192563 3.8654985 4.3302722 4.3265277
## XF10F-1 F9F-6 F-100A F4D-1 F11F-1 F-101A F3H-2
## FJ-1
## F-86A
## F9F-2
## F-94A
## F3D-1
## F-89A
## XF10F-1
## F9F-6 1.2663552
## F-100A 1.7182499 1.8936362
## F4D-1 2.6908085 1.8188508 1.9657949
## F11F-1 1.2365489 0.9253554 0.9787327 1.6099392
## F-101A 1.7358716 1.6734674 1.0977314 1.7200352 1.0826746
## F3H-2 1.4201278 1.1378652 1.0446761 1.4824324 0.5835497 1.3482988
## F-102A 1.6756089 2.3241242 0.8341884 2.7571906 1.5186016 1.7088492 1.5563316
## F-8A 1.7833364 2.4802087 2.3639198 3.7325575 2.2426592 3.0295341 2.3454549
## F-104B 2.9927250 2.7414305 2.1147736 1.9073222 2.2772802 1.3274989 2.3678197
## F-105B 3.5995850 3.5008484 2.7961738 2.4885972 3.0793581 2.4571081 2.8193780
## YF-107A 2.3164132 2.9341231 1.2548671 3.0635016 2.0963287 1.7206183 2.2186912
## F-106A 2.6363634 3.0475086 1.3695196 3.0053288 2.1850093 1.7372255 2.3970314
## F-4B 3.1169772 2.7354234 2.1843817 1.8729103 2.2811617 1.4064544 2.4703539
## F-111A 3.0189288 3.8342861 2.5353280 3.9347559 3.1936033 2.6112725 3.1527472
## F-102A F-8A F-104B F-105B YF-107A F-106A F-4B
## FJ-1
## F-86A
## F9F-2
## F-94A
## F3D-1
## F-89A
## XF10F-1
## F9F-6
## F-100A
## F4D-1
## F11F-1
## F-101A
## F3H-2
## F-102A
## F-8A 1.9120876
## F-104B 2.7645358 4.3229281
## F-105B 3.2637937 4.8721577 1.7646542
## YF-107A 0.9984890 2.7861657 2.4271270 3.0644926
## F-106A 1.3838706 3.0615878 2.3117765 3.2315468 0.6521073
## F-4B 2.8889016 4.3536817 0.5901064 2.2961139 2.5663556 2.3143814
## F-111A 2.2412774 3.8175676 2.8802528 2.7987852 1.6785219 2.1825757 3.2835233
# Analisis cluster aglomerative
cc <- hclust(dj)
cc##
## Call:
## hclust(d = dj)
##
## Cluster method : complete
## Distance : euclidean
## Number of objects: 22
#dendogram
ccd <- as.dendrogram(cc)
plot(ccd, type = "rectangle", ylab = "Height", main = "Hasil Clustering Jet")# Buat distance matrix terlebih dahulu
# Method ada beberapa pilihan
data_dist <- dist(zjet, method = "euclidean")
# Clustering
## Single linkage
single <- hclust(data_dist, method = "single")
plot(single, hang = -2, cex = 1)## Complete linkage
Complete <- hclust(data_dist, method = "complete")
plot(Complete, hang = -2, cex = 1)## Average linkage
Average <- hclust(data_dist, method = "average")
plot(Average, hang = -2, cex = 1)Penerapan 3 Package Cluster
#analisis cluster dengan package cluster
library(cluster)Agnes (Aglomerative)
clusters2 <- agnes(zjet, method = "complete")
clusters2## Call: agnes(x = zjet, method = "complete")
## Agglomerative coefficient: 0.7827311
## Order of objects:
## [1] FH-1 FJ-1 F3D-1 F9F-2 F-94A F-86A F11F-1 F3H-2 XF10F-1
## [10] F9F-6 F-89A F-8A F-100A F-102A YF-107A F-106A F-111A F4D-1
## [19] F-101A F-104B F-4B F-105B
## Height (summary):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4758 0.8758 1.4065 1.8129 2.4886 5.0347
##
## Available components:
## [1] "order" "height" "ac" "merge" "diss" "call"
## [7] "method" "order.lab" "data"
#dendogram
clust2d <- as.dendrogram(as.hclust(clusters2))
plot(clust2d, type = "rectangle", ylab = "Height", main = "Hasil Clustering JET Metode Agnes")Devisive (Diana)
#divisive
clusters3 <- diana(zjet)
clusters3## Merge:
## [,1] [,2]
## [1,] -1 -2
## [2,] -12 -14
## [3,] -17 -21
## [4,] -19 -20
## [5,] -10 -15
## [6,] -4 -5
## [7,] 1 -6
## [8,] -3 2
## [9,] -8 -9
## [10,] 5 4
## [11,] -7 -16
## [12,] 8 -13
## [13,] 12 -11
## [14,] 3 -18
## [15,] 11 9
## [16,] 7 6
## [17,] 10 -22
## [18,] 13 14
## [19,] 18 17
## [20,] 16 15
## [21,] 20 19
## Order of objects:
## [1] FH-1 FJ-1 F3D-1 F9F-2 F-94A F-89A F-8A XF10F-1 F9F-6
## [10] F-86A F11F-1 F3H-2 F-101A F4D-1 F-104B F-4B F-105B F-100A
## [19] F-102A YF-107A F-106A F-111A
## Height:
## [1] 0.4758243 1.0160824 2.4886138 0.8758366 3.9949627 1.5599817 2.4802087
## [8] 1.2663552 5.0346793 1.0356597 0.5835497 1.6709349 2.0093629 3.0793581
## [15] 0.5901064 2.2961139 3.9347559 0.8341884 1.3838706 0.6521073 2.5353280
## Divisive coefficient:
## [1] 0.7811598
##
## Available components:
## [1] "order" "height" "dc" "merge" "diss" "call"
## [7] "order.lab" "data"
#dendogram
clust3d <- as.dendrogram(as.hclust(clusters3))
plot(clust3d, type = "rectangle", ylab = "Height", main = "Hasil Clustering JET metode DIANA")+abline(h=4.5,col="red")## integer(0)
#menentukan jumlah cluster = 2, potong dendogram dengan agar jumlah cluster 2 (Lihat Garis Merah)
clusterCut <- cutree(cc, 2)
#keanggotaan cluster
clusterCut## FH-1 FJ-1 F-86A F9F-2 F-94A F3D-1 F-89A XF10F-1 F9F-6 F-100A
## 1 1 1 1 1 1 1 1 1 2
## F4D-1 F11F-1 F-101A F3H-2 F-102A F-8A F-104B F-105B YF-107A F-106A
## 2 1 2 1 2 1 2 2 2 2
## F-4B F-111A
## 2 2
Visualisasi Hullplot
library(dbscan)
jet2 <- cbind(jet,clusterCut)
hullplot(zjet,jet2$clusterCut)
## PROFILING CLUSTER
#Profiling Cluster
#Buat komponen utama dari variabel
pr2<-princomp(zjet)
summary(pr2, loadings = TRUE)## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 1.3740969 1.0531143 0.7251298 0.54330134
## Proportion of Variance 0.4945135 0.2904654 0.1377130 0.07730809
## Cumulative Proportion 0.4945135 0.7849789 0.9226919 1.00000000
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4
## SPR 0.601 0.305 0.184 0.715
## RGF 0.541 -0.246 -0.791 -0.146
## PLF 0.106 0.885 -0.122 -0.436
## SLF 0.579 -0.250 0.571 -0.526
PROFILING DENGAN PACKAGE FACTOEXTRA
#Menggunakan package factoextra
library(factoextra)
# Hierarchical clustering
hc.res <- eclust(data_dist, "hclust", k = 2, hc_metric = "euclidean",hc_method = "single", graph = FALSE)
# Visualisasi
fviz_dend(hc.res, show_labels = TRUE, palette = "jco")fviz_cluster(hc.res,zjet)require("cluster")
#Index Silhouette
sil <- silhouette(hc.res$cluster, dist(zjet))
fviz_silhouette(sil)## cluster size ave.sil.width
## 1 1 5 0.51
## 2 2 17 0.28
Penerapan 4
Tabel berikut menampilkan beberapa indikator sosial kependudukan kabupaten di Propinsi Jawa Tengah (38 kabupaten) dan Jawa Timur (35 kabupaten). Lakukan Pengelompokkan kabupaten-kabupaten tersebut berdasarkan indikator sosial kependudukannya!
Package
library(readr) #Membaca data
library(dplyr) #Data processing
library(DT) #Menampilkan tabel agar mudah dilihat di browser
library(readxl)Import Dataset
Data_jateng <- read_excel("Dataset Pert 11.xlsx")
Data_jateng## # A tibble: 73 x 6
## Kabkota Kriminalitas Jum.RS Padat.Pend IPM UMK
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Cilacap 394 9 804 69.6 1841209
## 2 Banyumas 895 16 1265 71.3 1589000
## 3 Purbalingga 193 4 1190 68.4 1655200
## 4 Banjarnegara 183 3 858 66.5 1490000
## 5 Kebumen 256 10 932 68.8 1560000
## 6 Purworejo 159 8 692 71.9 1573000
## 7 Wonosobo 130 3 800 67.8 1585000
## 8 Magelang 391 4 1179 69.1 1742000
## 9 Boyolali 292 10 965 73.2 1651650
## 10 Klaten 316 7 1787 74.8 1661632.
## # ... with 63 more rows
Standarisasi Variabel
datatable(Data_jateng, caption = "Variabel")data_standardized <- round(scale(Data_jateng[,2:6]),4)
# Hanya memilih kolom/variabel yang berisikan indikator yang akan digunakan
# standarisasi data dilakukan untuk menyamakan semua satuan pengukuran variabel
# perlu dinormalisasi sebelum masuk ke analisis klaster
datatable(data_standardized, caption = "Data Hasil Standardisasi")data_standardized## Kriminalitas Jum.RS Padat.Pend IPM UMK
## [1,] -0.3831 0.2195 -0.4905 -0.3726 -0.0509
## [2,] 0.2998 1.3299 -0.2828 -0.0135 -0.5360
## [3,] -0.6572 -0.5737 -0.3166 -0.6100 -0.4087
## [4,] -0.6708 -0.7323 -0.4662 -0.9959 -0.7263
## [5,] -0.5713 0.3781 -0.4328 -0.5295 -0.5917
## [6,] -0.7035 0.0608 -0.5410 0.1041 -0.5667
## [7,] -0.7430 -0.7323 -0.4923 -0.7338 -0.5437
## [8,] -0.3872 -0.5737 -0.3215 -0.4655 -0.2417
## [9,] -0.5222 0.3781 -0.4180 0.3827 -0.4155
## [10,] -0.4895 -0.0978 -0.0476 0.7067 -0.3963
## [11,] -0.6026 -0.0978 0.0020 0.9709 -0.4225
## [12,] -0.6871 0.2195 -0.6162 -0.4119 -0.6263
## [13,] -0.6217 -0.0978 -0.3400 0.8615 -0.3302
## [14,] -0.4513 0.3781 -0.4301 0.3291 -0.6177
## [15,] -0.6572 -0.0978 -0.5401 -0.4222 -0.5917
## [16,] -0.7471 -0.2564 -0.6365 -0.7049 -0.5840
## [17,] -0.7417 -0.7323 -0.5712 -0.3933 -0.6398
## [18,] -0.4540 0.2195 -0.4743 -0.1353 -0.5437
## [19,] -0.5713 -0.0978 0.0601 0.6634 0.0477
## [20,] -0.5767 -0.4150 -0.2963 0.0030 -0.2468
## [21,] -0.7512 -0.7323 -0.2747 -0.0218 0.3804
## [22,] -0.5413 -0.4150 -0.3576 0.4632 0.0621
## [23,] -0.7226 -0.5737 -0.4563 -0.5233 -0.5975
## [24,] -0.7294 -0.5737 -0.4193 -0.0177 0.1188
## [25,] -0.6626 -0.7323 -0.4175 -0.7235 -0.2265
## [26,] -0.6340 -0.7323 -0.3720 -0.4944 -0.2809
## [27,] -0.6258 -0.0978 -0.2742 -1.1755 -0.5379
## [28,] -0.6353 -0.2564 -0.1165 -0.8329 -0.4821
## [29,] -0.7499 0.3781 -0.3625 -1.1734 -0.6263
## [30,] -0.7240 -0.4150 2.1780 1.4332 -0.5533
## [31,] 0.3257 0.6954 4.4472 2.0833 -0.3827
## [32,] -0.5849 -0.5737 0.7770 2.2794 -0.2534
## [33,] 0.4798 1.9644 1.3011 2.3434 0.8507
## [34,] -0.6244 -0.2564 2.1987 0.5932 -0.1972
## [35,] -0.6844 -0.7323 -0.5284 0.6345 -0.4562
## [36,] -0.7062 -0.7323 -0.6730 -0.8329 -0.6882
## [37,] 0.1090 -0.4150 -0.5522 -0.3004 -0.6882
## [38,] 0.0476 -1.0496 -0.5797 -0.5481 -0.6882
## [39,] 0.9924 0.6954 -0.4112 0.1289 -0.3782
## [40,] -0.1964 -0.0978 -0.4626 -0.2963 -0.4121
## [41,] 0.4593 0.6954 -0.3432 -0.0610 -0.2967
## [42,] 2.4265 1.9644 -0.5221 -0.4057 1.3598
## [43,] 0.0013 -0.5737 -0.5910 -1.3488 -0.3397
## [44,] 1.8349 0.2195 -0.4973 -1.1156 0.0948
## [45,] 0.5684 1.1713 -0.7275 -0.2695 0.0269
## [46,] -0.5358 -0.5737 -0.6248 -1.2580 -0.3850
## [47,] 0.2480 -0.4150 -0.6694 -1.0207 -0.4823
## [48,] 1.3059 -0.4150 -0.5441 -1.3447 0.3369
## [49,] -0.0778 0.0608 -0.3585 -0.8164 3.2823
## [50,] 3.5171 3.0748 0.7221 1.6788 3.2879
## [51,] -0.3218 0.3781 -0.1566 0.2630 3.2653
## [52,] 1.0810 0.6954 -0.3441 0.1020 0.7624
## [53,] 0.7129 -0.5737 -0.4657 -0.0280 -0.3986
## [54,] 0.4689 -0.5737 -0.5572 -0.0734 -0.5592
## [55,] 0.1635 -0.8909 -0.4414 0.3187 -0.6882
## [56,] 0.7633 -0.7323 -0.5644 -0.3004 -0.5728
## [57,] -0.3368 0.3781 -0.5973 -0.7256 -0.2832
## [58,] 0.2439 -0.5737 -0.5658 -0.8122 0.3844
## [59,] 0.0013 0.5367 -0.5522 0.1247 -0.0320
## [60,] 0.3830 2.1230 -0.3616 0.8079 3.2936
## [61,] -0.2087 -0.7323 -0.4121 -1.7534 -0.3918
## [62,] -0.4418 -0.8909 -0.4986 -2.1393 -0.4529
## [63,] -0.3368 -0.2564 -0.3576 -1.2291 -0.5366
## [64,] -0.3545 -0.8909 -0.6081 -1.2622 -0.4280
## [65,] 0.2549 0.0608 1.1799 1.2825 -0.2107
## [66,] 0.1840 -0.5737 1.0979 1.2825 -0.4370
## [67,] 2.4129 1.1713 1.8333 1.9657 1.1584
## [68,] -0.0410 -1.0496 1.0159 0.2403 0.0359
## [69,] -0.2482 -0.8909 1.6882 0.7047 0.3844
## [70,] -0.3709 -0.2564 2.6493 1.1917 0.0359
## [71,] 0.4430 -0.4150 1.4985 1.8501 -0.4370
## [72,] 4.8122 5.1370 2.8570 2.1411 3.2993
## [73,] -0.4554 -0.4150 -0.1737 0.7583 0.9932
## attr(,"scaled:center")
## Kriminalitas Jum.RS Padat.Pend IPM UMK
## 6.750548e+02 7.616438e+00 1.892562e+03 7.136562e+01 1.867701e+06
## attr(,"scaled:scale")
## Kriminalitas Jum.RS Padat.Pend IPM UMK
## 7.335450e+02 6.303945e+00 2.219230e+03 4.845347e+00 5.199989e+05
Menentukan Jumlah Cluster
jumlah_klaster <- c(1:9) # Vektor yang berisikan jumlah klaster yang ingin dilihat nilai dari total within-cluster sum of squares
within_ss <- c() #Vektor kosong yang akan diisi nilai total within-cluster sum of squares
for (i in jumlah_klaster) {
within_ss <- c(within_ss, kmeans(x = data_standardized, centers = i,
nstart = 25)$tot.withinss)
}
plot(x = jumlah_klaster, y = within_ss, type = "b",
xlab = "Number of Cluster",
ylab = "Total Within Sum of Squares",
main = "Elbow Plot")
abline(v = 7, col='red')Dari Elbow Plot di atas, dapat dilihat bahwa pada titik ke tujuh nilai total WSS mulai menunjukkan penurunan yang kurang berarti, sehingga berdasarkan plot tersebut akan ditentukan jumlah klaster yang akan dibentuk adalah tujuh klaster.
Analisis Cluster
set.seed(123)
kmeans_clustering <- kmeans(x = data_standardized, centers = 7, nstart = 25) #parameter nstart digunakan untuk memberitahu fungsi berapa kali inisiasi centroid awal (secara acak) yang akan dibentuk.
kmeans_clustering## K-means clustering with 7 clusters of sizes 9, 3, 20, 3, 8, 2, 28
##
## Cluster means:
## Kriminalitas Jum.RS Padat.Pend IPM UMK
## 1 -0.190166667 -0.4855444 1.5870444 1.20640000 -0.1813778
## 2 -0.005533333 0.8539667 -0.2922333 0.08483333 3.2804000
## 3 -0.369370000 -0.2008950 -0.3604250 0.26878500 -0.2267200
## 4 1.072800000 1.2770333 2.5272000 2.13080000 0.5421333
## 5 1.121025000 0.7945375 -0.4590375 -0.37238750 0.1712375
## 6 4.164650000 4.1059000 1.7895500 1.90995000 3.2936000
## 7 -0.407153571 -0.4490357 -0.4888000 -0.84718214 -0.4734857
##
## Clustering vector:
## [1] 3 5 7 7 7 3 7 7 3 3 3 7 3 3 7 7 7 3 3 3 3 3 7 3 7 7 7 7 7 1 4 1 4 1 3 7 7 7
## [39] 5 3 5 5 7 5 5 7 7 5 2 6 2 5 3 3 3 7 7 7 3 2 7 7 7 7 1 1 4 1 1 1 1 6 3
##
## Within cluster sum of squares by cluster:
## [1] 9.475125 4.117009 13.178720 10.603871 12.497746 5.350801 15.614834
## (between_SS / total_SS = 80.3 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
# X = miu + Z*sigmaDari output : K-means clustering with 7 clusters of sizes 9, 3, 20, 3, 8, 2, 28
dapat di ketahui bahwa klaster pertama beranggotakan 9 observasi, klaster kedua 3 observasi, klaster ketiga 20 observasi, klaster ke empat 3 observasi, klaster kelima 8 observasi, klaster keenam 2 observasi, dan terakhir klaster ketujuh 28 observasi.
Centroid
menghitung nilai rata-rata cluster dari setiap variabel
Data_jateng %>%
mutate(Klaster = kmeans_clustering$cluster) %>%
group_by(Klaster) %>%
summarise(Mean_kriminalitas = mean(Kriminalitas), Mean_jumlahRS = mean(Jum.RS), Mean_kepadatanpnddk = mean(Padat.Pend), Mean_IPM = mean(IPM), Mean_UMK = mean(UMK))## # A tibble: 7 x 6
## Klaster Mean_kriminalitas Mean_jumlahRS Mean_kepadatanpnddk Mean_IPM Mean_UMK
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 536. 4.56 5415. 77.2 1773388.
## 2 2 671 13 1244 71.8 3573506.
## 3 3 404. 6.35 1093. 72.7 1749804.
## 4 4 1462 15.7 7501 81.7 2149620.
## 5 5 1497. 12.6 874. 69.6 1956743.
## 6 6 3730 33.5 5864 80.6 3580371.
## 7 7 376. 4.79 808. 67.3 1621483.
Pengelompokkan objek ke dalam cluster
D = Data_jateng %>%
mutate(Klaster = kmeans_clustering$cluster) %>%
select(Kabkota, Klaster) %>%
arrange(Klaster)
D## # A tibble: 73 x 2
## Kabkota Klaster
## <chr> <int>
## 1 Kota_Magelang 1
## 2 Kota_Salatiga 1
## 3 Kota_Pekalongan 1
## 4 Kota_Kediri 1
## 5 Kota_Blitar 1
## 6 Kota_Probolinggo 1
## 7 Kota_Pasuruan 1
## 8 Kota_Mojokerto 1
## 9 Kota_Madiun 1
## 10 Pasuruan 2
## # ... with 63 more rows
Penerapan 5 K-Means
CLUSTERING DENGAN METODE K-MEANS
library(NbClust)Tentukan jumlah cluster dengan elbow method
#menentukan jumlah cluster k
# Elbow method
fviz_nbclust(zjet, kmeans, method = "wss") +geom_vline(xintercept = 4, linetype = 2)+labs(subtitle = "Metode Elbow")# Banyak =4 (Tergantung interpretasi analis)EVALUASI JUMLAH CLUSTER BERDASARKAN METODE GAP STATISTIK
# Gap statistic with nboot = 50 to keep the function speedy.
# recommended value: nboot= 500 for your analysis.
# Use verbose = FALSE to hide computing progression.
set.seed(123)
fviz_nbclust(zjet, kmeans, nstart = 25, method = "gap_stat", nboot = 50)+labs(subtitle = "Metode Gap Statistik")lakukan K-Means
# K-means clustering
km.res <- eclust(zjet, "kmeans", k = 3, nstart = 25,graph = FALSE)
# Gambar Klaster
fviz_cluster(km.res, geom = "point", ellipse.type = "norm", palette = "jco", ggtheme = theme_minimal())EVALUASI BERDASARKAN KOEFISIEN SILUET
require("cluster")
sil <- silhouette(km.res$cluster, dist(zjet))
fviz_silhouette(sil)## cluster size ave.sil.width
## 1 1 9 0.32
## 2 2 8 0.13
## 3 3 5 0.44
# Silhouette information
silinfo <- km.res$silinfo
names(silinfo)## [1] "widths" "clus.avg.widths" "avg.width"
# Silhouette widths of each observation
head(silinfo$widths[, 1:3], 10)## cluster neighbor sil_width
## F11F-1 1 2 0.4168754
## F-8A 1 3 0.4128519
## F-86A 1 2 0.4121347
## F3H-2 1 2 0.3994896
## F-89A 1 3 0.3448718
## XF10F-1 1 3 0.2903407
## F-102A 1 2 0.2469427
## F9F-6 1 3 0.1992800
## F-100A 1 2 0.1913406
## F-104B 2 1 0.3511907
# Average silhouette width of each cluster
silinfo$clus.avg.widths## [1] 0.3237919 0.1304181 0.4447598
# The total average (mean of all individual silhouette widths)
silinfo$avg.width## [1] 0.2809669
# The size of each clusters
km.res$size## [1] 9 8 5
# Silhouette width of observation
sil <- km.res$silinfo$widths[, 1:3]
# Objects with negative silhouette
neg_sil_index <- which(sil[, 'sil_width'] < 0)
sil[neg_sil_index, , drop = FALSE]## cluster neighbor sil_width
## F-101A 2 1 -0.04272323
## F4D-1 2 1 -0.07872371
Penerapan Data
SOAL
- File crime.csv adalah file berisi data jumlah kejahatan di 50 negara bagian di Amerika Serikat berdasarkan jenis kejahatannya, yaitu:
- Murder : pembunuhan
- Rape : pemerkosaan
- Robbery : perampokan
- Assault : penyerangan
- Burglary : perampokan
- Theft : pencurian
- Vehicles : pencurian kendaraan bermotor
- Lakukan analisis Cluster pada data tersebut!
LOAD DATASET
crime<-read.csv2(file="crime.csv",sep=",")
crime## City Murder Rape Robbery Assault Burglary Theft Vehicle Type
## 1 AK 8.6 72.7 88 401 1162 3910 604 2
## 2 AL 10.1 28.4 112 408 1159 2304 267 2
## 3 AR 8.1 28.9 80 278 1030 2305 195 1
## 4 AZ 9.3 43 169 437 1908 4337 419 2
## 5 CA 11.3 44.9 343 521 1696 3384 762 2
## 6 CO 7 42.3 145 329 1792 4231 486 2
## 7 CT 4.6 23.8 192 205 1198 2758 447 1
## 8 DC 31 52.4 754 668 1728 4131 975 2
## 9 DE 4.9 56.9 124 241 1042 3090 272 1
## 10 FL 11.7 52.7 367 605 2221 4373 598 2
## 11 GA 11.2 43.9 214 319 1453 2984 430 2
## 12 HI 4.8 31 106 103 1339 3759 328 1
## 13 IA 1.8 12.5 42 179 956 2801 158 2
## 14 ID 3.2 20 21 178 1003 2800 181 2
## 15 IL 8.9 32.4 325 434 1180 2938 628 2
## 16 IN 6 25.9 90 186 887 2333 328 1
## 17 KS 4.4 32.9 80 252 1188 3008 258 1
## 18 KY 6.7 23.1 83 222 824 1740 193 2
## 19 LA 12.8 40.1 224 482 1461 3417 442 2
## 20 MA 3.6 29.7 193 331 1071 2189 906 2
## 21 MD 9 43.6 304 476 1296 2978 545 1
## 22 ME 2 14.8 28 102 803 2347 164 2
## 23 MI 11.3 67.4 301 424 1509 3378 800 2
## 24 MN 2.5 31.8 102 148 1004 2785 288 1
## 25 MO 9.2 29.2 170 370 1136 2500 439 2
## 26 MS 11.2 25.8 65 172 1076 1845 150 2
## 27 MT 2.9 17.3 20 118 783 3314 215 1
## 28 NC 8.1 26.4 88 354 1225 2423 208 2
## 29 ND 1 11.6 7 32 385 2049 120 1
## 30 NE 3.1 24.6 51 184 748 2677 168 1
## 31 NH 2.2 21.5 24 92 755 2208 228 2
## 32 NJ 5.2 33.2 269 265 1071 2822 776 2
## 33 NM 11.5 46.9 130 538 1845 3712 343 2
## 34 NV 12.6 64.9 287 354 1604 3489 478 2
## 35 NY 10.7 30.5 514 431 1221 2924 637 2
## 36 OH 5.5 38.6 142 235 988 2574 376 1
## 37 OK 8.1 36.4 107 285 1787 3142 649 2
## 38 OR 6.6 51.1 206 286 1967 4163 402 2
## 39 PA 5.5 25.1 152 176 735 1654 354 2
## 40 RI 3.5 21.4 119 192 1294 2568 705 1
## 41 SC 8.6 41.3 99 525 1340 2846 277 2
## 42 SD 4 17.7 16 87 554 1939 99 1
## 43 TN 10.4 47 208 274 1325 2126 544 2
## 44 TX 13.5 51.6 240 354 2049 3987 714 2
## 45 UT 3.2 25.3 59 180 915 4074 223 1
## 46 VA 7.1 26.5 106 167 813 2522 219 1
## 47 VT 2 21.8 22 103 949 2697 181 1
## 48 WA 5 53.4 135 244 1861 4267 315 2
## 49 WI 3.1 20.1 73 162 783 2802 254 1
## 50 WV 5.9 18.9 41 99 625 1358 169 1
## 51 WY 5.3 21.9 22 243 817 3078 169 1
str(crime)## 'data.frame': 51 obs. of 9 variables:
## $ City : chr "AK" "AL" "AR" "AZ" ...
## $ Murder : chr "8.6" "10.1" "8.1" "9.3" ...
## $ Rape : chr "72.7" "28.4" "28.9" "43" ...
## $ Robbery : int 88 112 80 169 343 145 192 754 124 367 ...
## $ Assault : int 401 408 278 437 521 329 205 668 241 605 ...
## $ Burglary: int 1162 1159 1030 1908 1696 1792 1198 1728 1042 2221 ...
## $ Theft : int 3910 2304 2305 4337 3384 4231 2758 4131 3090 4373 ...
## $ Vehicle : int 604 267 195 419 762 486 447 975 272 598 ...
## $ Type : int 2 2 1 2 2 2 1 2 1 2 ...
PRA PEMROSESAN
crime$Murder<-as.numeric(crime$Murder)
crime$Rape<-as.numeric(crime$Rape)
crime$Type<-factor(as.character(crime$Type),level = c("1","2"),labels = c("Tipe 1","Tipe 2"))head(crime)## City Murder Rape Robbery Assault Burglary Theft Vehicle Type
## 1 AK 8.6 72.7 88 401 1162 3910 604 Tipe 2
## 2 AL 10.1 28.4 112 408 1159 2304 267 Tipe 2
## 3 AR 8.1 28.9 80 278 1030 2305 195 Tipe 1
## 4 AZ 9.3 43.0 169 437 1908 4337 419 Tipe 2
## 5 CA 11.3 44.9 343 521 1696 3384 762 Tipe 2
## 6 CO 7.0 42.3 145 329 1792 4231 486 Tipe 2
MELIHAT ANGGOTA SETIAP TIPE CITY
library(tidyverse)
crime %>% filter(Type=="Tipe 1") %>% select(City)## City
## 1 AR
## 2 CT
## 3 DE
## 4 HI
## 5 IN
## 6 KS
## 7 MD
## 8 MN
## 9 MT
## 10 ND
## 11 NE
## 12 OH
## 13 RI
## 14 SD
## 15 UT
## 16 VA
## 17 VT
## 18 WI
## 19 WV
## 20 WY
crime %>% filter(Type=="Tipe 2") %>% select(City)## City
## 1 AK
## 2 AL
## 3 AZ
## 4 CA
## 5 CO
## 6 DC
## 7 FL
## 8 GA
## 9 IA
## 10 ID
## 11 IL
## 12 KY
## 13 LA
## 14 MA
## 15 ME
## 16 MI
## 17 MO
## 18 MS
## 19 NC
## 20 NH
## 21 NJ
## 22 NM
## 23 NV
## 24 NY
## 25 OK
## 26 OR
## 27 PA
## 28 SC
## 29 TN
## 30 TX
## 31 WA
PILIH VARIABEL
crime_sel<-crime %>% select(-c(Type))
crime_sel<-data.frame(crime_sel,row.names = 1)
crime_sel## Murder Rape Robbery Assault Burglary Theft Vehicle
## AK 8.6 72.7 88 401 1162 3910 604
## AL 10.1 28.4 112 408 1159 2304 267
## AR 8.1 28.9 80 278 1030 2305 195
## AZ 9.3 43.0 169 437 1908 4337 419
## CA 11.3 44.9 343 521 1696 3384 762
## CO 7.0 42.3 145 329 1792 4231 486
## CT 4.6 23.8 192 205 1198 2758 447
## DC 31.0 52.4 754 668 1728 4131 975
## DE 4.9 56.9 124 241 1042 3090 272
## FL 11.7 52.7 367 605 2221 4373 598
## GA 11.2 43.9 214 319 1453 2984 430
## HI 4.8 31.0 106 103 1339 3759 328
## IA 1.8 12.5 42 179 956 2801 158
## ID 3.2 20.0 21 178 1003 2800 181
## IL 8.9 32.4 325 434 1180 2938 628
## IN 6.0 25.9 90 186 887 2333 328
## KS 4.4 32.9 80 252 1188 3008 258
## KY 6.7 23.1 83 222 824 1740 193
## LA 12.8 40.1 224 482 1461 3417 442
## MA 3.6 29.7 193 331 1071 2189 906
## MD 9.0 43.6 304 476 1296 2978 545
## ME 2.0 14.8 28 102 803 2347 164
## MI 11.3 67.4 301 424 1509 3378 800
## MN 2.5 31.8 102 148 1004 2785 288
## MO 9.2 29.2 170 370 1136 2500 439
## MS 11.2 25.8 65 172 1076 1845 150
## MT 2.9 17.3 20 118 783 3314 215
## NC 8.1 26.4 88 354 1225 2423 208
## ND 1.0 11.6 7 32 385 2049 120
## NE 3.1 24.6 51 184 748 2677 168
## NH 2.2 21.5 24 92 755 2208 228
## NJ 5.2 33.2 269 265 1071 2822 776
## NM 11.5 46.9 130 538 1845 3712 343
## NV 12.6 64.9 287 354 1604 3489 478
## NY 10.7 30.5 514 431 1221 2924 637
## OH 5.5 38.6 142 235 988 2574 376
## OK 8.1 36.4 107 285 1787 3142 649
## OR 6.6 51.1 206 286 1967 4163 402
## PA 5.5 25.1 152 176 735 1654 354
## RI 3.5 21.4 119 192 1294 2568 705
## SC 8.6 41.3 99 525 1340 2846 277
## SD 4.0 17.7 16 87 554 1939 99
## TN 10.4 47.0 208 274 1325 2126 544
## TX 13.5 51.6 240 354 2049 3987 714
## UT 3.2 25.3 59 180 915 4074 223
## VA 7.1 26.5 106 167 813 2522 219
## VT 2.0 21.8 22 103 949 2697 181
## WA 5.0 53.4 135 244 1861 4267 315
## WI 3.1 20.1 73 162 783 2802 254
## WV 5.9 18.9 41 99 625 1358 169
## WY 5.3 21.9 22 243 817 3078 169
SKALA ATAU STANDARISASI
crime_scl<-scale(crime_sel)
crime_scl## Murder Rape Robbery Assault Burglary Theft
## AK 0.28006198 2.64103441 -0.47960926 0.79309857 -0.10687230 1.268026694
## AL 0.59146811 -0.39926368 -0.30546457 0.84028794 -0.11398472 -0.835659642
## AR 0.17625994 -0.36494881 -0.53765749 -0.03608599 -0.41981894 -0.834349750
## AZ 0.42538484 0.60273072 0.10812905 1.03578673 1.66175038 1.827350520
## CA 0.84059301 0.73312725 1.37067802 1.60205912 1.15913911 0.579023573
## CO -0.05210455 0.55468989 -0.06601563 0.30772225 1.38673666 1.688501982
## CT -0.55035436 -0.71496055 0.27501771 -0.52820365 -0.02152322 -0.240968735
## DC 4.93039348 1.24785040 4.35290576 2.59303578 1.23500496 1.557512796
## DE -0.48807313 1.55668430 -0.21839223 -0.28551549 -0.39136925 0.193915364
## FL 0.92363464 1.26843933 1.54482271 2.16833150 2.40381324 1.874506627
## GA 0.81983260 0.66449750 0.43465034 0.24030887 0.58303279 0.055066827
## HI -0.50883354 -0.22082633 -0.34900075 -1.21582011 0.31276069 1.070233022
## IA -1.13164579 -1.49047677 -0.81338657 -0.70347843 -0.59525872 -0.184643385
## ID -0.84100008 -0.97575362 -0.96576317 -0.71021977 -0.48383075 -0.185953277
## IL 0.34234321 -0.12474467 1.24006951 1.01556272 -0.06419776 -0.005188199
## IN -0.25970864 -0.57083807 -0.46509720 -0.65628907 -0.75884447 -0.797672777
## KS -0.59187517 -0.09042979 -0.53765749 -0.21136077 -0.04523130 0.086504231
## KY -0.11438578 -0.76300138 -0.51588940 -0.41360091 -0.90820536 -1.574438653
## LA 1.15199914 0.40370443 0.50721062 1.33914694 0.60199926 0.622250004
## MA -0.75795844 -0.31004501 0.28227374 0.32120492 -0.32261582 -0.986297206
## MD 0.36310361 0.64390857 1.08769291 1.29869891 0.21081596 0.047207475
## ME -1.09012498 -1.33262834 -0.91497097 -1.22256145 -0.95799233 -0.779334291
## MI 0.84059301 2.27729671 1.06592482 0.94814934 0.71579803 0.571164221
## MN -0.98632294 -0.16592252 -0.37802486 -0.91245991 -0.48145994 -0.205601654
## MO 0.40462443 -0.34435988 0.11538508 0.58411710 -0.16851330 -0.578920836
## MS 0.81983260 -0.57770104 -0.64649792 -0.75066780 -0.31076178 -1.436900008
## MT -0.90328130 -1.16105395 -0.97301920 -1.11470004 -1.00540849 0.487331142
## NC 0.17625994 -0.53652319 -0.47960926 0.47625569 0.04248860 -0.679782510
## ND -1.29772906 -1.55224355 -1.06734757 -1.69445510 -1.94899002 -1.169682067
## NE -0.86176048 -0.66005675 -0.74808232 -0.66977174 -1.08838676 -0.347069976
## NH -1.04860416 -0.87280899 -0.94399509 -1.28997483 -1.07179111 -0.961409261
## NJ -0.42579191 -0.06984087 0.83373191 -0.12372338 -0.32261582 -0.157135655
## NM 0.88211383 0.87038676 -0.17485606 1.71666186 1.51238948 1.008668104
## NV 1.11047832 2.10572233 0.96434042 0.47625569 0.94102478 0.716562218
## NY 0.71603056 -0.25514120 2.61145891 0.99533871 0.03300536 -0.023526685
## OH -0.36351068 0.30075980 -0.08778372 -0.32596351 -0.51939287 -0.481988838
## OK 0.17625994 0.14977434 -0.34174472 0.01110338 1.37488262 0.262029741
## OR -0.13514619 1.15863172 0.37660211 0.01784472 1.80162804 1.599429335
## PA -0.36351068 -0.62574187 -0.01522343 -0.72370245 -1.11920726 -1.687089354
## RI -0.77871885 -0.87967196 -0.25467237 -0.61584104 0.20607434 -0.489848189
## SC 0.28006198 0.48606014 -0.39979295 1.62902447 0.31513150 -0.125698251
## SD -0.67491681 -1.13360205 -1.00204332 -1.32368152 -1.54832349 -1.313770172
## TN 0.65374933 0.87724973 0.39111417 -0.06305134 0.27956938 -1.068820394
## TX 1.29732200 1.19294660 0.62330708 0.47625569 1.99603429 1.368888367
## UT -0.84100008 -0.61201592 -0.69003409 -0.69673710 -0.69246185 1.482848959
## VA -0.03134415 -0.52966022 -0.34900075 -0.78437449 -0.93428425 -0.550103215
## VT -1.09012498 -0.85222006 -0.95850715 -1.21582011 -0.61185438 -0.320872139
## WA -0.46731272 1.31648016 -0.13857592 -0.26529147 1.55032241 1.735658089
## WI -0.86176048 -0.96889064 -0.58844969 -0.81808118 -1.00540849 -0.183333493
## WV -0.28046905 -1.05124635 -0.82064260 -1.24278546 -1.37999613 -2.074817346
## WY -0.40503150 -0.84535708 -0.95850715 -0.27203281 -0.92480102 0.178196662
## Vehicle
## AK 0.93978076
## AL -0.56721792
## AR -0.88918797
## AZ 0.11249661
## CA 1.64632613
## CO 0.41210763
## CT 0.23770719
## DC 2.59882085
## DE -0.54485889
## FL 0.91294992
## GA 0.16168648
## HI -0.29443775
## IA -1.05464480
## ID -0.95179326
## IL 1.04710410
## IN -0.29443775
## KS -0.60746418
## KY -0.89813158
## LA 0.21534816
## MA 2.29026622
## MD 0.67594419
## ME -1.02781396
## MI 1.81625477
## MN -0.47330999
## MO 0.20193274
## MS -1.09041925
## MT -0.79975185
## NC -0.83105449
## ND -1.22457343
## NE -1.00992674
## NH -0.74161837
## NJ 1.70893142
## NM -0.22736066
## NV 0.37633318
## NY 1.08735036
## OH -0.07979105
## OK 1.14101203
## OR 0.03647591
## PA -0.17817079
## RI 1.39143318
## SC -0.52249986
## SD -1.31848136
## TN 0.67147239
## TX 1.43167944
## UT -0.76397740
## VA -0.78186462
## VT -0.95179326
## WA -0.35257123
## WI -0.62535140
## WV -1.00545493
## WY -1.00545493
## attr(,"scaled:center")
## Murder Rape Robbery Assault Burglary Theft Vehicle
## 7.25098 34.21765 154.09804 283.35294 1207.07843 2941.96078 393.84314
## attr(,"scaled:scale")
## Murder Rape Robbery Assault Burglary Theft Vehicle
## 4.816861 14.570940 137.816437 148.338508 421.797148 763.421796 223.623288
cor(crime_scl)## Murder Rape Robbery Assault Burglary Theft Vehicle
## Murder 1.0000000 0.5784284 0.8039630 0.7812251 0.5807205 0.3609238 0.5726158
## Rape 0.5784284 1.0000000 0.5298971 0.6586707 0.7213333 0.6347214 0.5693911
## Robbery 0.8039630 0.5298971 1.0000000 0.7403202 0.5514852 0.3995175 0.7857589
## Assault 0.7812251 0.6586707 0.7403202 1.0000000 0.7104563 0.5120665 0.6377937
## Burglary 0.5807205 0.7213333 0.5514852 0.7104563 1.0000000 0.7640201 0.5785400
## Theft 0.3609238 0.6347214 0.3995175 0.5120665 0.7640201 1.0000000 0.3858359
## Vehicle 0.5726158 0.5693911 0.7857589 0.6377937 0.5785400 0.3858359 1.0000000
heatmap(cor(crime_scl),main="Heatmap Korelasi")PEMILIHAN CLUSTER BERDASARKAN METODE ELBOW
k<-c(1:10)
ss<-rep(0,10)
for(i in 1:10){
klaster_cek<-eclust(crime_scl,"kmeans",k=k[i],nstart=25,graph = FALSE)
ss[i]<-klaster_cek$tot.withinss
}
plot(ss,xlab="K",ylab="SSW",type="b",main="Metode Elbow")+abline(v=2,col="red")## integer(0)
- INTERPRETASI Berdasarkan metode elbow, maka banyak cluster terpilih adalah 2
PEMILIHAN BANYAK CLUSTER DENGAN METODE SILUET
fviz_nbclust(crime_scl, kmeans, method = "silhouette")+ labs(subtitle = "Silhouette method")
* INTERPRETASI Berdasarkan metode siluet, maka banyak
cluster terpilih adalah 2
PEMILIHAN BANYAK CLUSTER DENGAN GAP STATISTIK
fviz_nbclust(crime_scl, kmeans, nstart = 25, method = "gap_stat", nboot = 50)+ labs(subtitle = "Metode Gap Statistik")
#### CLUSTERING
# K-means clustering
crime_cluster<- eclust(crime_scl, "kmeans", k = 2, nstart = 25,graph = FALSE)
fviz_cluster(crime_cluster, geom = "point", ellipse.type = "norm",
palette = "jco", ggtheme = theme_minimal())# Tambahkan cluster ke crime
crime$cluster<-crime_cluster$cluster
crime$cluster<-as.factor(crime$cluster)table(crime$Type,crime$cluster)##
## 1 2
## Tipe 1 19 1
## Tipe 2 11 20
DUNN INDEX
library(fpc)
km_stats <- cluster.stats(dist(crime_scl), crime_cluster$cluster)
# Dun index
km_stats$dunn## [1] 0.1598884
EXTERNAL VALIDATION
table(crime$Type,crime$cluster)##
## 1 2
## Tipe 1 19 1
## Tipe 2 11 20
chisq.test(table(crime$Type,crime$cluster))##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(crime$Type, crime$cluster)
## X-squared = 15.406, df = 1, p-value = 8.671e-05
- Interpretasi Terdapat perbedaan antara grup asli dengan grup hasil cluster
RAND INDEX
# Compute cluster stats
tipe <- as.numeric(crime$Type)
clust_stats<- cluster.stats(d = dist(crime_scl),tipe, crime_cluster$cluster)
# Corrected Rand index
clust_stats$corrected.rand## [1] 0.2656452
- Interpretasi Hasil rand index sebesar 0.26
VISUALISASI KARAKTERISTIK SETIAP CLUSTER
cluster1<-crime %>% filter(cluster==1)
cluster2<-crime %>% filter(cluster==2)crime %>% group_by(cluster)%>% summarise(`rata-rata kejahatan`=mean(Murder))%>%
ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Pembunuhan") crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Rape))%>%
ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Pemerkosaan") crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Robbery))%>%
ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Perampokan") crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Assault))%>%
ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Penyerangan") crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Burglary))%>%
ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Perampokan (Burglary)")crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Vehicle))%>%
ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Pencurian Kendaraan Bermotor")- Interpretasi Secara keseluruhan , dapat disimpulkan bahwa Kota yang termasuk cluster 2 memiliki jumlah kasus kejahatan lebih tinggi dibandingkan kota yang termasuk dalam cluster 1
Tugas PPT
Nomor 1
Pembahasan
Nomor 2
Pembahasan
Soal UAS
SOal UAS 1
Dibahas waktu praktikum
Soal UAS 2
Manual K-Means
Muh.Nurtanzis Sutoyo dalam “Algoritma K-Means”
Load Data
kdata = matrix(c(6,6.7,7.4,6.7,9.2,7.4,9.3,4.5,6.4,8.5,6.9,5.8,6.3,6.4,3.9,
2.92,3.07,3.22,2.93,3.03,3.29,3.28,2.72,2.92,3.49,3.08,2.83,3.18,3.2,3.29),15,2)
kdata = as.data.frame(kdata)
kdata## V1 V2
## 1 6.0 2.92
## 2 6.7 3.07
## 3 7.4 3.22
## 4 6.7 2.93
## 5 9.2 3.03
## 6 7.4 3.29
## 7 9.3 3.28
## 8 4.5 2.72
## 9 6.4 2.92
## 10 8.5 3.49
## 11 6.9 3.08
## 12 5.8 2.83
## 13 6.3 3.18
## 14 6.4 3.20
## 15 3.9 3.29
summary(kdata)## V1 V2
## Min. :3.90 Min. :2.720
## 1st Qu.:6.15 1st Qu.:2.925
## Median :6.70 Median :3.080
## Mean :6.76 Mean :3.097
## 3rd Qu.:7.40 3rd Qu.:3.250
## Max. :9.30 Max. :3.490
Hitung Jarak
library(dplyr)
kdata1=mutate(.data=kdata, Jarak_C1 = sqrt((V1-3.9)^2+(V2-2.7)^2),
Jarak_C2 = sqrt((V1-6.6)^2+(V2-3.1)^2),
Jarak_C3 = sqrt((V1-9.3)^2+(V2-3.5)^2))
kdata1## V1 V2 Jarak_C1 Jarak_C2 Jarak_C3
## 1 6.0 2.92 2.1114924 0.6264184 3.3505820
## 2 6.7 3.07 2.8243406 0.1044031 2.6353178
## 3 7.4 3.22 3.5384177 0.8089499 1.9205208
## 4 6.7 2.93 2.8094305 0.1972308 2.6617475
## 5 9.2 3.03 5.3102636 2.6009421 0.4805206
## 6 7.4 3.29 3.5493802 0.8222530 1.9115700
## 7 9.3 3.28 5.4310588 2.7059933 0.2200000
## 8 4.5 2.72 0.6003332 2.1341040 4.8629621
## 9 6.4 2.92 2.5096613 0.2690725 2.9574313
## 10 8.5 3.49 4.6673440 1.9396134 0.8000625
## 11 6.9 3.08 3.0239709 0.3006659 2.4364729
## 12 5.8 2.83 1.9044422 0.8443341 3.5635516
## 13 6.3 3.18 2.4475294 0.3104835 3.0170184
## 14 6.4 3.20 2.5495098 0.2236068 2.9154759
## 15 3.9 3.29 0.5900000 2.7066769 5.4040818
*Dengan memperhatikan jarak terkecil didapatkan :
Hitung Pusat Cluster Baru
gg = data.frame(kdata1$Jarak_C1,kdata1$Jarak_C2,kdata1$Jarak_C3)
gg## kdata1.Jarak_C1 kdata1.Jarak_C2 kdata1.Jarak_C3
## 1 2.1114924 0.6264184 3.3505820
## 2 2.8243406 0.1044031 2.6353178
## 3 3.5384177 0.8089499 1.9205208
## 4 2.8094305 0.1972308 2.6617475
## 5 5.3102636 2.6009421 0.4805206
## 6 3.5493802 0.8222530 1.9115700
## 7 5.4310588 2.7059933 0.2200000
## 8 0.6003332 2.1341040 4.8629621
## 9 2.5096613 0.2690725 2.9574313
## 10 4.6673440 1.9396134 0.8000625
## 11 3.0239709 0.3006659 2.4364729
## 12 1.9044422 0.8443341 3.5635516
## 13 2.4475294 0.3104835 3.0170184
## 14 2.5495098 0.2236068 2.9154759
## 15 0.5900000 2.7066769 5.4040818
for (i in 1:15) {
a[i]= min(gg[i,])
print(a[i])
}## [1] 0.6264184
## [1] 0.1044031
## [1] 0.8089499
## [1] 0.1972308
## [1] 0.4805206
## [1] 0.822253
## [1] 0.22
## [1] 0.6003332
## [1] 0.2690725
## [1] 0.8000625
## [1] 0.3006659
## [1] 0.8443341
## [1] 0.3104835
## [1] 0.2236068
## [1] 0.59
Iterasi Kedua
kdata2=mutate(.data=kdata, Jarak_C1 = sqrt((V1-4.2)^2+(V2-2.72)^2),
Jarak_C2 = sqrt((V1-6.6)^2+(V2-3.06)^2),
Jarak_C3 = sqrt((V1-9)^2+(V2-3.26)^2))
kdata2## V1 V2 Jarak_C1 Jarak_C2 Jarak_C3
## 1 6.0 2.92 1.8110770 0.6161169 3.0192052
## 2 6.7 3.07 2.5243811 0.1004988 2.3078345
## 3 7.4 3.22 3.2388269 0.8158431 1.6004999
## 4 6.7 2.93 2.5088045 0.1640122 2.3235533
## 5 9.2 3.03 5.0096008 2.6001731 0.3047950
## 6 7.4 3.29 3.2503692 0.8324062 1.6002812
## 7 9.3 3.28 5.1306530 2.7089481 0.3006659
## 8 4.5 2.72 0.3000000 2.1273458 4.5322842
## 9 6.4 2.92 2.2090722 0.2441311 2.6221365
## 10 8.5 3.49 4.3683979 1.9480503 0.5503635
## 11 6.9 3.08 2.7238943 0.3006659 2.1077002
## 12 5.8 2.83 1.6037768 0.8324062 3.2287614
## 13 6.3 3.18 2.1497907 0.3231099 2.7011849
## 14 6.4 3.20 2.2517549 0.2441311 2.6006922
## 15 3.9 3.29 0.6441273 2.7097786 5.1000882
## Iterasi ke Tiga
kdata3=mutate(.data=kdata, Jarak_C1 = sqrt((V1-4.2)^2+(V2-2.72)^2),
Jarak_C2 = sqrt((V1-6.6)^2+(V2-3.06)^2),
Jarak_C3 = sqrt((V1-9)^2+(V2-3.26)^2))
kdata3## V1 V2 Jarak_C1 Jarak_C2 Jarak_C3
## 1 6.0 2.92 1.8110770 0.6161169 3.0192052
## 2 6.7 3.07 2.5243811 0.1004988 2.3078345
## 3 7.4 3.22 3.2388269 0.8158431 1.6004999
## 4 6.7 2.93 2.5088045 0.1640122 2.3235533
## 5 9.2 3.03 5.0096008 2.6001731 0.3047950
## 6 7.4 3.29 3.2503692 0.8324062 1.6002812
## 7 9.3 3.28 5.1306530 2.7089481 0.3006659
## 8 4.5 2.72 0.3000000 2.1273458 4.5322842
## 9 6.4 2.92 2.2090722 0.2441311 2.6221365
## 10 8.5 3.49 4.3683979 1.9480503 0.5503635
## 11 6.9 3.08 2.7238943 0.3006659 2.1077002
## 12 5.8 2.83 1.6037768 0.8324062 3.2287614
## 13 6.3 3.18 2.1497907 0.3231099 2.7011849
## 14 6.4 3.20 2.2517549 0.2441311 2.6006922
## 15 3.9 3.29 0.6441273 2.7097786 5.1000882