Praktikum 11 APG : Clustering

Terkadang Ambisi Tinggi dapat Menjatuhkan ke Jurang Terdalam, Lakukan versi terbaik dan Tetaplah Bersyukur

My Image

Pengantar

Summary Clustering

!! Password is required to open pdf.

Chat me untuk Pasword

SEKILAS ANALISIS CLUSTER

  • Analisis cluster (atau clustering, segmentasi data, …) adalah suatu teknik analisis yang bertujuan menemukan kesamaan antara data sesuai dengan karakteristik yang ditemukan dalam data dan mengelompokkan objek data serupa ke dalam kelompok.

  • Metode analisis kluster adalah teknik di mana tidak ada asumsi yang dibuat mengenai jumlah kelompok atau struktur kelompok.

  • Analisis cluster adalah istilah umum untuk berbagai metode numerik yang memiliki tujuan secara umum mengungkap atau menemukan kelompok atau kelompok pengamatan yang homogen dan terpisah dari kelompok lain.

UKURAN SIMILARITY DAN DISSIMILARITY

  • Karena analisis cluster berupaya mengidentifikasi vektor-vektor pengamatan yang memiliki kemiripan dan mengelompokkannya ke dalam kelompok-kelompok, banyak teknik yang menggunakan indeks kesamaan (similarity) atau kedekatan (proximity) antara masing-masing pasangan pengamatan.

  • Ukuran proximity yang umum digunakan adalah jarak antara dua pengamatan.

  • Pemilihan metrik jarak yang digunakan perlu mempertimbangkan (jenis variabel, sifat variabel, skala).

JENIS METRIK JARAK

  • [Numerik] Euclidean Distance, Squared Euclidean Distance, Normalized Squared Euclidean Distance, Manhattan Distance, Chessboard Distance, Bray-Curtis Distance, Canberra Distance, Cosine Distance, Correlation Distance, Binary Distance, Warping Distance, Canonical Warping Distance

  • [Boolean] Hamming Distance, Jaccard Dissimilarity, Matching Dissimilarity, Dice Dissimilarity, Rogers-Tanimoto Dissimilarity, Russell-Rao Dissimilarity, Sokal-Sneath Dissimilarity, Yule Dissimilarity

  • [String] Edit Distance, Damerau-Levenshtein Distance, Hamming Distance, Smith-Waterman Similarity, Needleman-Wunsch Similarity

METODE PENGELOMPOKAN HIERARKI

  • Divisif => merupakan metode pengelompokan dimana pada awalanya setiap objek digabungkan menjadi satu kelompok besar , kemudian di lakukan pengelompokan ulang hingga setiap objek dianggap sebagai satu kelompok terpisah.

  • Aglomeratif => Metode alglomerative merupakan metode pengelompokan dimana awalnya setiap objek dianggap sebagai satu cluster terpisah, kemudian dilakukan pengelompokkan satu per satu sehingga membentuk satu cluster besar.

METODE PENGELOMPOKKAN NON HIERARKI

  • K-Means, C-Means, K-Medoids, K-Prototype, dll.

Penerapan 1 Hirarki

Single Linkage

dist1 = matrix(c(0,9,3,6,11,9,0,7,5,10,3,7,0,9,2,6,5,9,0,8,11,10,2,8,0),5,5)
a=dist(dist1, method = "euclidean")
single1 <- hclust(a, method = "single")
plot(single1)

Complete Linkage

a=dist(dist1, method = "euclidean")
single2 <- hclust(a, method = "complete")
plot(single2)

Penerapan 2 Hirarki

  • Data yang ditunjukkan pada Tabel 1 diambil dari Stanley dan Miller (1979) adalah nilai dari enam variabel untuk 22 pesawat tempur AS. Variabelnya adalah sebagai berikut:

  • FFD: tanggal penerbangan pertama, dalam beberapa bulan setelah Januari 1940;

  • SPR: daya spesifik, sebanding dengan daya per satuan berat;

  • RGF: faktor jangkauan penerbangan;

  • PLF: muatan sebagai sebagian kecil dari berat kotor pesawat;

  • SLF: faktor beban berkelanjutan;

  • CAR: variabel biner yang mengambil nilai 1 jika pesawat dapat mendarat di kapal induk dan 0 sebaliknya.

  • Lakukan analisis cluster pada data tersebut!

jet <- read.csv("jet.csv", row.names = 1)
jet
##         FFD   SPR  RGF   PLF  SLF CAR
## FH-1     82 1.468 3.30 0.166 0.10  no
## FJ-1     89 1.605 3.64 0.154 0.10  no
## F-86A   101 2.168 4.87 0.177 2.90 yes
## F9F-2   107 2.054 4.72 0.275 1.10  no
## F-94A   115 2.467 4.11 0.298 1.00 yes
## F3D-1   122 1.294 3.75 0.150 0.90  no
## F-89A   127 2.183 3.97 0.000 2.40 yes
## XF10F-1 137 2.426 4.65 0.117 1.80  no
## F9F-6   147 2.607 3.84 0.155 2.30  no
## F-100A  166 4.567 4.92 0.138 3.20 yes
## F4D-1   174 4.588 3.82 0.249 3.50  no
## F11F-1  175 3.618 4.32 0.143 2.80  no
## F-101A  177 5.855 4.53 0.172 2.50 yes
## F3H-2   184 2.898 4.48 0.178 3.00  no
## F-102A  187 3.880 5.39 0.101 3.00 yes
## F-8A    189 0.455 4.99 0.008 2.64  no
## F-104B  194 8.088 4.50 0.251 2.70 yes
## F-105B  197 6.502 5.20 0.366 2.90 yes
## YF-107A 201 6.081 5.65 0.106 2.90 yes
## F-106A  204 7.105 5.40 0.089 3.20 yes
## F-4B    255 8.548 4.20 0.222 2.90  no
## F-111A  328 6.321 6.45 0.187 2.00 yes
# Pilih variabel kecuali variabel FFD dan CAR dan skalakan ke N(0,1)
zjet <- scale(jet[,names(jet)!=c("FFD","CAR")])
zjet
##                 SPR         RGF         PLF         SLF
## FH-1    -1.04614198 -1.69627044 -0.02622718 -2.15830567
## FJ-1    -0.98826834 -1.24473653 -0.16470671 -2.15830567
## F-86A   -0.75043723  0.38875379  0.10071238  0.63245061
## F9F-2   -0.79859486  0.18954766  1.23162849 -1.16160700
## F-94A   -0.62412905 -0.62055730  1.49704758 -1.26127686
## F3D-1   -1.11964574 -1.09865203 -0.21086655 -1.36094673
## F-89A   -0.74410070 -0.80648303 -1.94186060  0.13410128
## XF10F-1 -0.64144890  0.09658479 -0.59168524 -0.46391793
## F9F-6   -0.56498810 -0.97912835 -0.15316675  0.03443141
## F-100A   0.26298522  0.45515584 -0.34934607  0.93146021
## F4D-1    0.27185636 -1.00568916  0.93158952  1.23046981
## F11F-1  -0.13790595 -0.34166871 -0.29164627  0.53278074
## F-101A   0.80708197 -0.06278012  0.04301258  0.23377114
## F3H-2   -0.44205941 -0.12918216  0.11225234  0.73212048
## F-102A  -0.02722788  1.07933507 -0.77632460  0.73212048
## F-8A    -1.47406901  0.54811870 -1.84954092  0.37330896
## F-104B   1.75038014 -0.10262134  0.95466945  0.43311088
## F-105B   1.08039765  0.82700730  2.28176488  0.63245061
## YF-107A  0.90255236  1.42462571 -0.71862480  0.63245061
## F-106A   1.33512618  1.09261548 -0.91480413  0.93146021
## F-4B     1.94470041 -0.50103362  0.62001060  0.63245061
## F-111A   1.00393685  2.48705844  0.21611198 -0.26457819
## attr(,"scaled:center")
##       SPR       RGF       PLF       SLF 
## 3.9444545 4.5772727 0.1682727 2.2654545 
## attr(,"scaled:scale")
##        SPR        RGF        PLF        SLF 
## 2.36722604 0.75298885 0.08665541 1.00331226
#Ekstrak korelasi
cor(zjet)
##           SPR         RGF         PLF        SLF
## SPR 1.0000000  0.44351234  0.33176076  0.5406762
## RGF 0.4435123  1.00000000 -0.06613601  0.4655696
## PLF 0.3317608 -0.06613601  1.00000000 -0.1030198
## SLF 0.5406762  0.46556957 -0.10301985  1.0000000
# Buat distance matrix terlebih dahulu
dj<-dist(zjet)
dj
##              FH-1      FJ-1     F-86A     F9F-2     F-94A     F3D-1     F-89A
## FJ-1    0.4758243                                                            
## F-86A   3.4984570 3.2532479                                                  
## F9F-2   2.4886138 2.2441719 2.1306375                                        
## F-94A   2.1119376 2.0219508 2.5633217 0.8758366                              
## F3D-1   1.0160824 0.8225039 2.5336546 1.9705482 1.8441844                    
## F-89A   3.1317298 2.9436524 2.4184887 3.5700056 3.7178127 2.3362161          
## XF10F-1 2.5629668 2.2299523 1.3336715 1.9607600 2.3480159 1.6146042 1.7339750
## F9F-6   2.3600872 2.2489871 1.5256390 2.1836990 2.1293516 1.5074284 1.8086590
## F-100A  3.9991884 3.7464669 1.1503885 2.8421666 3.1876944 3.0984340 2.4037135
## F4D-1   3.8229876 3.7855941 2.0093629 2.8959475 2.7348968 3.1568284 3.2450873
## F11F-1  3.1578841 2.9659244 1.0356597 2.4310525 2.5946599 2.2648545 1.8616329
## F-101A  3.4394121 3.2226515 1.6709349 2.4498480 2.5901480 2.7189869 2.6284819
## F3H-2   3.3458065 3.1581760 0.6110825 2.2512026 2.4831210 2.4257640 2.2642819
## F-102A  4.2023067 3.8798901 1.3338149 3.0008012 3.5196277 3.2615453 2.4054766
## F-8A    3.8670337 3.5634353 2.1023041 3.5262459 3.9949627 2.9206791 1.5599817
## F-104B  4.2470992 4.0954933 2.6952766 3.0335569 3.0119222 3.7156752 3.8983521
## F-105B  4.8994132 4.7270546 2.8811478 2.8737426 3.0336182 4.3284176 4.9075910
## YF-107A 4.6695971 4.3354173 2.1158258 3.3825213 3.8741370 3.8324492 3.0714769
## F-106A  4.8769401 4.5793750 2.4424734 3.7889841 4.1711829 4.0716356 3.1016747
## F-4B    4.3104145 4.1904059 2.8853352 3.4051858 3.3118829 3.7962290 3.7595860
## F-111A  5.0346793 4.6504195 2.8807444 3.2192563 3.8654985 4.3302722 4.3265277
##           XF10F-1     F9F-6    F-100A     F4D-1    F11F-1    F-101A     F3H-2
## FJ-1                                                                         
## F-86A                                                                        
## F9F-2                                                                        
## F-94A                                                                        
## F3D-1                                                                        
## F-89A                                                                        
## XF10F-1                                                                      
## F9F-6   1.2663552                                                            
## F-100A  1.7182499 1.8936362                                                  
## F4D-1   2.6908085 1.8188508 1.9657949                                        
## F11F-1  1.2365489 0.9253554 0.9787327 1.6099392                              
## F-101A  1.7358716 1.6734674 1.0977314 1.7200352 1.0826746                    
## F3H-2   1.4201278 1.1378652 1.0446761 1.4824324 0.5835497 1.3482988          
## F-102A  1.6756089 2.3241242 0.8341884 2.7571906 1.5186016 1.7088492 1.5563316
## F-8A    1.7833364 2.4802087 2.3639198 3.7325575 2.2426592 3.0295341 2.3454549
## F-104B  2.9927250 2.7414305 2.1147736 1.9073222 2.2772802 1.3274989 2.3678197
## F-105B  3.5995850 3.5008484 2.7961738 2.4885972 3.0793581 2.4571081 2.8193780
## YF-107A 2.3164132 2.9341231 1.2548671 3.0635016 2.0963287 1.7206183 2.2186912
## F-106A  2.6363634 3.0475086 1.3695196 3.0053288 2.1850093 1.7372255 2.3970314
## F-4B    3.1169772 2.7354234 2.1843817 1.8729103 2.2811617 1.4064544 2.4703539
## F-111A  3.0189288 3.8342861 2.5353280 3.9347559 3.1936033 2.6112725 3.1527472
##            F-102A      F-8A    F-104B    F-105B   YF-107A    F-106A      F-4B
## FJ-1                                                                         
## F-86A                                                                        
## F9F-2                                                                        
## F-94A                                                                        
## F3D-1                                                                        
## F-89A                                                                        
## XF10F-1                                                                      
## F9F-6                                                                        
## F-100A                                                                       
## F4D-1                                                                        
## F11F-1                                                                       
## F-101A                                                                       
## F3H-2                                                                        
## F-102A                                                                       
## F-8A    1.9120876                                                            
## F-104B  2.7645358 4.3229281                                                  
## F-105B  3.2637937 4.8721577 1.7646542                                        
## YF-107A 0.9984890 2.7861657 2.4271270 3.0644926                              
## F-106A  1.3838706 3.0615878 2.3117765 3.2315468 0.6521073                    
## F-4B    2.8889016 4.3536817 0.5901064 2.2961139 2.5663556 2.3143814          
## F-111A  2.2412774 3.8175676 2.8802528 2.7987852 1.6785219 2.1825757 3.2835233
# Analisis cluster aglomerative
cc <- hclust(dj)
cc
## 
## Call:
## hclust(d = dj)
## 
## Cluster method   : complete 
## Distance         : euclidean 
## Number of objects: 22
#dendogram
ccd <- as.dendrogram(cc)
plot(ccd, type = "rectangle", ylab = "Height", main = "Hasil Clustering Jet")

# Buat distance matrix terlebih dahulu
# Method ada beberapa pilihan
data_dist <- dist(zjet, method = "euclidean")

# Clustering
## Single linkage
single <- hclust(data_dist, method = "single")
plot(single, hang = -2, cex = 1)

## Complete linkage
Complete <- hclust(data_dist, method = "complete")
plot(Complete, hang = -2, cex = 1)

## Average linkage
Average <- hclust(data_dist, method = "average")
plot(Average, hang = -2, cex = 1)

Penerapan 3 Package Cluster

#analisis cluster dengan package cluster
library(cluster)

Agnes (Aglomerative)

clusters2 <- agnes(zjet, method = "complete")
clusters2
## Call:     agnes(x = zjet, method = "complete") 
## Agglomerative coefficient:  0.7827311 
## Order of objects:
##  [1] FH-1    FJ-1    F3D-1   F9F-2   F-94A   F-86A   F11F-1  F3H-2   XF10F-1
## [10] F9F-6   F-89A   F-8A    F-100A  F-102A  YF-107A F-106A  F-111A  F4D-1  
## [19] F-101A  F-104B  F-4B    F-105B 
## Height (summary):
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4758  0.8758  1.4065  1.8129  2.4886  5.0347 
## 
## Available components:
## [1] "order"     "height"    "ac"        "merge"     "diss"      "call"     
## [7] "method"    "order.lab" "data"
#dendogram
clust2d <- as.dendrogram(as.hclust(clusters2))
plot(clust2d, type = "rectangle", ylab = "Height", main = "Hasil Clustering JET Metode Agnes")

Devisive (Diana)

#divisive
clusters3 <- diana(zjet)
clusters3
## Merge:
##       [,1] [,2]
##  [1,]   -1   -2
##  [2,]  -12  -14
##  [3,]  -17  -21
##  [4,]  -19  -20
##  [5,]  -10  -15
##  [6,]   -4   -5
##  [7,]    1   -6
##  [8,]   -3    2
##  [9,]   -8   -9
## [10,]    5    4
## [11,]   -7  -16
## [12,]    8  -13
## [13,]   12  -11
## [14,]    3  -18
## [15,]   11    9
## [16,]    7    6
## [17,]   10  -22
## [18,]   13   14
## [19,]   18   17
## [20,]   16   15
## [21,]   20   19
## Order of objects:
##  [1] FH-1    FJ-1    F3D-1   F9F-2   F-94A   F-89A   F-8A    XF10F-1 F9F-6  
## [10] F-86A   F11F-1  F3H-2   F-101A  F4D-1   F-104B  F-4B    F-105B  F-100A 
## [19] F-102A  YF-107A F-106A  F-111A 
## Height:
##  [1] 0.4758243 1.0160824 2.4886138 0.8758366 3.9949627 1.5599817 2.4802087
##  [8] 1.2663552 5.0346793 1.0356597 0.5835497 1.6709349 2.0093629 3.0793581
## [15] 0.5901064 2.2961139 3.9347559 0.8341884 1.3838706 0.6521073 2.5353280
## Divisive coefficient:
## [1] 0.7811598
## 
## Available components:
## [1] "order"     "height"    "dc"        "merge"     "diss"      "call"     
## [7] "order.lab" "data"
#dendogram
clust3d <- as.dendrogram(as.hclust(clusters3))
plot(clust3d, type = "rectangle", ylab = "Height", main = "Hasil Clustering JET metode DIANA")+abline(h=4.5,col="red")

## integer(0)
#menentukan jumlah cluster = 2, potong dendogram dengan agar jumlah cluster 2 (Lihat Garis Merah)
clusterCut <- cutree(cc, 2)

#keanggotaan cluster
clusterCut
##    FH-1    FJ-1   F-86A   F9F-2   F-94A   F3D-1   F-89A XF10F-1   F9F-6  F-100A 
##       1       1       1       1       1       1       1       1       1       2 
##   F4D-1  F11F-1  F-101A   F3H-2  F-102A    F-8A  F-104B  F-105B YF-107A  F-106A 
##       2       1       2       1       2       1       2       2       2       2 
##    F-4B  F-111A 
##       2       2

Visualisasi Hullplot

library(dbscan)
jet2 <- cbind(jet,clusterCut)
hullplot(zjet,jet2$clusterCut)

## PROFILING CLUSTER

#Profiling Cluster
#Buat komponen utama dari variabel
pr2<-princomp(zjet)
summary(pr2, loadings = TRUE)
## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4
## Standard deviation     1.3740969 1.0531143 0.7251298 0.54330134
## Proportion of Variance 0.4945135 0.2904654 0.1377130 0.07730809
## Cumulative Proportion  0.4945135 0.7849789 0.9226919 1.00000000
## 
## Loadings:
##     Comp.1 Comp.2 Comp.3 Comp.4
## SPR  0.601  0.305  0.184  0.715
## RGF  0.541 -0.246 -0.791 -0.146
## PLF  0.106  0.885 -0.122 -0.436
## SLF  0.579 -0.250  0.571 -0.526

PROFILING DENGAN PACKAGE FACTOEXTRA

#Menggunakan package factoextra
library(factoextra)

# Hierarchical clustering
hc.res <- eclust(data_dist, "hclust", k = 2, hc_metric = "euclidean",hc_method = "single", graph = FALSE)

# Visualisasi
fviz_dend(hc.res, show_labels = TRUE, palette = "jco")

fviz_cluster(hc.res,zjet)

require("cluster")

#Index Silhouette
sil <- silhouette(hc.res$cluster, dist(zjet))
fviz_silhouette(sil)
##   cluster size ave.sil.width
## 1       1    5          0.51
## 2       2   17          0.28

Penerapan 4

Tabel berikut menampilkan beberapa indikator sosial kependudukan kabupaten di Propinsi Jawa Tengah (38 kabupaten) dan Jawa Timur (35 kabupaten). Lakukan Pengelompokkan kabupaten-kabupaten tersebut berdasarkan indikator sosial kependudukannya!

Package

library(readr) #Membaca data
library(dplyr) #Data processing
library(DT) #Menampilkan tabel agar mudah dilihat di browser
library(readxl)

Import Dataset

Data_jateng <- read_excel("Dataset Pert 11.xlsx")
Data_jateng
## # A tibble: 73 x 6
##    Kabkota      Kriminalitas Jum.RS Padat.Pend   IPM      UMK
##    <chr>               <dbl>  <dbl>      <dbl> <dbl>    <dbl>
##  1 Cilacap               394      9        804  69.6 1841209 
##  2 Banyumas              895     16       1265  71.3 1589000 
##  3 Purbalingga           193      4       1190  68.4 1655200 
##  4 Banjarnegara          183      3        858  66.5 1490000 
##  5 Kebumen               256     10        932  68.8 1560000 
##  6 Purworejo             159      8        692  71.9 1573000 
##  7 Wonosobo              130      3        800  67.8 1585000 
##  8 Magelang              391      4       1179  69.1 1742000 
##  9 Boyolali              292     10        965  73.2 1651650 
## 10 Klaten                316      7       1787  74.8 1661632.
## # ... with 63 more rows

Standarisasi Variabel

datatable(Data_jateng, caption = "Variabel")
data_standardized <- round(scale(Data_jateng[,2:6]),4) 
# Hanya memilih kolom/variabel yang berisikan indikator yang akan digunakan
# standarisasi data dilakukan untuk menyamakan semua satuan pengukuran variabel
# perlu dinormalisasi sebelum masuk ke analisis klaster

datatable(data_standardized, caption = "Data Hasil Standardisasi")
data_standardized
##       Kriminalitas  Jum.RS Padat.Pend     IPM     UMK
##  [1,]      -0.3831  0.2195    -0.4905 -0.3726 -0.0509
##  [2,]       0.2998  1.3299    -0.2828 -0.0135 -0.5360
##  [3,]      -0.6572 -0.5737    -0.3166 -0.6100 -0.4087
##  [4,]      -0.6708 -0.7323    -0.4662 -0.9959 -0.7263
##  [5,]      -0.5713  0.3781    -0.4328 -0.5295 -0.5917
##  [6,]      -0.7035  0.0608    -0.5410  0.1041 -0.5667
##  [7,]      -0.7430 -0.7323    -0.4923 -0.7338 -0.5437
##  [8,]      -0.3872 -0.5737    -0.3215 -0.4655 -0.2417
##  [9,]      -0.5222  0.3781    -0.4180  0.3827 -0.4155
## [10,]      -0.4895 -0.0978    -0.0476  0.7067 -0.3963
## [11,]      -0.6026 -0.0978     0.0020  0.9709 -0.4225
## [12,]      -0.6871  0.2195    -0.6162 -0.4119 -0.6263
## [13,]      -0.6217 -0.0978    -0.3400  0.8615 -0.3302
## [14,]      -0.4513  0.3781    -0.4301  0.3291 -0.6177
## [15,]      -0.6572 -0.0978    -0.5401 -0.4222 -0.5917
## [16,]      -0.7471 -0.2564    -0.6365 -0.7049 -0.5840
## [17,]      -0.7417 -0.7323    -0.5712 -0.3933 -0.6398
## [18,]      -0.4540  0.2195    -0.4743 -0.1353 -0.5437
## [19,]      -0.5713 -0.0978     0.0601  0.6634  0.0477
## [20,]      -0.5767 -0.4150    -0.2963  0.0030 -0.2468
## [21,]      -0.7512 -0.7323    -0.2747 -0.0218  0.3804
## [22,]      -0.5413 -0.4150    -0.3576  0.4632  0.0621
## [23,]      -0.7226 -0.5737    -0.4563 -0.5233 -0.5975
## [24,]      -0.7294 -0.5737    -0.4193 -0.0177  0.1188
## [25,]      -0.6626 -0.7323    -0.4175 -0.7235 -0.2265
## [26,]      -0.6340 -0.7323    -0.3720 -0.4944 -0.2809
## [27,]      -0.6258 -0.0978    -0.2742 -1.1755 -0.5379
## [28,]      -0.6353 -0.2564    -0.1165 -0.8329 -0.4821
## [29,]      -0.7499  0.3781    -0.3625 -1.1734 -0.6263
## [30,]      -0.7240 -0.4150     2.1780  1.4332 -0.5533
## [31,]       0.3257  0.6954     4.4472  2.0833 -0.3827
## [32,]      -0.5849 -0.5737     0.7770  2.2794 -0.2534
## [33,]       0.4798  1.9644     1.3011  2.3434  0.8507
## [34,]      -0.6244 -0.2564     2.1987  0.5932 -0.1972
## [35,]      -0.6844 -0.7323    -0.5284  0.6345 -0.4562
## [36,]      -0.7062 -0.7323    -0.6730 -0.8329 -0.6882
## [37,]       0.1090 -0.4150    -0.5522 -0.3004 -0.6882
## [38,]       0.0476 -1.0496    -0.5797 -0.5481 -0.6882
## [39,]       0.9924  0.6954    -0.4112  0.1289 -0.3782
## [40,]      -0.1964 -0.0978    -0.4626 -0.2963 -0.4121
## [41,]       0.4593  0.6954    -0.3432 -0.0610 -0.2967
## [42,]       2.4265  1.9644    -0.5221 -0.4057  1.3598
## [43,]       0.0013 -0.5737    -0.5910 -1.3488 -0.3397
## [44,]       1.8349  0.2195    -0.4973 -1.1156  0.0948
## [45,]       0.5684  1.1713    -0.7275 -0.2695  0.0269
## [46,]      -0.5358 -0.5737    -0.6248 -1.2580 -0.3850
## [47,]       0.2480 -0.4150    -0.6694 -1.0207 -0.4823
## [48,]       1.3059 -0.4150    -0.5441 -1.3447  0.3369
## [49,]      -0.0778  0.0608    -0.3585 -0.8164  3.2823
## [50,]       3.5171  3.0748     0.7221  1.6788  3.2879
## [51,]      -0.3218  0.3781    -0.1566  0.2630  3.2653
## [52,]       1.0810  0.6954    -0.3441  0.1020  0.7624
## [53,]       0.7129 -0.5737    -0.4657 -0.0280 -0.3986
## [54,]       0.4689 -0.5737    -0.5572 -0.0734 -0.5592
## [55,]       0.1635 -0.8909    -0.4414  0.3187 -0.6882
## [56,]       0.7633 -0.7323    -0.5644 -0.3004 -0.5728
## [57,]      -0.3368  0.3781    -0.5973 -0.7256 -0.2832
## [58,]       0.2439 -0.5737    -0.5658 -0.8122  0.3844
## [59,]       0.0013  0.5367    -0.5522  0.1247 -0.0320
## [60,]       0.3830  2.1230    -0.3616  0.8079  3.2936
## [61,]      -0.2087 -0.7323    -0.4121 -1.7534 -0.3918
## [62,]      -0.4418 -0.8909    -0.4986 -2.1393 -0.4529
## [63,]      -0.3368 -0.2564    -0.3576 -1.2291 -0.5366
## [64,]      -0.3545 -0.8909    -0.6081 -1.2622 -0.4280
## [65,]       0.2549  0.0608     1.1799  1.2825 -0.2107
## [66,]       0.1840 -0.5737     1.0979  1.2825 -0.4370
## [67,]       2.4129  1.1713     1.8333  1.9657  1.1584
## [68,]      -0.0410 -1.0496     1.0159  0.2403  0.0359
## [69,]      -0.2482 -0.8909     1.6882  0.7047  0.3844
## [70,]      -0.3709 -0.2564     2.6493  1.1917  0.0359
## [71,]       0.4430 -0.4150     1.4985  1.8501 -0.4370
## [72,]       4.8122  5.1370     2.8570  2.1411  3.2993
## [73,]      -0.4554 -0.4150    -0.1737  0.7583  0.9932
## attr(,"scaled:center")
## Kriminalitas       Jum.RS   Padat.Pend          IPM          UMK 
## 6.750548e+02 7.616438e+00 1.892562e+03 7.136562e+01 1.867701e+06 
## attr(,"scaled:scale")
## Kriminalitas       Jum.RS   Padat.Pend          IPM          UMK 
## 7.335450e+02 6.303945e+00 2.219230e+03 4.845347e+00 5.199989e+05

Menentukan Jumlah Cluster

jumlah_klaster <- c(1:9) # Vektor yang berisikan jumlah klaster yang ingin dilihat nilai dari total within-cluster sum of squares

within_ss <- c() #Vektor kosong yang akan diisi nilai total within-cluster sum of squares

for (i in jumlah_klaster) {
  within_ss <- c(within_ss, kmeans(x = data_standardized, centers = i, 
                                   nstart = 25)$tot.withinss)
}

plot(x = jumlah_klaster, y = within_ss, type = "b", 
     xlab = "Number of Cluster", 
     ylab = "Total Within Sum of Squares", 
     main = "Elbow Plot") 
abline(v = 7, col='red')

Dari Elbow Plot di atas, dapat dilihat bahwa pada titik ke tujuh nilai total WSS mulai menunjukkan penurunan yang kurang berarti, sehingga berdasarkan plot tersebut akan ditentukan jumlah klaster yang akan dibentuk adalah tujuh klaster.

Analisis Cluster

set.seed(123)
kmeans_clustering <- kmeans(x = data_standardized, centers = 7, nstart = 25) #parameter nstart digunakan untuk memberitahu fungsi berapa kali inisiasi centroid awal (secara acak) yang akan dibentuk.

kmeans_clustering
## K-means clustering with 7 clusters of sizes 9, 3, 20, 3, 8, 2, 28
## 
## Cluster means:
##   Kriminalitas     Jum.RS Padat.Pend         IPM        UMK
## 1 -0.190166667 -0.4855444  1.5870444  1.20640000 -0.1813778
## 2 -0.005533333  0.8539667 -0.2922333  0.08483333  3.2804000
## 3 -0.369370000 -0.2008950 -0.3604250  0.26878500 -0.2267200
## 4  1.072800000  1.2770333  2.5272000  2.13080000  0.5421333
## 5  1.121025000  0.7945375 -0.4590375 -0.37238750  0.1712375
## 6  4.164650000  4.1059000  1.7895500  1.90995000  3.2936000
## 7 -0.407153571 -0.4490357 -0.4888000 -0.84718214 -0.4734857
## 
## Clustering vector:
##  [1] 3 5 7 7 7 3 7 7 3 3 3 7 3 3 7 7 7 3 3 3 3 3 7 3 7 7 7 7 7 1 4 1 4 1 3 7 7 7
## [39] 5 3 5 5 7 5 5 7 7 5 2 6 2 5 3 3 3 7 7 7 3 2 7 7 7 7 1 1 4 1 1 1 1 6 3
## 
## Within cluster sum of squares by cluster:
## [1]  9.475125  4.117009 13.178720 10.603871 12.497746  5.350801 15.614834
##  (between_SS / total_SS =  80.3 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
# X = miu + Z*sigma

Dari output : K-means clustering with 7 clusters of sizes 9, 3, 20, 3, 8, 2, 28

dapat di ketahui bahwa klaster pertama beranggotakan 9 observasi, klaster kedua 3 observasi, klaster ketiga 20 observasi, klaster ke empat 3 observasi, klaster kelima 8 observasi, klaster keenam 2 observasi, dan terakhir klaster ketujuh 28 observasi.

Centroid

menghitung nilai rata-rata cluster dari setiap variabel

Data_jateng %>% 
  mutate(Klaster = kmeans_clustering$cluster) %>% 
  group_by(Klaster) %>%
  summarise(Mean_kriminalitas = mean(Kriminalitas), Mean_jumlahRS = mean(Jum.RS), Mean_kepadatanpnddk = mean(Padat.Pend), Mean_IPM = mean(IPM), Mean_UMK = mean(UMK))
## # A tibble: 7 x 6
##   Klaster Mean_kriminalitas Mean_jumlahRS Mean_kepadatanpnddk Mean_IPM Mean_UMK
##     <int>             <dbl>         <dbl>               <dbl>    <dbl>    <dbl>
## 1       1              536.          4.56               5415.     77.2 1773388.
## 2       2              671          13                  1244      71.8 3573506.
## 3       3              404.          6.35               1093.     72.7 1749804.
## 4       4             1462          15.7                7501      81.7 2149620.
## 5       5             1497.         12.6                 874.     69.6 1956743.
## 6       6             3730          33.5                5864      80.6 3580371.
## 7       7              376.          4.79                808.     67.3 1621483.

Pengelompokkan objek ke dalam cluster

D = Data_jateng %>%
  mutate(Klaster = kmeans_clustering$cluster) %>%
  select(Kabkota, Klaster) %>%
  arrange(Klaster)

D
## # A tibble: 73 x 2
##    Kabkota          Klaster
##    <chr>              <int>
##  1 Kota_Magelang          1
##  2 Kota_Salatiga          1
##  3 Kota_Pekalongan        1
##  4 Kota_Kediri            1
##  5 Kota_Blitar            1
##  6 Kota_Probolinggo       1
##  7 Kota_Pasuruan          1
##  8 Kota_Mojokerto         1
##  9 Kota_Madiun            1
## 10 Pasuruan               2
## # ... with 63 more rows

Penerapan 5 K-Means

CLUSTERING DENGAN METODE K-MEANS

library(NbClust)

Tentukan jumlah cluster dengan elbow method

#menentukan jumlah cluster k
# Elbow method
fviz_nbclust(zjet, kmeans, method = "wss") +geom_vline(xintercept = 4, linetype = 2)+labs(subtitle = "Metode Elbow")

# Banyak =4 (Tergantung interpretasi analis)

EVALUASI JUMLAH CLUSTER BERDASARKAN METODE GAP STATISTIK

# Gap statistic with nboot = 50 to keep the function speedy.
# recommended value: nboot= 500 for your analysis.
# Use verbose = FALSE to hide computing progression.
set.seed(123)
fviz_nbclust(zjet, kmeans, nstart = 25, method = "gap_stat", nboot = 50)+labs(subtitle = "Metode Gap Statistik")

lakukan K-Means

# K-means clustering
km.res <- eclust(zjet, "kmeans", k = 3, nstart = 25,graph = FALSE)

# Gambar Klaster

fviz_cluster(km.res, geom = "point", ellipse.type = "norm", palette = "jco", ggtheme = theme_minimal())

EVALUASI BERDASARKAN KOEFISIEN SILUET

require("cluster")
sil <- silhouette(km.res$cluster, dist(zjet))
fviz_silhouette(sil)
##   cluster size ave.sil.width
## 1       1    9          0.32
## 2       2    8          0.13
## 3       3    5          0.44

# Silhouette information
silinfo <- km.res$silinfo
names(silinfo)
## [1] "widths"          "clus.avg.widths" "avg.width"
# Silhouette widths of each observation
head(silinfo$widths[, 1:3], 10)
##         cluster neighbor sil_width
## F11F-1        1        2 0.4168754
## F-8A          1        3 0.4128519
## F-86A         1        2 0.4121347
## F3H-2         1        2 0.3994896
## F-89A         1        3 0.3448718
## XF10F-1       1        3 0.2903407
## F-102A        1        2 0.2469427
## F9F-6         1        3 0.1992800
## F-100A        1        2 0.1913406
## F-104B        2        1 0.3511907
# Average silhouette width of each cluster
silinfo$clus.avg.widths
## [1] 0.3237919 0.1304181 0.4447598
# The total average (mean of all individual silhouette widths)
silinfo$avg.width
## [1] 0.2809669
# The size of each clusters
km.res$size
## [1] 9 8 5
# Silhouette width of observation
sil <- km.res$silinfo$widths[, 1:3]
# Objects with negative silhouette
neg_sil_index <- which(sil[, 'sil_width'] < 0)
sil[neg_sil_index, , drop = FALSE]
##        cluster neighbor   sil_width
## F-101A       2        1 -0.04272323
## F4D-1        2        1 -0.07872371

Penerapan Data

SOAL

  • File crime.csv adalah file berisi data jumlah kejahatan di 50 negara bagian di Amerika Serikat berdasarkan jenis kejahatannya, yaitu:
  1. Murder : pembunuhan
  2. Rape : pemerkosaan
  3. Robbery : perampokan
  4. Assault : penyerangan
  5. Burglary : perampokan
  6. Theft : pencurian
  7. Vehicles : pencurian kendaraan bermotor
  • Lakukan analisis Cluster pada data tersebut!

LOAD DATASET

crime<-read.csv2(file="crime.csv",sep=",")
crime
##    City Murder Rape Robbery Assault Burglary Theft Vehicle Type
## 1    AK    8.6 72.7      88     401     1162  3910     604    2
## 2    AL   10.1 28.4     112     408     1159  2304     267    2
## 3    AR    8.1 28.9      80     278     1030  2305     195    1
## 4    AZ    9.3   43     169     437     1908  4337     419    2
## 5    CA   11.3 44.9     343     521     1696  3384     762    2
## 6    CO      7 42.3     145     329     1792  4231     486    2
## 7    CT    4.6 23.8     192     205     1198  2758     447    1
## 8    DC     31 52.4     754     668     1728  4131     975    2
## 9    DE    4.9 56.9     124     241     1042  3090     272    1
## 10   FL   11.7 52.7     367     605     2221  4373     598    2
## 11   GA   11.2 43.9     214     319     1453  2984     430    2
## 12   HI    4.8   31     106     103     1339  3759     328    1
## 13   IA    1.8 12.5      42     179      956  2801     158    2
## 14   ID    3.2   20      21     178     1003  2800     181    2
## 15   IL    8.9 32.4     325     434     1180  2938     628    2
## 16   IN      6 25.9      90     186      887  2333     328    1
## 17   KS    4.4 32.9      80     252     1188  3008     258    1
## 18   KY    6.7 23.1      83     222      824  1740     193    2
## 19   LA   12.8 40.1     224     482     1461  3417     442    2
## 20   MA    3.6 29.7     193     331     1071  2189     906    2
## 21   MD      9 43.6     304     476     1296  2978     545    1
## 22   ME      2 14.8      28     102      803  2347     164    2
## 23   MI   11.3 67.4     301     424     1509  3378     800    2
## 24   MN    2.5 31.8     102     148     1004  2785     288    1
## 25   MO    9.2 29.2     170     370     1136  2500     439    2
## 26   MS   11.2 25.8      65     172     1076  1845     150    2
## 27   MT    2.9 17.3      20     118      783  3314     215    1
## 28   NC    8.1 26.4      88     354     1225  2423     208    2
## 29   ND      1 11.6       7      32      385  2049     120    1
## 30   NE    3.1 24.6      51     184      748  2677     168    1
## 31   NH    2.2 21.5      24      92      755  2208     228    2
## 32   NJ    5.2 33.2     269     265     1071  2822     776    2
## 33   NM   11.5 46.9     130     538     1845  3712     343    2
## 34   NV   12.6 64.9     287     354     1604  3489     478    2
## 35   NY   10.7 30.5     514     431     1221  2924     637    2
## 36   OH    5.5 38.6     142     235      988  2574     376    1
## 37   OK    8.1 36.4     107     285     1787  3142     649    2
## 38   OR    6.6 51.1     206     286     1967  4163     402    2
## 39   PA    5.5 25.1     152     176      735  1654     354    2
## 40   RI    3.5 21.4     119     192     1294  2568     705    1
## 41   SC    8.6 41.3      99     525     1340  2846     277    2
## 42   SD      4 17.7      16      87      554  1939      99    1
## 43   TN   10.4   47     208     274     1325  2126     544    2
## 44   TX   13.5 51.6     240     354     2049  3987     714    2
## 45   UT    3.2 25.3      59     180      915  4074     223    1
## 46   VA    7.1 26.5     106     167      813  2522     219    1
## 47   VT      2 21.8      22     103      949  2697     181    1
## 48   WA      5 53.4     135     244     1861  4267     315    2
## 49   WI    3.1 20.1      73     162      783  2802     254    1
## 50   WV    5.9 18.9      41      99      625  1358     169    1
## 51   WY    5.3 21.9      22     243      817  3078     169    1
str(crime)
## 'data.frame':    51 obs. of  9 variables:
##  $ City    : chr  "AK" "AL" "AR" "AZ" ...
##  $ Murder  : chr  "8.6" "10.1" "8.1" "9.3" ...
##  $ Rape    : chr  "72.7" "28.4" "28.9" "43" ...
##  $ Robbery : int  88 112 80 169 343 145 192 754 124 367 ...
##  $ Assault : int  401 408 278 437 521 329 205 668 241 605 ...
##  $ Burglary: int  1162 1159 1030 1908 1696 1792 1198 1728 1042 2221 ...
##  $ Theft   : int  3910 2304 2305 4337 3384 4231 2758 4131 3090 4373 ...
##  $ Vehicle : int  604 267 195 419 762 486 447 975 272 598 ...
##  $ Type    : int  2 2 1 2 2 2 1 2 1 2 ...

PRA PEMROSESAN

crime$Murder<-as.numeric(crime$Murder)
crime$Rape<-as.numeric(crime$Rape)
crime$Type<-factor(as.character(crime$Type),level = c("1","2"),labels = c("Tipe 1","Tipe 2"))
head(crime)
##   City Murder Rape Robbery Assault Burglary Theft Vehicle   Type
## 1   AK    8.6 72.7      88     401     1162  3910     604 Tipe 2
## 2   AL   10.1 28.4     112     408     1159  2304     267 Tipe 2
## 3   AR    8.1 28.9      80     278     1030  2305     195 Tipe 1
## 4   AZ    9.3 43.0     169     437     1908  4337     419 Tipe 2
## 5   CA   11.3 44.9     343     521     1696  3384     762 Tipe 2
## 6   CO    7.0 42.3     145     329     1792  4231     486 Tipe 2

MELIHAT ANGGOTA SETIAP TIPE CITY

library(tidyverse)
crime %>% filter(Type=="Tipe 1") %>% select(City)
##    City
## 1    AR
## 2    CT
## 3    DE
## 4    HI
## 5    IN
## 6    KS
## 7    MD
## 8    MN
## 9    MT
## 10   ND
## 11   NE
## 12   OH
## 13   RI
## 14   SD
## 15   UT
## 16   VA
## 17   VT
## 18   WI
## 19   WV
## 20   WY
crime %>% filter(Type=="Tipe 2") %>% select(City)
##    City
## 1    AK
## 2    AL
## 3    AZ
## 4    CA
## 5    CO
## 6    DC
## 7    FL
## 8    GA
## 9    IA
## 10   ID
## 11   IL
## 12   KY
## 13   LA
## 14   MA
## 15   ME
## 16   MI
## 17   MO
## 18   MS
## 19   NC
## 20   NH
## 21   NJ
## 22   NM
## 23   NV
## 24   NY
## 25   OK
## 26   OR
## 27   PA
## 28   SC
## 29   TN
## 30   TX
## 31   WA

PILIH VARIABEL

crime_sel<-crime %>% select(-c(Type))
crime_sel<-data.frame(crime_sel,row.names = 1)
crime_sel
##    Murder Rape Robbery Assault Burglary Theft Vehicle
## AK    8.6 72.7      88     401     1162  3910     604
## AL   10.1 28.4     112     408     1159  2304     267
## AR    8.1 28.9      80     278     1030  2305     195
## AZ    9.3 43.0     169     437     1908  4337     419
## CA   11.3 44.9     343     521     1696  3384     762
## CO    7.0 42.3     145     329     1792  4231     486
## CT    4.6 23.8     192     205     1198  2758     447
## DC   31.0 52.4     754     668     1728  4131     975
## DE    4.9 56.9     124     241     1042  3090     272
## FL   11.7 52.7     367     605     2221  4373     598
## GA   11.2 43.9     214     319     1453  2984     430
## HI    4.8 31.0     106     103     1339  3759     328
## IA    1.8 12.5      42     179      956  2801     158
## ID    3.2 20.0      21     178     1003  2800     181
## IL    8.9 32.4     325     434     1180  2938     628
## IN    6.0 25.9      90     186      887  2333     328
## KS    4.4 32.9      80     252     1188  3008     258
## KY    6.7 23.1      83     222      824  1740     193
## LA   12.8 40.1     224     482     1461  3417     442
## MA    3.6 29.7     193     331     1071  2189     906
## MD    9.0 43.6     304     476     1296  2978     545
## ME    2.0 14.8      28     102      803  2347     164
## MI   11.3 67.4     301     424     1509  3378     800
## MN    2.5 31.8     102     148     1004  2785     288
## MO    9.2 29.2     170     370     1136  2500     439
## MS   11.2 25.8      65     172     1076  1845     150
## MT    2.9 17.3      20     118      783  3314     215
## NC    8.1 26.4      88     354     1225  2423     208
## ND    1.0 11.6       7      32      385  2049     120
## NE    3.1 24.6      51     184      748  2677     168
## NH    2.2 21.5      24      92      755  2208     228
## NJ    5.2 33.2     269     265     1071  2822     776
## NM   11.5 46.9     130     538     1845  3712     343
## NV   12.6 64.9     287     354     1604  3489     478
## NY   10.7 30.5     514     431     1221  2924     637
## OH    5.5 38.6     142     235      988  2574     376
## OK    8.1 36.4     107     285     1787  3142     649
## OR    6.6 51.1     206     286     1967  4163     402
## PA    5.5 25.1     152     176      735  1654     354
## RI    3.5 21.4     119     192     1294  2568     705
## SC    8.6 41.3      99     525     1340  2846     277
## SD    4.0 17.7      16      87      554  1939      99
## TN   10.4 47.0     208     274     1325  2126     544
## TX   13.5 51.6     240     354     2049  3987     714
## UT    3.2 25.3      59     180      915  4074     223
## VA    7.1 26.5     106     167      813  2522     219
## VT    2.0 21.8      22     103      949  2697     181
## WA    5.0 53.4     135     244     1861  4267     315
## WI    3.1 20.1      73     162      783  2802     254
## WV    5.9 18.9      41      99      625  1358     169
## WY    5.3 21.9      22     243      817  3078     169

SKALA ATAU STANDARISASI

crime_scl<-scale(crime_sel)
crime_scl
##         Murder        Rape     Robbery     Assault    Burglary        Theft
## AK  0.28006198  2.64103441 -0.47960926  0.79309857 -0.10687230  1.268026694
## AL  0.59146811 -0.39926368 -0.30546457  0.84028794 -0.11398472 -0.835659642
## AR  0.17625994 -0.36494881 -0.53765749 -0.03608599 -0.41981894 -0.834349750
## AZ  0.42538484  0.60273072  0.10812905  1.03578673  1.66175038  1.827350520
## CA  0.84059301  0.73312725  1.37067802  1.60205912  1.15913911  0.579023573
## CO -0.05210455  0.55468989 -0.06601563  0.30772225  1.38673666  1.688501982
## CT -0.55035436 -0.71496055  0.27501771 -0.52820365 -0.02152322 -0.240968735
## DC  4.93039348  1.24785040  4.35290576  2.59303578  1.23500496  1.557512796
## DE -0.48807313  1.55668430 -0.21839223 -0.28551549 -0.39136925  0.193915364
## FL  0.92363464  1.26843933  1.54482271  2.16833150  2.40381324  1.874506627
## GA  0.81983260  0.66449750  0.43465034  0.24030887  0.58303279  0.055066827
## HI -0.50883354 -0.22082633 -0.34900075 -1.21582011  0.31276069  1.070233022
## IA -1.13164579 -1.49047677 -0.81338657 -0.70347843 -0.59525872 -0.184643385
## ID -0.84100008 -0.97575362 -0.96576317 -0.71021977 -0.48383075 -0.185953277
## IL  0.34234321 -0.12474467  1.24006951  1.01556272 -0.06419776 -0.005188199
## IN -0.25970864 -0.57083807 -0.46509720 -0.65628907 -0.75884447 -0.797672777
## KS -0.59187517 -0.09042979 -0.53765749 -0.21136077 -0.04523130  0.086504231
## KY -0.11438578 -0.76300138 -0.51588940 -0.41360091 -0.90820536 -1.574438653
## LA  1.15199914  0.40370443  0.50721062  1.33914694  0.60199926  0.622250004
## MA -0.75795844 -0.31004501  0.28227374  0.32120492 -0.32261582 -0.986297206
## MD  0.36310361  0.64390857  1.08769291  1.29869891  0.21081596  0.047207475
## ME -1.09012498 -1.33262834 -0.91497097 -1.22256145 -0.95799233 -0.779334291
## MI  0.84059301  2.27729671  1.06592482  0.94814934  0.71579803  0.571164221
## MN -0.98632294 -0.16592252 -0.37802486 -0.91245991 -0.48145994 -0.205601654
## MO  0.40462443 -0.34435988  0.11538508  0.58411710 -0.16851330 -0.578920836
## MS  0.81983260 -0.57770104 -0.64649792 -0.75066780 -0.31076178 -1.436900008
## MT -0.90328130 -1.16105395 -0.97301920 -1.11470004 -1.00540849  0.487331142
## NC  0.17625994 -0.53652319 -0.47960926  0.47625569  0.04248860 -0.679782510
## ND -1.29772906 -1.55224355 -1.06734757 -1.69445510 -1.94899002 -1.169682067
## NE -0.86176048 -0.66005675 -0.74808232 -0.66977174 -1.08838676 -0.347069976
## NH -1.04860416 -0.87280899 -0.94399509 -1.28997483 -1.07179111 -0.961409261
## NJ -0.42579191 -0.06984087  0.83373191 -0.12372338 -0.32261582 -0.157135655
## NM  0.88211383  0.87038676 -0.17485606  1.71666186  1.51238948  1.008668104
## NV  1.11047832  2.10572233  0.96434042  0.47625569  0.94102478  0.716562218
## NY  0.71603056 -0.25514120  2.61145891  0.99533871  0.03300536 -0.023526685
## OH -0.36351068  0.30075980 -0.08778372 -0.32596351 -0.51939287 -0.481988838
## OK  0.17625994  0.14977434 -0.34174472  0.01110338  1.37488262  0.262029741
## OR -0.13514619  1.15863172  0.37660211  0.01784472  1.80162804  1.599429335
## PA -0.36351068 -0.62574187 -0.01522343 -0.72370245 -1.11920726 -1.687089354
## RI -0.77871885 -0.87967196 -0.25467237 -0.61584104  0.20607434 -0.489848189
## SC  0.28006198  0.48606014 -0.39979295  1.62902447  0.31513150 -0.125698251
## SD -0.67491681 -1.13360205 -1.00204332 -1.32368152 -1.54832349 -1.313770172
## TN  0.65374933  0.87724973  0.39111417 -0.06305134  0.27956938 -1.068820394
## TX  1.29732200  1.19294660  0.62330708  0.47625569  1.99603429  1.368888367
## UT -0.84100008 -0.61201592 -0.69003409 -0.69673710 -0.69246185  1.482848959
## VA -0.03134415 -0.52966022 -0.34900075 -0.78437449 -0.93428425 -0.550103215
## VT -1.09012498 -0.85222006 -0.95850715 -1.21582011 -0.61185438 -0.320872139
## WA -0.46731272  1.31648016 -0.13857592 -0.26529147  1.55032241  1.735658089
## WI -0.86176048 -0.96889064 -0.58844969 -0.81808118 -1.00540849 -0.183333493
## WV -0.28046905 -1.05124635 -0.82064260 -1.24278546 -1.37999613 -2.074817346
## WY -0.40503150 -0.84535708 -0.95850715 -0.27203281 -0.92480102  0.178196662
##        Vehicle
## AK  0.93978076
## AL -0.56721792
## AR -0.88918797
## AZ  0.11249661
## CA  1.64632613
## CO  0.41210763
## CT  0.23770719
## DC  2.59882085
## DE -0.54485889
## FL  0.91294992
## GA  0.16168648
## HI -0.29443775
## IA -1.05464480
## ID -0.95179326
## IL  1.04710410
## IN -0.29443775
## KS -0.60746418
## KY -0.89813158
## LA  0.21534816
## MA  2.29026622
## MD  0.67594419
## ME -1.02781396
## MI  1.81625477
## MN -0.47330999
## MO  0.20193274
## MS -1.09041925
## MT -0.79975185
## NC -0.83105449
## ND -1.22457343
## NE -1.00992674
## NH -0.74161837
## NJ  1.70893142
## NM -0.22736066
## NV  0.37633318
## NY  1.08735036
## OH -0.07979105
## OK  1.14101203
## OR  0.03647591
## PA -0.17817079
## RI  1.39143318
## SC -0.52249986
## SD -1.31848136
## TN  0.67147239
## TX  1.43167944
## UT -0.76397740
## VA -0.78186462
## VT -0.95179326
## WA -0.35257123
## WI -0.62535140
## WV -1.00545493
## WY -1.00545493
## attr(,"scaled:center")
##     Murder       Rape    Robbery    Assault   Burglary      Theft    Vehicle 
##    7.25098   34.21765  154.09804  283.35294 1207.07843 2941.96078  393.84314 
## attr(,"scaled:scale")
##     Murder       Rape    Robbery    Assault   Burglary      Theft    Vehicle 
##   4.816861  14.570940 137.816437 148.338508 421.797148 763.421796 223.623288
cor(crime_scl)
##             Murder      Rape   Robbery   Assault  Burglary     Theft   Vehicle
## Murder   1.0000000 0.5784284 0.8039630 0.7812251 0.5807205 0.3609238 0.5726158
## Rape     0.5784284 1.0000000 0.5298971 0.6586707 0.7213333 0.6347214 0.5693911
## Robbery  0.8039630 0.5298971 1.0000000 0.7403202 0.5514852 0.3995175 0.7857589
## Assault  0.7812251 0.6586707 0.7403202 1.0000000 0.7104563 0.5120665 0.6377937
## Burglary 0.5807205 0.7213333 0.5514852 0.7104563 1.0000000 0.7640201 0.5785400
## Theft    0.3609238 0.6347214 0.3995175 0.5120665 0.7640201 1.0000000 0.3858359
## Vehicle  0.5726158 0.5693911 0.7857589 0.6377937 0.5785400 0.3858359 1.0000000
heatmap(cor(crime_scl),main="Heatmap Korelasi")

PEMILIHAN CLUSTER BERDASARKAN METODE ELBOW

k<-c(1:10) 
ss<-rep(0,10) 
for(i in 1:10){ 
  klaster_cek<-eclust(crime_scl,"kmeans",k=k[i],nstart=25,graph = FALSE) 
  ss[i]<-klaster_cek$tot.withinss 
} 

plot(ss,xlab="K",ylab="SSW",type="b",main="Metode Elbow")+abline(v=2,col="red")

## integer(0)
  • INTERPRETASI Berdasarkan metode elbow, maka banyak cluster terpilih adalah 2

PEMILIHAN BANYAK CLUSTER DENGAN METODE SILUET

fviz_nbclust(crime_scl, kmeans, method = "silhouette")+ labs(subtitle = "Silhouette method")

* INTERPRETASI Berdasarkan metode siluet, maka banyak cluster terpilih adalah 2

PEMILIHAN BANYAK CLUSTER DENGAN GAP STATISTIK

fviz_nbclust(crime_scl, kmeans, nstart = 25, method = "gap_stat", nboot = 50)+ labs(subtitle = "Metode Gap Statistik")

#### CLUSTERING

# K-means clustering
crime_cluster<- eclust(crime_scl, "kmeans", k = 2, nstart = 25,graph = FALSE)
fviz_cluster(crime_cluster, geom = "point", ellipse.type = "norm",
palette = "jco", ggtheme = theme_minimal())

# Tambahkan cluster ke crime
crime$cluster<-crime_cluster$cluster
crime$cluster<-as.factor(crime$cluster)
table(crime$Type,crime$cluster)
##         
##           1  2
##   Tipe 1 19  1
##   Tipe 2 11 20

DUNN INDEX

library(fpc)
km_stats <- cluster.stats(dist(crime_scl), crime_cluster$cluster)
# Dun index
km_stats$dunn
## [1] 0.1598884

EXTERNAL VALIDATION

table(crime$Type,crime$cluster)
##         
##           1  2
##   Tipe 1 19  1
##   Tipe 2 11 20
chisq.test(table(crime$Type,crime$cluster))
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(crime$Type, crime$cluster)
## X-squared = 15.406, df = 1, p-value = 8.671e-05
  • Interpretasi Terdapat perbedaan antara grup asli dengan grup hasil cluster

RAND INDEX

# Compute cluster stats
tipe <- as.numeric(crime$Type)
clust_stats<- cluster.stats(d = dist(crime_scl),tipe, crime_cluster$cluster)
# Corrected Rand index
clust_stats$corrected.rand
## [1] 0.2656452
  • Interpretasi Hasil rand index sebesar 0.26

VISUALISASI KARAKTERISTIK SETIAP CLUSTER

cluster1<-crime %>% filter(cluster==1)
cluster2<-crime %>% filter(cluster==2)
crime %>% group_by(cluster)%>% summarise(`rata-rata kejahatan`=mean(Murder))%>% 
  ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Pembunuhan")

 crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Rape))%>% 
  ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Pemerkosaan")

 crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Robbery))%>% 
  ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Perampokan")

 crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Assault))%>% 
  ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Penyerangan")

 crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Burglary))%>% 
  ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Perampokan (Burglary)")

crime %>% group_by(cluster) %>% summarise(`rata-rata kejahatan`=mean(Vehicle))%>% 
  ggplot(aes(x=cluster,y=`rata-rata kejahatan`,fill=cluster))+geom_bar(stat = "identity")+theme(legend.position = "none")+ggtitle("Rata-Rata Jumlah Kejahatan Pencurian Kendaraan Bermotor")

  • Interpretasi Secara keseluruhan , dapat disimpulkan bahwa Kota yang termasuk cluster 2 memiliki jumlah kasus kejahatan lebih tinggi dibandingkan kota yang termasuk dalam cluster 1

Tugas PPT

Nomor 1

Pembahasan

Nomor 2

Pembahasan

Soal UAS

SOal UAS 1

Dibahas waktu praktikum

Soal UAS 2

Manual K-Means

Muh.Nurtanzis Sutoyo dalam “Algoritma K-Means”

Load Data

kdata = matrix(c(6,6.7,7.4,6.7,9.2,7.4,9.3,4.5,6.4,8.5,6.9,5.8,6.3,6.4,3.9,
                 2.92,3.07,3.22,2.93,3.03,3.29,3.28,2.72,2.92,3.49,3.08,2.83,3.18,3.2,3.29),15,2)
kdata = as.data.frame(kdata)
kdata
##     V1   V2
## 1  6.0 2.92
## 2  6.7 3.07
## 3  7.4 3.22
## 4  6.7 2.93
## 5  9.2 3.03
## 6  7.4 3.29
## 7  9.3 3.28
## 8  4.5 2.72
## 9  6.4 2.92
## 10 8.5 3.49
## 11 6.9 3.08
## 12 5.8 2.83
## 13 6.3 3.18
## 14 6.4 3.20
## 15 3.9 3.29
summary(kdata)
##        V1             V2       
##  Min.   :3.90   Min.   :2.720  
##  1st Qu.:6.15   1st Qu.:2.925  
##  Median :6.70   Median :3.080  
##  Mean   :6.76   Mean   :3.097  
##  3rd Qu.:7.40   3rd Qu.:3.250  
##  Max.   :9.30   Max.   :3.490

Hitung Jarak

library(dplyr)
kdata1=mutate(.data=kdata, Jarak_C1 = sqrt((V1-3.9)^2+(V2-2.7)^2),
              Jarak_C2 = sqrt((V1-6.6)^2+(V2-3.1)^2),
              Jarak_C3 = sqrt((V1-9.3)^2+(V2-3.5)^2))
kdata1
##     V1   V2  Jarak_C1  Jarak_C2  Jarak_C3
## 1  6.0 2.92 2.1114924 0.6264184 3.3505820
## 2  6.7 3.07 2.8243406 0.1044031 2.6353178
## 3  7.4 3.22 3.5384177 0.8089499 1.9205208
## 4  6.7 2.93 2.8094305 0.1972308 2.6617475
## 5  9.2 3.03 5.3102636 2.6009421 0.4805206
## 6  7.4 3.29 3.5493802 0.8222530 1.9115700
## 7  9.3 3.28 5.4310588 2.7059933 0.2200000
## 8  4.5 2.72 0.6003332 2.1341040 4.8629621
## 9  6.4 2.92 2.5096613 0.2690725 2.9574313
## 10 8.5 3.49 4.6673440 1.9396134 0.8000625
## 11 6.9 3.08 3.0239709 0.3006659 2.4364729
## 12 5.8 2.83 1.9044422 0.8443341 3.5635516
## 13 6.3 3.18 2.4475294 0.3104835 3.0170184
## 14 6.4 3.20 2.5495098 0.2236068 2.9154759
## 15 3.9 3.29 0.5900000 2.7066769 5.4040818

*Dengan memperhatikan jarak terkecil didapatkan :

Hitung Pusat Cluster Baru

gg = data.frame(kdata1$Jarak_C1,kdata1$Jarak_C2,kdata1$Jarak_C3)
gg
##    kdata1.Jarak_C1 kdata1.Jarak_C2 kdata1.Jarak_C3
## 1        2.1114924       0.6264184       3.3505820
## 2        2.8243406       0.1044031       2.6353178
## 3        3.5384177       0.8089499       1.9205208
## 4        2.8094305       0.1972308       2.6617475
## 5        5.3102636       2.6009421       0.4805206
## 6        3.5493802       0.8222530       1.9115700
## 7        5.4310588       2.7059933       0.2200000
## 8        0.6003332       2.1341040       4.8629621
## 9        2.5096613       0.2690725       2.9574313
## 10       4.6673440       1.9396134       0.8000625
## 11       3.0239709       0.3006659       2.4364729
## 12       1.9044422       0.8443341       3.5635516
## 13       2.4475294       0.3104835       3.0170184
## 14       2.5495098       0.2236068       2.9154759
## 15       0.5900000       2.7066769       5.4040818
for (i in 1:15) {
 a[i]= min(gg[i,])
 print(a[i])
}
## [1] 0.6264184
## [1] 0.1044031
## [1] 0.8089499
## [1] 0.1972308
## [1] 0.4805206
## [1] 0.822253
## [1] 0.22
## [1] 0.6003332
## [1] 0.2690725
## [1] 0.8000625
## [1] 0.3006659
## [1] 0.8443341
## [1] 0.3104835
## [1] 0.2236068
## [1] 0.59

Iterasi Kedua

kdata2=mutate(.data=kdata, Jarak_C1 = sqrt((V1-4.2)^2+(V2-2.72)^2),
              Jarak_C2 = sqrt((V1-6.6)^2+(V2-3.06)^2),
              Jarak_C3 = sqrt((V1-9)^2+(V2-3.26)^2))
kdata2
##     V1   V2  Jarak_C1  Jarak_C2  Jarak_C3
## 1  6.0 2.92 1.8110770 0.6161169 3.0192052
## 2  6.7 3.07 2.5243811 0.1004988 2.3078345
## 3  7.4 3.22 3.2388269 0.8158431 1.6004999
## 4  6.7 2.93 2.5088045 0.1640122 2.3235533
## 5  9.2 3.03 5.0096008 2.6001731 0.3047950
## 6  7.4 3.29 3.2503692 0.8324062 1.6002812
## 7  9.3 3.28 5.1306530 2.7089481 0.3006659
## 8  4.5 2.72 0.3000000 2.1273458 4.5322842
## 9  6.4 2.92 2.2090722 0.2441311 2.6221365
## 10 8.5 3.49 4.3683979 1.9480503 0.5503635
## 11 6.9 3.08 2.7238943 0.3006659 2.1077002
## 12 5.8 2.83 1.6037768 0.8324062 3.2287614
## 13 6.3 3.18 2.1497907 0.3231099 2.7011849
## 14 6.4 3.20 2.2517549 0.2441311 2.6006922
## 15 3.9 3.29 0.6441273 2.7097786 5.1000882

## Iterasi ke Tiga

kdata3=mutate(.data=kdata, Jarak_C1 = sqrt((V1-4.2)^2+(V2-2.72)^2),
              Jarak_C2 = sqrt((V1-6.6)^2+(V2-3.06)^2),
              Jarak_C3 = sqrt((V1-9)^2+(V2-3.26)^2))
kdata3
##     V1   V2  Jarak_C1  Jarak_C2  Jarak_C3
## 1  6.0 2.92 1.8110770 0.6161169 3.0192052
## 2  6.7 3.07 2.5243811 0.1004988 2.3078345
## 3  7.4 3.22 3.2388269 0.8158431 1.6004999
## 4  6.7 2.93 2.5088045 0.1640122 2.3235533
## 5  9.2 3.03 5.0096008 2.6001731 0.3047950
## 6  7.4 3.29 3.2503692 0.8324062 1.6002812
## 7  9.3 3.28 5.1306530 2.7089481 0.3006659
## 8  4.5 2.72 0.3000000 2.1273458 4.5322842
## 9  6.4 2.92 2.2090722 0.2441311 2.6221365
## 10 8.5 3.49 4.3683979 1.9480503 0.5503635
## 11 6.9 3.08 2.7238943 0.3006659 2.1077002
## 12 5.8 2.83 1.6037768 0.8324062 3.2287614
## 13 6.3 3.18 2.1497907 0.3231099 2.7011849
## 14 6.4 3.20 2.2517549 0.2441311 2.6006922
## 15 3.9 3.29 0.6441273 2.7097786 5.1000882