Pengertian Analisis Cluster

Clustering merupakan salah satu metode machine learning dan termasuk dalam unsupervised learning. Unsupervised learning adalah metode machine learning dimana dalam data yang akan dianalisis tidak terdapat target variabel. Dalam unsupervised learning lebih fokus dalam melakukan eksplorasi data seperti mencari pola dalam data. Clustering sendiri bertujuan dalam mencari pola data yang mirip sehingga memiliki kemungkinan dalam mengelompokkan data-data yang mirip tersebut. Dalam yang telah dikelompokkan dalam clusterung biasanya disebut juga sebagai cluster. Dalam menentukan cluster yang baik adalah ketika suatu anggota dalam cluster memiliki kemiripan semirip mungkin sedangkan antar anggota cluster memiliki perbedaan yang cukup signifikan. Clustering banyak digunakan dalam berbagai bidang seperti segmentasi customer, rekomendasi produk, profiling data, dan masih banyak lagi.

Algoritma Fuzzy C-Means

Fuzzy C-Means (FCM) adalah Teknik clustering yang digunakan untuk pengelompokan data yang tidak hanya memiliki nilai yang sama didalam suatu cluster, tetapi juga kelompok nilai-nilai yang memiliki dua atau lebih kelompok menurut level keanggotaannya.

FCM termasuk kedalam soft clustering karena penentuan keanggotaannya berdasarkan derajat keanggotaan.

Tahapan Algoritma Fuzzy C-Means

Input data
Inisialisasi, menentukan : a)Jumlah Cluster ( k ≥ 2 ); b)Bobot pangkat (w > 1 ); c)Jumlah maksimal Iterasi; d)Ambang batas perubahan nilai fungsi objektif (Error terkecil yang diharapkan).
Inisialisasi Matriks Pseudo-Partition Memberikan nilai secara acak pada matriks Fuzzy Pseudo partition dengan syarat antara u (u1) , i (u2) dan j (u3) berjumlah satu, Σuij = 1

\[ \sum_{j=1}^k u_{i j}=1 \]

Pengulangan Lakukan langkah u1(a1), u1(a2), u1(a3), u1(a4) u2(a1), u2(a2), u2(a3), u2(a4) Pengulangan akan tetap berlanjut apabila : a)Perubahan pada nilai fungsi objektif masih di atas nilai ambang batas yang di tentukan ; atau b) perubahan pada nilai centroid masih di atas nilai ambang yang di tentukan ; atau c) Iterasi maksimal belom tercapai.
Hitung nilai centroid dari setiap cluster menggunakan persamaan:

\[ C_{i j}=\frac{\sum_{i=1}^N\left(U_{i j}\right)^w X_{i j}}{\sum_{i=1}^N\left(U_{i j}\right)^w} \]

Hitung ulang matriks fuzzy pseudo-partition (derajat keanggotaan setiap data pada setiap cluster) menggunakan persamaan :

\[ U_{i j}=\frac{D\left(x_i, c_j\right)^{\frac{-2}{w-1}}}{\sum_{i=1}^k D\left(x_i, c_j\right)^{\frac{-2}{w-1}}} \]

Eksperimen Algoritma Fuzzy C-Means

Library

Sebelum memasukkan data, kita perlu memanggil library terlebih dahulu dan meng install beberapa packages yang belum tersedia.

require(ppclust)

## Loading required package: ppclust

## Warning: package 'ppclust' was built under R version 4.1.3

require(factoextra)

## Loading required package: factoextra

## Warning: package 'factoextra' was built under R version 4.1.3

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.1.3

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

require(dplyr)

## Loading required package: dplyr

## Warning: package 'dplyr' was built under R version 4.1.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

require(cluster)

## Loading required package: cluster

require(fclust)

## Loading required package: fclust

## Warning: package 'fclust' was built under R version 4.1.3

Input Dataset

Pertama, dan bagian terpenting adalah memanggil data. Karena data dalam format excel, maka data diekspor dalam R menggunakan fungsi read.excel(). Pada kali ini menggunakan data set anemia. Pastikan variabel datanya numerik.

Untuk variabel 1 Gender, 1 = Laki-laki, 0 = Perempuan.

Untuk variabel 6 Result, 1 = Terkena anemia, 0 = Tidak terkena anemia.

library(readxl)
anemia <- read_excel("anemia.xlsx", col_types = c("numeric", 
    "numeric", "numeric", "numeric", "numeric", 
    "numeric"))

Selanjutnya menampilkan data set.

anemia

Kemudian menghapus Variabel kolom pertama karena variabel gender tidak diperlukan. Dari data anemia ini kita hanya akan tampilkan n=50 dari 1421.

x=anemia[,-1]
x[1:50,]

Pra-pemrosesan Data

Sebelum data diproses kita perlu memeriksa apakah ada nilai yang hilang dengan menggunakan fungsi colSums().

colSums(is.na(x))

## Hemoglobin        MCH       MCHC        MCV     Result 
##          0          0          0          0          0

Dapat dilihat bahwa tidak ada nilai yang hilang. Jadi, kita tidak perlu Pra-pemrosesan lebih lanjut.

Summary of Data

Di sini, di bawah ini adalah ringkasan variabel dari data set anemia. Di mana, minimum, maksimum, median, dan rata-rata, kuartil 1 dan kuartil 3 dihitung untuk setiap variabel dan ditampilkan.

summary(x)

##    Hemoglobin         MCH             MCHC            MCV        
##  Min.   : 6.60   Min.   :16.00   Min.   :27.80   Min.   : 69.40  
##  1st Qu.:11.70   1st Qu.:19.40   1st Qu.:29.00   1st Qu.: 77.30  
##  Median :13.20   Median :22.70   Median :30.40   Median : 85.30  
##  Mean   :13.41   Mean   :22.91   Mean   :30.25   Mean   : 85.52  
##  3rd Qu.:15.00   3rd Qu.:26.20   3rd Qu.:31.40   3rd Qu.: 94.20  
##  Max.   :16.90   Max.   :30.00   Max.   :32.50   Max.   :101.60  
##      Result      
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4363  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

Korelasi

Tahap ini untuk mengetahui korelasi antar variabel.

cor(x)

##             Hemoglobin         MCH        MCHC         MCV      Result
## Hemoglobin  1.00000000  0.01408126 -0.04259699 -0.02588463 -0.79626101
## MCH         0.01408126  1.00000000  0.01879519 -0.01594830 -0.02867752
## MCHC       -0.04259699  0.01879519  1.00000000  0.06845045  0.04806695
## MCV        -0.02588463 -0.01594830  0.06845045  1.00000000 -0.02057051
## Result     -0.79626101 -0.02867752  0.04806695 -0.02057051  1.00000000

require(psych)

## Loading required package: psych

## Warning: package 'psych' was built under R version 4.1.3

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

## The following object is masked from 'package:ppclust':
## 
##     pca

pairs.panels(x, method = "pearson")

Fuzzy Membership Matrix

Fuzzy C-Means (FCM) ini didasarkan pada Teori Logika Fuzzy (Lotfi Zadeh-1965) Keanggotaan data tidak harus bernilai 0 dan 1, tetapi dengan nilai derajat keanggotaannya berada di jangkauan antara 0 sampai 1.

0 = sama sekali bukan anggota; 1 = anggota secara penuh.

“Semakin tinggi nilai keanggotaan, semakin tinggi pula derajatnya.”

Pembagian Cluster/kelompok = 3

res.fcm <- fcm(x, centers=3)

Menampilkan Cluster/kelompok

as.data.frame(res.fcm$u)[1:50,]

Matriks prototypes cluster awal dan akhir

res.fcm$v0

##           Hemoglobin  MCH MCHC  MCV Result
## Cluster 1       10.6 29.5 30.4 85.0      1
## Cluster 2       14.8 20.4 28.5 91.1      0
## Cluster 3       13.4 17.7 30.2 79.0      1

res.fcm$v

##           Hemoglobin      MCH     MCHC      MCV    Result
## Cluster 1   13.39443 23.66008 30.10574 85.54657 0.4104841
## Cluster 2   13.40115 22.41175 30.42231 96.96000 0.4180492
## Cluster 3   13.44944 22.57629 30.20883 74.53194 0.4620961

summary(res.fcm)

## Summary for 'res.fcm'
## 
## Number of data objects:  1421 
## 
## Number of clusters:  3 
## 
## Crisp clustering vector:
##    [1] 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2
##   [38] 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1
##   [75] 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3
##  [112] 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3
##  [149] 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3
##  [186] 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2
##  [223] 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3
##  [260] 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2
##  [297] 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3
##  [334] 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3
##  [371] 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1
##  [408] 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1
##  [445] 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1
##  [482] 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3
##  [519] 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1
##  [556] 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3
##  [593] 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3
##  [630] 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3
##  [667] 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1
##  [704] 3 3 1 2 3 1 3 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2
##  [741] 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1
##  [778] 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3
##  [815] 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2
##  [852] 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1
##  [889] 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2
##  [926] 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1
##  [963] 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3
## [1000] 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2
## [1037] 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1
## [1074] 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2
## [1111] 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3
## [1148] 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
## [1185] 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1
## [1222] 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2
## [1259] 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1
## [1296] 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3
## [1333] 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1
## [1370] 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2
## [1407] 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
## 
## Initial cluster prototypes:
##           Hemoglobin  MCH MCHC  MCV Result
## Cluster 1       10.6 29.5 30.4 85.0      1
## Cluster 2       14.8 20.4 28.5 91.1      0
## Cluster 3       13.4 17.7 30.2 79.0      1
## 
## Final cluster prototypes:
##           Hemoglobin      MCH     MCHC      MCV    Result
## Cluster 1   13.39443 23.66008 30.10574 85.54657 0.4104841
## Cluster 2   13.40115 22.41175 30.42231 96.96000 0.4180492
## Cluster 3   13.44944 22.57629 30.20883 74.53194 0.4620961
## 
## Distance between the final cluster prototypes
##           Cluster 1 Cluster 2
## Cluster 2  131.9249          
## Cluster 3  122.5130  503.0945
## 
## Difference between the initial and final cluster prototypes
##            Hemoglobin       MCH         MCHC        MCV     Result
## Cluster 1  2.79442779 -5.839917 -0.294256064  0.5465723 -0.5895159
## Cluster 2 -1.39885014  2.011749  1.922306799  5.8599962  0.4180492
## Cluster 3  0.04943926  4.876290  0.008834311 -4.4680567 -0.5379039
## 
## Root Mean Squared Deviations (RMSD): 6.605387 
## Mean Absolute Deviation (MAD): 52.69361 
## 
## Membership degrees matrix (top and bottom 5 rows): 
##    Cluster 1  Cluster 2  Cluster 3
## 1 0.88341422 0.03815800 0.07842778
## 2 0.10639458 0.03247906 0.86112636
## 3 0.11935117 0.04015667 0.84049216
## 4 0.55350325 0.27515474 0.17134201
## 5 0.06081164 0.91958343 0.01960493
## ...
##       Cluster 1  Cluster 2  Cluster 3
## 1417 0.74595758 0.07407359 0.17996884
## 1418 0.75742631 0.13989007 0.10268362
## 1419 0.46106847 0.10007707 0.43885445
## 1420 0.20957580 0.72334558 0.06707862
## 1421 0.05358702 0.93030349 0.01610948
## 
## Descriptive statistics for the membership degrees by clusters
##           Size       Min        Q1      Mean    Median        Q3       Max
## Cluster 1  459 0.4610685 0.6016129 0.6923465 0.6866724 0.7861956 0.9728791
## Cluster 2  465 0.4431725 0.7146659 0.7832870 0.8251251 0.8862383 0.9857232
## Cluster 3  497 0.4622971 0.6836670 0.7682988 0.7941730 0.8749778 0.9913936
## 
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized 
##  0.6325960  0.4488939 
## 
## Within cluster sum of squares by cluster:
##        1        2        3 
## 13914.23 14600.97 16330.89 
## (between_SS / total_SS =  73.01%) 
## 
## Available components: 
##  [1] "u"          "v"          "v0"         "d"          "x"         
##  [6] "cluster"    "csize"      "sumsqrs"    "k"          "m"         
## [11] "iter"       "best.start" "func.val"   "comp.time"  "inpargs"   
## [16] "algorithm"  "call"

FCM dengan Multiple Start

Fungsi fcm dapat dimulai beberapa kali Untuk menemukan solusi optimal.

res.fcm <- fcm(x, centers=3, nstart=5)

res.fcm$func.val

## [1] 31100.48 31100.48 31100.48 31100.48 31100.48

res.fcm$iter

## [1] 69 66 75 67 66

res.fcm$best.start

## [1] 1

summary(res.fcm)

## Summary for 'res.fcm'
## 
## Number of data objects:  1421 
## 
## Number of clusters:  3 
## 
## Crisp clustering vector:
##    [1] 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2
##   [38] 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1
##   [75] 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3
##  [112] 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3
##  [149] 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3
##  [186] 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2
##  [223] 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3
##  [260] 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2
##  [297] 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3
##  [334] 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3
##  [371] 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1
##  [408] 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1
##  [445] 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1
##  [482] 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3
##  [519] 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1
##  [556] 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3
##  [593] 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3
##  [630] 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3
##  [667] 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1
##  [704] 3 3 1 2 3 1 3 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2
##  [741] 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1
##  [778] 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3
##  [815] 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2
##  [852] 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1
##  [889] 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2
##  [926] 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1
##  [963] 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3
## [1000] 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2
## [1037] 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1
## [1074] 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2
## [1111] 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3
## [1148] 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
## [1185] 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1
## [1222] 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2
## [1259] 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1
## [1296] 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3
## [1333] 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1
## [1370] 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2
## [1407] 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
## 
## Initial cluster prototypes:
##           Hemoglobin  MCH MCHC  MCV Result
## Cluster 1       12.7 24.1 28.7 97.2      1
## Cluster 2       12.9 27.4 28.2 78.8      1
## Cluster 3       12.7 19.1 29.8 84.3      1
## 
## Final cluster prototypes:
##           Hemoglobin      MCH     MCHC      MCV    Result
## Cluster 1   13.39443 23.66008 30.10574 85.54657 0.4104841
## Cluster 2   13.40115 22.41175 30.42231 96.96000 0.4180492
## Cluster 3   13.44944 22.57629 30.20883 74.53194 0.4620961
## 
## Distance between the final cluster prototypes
##           Cluster 1 Cluster 2
## Cluster 2  131.9249          
## Cluster 3  122.5130  503.0945
## 
## Difference between the initial and final cluster prototypes
##           Hemoglobin        MCH      MCHC        MCV     Result
## Cluster 1  0.6944278 -0.4399175 1.4057439 -11.653428 -0.5895159
## Cluster 2  0.5011499 -4.9882512 2.2223068  18.159996 -0.5819508
## Cluster 3  0.7494393  3.4762904 0.4088343  -9.768057 -0.5379039
## 
## Root Mean Squared Deviations (RMSD): 14.23044 
## Mean Absolute Deviation (MAD): 93.62869 
## 
## Membership degrees matrix (top and bottom 5 rows): 
##    Cluster 1  Cluster 2  Cluster 3
## 1 0.88341422 0.03815800 0.07842778
## 2 0.10639458 0.03247906 0.86112636
## 3 0.11935117 0.04015667 0.84049216
## 4 0.55350325 0.27515474 0.17134201
## 5 0.06081164 0.91958343 0.01960493
## ...
##       Cluster 1  Cluster 2  Cluster 3
## 1417 0.74595758 0.07407359 0.17996884
## 1418 0.75742631 0.13989007 0.10268362
## 1419 0.46106847 0.10007707 0.43885445
## 1420 0.20957580 0.72334558 0.06707862
## 1421 0.05358702 0.93030349 0.01610948
## 
## Descriptive statistics for the membership degrees by clusters
##           Size       Min        Q1      Mean    Median        Q3       Max
## Cluster 1  459 0.4610685 0.6016129 0.6923465 0.6866724 0.7861956 0.9728791
## Cluster 2  465 0.4431725 0.7146659 0.7832870 0.8251251 0.8862383 0.9857232
## Cluster 3  497 0.4622971 0.6836670 0.7682988 0.7941730 0.8749778 0.9913936
## 
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized 
##  0.6325960  0.4488939 
## 
## Within cluster sum of squares by cluster:
##        1        2        3 
## 13914.23 14600.97 16330.89 
## (between_SS / total_SS =  73.01%) 
## 
## Available components: 
##  [1] "u"          "v"          "v0"         "d"          "x"         
##  [6] "cluster"    "csize"      "sumsqrs"    "k"          "m"         
## [11] "iter"       "best.start" "func.val"   "comp.time"  "inpargs"   
## [16] "algorithm"  "call"

Visualization

Plotcluster() dapat digunakan untuk memplot hasil pengelompokan. Ada beberapa Plot yaitu sebagai berikut:

Pairwise Scatter Plots

plotcluster(res.fcm, cp=1, trans=TRUE)

Cluster Plot with fviz_cluster

res.fcm2 <- ppclust2(res.fcm, "kmeans")
fviz_cluster(res.fcm2, data = x, 
  ellipse.type = "convex",
  palette = "jco",
  repel = TRUE)

## Warning: ggrepel: 1399 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Cluster Plot with clusplot

res.fcm3 <- ppclust2(res.fcm, "fanny")

cluster::clusplot(scale(x), res.fcm3$cluster,  
  main = "Cluster plot of anemia data set",
  color=TRUE, labels = 2, lines = 2, cex=1)

Validation of Results

Tahap ini merupakan proses evaluasi terhadap hasil clustering. Disini terdapat Partition Entropy (PE), Partition Coefficient (PC) dan Modified Partition Coefficient (MPC) dan Fuzzy Silhouette Index.

res.fcm4 <- ppclust2(res.fcm, "fclust")
idxsf <- SIL.F(res.fcm4$Xca, res.fcm4$U, alpha=1)
idxpe <- PE(res.fcm4$U)
idxpc <- PC(res.fcm4$U)
idxmpc <- MPC(res.fcm4$U)

cat("Partition Entropy: ", idxpe)

## Partition Entropy:  0.6471561

cat("Partition Coefficient: ", idxpc)

## Partition Coefficient:  0.632596

cat("Modified Partition Coefficient: ", idxmpc)

## Modified Partition Coefficient:  0.4488939

cat("Fuzzy Silhouette Index: ", idxsf)

## Fuzzy Silhouette Index:  0.6497773

ALGORITMA FUZZY C-MEANS DENGAN R

Annisa Suherman-Institut Teknologi Statistika dan Bisnis Muhammadiyah Semarang

2023-01-21