Clustering merupakan salah satu metode machine learning dan termasuk dalam unsupervised learning. Unsupervised learning adalah metode machine learning dimana dalam data yang akan dianalisis tidak terdapat target variabel. Dalam unsupervised learning lebih fokus dalam melakukan eksplorasi data seperti mencari pola dalam data. Clustering sendiri bertujuan dalam mencari pola data yang mirip sehingga memiliki kemungkinan dalam mengelompokkan data-data yang mirip tersebut. Dalam yang telah dikelompokkan dalam clusterung biasanya disebut juga sebagai cluster. Dalam menentukan cluster yang baik adalah ketika suatu anggota dalam cluster memiliki kemiripan semirip mungkin sedangkan antar anggota cluster memiliki perbedaan yang cukup signifikan. Clustering banyak digunakan dalam berbagai bidang seperti segmentasi customer, rekomendasi produk, profiling data, dan masih banyak lagi.
Fuzzy C-Means (FCM) adalah Teknik clustering yang digunakan untuk pengelompokan data yang tidak hanya memiliki nilai yang sama didalam suatu cluster, tetapi juga kelompok nilai-nilai yang memiliki dua atau lebih kelompok menurut level keanggotaannya.
FCM termasuk kedalam soft clustering karena penentuan keanggotaannya berdasarkan derajat keanggotaan.
\[ \sum_{j=1}^k u_{i j}=1 \]
Pengulangan Lakukan langkah u1(a1), u1(a2), u1(a3), u1(a4) u2(a1), u2(a2), u2(a3), u2(a4) Pengulangan akan tetap berlanjut apabila : a)Perubahan pada nilai fungsi objektif masih di atas nilai ambang batas yang di tentukan ; atau b) perubahan pada nilai centroid masih di atas nilai ambang yang di tentukan ; atau c) Iterasi maksimal belom tercapai.
Hitung nilai centroid dari setiap cluster menggunakan persamaan:
\[ C_{i j}=\frac{\sum_{i=1}^N\left(U_{i j}\right)^w X_{i j}}{\sum_{i=1}^N\left(U_{i j}\right)^w} \]
\[ U_{i j}=\frac{D\left(x_i, c_j\right)^{\frac{-2}{w-1}}}{\sum_{i=1}^k D\left(x_i, c_j\right)^{\frac{-2}{w-1}}} \]
Sebelum memasukkan data, kita perlu memanggil library terlebih dahulu dan meng install beberapa packages yang belum tersedia.
require(ppclust)
## Loading required package: ppclust
## Warning: package 'ppclust' was built under R version 4.1.3
require(factoextra)
## Loading required package: factoextra
## Warning: package 'factoextra' was built under R version 4.1.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.1.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
require(dplyr)
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 4.1.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(cluster)
## Loading required package: cluster
require(fclust)
## Loading required package: fclust
## Warning: package 'fclust' was built under R version 4.1.3
Pertama, dan bagian terpenting adalah memanggil data. Karena data dalam format excel, maka data diekspor dalam R menggunakan fungsi read.excel(). Pada kali ini menggunakan data set anemia. Pastikan variabel datanya numerik.
Untuk variabel 1 Gender, 1 = Laki-laki, 0 = Perempuan.
Untuk variabel 6 Result, 1 = Terkena anemia, 0 = Tidak terkena anemia.
library(readxl)
anemia <- read_excel("anemia.xlsx", col_types = c("numeric",
"numeric", "numeric", "numeric", "numeric",
"numeric"))
Selanjutnya menampilkan data set.
anemia
Kemudian menghapus Variabel kolom pertama karena variabel gender tidak diperlukan. Dari data anemia ini kita hanya akan tampilkan n=50 dari 1421.
x=anemia[,-1]
x[1:50,]
Sebelum data diproses kita perlu memeriksa apakah ada nilai yang hilang dengan menggunakan fungsi colSums().
colSums(is.na(x))
## Hemoglobin MCH MCHC MCV Result
## 0 0 0 0 0
Dapat dilihat bahwa tidak ada nilai yang hilang. Jadi, kita tidak perlu Pra-pemrosesan lebih lanjut.
Di sini, di bawah ini adalah ringkasan variabel dari data set anemia. Di mana, minimum, maksimum, median, dan rata-rata, kuartil 1 dan kuartil 3 dihitung untuk setiap variabel dan ditampilkan.
summary(x)
## Hemoglobin MCH MCHC MCV
## Min. : 6.60 Min. :16.00 Min. :27.80 Min. : 69.40
## 1st Qu.:11.70 1st Qu.:19.40 1st Qu.:29.00 1st Qu.: 77.30
## Median :13.20 Median :22.70 Median :30.40 Median : 85.30
## Mean :13.41 Mean :22.91 Mean :30.25 Mean : 85.52
## 3rd Qu.:15.00 3rd Qu.:26.20 3rd Qu.:31.40 3rd Qu.: 94.20
## Max. :16.90 Max. :30.00 Max. :32.50 Max. :101.60
## Result
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4363
## 3rd Qu.:1.0000
## Max. :1.0000
Tahap ini untuk mengetahui korelasi antar variabel.
cor(x)
## Hemoglobin MCH MCHC MCV Result
## Hemoglobin 1.00000000 0.01408126 -0.04259699 -0.02588463 -0.79626101
## MCH 0.01408126 1.00000000 0.01879519 -0.01594830 -0.02867752
## MCHC -0.04259699 0.01879519 1.00000000 0.06845045 0.04806695
## MCV -0.02588463 -0.01594830 0.06845045 1.00000000 -0.02057051
## Result -0.79626101 -0.02867752 0.04806695 -0.02057051 1.00000000
require(psych)
## Loading required package: psych
## Warning: package 'psych' was built under R version 4.1.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## The following object is masked from 'package:ppclust':
##
## pca
pairs.panels(x, method = "pearson")
Fuzzy C-Means (FCM) ini didasarkan pada Teori Logika Fuzzy (Lotfi Zadeh-1965) Keanggotaan data tidak harus bernilai 0 dan 1, tetapi dengan nilai derajat keanggotaannya berada di jangkauan antara 0 sampai 1.
0 = sama sekali bukan anggota; 1 = anggota secara penuh.
“Semakin tinggi nilai keanggotaan, semakin tinggi pula derajatnya.”
res.fcm <- fcm(x, centers=3)
Menampilkan Cluster/kelompok
as.data.frame(res.fcm$u)[1:50,]
res.fcm$v0
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 10.6 29.5 30.4 85.0 1
## Cluster 2 14.8 20.4 28.5 91.1 0
## Cluster 3 13.4 17.7 30.2 79.0 1
res.fcm$v
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 13.39443 23.66008 30.10574 85.54657 0.4104841
## Cluster 2 13.40115 22.41175 30.42231 96.96000 0.4180492
## Cluster 3 13.44944 22.57629 30.20883 74.53194 0.4620961
summary(res.fcm)
## Summary for 'res.fcm'
##
## Number of data objects: 1421
##
## Number of clusters: 3
##
## Crisp clustering vector:
## [1] 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2
## [38] 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1
## [75] 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3
## [112] 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3
## [149] 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3
## [186] 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2
## [223] 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3
## [260] 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2
## [297] 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3
## [334] 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3
## [371] 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1
## [408] 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1
## [445] 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1
## [482] 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3
## [519] 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1
## [556] 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3
## [593] 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3
## [630] 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3
## [667] 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1
## [704] 3 3 1 2 3 1 3 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2
## [741] 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1
## [778] 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3
## [815] 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2
## [852] 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1
## [889] 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2
## [926] 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1
## [963] 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3
## [1000] 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2
## [1037] 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1
## [1074] 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2
## [1111] 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3
## [1148] 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
## [1185] 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1
## [1222] 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2
## [1259] 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1
## [1296] 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3
## [1333] 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1
## [1370] 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2
## [1407] 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
##
## Initial cluster prototypes:
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 10.6 29.5 30.4 85.0 1
## Cluster 2 14.8 20.4 28.5 91.1 0
## Cluster 3 13.4 17.7 30.2 79.0 1
##
## Final cluster prototypes:
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 13.39443 23.66008 30.10574 85.54657 0.4104841
## Cluster 2 13.40115 22.41175 30.42231 96.96000 0.4180492
## Cluster 3 13.44944 22.57629 30.20883 74.53194 0.4620961
##
## Distance between the final cluster prototypes
## Cluster 1 Cluster 2
## Cluster 2 131.9249
## Cluster 3 122.5130 503.0945
##
## Difference between the initial and final cluster prototypes
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 2.79442779 -5.839917 -0.294256064 0.5465723 -0.5895159
## Cluster 2 -1.39885014 2.011749 1.922306799 5.8599962 0.4180492
## Cluster 3 0.04943926 4.876290 0.008834311 -4.4680567 -0.5379039
##
## Root Mean Squared Deviations (RMSD): 6.605387
## Mean Absolute Deviation (MAD): 52.69361
##
## Membership degrees matrix (top and bottom 5 rows):
## Cluster 1 Cluster 2 Cluster 3
## 1 0.88341422 0.03815800 0.07842778
## 2 0.10639458 0.03247906 0.86112636
## 3 0.11935117 0.04015667 0.84049216
## 4 0.55350325 0.27515474 0.17134201
## 5 0.06081164 0.91958343 0.01960493
## ...
## Cluster 1 Cluster 2 Cluster 3
## 1417 0.74595758 0.07407359 0.17996884
## 1418 0.75742631 0.13989007 0.10268362
## 1419 0.46106847 0.10007707 0.43885445
## 1420 0.20957580 0.72334558 0.06707862
## 1421 0.05358702 0.93030349 0.01610948
##
## Descriptive statistics for the membership degrees by clusters
## Size Min Q1 Mean Median Q3 Max
## Cluster 1 459 0.4610685 0.6016129 0.6923465 0.6866724 0.7861956 0.9728791
## Cluster 2 465 0.4431725 0.7146659 0.7832870 0.8251251 0.8862383 0.9857232
## Cluster 3 497 0.4622971 0.6836670 0.7682988 0.7941730 0.8749778 0.9913936
##
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized
## 0.6325960 0.4488939
##
## Within cluster sum of squares by cluster:
## 1 2 3
## 13914.23 14600.97 16330.89
## (between_SS / total_SS = 73.01%)
##
## Available components:
## [1] "u" "v" "v0" "d" "x"
## [6] "cluster" "csize" "sumsqrs" "k" "m"
## [11] "iter" "best.start" "func.val" "comp.time" "inpargs"
## [16] "algorithm" "call"
Fungsi fcm dapat dimulai beberapa kali Untuk menemukan solusi optimal.
res.fcm <- fcm(x, centers=3, nstart=5)
res.fcm$func.val
## [1] 31100.48 31100.48 31100.48 31100.48 31100.48
res.fcm$iter
## [1] 69 66 75 67 66
res.fcm$best.start
## [1] 1
summary(res.fcm)
## Summary for 'res.fcm'
##
## Number of data objects: 1421
##
## Number of clusters: 3
##
## Crisp clustering vector:
## [1] 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2
## [38] 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1
## [75] 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3
## [112] 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3
## [149] 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3
## [186] 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2
## [223] 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3
## [260] 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2
## [297] 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3
## [334] 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3
## [371] 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1
## [408] 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1
## [445] 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1 3 3 1 2 3 1 3 3 1 3 3 1 2 3 1
## [482] 2 3 1 1 2 2 3 2 1 1 3 2 3 3 3 2 3 2 1 3 3 2 1 1 1 2 3 1 3 2 1 1 1 1 3 2 3
## [519] 1 3 1 2 2 2 1 1 3 1 2 2 3 2 2 3 2 2 1 2 2 3 2 3 3 3 3 2 3 1 3 1 2 3 3 2 1
## [556] 2 2 3 2 1 2 1 2 3 1 3 1 3 1 3 3 3 2 2 2 2 2 2 2 3 2 2 3 3 3 2 1 2 3 2 3 3
## [593] 3 1 2 2 1 1 1 3 1 3 3 1 2 2 3 2 1 3 3 1 1 2 3 3 1 1 2 1 2 3 2 3 3 3 1 2 3
## [630] 2 1 3 3 3 1 1 1 3 3 1 3 2 3 1 1 3 3 3 3 3 3 2 2 3 3 3 3 2 3 2 3 1 2 2 2 3
## [667] 1 3 3 2 1 1 2 1 3 1 1 3 1 3 1 1 3 3 3 1 1 2 1 1 1 2 3 1 3 2 3 3 3 1 3 1 1
## [704] 3 3 1 2 3 1 3 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2
## [741] 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1
## [778] 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3
## [815] 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2
## [852] 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1
## [889] 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2
## [926] 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1
## [963] 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3
## [1000] 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2
## [1037] 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1
## [1074] 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2
## [1111] 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3
## [1148] 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
## [1185] 1 2 3 1 2 1 1 3 2 1 3 1 1 2 1 1 3 3 3 3 1 1 2 3 3 1 2 2 3 2 2 1 1 2 2 2 1
## [1222] 1 3 2 3 2 3 2 1 1 1 2 2 3 3 3 2 3 2 1 2 3 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 2
## [1259] 2 2 1 2 2 3 3 3 2 1 3 3 2 1 2 1 1 1 2 1 3 1 1 1 3 2 1 2 3 3 2 3 2 2 1 3 1
## [1296] 2 2 1 3 1 1 3 2 2 1 2 2 3 3 1 2 2 2 3 3 1 1 1 2 2 3 3 2 2 2 2 2 1 3 1 2 3
## [1333] 1 3 2 2 1 3 1 2 2 3 3 1 1 2 2 1 1 3 1 2 1 1 3 1 1 2 3 2 1 1 2 1 1 1 1 1 1
## [1370] 3 3 3 2 3 1 3 2 3 3 3 3 2 3 3 2 1 3 1 1 2 3 3 2 2 3 3 1 3 2 2 3 3 1 3 2 2
## [1407] 1 3 2 2 3 3 2 2 3 2 1 1 1 2 2
##
## Initial cluster prototypes:
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 12.7 24.1 28.7 97.2 1
## Cluster 2 12.9 27.4 28.2 78.8 1
## Cluster 3 12.7 19.1 29.8 84.3 1
##
## Final cluster prototypes:
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 13.39443 23.66008 30.10574 85.54657 0.4104841
## Cluster 2 13.40115 22.41175 30.42231 96.96000 0.4180492
## Cluster 3 13.44944 22.57629 30.20883 74.53194 0.4620961
##
## Distance between the final cluster prototypes
## Cluster 1 Cluster 2
## Cluster 2 131.9249
## Cluster 3 122.5130 503.0945
##
## Difference between the initial and final cluster prototypes
## Hemoglobin MCH MCHC MCV Result
## Cluster 1 0.6944278 -0.4399175 1.4057439 -11.653428 -0.5895159
## Cluster 2 0.5011499 -4.9882512 2.2223068 18.159996 -0.5819508
## Cluster 3 0.7494393 3.4762904 0.4088343 -9.768057 -0.5379039
##
## Root Mean Squared Deviations (RMSD): 14.23044
## Mean Absolute Deviation (MAD): 93.62869
##
## Membership degrees matrix (top and bottom 5 rows):
## Cluster 1 Cluster 2 Cluster 3
## 1 0.88341422 0.03815800 0.07842778
## 2 0.10639458 0.03247906 0.86112636
## 3 0.11935117 0.04015667 0.84049216
## 4 0.55350325 0.27515474 0.17134201
## 5 0.06081164 0.91958343 0.01960493
## ...
## Cluster 1 Cluster 2 Cluster 3
## 1417 0.74595758 0.07407359 0.17996884
## 1418 0.75742631 0.13989007 0.10268362
## 1419 0.46106847 0.10007707 0.43885445
## 1420 0.20957580 0.72334558 0.06707862
## 1421 0.05358702 0.93030349 0.01610948
##
## Descriptive statistics for the membership degrees by clusters
## Size Min Q1 Mean Median Q3 Max
## Cluster 1 459 0.4610685 0.6016129 0.6923465 0.6866724 0.7861956 0.9728791
## Cluster 2 465 0.4431725 0.7146659 0.7832870 0.8251251 0.8862383 0.9857232
## Cluster 3 497 0.4622971 0.6836670 0.7682988 0.7941730 0.8749778 0.9913936
##
## Dunn's Fuzziness Coefficients:
## dunn_coeff normalized
## 0.6325960 0.4488939
##
## Within cluster sum of squares by cluster:
## 1 2 3
## 13914.23 14600.97 16330.89
## (between_SS / total_SS = 73.01%)
##
## Available components:
## [1] "u" "v" "v0" "d" "x"
## [6] "cluster" "csize" "sumsqrs" "k" "m"
## [11] "iter" "best.start" "func.val" "comp.time" "inpargs"
## [16] "algorithm" "call"
Plotcluster() dapat digunakan untuk memplot hasil pengelompokan. Ada beberapa Plot yaitu sebagai berikut:
plotcluster(res.fcm, cp=1, trans=TRUE)
res.fcm2 <- ppclust2(res.fcm, "kmeans")
fviz_cluster(res.fcm2, data = x,
ellipse.type = "convex",
palette = "jco",
repel = TRUE)
## Warning: ggrepel: 1399 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
res.fcm3 <- ppclust2(res.fcm, "fanny")
cluster::clusplot(scale(x), res.fcm3$cluster,
main = "Cluster plot of anemia data set",
color=TRUE, labels = 2, lines = 2, cex=1)
Tahap ini merupakan proses evaluasi terhadap hasil clustering. Disini terdapat Partition Entropy (PE), Partition Coefficient (PC) dan Modified Partition Coefficient (MPC) dan Fuzzy Silhouette Index.
res.fcm4 <- ppclust2(res.fcm, "fclust")
idxsf <- SIL.F(res.fcm4$Xca, res.fcm4$U, alpha=1)
idxpe <- PE(res.fcm4$U)
idxpc <- PC(res.fcm4$U)
idxmpc <- MPC(res.fcm4$U)
cat("Partition Entropy: ", idxpe)
## Partition Entropy: 0.6471561
cat("Partition Coefficient: ", idxpc)
## Partition Coefficient: 0.632596
cat("Modified Partition Coefficient: ", idxmpc)
## Modified Partition Coefficient: 0.4488939
cat("Fuzzy Silhouette Index: ", idxsf)
## Fuzzy Silhouette Index: 0.6497773