This time we will discuss about data ‘Recapitulation of Health HR for each Province that is utilized in Health Service Facilities (Fasyankes) in 2018’. With the aim of grouping the availability of health facilities and services in each province using the clustering method, namely “k-Means” and “Self Organizing.”
Data Source: (http://bppsdmk.kemkes.go.id/info_sdmk/history/#)
The data consists of 34 rows and 11 columns, namely:
Nama.Provinsi: is the name of each province
Jumlah Puskesmas: is data on the number of Public Health in each province.
Total.RS: is data on the number of hospitals in each province.
Dokter.Spesialis: is data on the number of specialist doctors in each province.
Dokter.Umum: is data on the number of general practitioners in each province.
Dokter.Gigi…Spesialis: is data on the number of dental specialists in each province.
Keperawatan: is data on the number of nurses in each province.
Kebidanan: is data on the number of midwives in each province.
Farmasi: is the data on the number of pharmacists in each province.
Nakes.Lainnya: is data on the number of other health workers not mentioned in each province.
Tenaga.Penunjang: is data on the number of supporting staff in each province.
## [1] "No." "Nama.Provinsi"
## [3] "Jumlah.Puskesmas" "Jumlah.RS"
## [5] "Dokter.Spesialis" "Dokter.Umum"
## [7] "Dokter.Gigi...Spesialis" "Keperawatan"
## [9] "Kebidanan" "Farmasi"
## [11] "Nakes.Lainnya" "Tenaga.Penunjang"
## [13] "Jumlah"
## Jumlah.Puskesmas Jumlah.RS Dokter.Spesialis Dokter.Umum
## ACEH 347 70 1328 1580
## SUMATERA UTARA 575 237 3799 3378
## SUMATERA BARAT 276 82 1141 1107
## RIAU 232 73 1209 1629
## JAMBI 207 41 496 1006
## SUMATERA SELATAN 341 76 1563 1283
## Dokter.Gigi...Spesialis Keperawatan Kebidanan Farmasi
## ACEH 346 11635 12270 1441
## SUMATERA UTARA 826 16221 16938 1716
## SUMATERA BARAT 408 7978 5986 1499
## RIAU 450 8287 7333 1554
## JAMBI 266 6593 5263 1102
## SUMATERA SELATAN 275 12854 11805 1591
## Nakes.Lainnya Tenaga.Penunjang
## ACEH 6513 7808
## SUMATERA UTARA 5730 10774
## SUMATERA BARAT 4362 7155
## RIAU 3240 7345
## JAMBI 2785 4878
## SUMATERA SELATAN 5432 9806
wss <- function(data, maxCluster = 15) {
# Initialize within sum of squares
SSw <- (nrow(data) - 1) * sum(apply(data, 2, var))
SSw <- vector()
for (i in 2:maxCluster) {
SSw[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:maxCluster, SSw, type = "o", xlab = "Number of Clusters", ylab = "Within groups sum of squares", pch=19)
}
wss(data)
Elbow method is one way to determine the number of clusters that you want to form, by looking at the point that shows the movement starting to sloping.
In the visualization above, the amount determined can be seen at the point that indicates the movement began to sloping. In the visualization above point number 9 is a starting point. However, taking into consideration it would be ’unwise if it still kept the number of k = 9, because it has a collection of only 34 (each province). There will be many clusters that do not have members.
I chose k = 3 because it considers the territory of Indonesia divided into 3 major parts. (Western Indonesia Region, Central Indonesia Region, and Eastern Indonesia Region). It will be easier to prove the properties of the clusters formed.
The process below is the process of making a k-Means clustering model using the kmeans
function with k or cluster groups to be formed by 3.
Below is shown the characteristics of each cluster. Namely the average value of each variable based on the cluster.
## $centers
## Jumlah.Puskesmas Jumlah.RS Dokter.Spesialis Dokter.Umum
## 1 195.9583 42.75000 684.2083 812.7917
## 2 819.2500 307.75000 7494.2500 6746.0000
## 3 347.1667 97.83333 1848.5000 2274.3333
## Dokter.Gigi...Spesialis Keperawatan Kebidanan Farmasi Nakes.Lainnya
## 1 210.1250 6201.917 3619.708 882.0833 2656.667
## 2 1829.7500 41295.000 19121.500 8651.0000 14248.500
## 3 544.1667 16006.333 12341.167 2086.6667 6684.833
## Tenaga.Penunjang
## 1 4931.00
## 2 40045.25
## 3 12774.00
Visualization of the cluster plot above illustrates the area of objects that are divided into 3 clusters.
To be able to clearly see which provinces are included in clusters 1,2, or 3. Then see the table below:
Below this is the name of the province and their respective clusters.
The process below is the process of making a k-Means clustering model using the som
function.
data <-as.matrix(data)
som_grid <- somgrid(xdim=6, ydim=5, topo=c("rectangular","hexagonal"))
sommodel <- som(data, grid=som_grid,rlen=500, alpha= c(0.05,0.01), keep.data=TRUE)
The figure below explains the amount of training progress that shows the number of iterations and the impact on the average distance to the nearest unit that is getting smaller. It can be seen that iteration shows convergence starting from the iteration to 200
The graph below illustrates the characteristics of each node. For example, node (5.6) has the characteristics of the ten most variables. The bar in the node represents the variable frequency, for example, the variable “Total. Puskesmas (Number of Public Health)” has a green bar (3 o’clock) and then the next variable is shown by the next bar which is counterclockwise.
But the weakness of the visualization above is that we cannot know which node (5,6) is which cluster?
Now to overcome the shortcomings of the visualization above, we can see the plot below. The red color indicates the characteristics of cluster 1, the orange color indicates the characteristics of cluster 2, and the yellow color indicates the characteristics of cluster 3.
plot(sommodel, type="codes", bgcol = rainbow(10)[som_cluster], main = "Clusters")
add.cluster.boundaries(sommodel,som_cluster)
Based on these images, it can be seen that the model formed by the Kohonen SOM algorithm is then formed into 3 groups. Each cluster formed has its own characteristics. The characteristics of each cluster can be seen through the size of the bars inside the circle. The larger the size, the cluster has the highest average value variable.
From the visualization above, it can be seen that cluster 2 or (orange-colored nodes have the ten characteristics of the highest variable, while cluster 1 (nodes that have a red color) have the least ten characteristics of a variable.
Below is a list of cluster members 1,2 and 3 based on k-means and SOM clustering.
From the comparison of the cluster results above, it can be seen that using the k-Means method and SOM both produce cluster 1 describing the provinces with the ten lowest characteristics, cluster 2 describing the provinces with the ten highest characteristics, and cluster 3 describing the provinces with the ten characteristics not lowest and not the highest. Although the results of cluster members have several different provinces between k-Means and SOM algorithm.
As a large and developing country, the government must pay more attention to health issues with the equal distribution of health personnel and health facilities in each province by prioritizing provinces in cluster 1 first.