Objective: To categorise the countries using socio-economic and health factors that determine the overall development of the country.

About organization: HELP International is an international humanitarian NGO that is committed to fighting poverty and providing the people of backward countries with basic amenities and relief during the time of disasters and natural calamities.

Problem Statement: HELP International have been able to raise around $ 10 million. Now the CEO of the NGO needs to decide how to use this money strategically and effectively. So, CEO has to make decision to choose the countries that are in the direst need of aid. Hence, your Job as a Data scientist is to categorise the countries using some socio-economic and health factors that determine the overall development of the country. Then you need to suggest the countries which the CEO needs to focus on the most.

Let’s start with downloading data and name it country.

country <- read.csv("C:\\Users\\ellaz\\OneDrive\\Desktop\\Leyla\\University of Warsaw\\USL\\Final projects\\country.csv")

We can now inspect the dataset. We have 1 character, 2 integer and 7 number variables.

str(country)
## 'data.frame':    167 obs. of  10 variables:
##  $ country   : chr  "Afghanistan" "Albania" "Algeria" "Angola" ...
##  $ child_mort: num  90.2 16.6 27.3 119 10.3 14.5 18.1 4.8 4.3 39.2 ...
##  $ exports   : num  10 28 38.4 62.3 45.5 18.9 20.8 19.8 51.3 54.3 ...
##  $ health    : num  7.58 6.55 4.17 2.85 6.03 8.1 4.4 8.73 11 5.88 ...
##  $ imports   : num  44.9 48.6 31.4 42.9 58.9 16 45.3 20.9 47.8 20.7 ...
##  $ income    : int  1610 9930 12900 5900 19100 18700 6700 41400 43200 16000 ...
##  $ inflation : num  9.44 4.49 16.1 22.4 1.44 20.9 7.77 1.16 0.873 13.8 ...
##  $ life_expec: num  56.2 76.3 76.5 60.1 76.8 75.8 73.3 82 80.5 69.1 ...
##  $ total_fer : num  5.82 1.65 2.89 6.16 2.13 2.37 1.69 1.93 1.44 1.92 ...
##  $ gdpp      : int  553 4090 4460 3530 12200 10300 3220 51900 46900 5840 ...

We see that we have 2 columns with integer values. Let’s remove those columns to have only number type values in our data.

drop <- c("income","gdpp")
country <- country[,!(names(country) %in% drop)]

Let’s now check if we have missing values in our data or not. It looks our data is tidy and there is no missing values.

table(is.na(country))
## 
## FALSE 
##  1336

Let’s refer to various results of countries for our clustering

results <- country[2:8]

Let’s check if we have outliers. And then replace them with median. As we can see below in our data there are a lot of outliers.

boxplot(results)$out

##  [1] 149.00 150.00 208.00 160.00 103.00 175.00 153.00  93.80 200.00  14.20
## [11]  17.90 142.00 154.00 108.00 174.00  24.90  39.20 104.00  26.50  45.90
## [21]  47.50  32.10  46.50   7.49

These outliers will be a problem for us later. We need to locate/detect them in our data frame.

sapply(results, function(x) x %in% boxplot(results)$out)

##        child_mort exports health imports inflation life_expec total_fer
##   [1,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [2,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [3,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [4,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [5,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [6,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [7,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [8,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##   [9,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [10,]       TRUE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [11,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [12,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [13,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [14,]       TRUE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [15,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [16,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [17,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [18,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [19,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [20,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [21,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [22,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [23,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [24,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [25,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [26,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [27,]      FALSE   FALSE  FALSE    TRUE     FALSE      FALSE     FALSE
##  [28,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [29,]       TRUE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [30,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [31,]       TRUE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [32,]       TRUE   FALSE  FALSE    TRUE     FALSE       TRUE     FALSE
##  [33,]       TRUE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [34,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [35,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [36,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [37,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [38,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [39,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [40,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [41,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [42,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [43,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [44,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [45,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [46,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [47,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [48,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [49,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [50,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
##  [51,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [52,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [53,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [54,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [55,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [56,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [57,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [58,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [59,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [60,]      FALSE   FALSE  FALSE    TRUE     FALSE      FALSE     FALSE
##  [61,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [62,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [63,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [64,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [65,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [66,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [67,]       TRUE   FALSE  FALSE   FALSE     FALSE       TRUE     FALSE
##  [68,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [69,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [70,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [71,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [72,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [73,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [74,]      FALSE    TRUE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [75,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [76,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [77,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [78,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [79,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [80,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [81,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [82,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [83,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [84,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [85,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [86,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [87,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [88,]      FALSE   FALSE  FALSE   FALSE     FALSE       TRUE     FALSE
##  [89,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [90,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
##  [91,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [92,]      FALSE    TRUE  FALSE    TRUE     FALSE      FALSE     FALSE
##  [93,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [94,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [95,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [96,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [97,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [98,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
##  [99,]      FALSE    TRUE  FALSE    TRUE     FALSE      FALSE     FALSE
## [100,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [101,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [102,]      FALSE   FALSE   TRUE   FALSE     FALSE      FALSE     FALSE
## [103,]      FALSE    TRUE  FALSE   FALSE     FALSE      FALSE     FALSE
## [104,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
## [105,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [106,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [107,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [108,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [109,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [110,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [111,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [112,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [113,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE      TRUE
## [114,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
## [115,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [116,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [117,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [118,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [119,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [120,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [121,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [122,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [123,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [124,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [125,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [126,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
## [127,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [128,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [129,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [130,]      FALSE    TRUE  FALSE   FALSE     FALSE      FALSE     FALSE
## [131,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [132,]      FALSE    TRUE  FALSE    TRUE     FALSE      FALSE     FALSE
## [133,]       TRUE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [134,]      FALSE    TRUE  FALSE    TRUE     FALSE      FALSE     FALSE
## [135,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [136,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [137,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [138,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [139,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [140,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [141,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [142,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [143,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [144,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [145,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [146,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [147,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [148,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [149,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [150,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
## [151,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [152,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [153,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [154,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [155,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [156,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [157,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [158,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [159,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [160,]      FALSE   FALSE   TRUE   FALSE     FALSE      FALSE     FALSE
## [161,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [162,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [163,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [164,]      FALSE   FALSE  FALSE   FALSE      TRUE      FALSE     FALSE
## [165,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [166,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE
## [167,]      FALSE   FALSE  FALSE   FALSE     FALSE      FALSE     FALSE

Now we replace outliers by median in the 1:6 columns.

results$child_mort[results$child_mort %in% boxplot(results)$out] <- median(results$child_mort)

results$exports[results$exports %in% boxplot(results)$out] <- median(results$exports)

results$health[results$health %in% boxplot(results)$out] <- median(results$health)

results$imports[results$imports %in% boxplot(results)$out] <- median(results$imports)

results$inflation[results$inflation %in% boxplot(results)$out] <- median(results$inflation)

results$life_expec[results$life_expec %in% boxplot(results)$out] <- median(results$life_expec)

We need z-score standardization below to avoid a problem in which some features come to dominate solely because they tend to have larger values than others.

results_z <- as.data.frame(lapply(results, scale))

K-MEANS

Now we come to clustering with k-means. Firstly, we need to download some packages.

library(stats)

Let’s try with 3 clusters.

country_clusters <- kmeans(results_z, 3)

We check the sizes of these clusters. We get 54,57,56 They are very close.

country_clusters$size
## [1] 45 85 37

What are centers of these clusters?

country_clusters$centers
##   child_mort    exports     health    imports  inflation life_expec  total_fer
## 1  1.3396971 -0.4356283 -0.1504231 -0.1806889  0.4966408 -1.3262658  1.3856029
## 2 -0.5794842  0.3822335  0.3661154  0.4941641 -0.5528056  0.5618213 -0.5867165
## 3 -0.2981138 -0.3482858 -0.6581288 -0.9154851  0.6659361  0.3223554 -0.3373304

We check the assignment into clusters.

results$cluster <- (country_clusters$cluster)

Now let’s look at the general characteristics of clusters for all columns. Firstly, we see that mean values of child mortaliy are not close. Third cluster differs a lot from the others.

aggregate(data = results, child_mort ~ cluster, mean)
##   cluster child_mort
## 1       1   80.10667
## 2       2   14.16471
## 3       3   23.83243

Mean values of exports of countries are close in 3 clusters.

aggregate(data = results, exports ~ cluster, mean)
##   cluster  exports
## 1       1 29.52467
## 2       2 45.15882
## 3       3 31.19430

In this column, means of health are close.

aggregate(data = results, health ~ cluster, mean)
##   cluster   health
## 1       1 6.316667
## 2       2 7.630118
## 3       3 5.025676

Same goes for the imports.

aggregate(data = results, imports ~ cluster, mean)
##   cluster  imports
## 1       1 41.26889
## 2       2 53.62000
## 3       3 27.82070

For the inflation means are also close.

aggregate(data = results, inflation ~ cluster, mean)
##   cluster inflation
## 1       1  9.479667
## 2       2  3.189894
## 3       3 10.494324

Means are alike for the life expectancy.

aggregate(data = results, life_expec ~ cluster, mean)
##   cluster life_expec
## 1       1   60.56222
## 2       2   75.58353
## 3       3   73.67838

Finally, mean values of total fertility are also close.

aggregate(data = results, total_fer ~ cluster, mean)
##   cluster total_fer
## 1       1  5.045556
## 2       2  2.059765
## 3       3  2.437297

Let’s get more information about the output.

attributes(country_clusters)
## $names
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"      
## 
## $class
## [1] "kmeans"

We can even look at more options with other packages. We download the required packages.

library(factoextra)
## Loading required package: ggplot2
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(flexclust)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: modeltools
## Loading required package: stats4
library(fpc)
library(clustertend)
## Package `clustertend` is deprecated.  Use package `hopkins` instead.
library(cluster)
library(ClusterR)
## Loading required package: gtools

Let’s narrow to inflation and life_expec variables to make transparent plots.

results_short <- results[5:6]

results_short_z <- as.data.frame(lapply(results_short, scale))

country_clusters_km3 <- kmeans(results_short_z, 3)

Now we can create advanced plots. We see 3 clusters in different colors.

fviz_cluster(list(data=results_short_z, cluster=country_clusters_km3$cluster), 
             ellipse.type="norm", geom="point", stand=FALSE, palette="jco", ggtheme=theme_classic()) 

We look at the statistics by clustered groups. In this plot we see the distances of clusters from centroids.

d1<-cclust(results_short_z, 3, dist="euclidean") 
## Found more than one class "kcca" in cache; using the first, from namespace 'flexclust'
## Also defined by 'kernlab'
## Found more than one class "kcca" in cache; using the first, from namespace 'flexclust'
## Also defined by 'kernlab'
stripes(d1)

Now let’s look at the silhouette plot for 3 clusters.

sil<-silhouette(country_clusters_km3$cluster, dist(results_short_z))

fviz_silhouette(sil)
##   cluster size ave.sil.width
## 1       1   41          0.47
## 2       2   92          0.56
## 3       3   34          0.41

Let’s check the results for 2 and 4 clusters.

country_clusters_km2 <- kmeans(results_short_z, 2)

Advanced plot for 2 clusters.

fviz_cluster(list(data=results_short_z, cluster=country_clusters_km2$cluster), 
             ellipse.type="norm", geom="point", stand=FALSE, palette="jco", ggtheme=theme_classic())

And statistics for 2 clusters. Results are not heartwarming. Because cluster size is huge for the second cluster.

d2<-cclust(results_short_z, 2, dist="euclidean") 

stripes(d2) 

Silhouette plot for 2 clusters.

sil2<-silhouette(country_clusters_km2$cluster, dist(results_short_z))

fviz_silhouette(sil2)
##   cluster size ave.sil.width
## 1       1   90          0.61
## 2       2   77          0.25

Finally, let’s check for 4 clusters.

country_clusters_km4 <- kmeans(results_short_z, 4)

Advanced plot for 4 clusters.

fviz_cluster(list(data=results_short_z, cluster=country_clusters_km4$cluster), 
             ellipse.type="norm", geom="point", stand=FALSE, palette="jco", ggtheme=theme_classic())

And statistics for 4 clusters. Results are not heartwarming. We can see the big difference between second and third clusters.

d3<-cclust(results_short_z, 4, dist="euclidean") 

stripes(d3) 

Silhouette plot for 4 clusters. Actually we can consider 4 clusters too. But 3 clusters looks more tidy than 4. It means choosing 4 clusters is not a good option.

sil3<-silhouette(country_clusters_km4$cluster, dist(results_short_z))

fviz_silhouette(sil3)
##   cluster size ave.sil.width
## 1       1   28          0.35
## 2       2   42          0.40
## 3       3   59          0.45
## 4       4   38          0.37

We can also look at boxplots of 2,3 and 4 clusters.

groupBWplot(results_short_z, country_clusters_km3$cluster, alpha=0.05)

groupBWplot(results_short_z, country_clusters_km2$cluster, alpha=0.05)

groupBWplot(results_short_z, country_clusters_km4$cluster, alpha=0.05)

Let’s now look at alternative commands for k-means.

k1<- cclust(results_short_z, k=3, simple=FALSE, save.data=TRUE) 

k1
## kcca object of family 'kmeans' 
## 
## call:
## cclust(x = results_short_z, k = 3, simple = FALSE, save.data = TRUE)
## 
## cluster sizes:
## 
##  1  2  3 
## 93 33 41
plot(k1)

summary(k1)
## kcca object of family 'kmeans' 
## 
## call:
## cclust(x = results_short_z, k = 3, simple = FALSE, save.data = TRUE)
## 
## cluster info:
##   size   av_dist max_dist separation
## 1   93 0.6350463 1.348737   1.095613
## 2   33 0.8400787 1.751312   1.191154
## 3   41 0.7024138 1.715194   1.015697
## 
## convergence after 8 iterations
## sum of within cluster distances: 115.5809
attributes(k1) # checking the slots of output
## $second
##   [1] 2 3 1 3 3 1 2 3 3 3 3 2 3 3 3 3 3 1 3 2 3 2 2 1 3 2 2 1 1 3 3 3 2 2 2 3 1
##  [38] 3 3 2 2 3 3 3 3 3 2 1 3 1 2 3 1 3 3 3 1 2 3 3 3 3 3 3 1 1 3 3 2 2 3 1 3 3
##  [75] 3 3 2 3 2 3 1 1 2 3 2 3 3 3 1 1 3 3 3 2 2 2 3 1 3 3 3 1 3 1 3 3 2 1 1 3 3
## [112] 3 1 1 2 1 2 3 3 2 3 3 3 2 3 3 1 3 1 1 3 3 3 3 3 3 2 2 3 3 1 3 3 3 3 3 3 2
## [149] 3 3 1 3 3 2 3 2 3 1 3 3 3 3 1 3 1 3 2
## 
## $xrange
##      inflation life_expec
## [1,] -1.787475  -2.402485
## [2,]  2.852614   1.468888
## 
## $xcent
##    inflation   life_expec 
## 4.584033e-17 5.950855e-16 
## 
## $totaldist
## [1] 212.4864
## 
## $clsim
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.1261344 0.3497999
## [2,] 0.2162703 1.0000000 0.3428963
## [3,] 0.2924045 0.2625349 1.0000000
## 
## $centers
##        inflation life_expec
## [1,] -0.54860588  0.6806653
## [2,]  1.64371195 -0.2797813
## [3,] -0.07858896 -1.3187582
## 
## $family
## An object of class "kccaFamily"
## Slot "name":
## [1] "kmeans"
## 
## Slot "dist":
## function (x, centers) 
## {
##     if (ncol(x) != ncol(centers)) 
##         stop(sQuote("x"), " and ", sQuote("centers"), " must have the same number of columns")
##     z <- matrix(0, nrow = nrow(x), ncol = nrow(centers))
##     for (k in 1:nrow(centers)) {
##         z[, k] <- sqrt(colSums((t(x) - centers[k, ])^2))
##     }
##     z
## }
## <bytecode: 0x0000000021d18d70>
## <environment: namespace:flexclust>
## 
## Slot "cent":
## function (x) 
## colMeans(x)
## <bytecode: 0x0000000021cb6f30>
## <environment: 0x0000000022fbec10>
## 
## Slot "allcent":
## function (x, cluster, k = max(cluster, na.rm = TRUE)) 
## {
##     centers <- matrix(NA, nrow = k, ncol = ncol(x))
##     for (n in 1:k) {
##         if (sum(cluster == n, na.rm = TRUE) > 0) {
##             centers[n, ] <- z@cent(x[cluster == n, , drop = FALSE])
##         }
##     }
##     centers
## }
## <environment: 0x0000000022fbec10>
## 
## Slot "wcent":
## function (x, weights) 
## colMeans(x * normWeights(weights))
## <bytecode: 0x0000000021cb6bb0>
## <environment: 0x0000000022fbec10>
## 
## Slot "weighted":
## [1] TRUE
## 
## Slot "cluster":
## function (x, centers, n = 1, distmat = NULL) 
## {
##     if (is.null(distmat)) 
##         distmat <- z@dist(x, centers)
##     if (n == 1) {
##         return(max.col(-distmat))
##     }
##     else {
##         r <- t(matrix(apply(distmat, 1, rank, ties.method = "random"), 
##             nrow = ncol(distmat)))
##         z <- list()
##         for (k in 1:n) z[[k]] <- apply(r, 1, function(x) which(x == 
##             k))
##     }
##     return(z)
## }
## <environment: 0x0000000022fbec10>
## 
## Slot "preproc":
## function (x) 
## x
## <bytecode: 0x000000001d784aa8>
## <environment: namespace:flexclust>
## 
## Slot "groupFun":
## function (cluster, group, distmat) 
## {
##     G <- levels(group)
##     x <- matrix(0, ncol = ncol(distmat), nrow = length(G))
##     for (n in 1:length(G)) {
##         x[n, ] <- colSums(distmat[group == G[n], , drop = FALSE])
##     }
##     m <- max.col(-x)
##     names(m) <- G
##     z <- m[group]
##     names(z) <- NULL
##     if (is.list(cluster)) {
##         x[cbind(1:nrow(x), m)] <- Inf
##         m <- max.col(-x)
##         names(m) <- G
##         z1 <- m[group]
##         names(z1) <- NULL
##         z <- list(z, z1)
##     }
##     z
## }
## <bytecode: 0x0000000021d30fb0>
## <environment: namespace:flexclust>
## 
## 
## $cldist
##              [,1]     [,2]
##   [1,] 0.79513922 1.968337
##   [2,] 0.21466166 1.987361
##   [3,] 0.95773956 2.149850
##   [4,] 1.49584190 2.731771
##   [5,] 0.29812083 2.173037
##   [6,] 1.15326305 2.952148
##   [7,] 0.86157655 1.535943
##   [8,] 0.76841255 2.807360
##   [9,] 0.63390107 2.642671
##  [10,] 0.42705813 1.677915
##  [11,] 0.69287360 1.973042
##  [12,] 0.70805837 1.735360
##  [13,] 1.01111263 1.242866
##  [14,] 0.48335161 2.234299
##  [15,] 0.28273584 1.949270
##  [16,] 0.48985679 2.532308
##  [17,] 0.73177442 1.581631
##  [18,] 0.87145943 1.891729
##  [19,] 0.72407450 1.442738
##  [20,] 1.11622926 1.308979
##  [21,] 0.30475208 2.175400
##  [22,] 0.65433293 1.932369
##  [23,] 0.91487459 1.484215
##  [24,] 1.03382181 2.251101
##  [25,] 0.48224626 1.860082
##  [26,] 0.36592626 2.107969
##  [27,] 1.10841733 1.560508
##  [28,] 0.84272724 1.310962
##  [29,] 0.80459305 2.426762
##  [30,] 0.60243910 2.652112
##  [31,] 0.67891596 1.754845
##  [32,] 0.47559997 1.705953
##  [33,] 0.52153473 2.277867
##  [34,] 1.01154785 1.780404
##  [35,] 0.66712669 1.727122
##  [36,] 0.10882263 2.016046
##  [37,] 0.75515464 1.340466
##  [38,] 1.61214817 2.495075
##  [39,] 1.28992302 2.447498
##  [40,] 0.74167385 2.181521
##  [41,] 0.55370548 2.418726
##  [42,] 0.40048485 2.153921
##  [43,] 0.46899604 2.514344
##  [44,] 0.78457785 2.459830
##  [45,] 0.37343351 2.418799
##  [46,] 0.44339831 1.759731
##  [47,] 0.71025968 1.778105
##  [48,] 1.06305982 1.376192
##  [49,] 0.31954011 1.785619
##  [50,] 0.11268631 1.997710
##  [51,] 0.93883692 1.202322
##  [52,] 0.25493889 2.061323
##  [53,] 0.66041832 1.421544
##  [54,] 0.64706866 2.613644
##  [55,] 0.71087685 2.740772
##  [56,] 0.75375341 1.786353
##  [57,] 0.67783523 1.398051
##  [58,] 1.00599730 1.391936
##  [59,] 0.60789217 2.601615
##  [60,] 0.84162277 1.774376
##  [61,] 0.64547253 2.641897
##  [62,] 0.80014869 1.630814
##  [63,] 0.73153320 1.350394
##  [64,] 1.36919911 1.711853
##  [65,] 0.81208038 2.630970
##  [66,] 0.61520912 1.448398
##  [67,] 0.56993108 1.571419
##  [68,] 0.29470514 1.850520
##  [69,] 0.78386186 2.452435
##  [70,] 0.85644947 1.275980
##  [71,] 0.21712412 1.936804
##  [72,] 0.70947478 2.131788
##  [73,] 0.21607278 1.947486
##  [74,] 1.17878937 2.926282
##  [75,] 0.65801572 2.706759
##  [76,] 0.80989899 2.815647
##  [77,] 1.12412244 1.313799
##  [78,] 1.16175133 3.085859
##  [79,] 0.87492210 1.582108
##  [80,] 0.52838858 2.450582
##  [81,] 0.71243390 1.735836
##  [82,] 0.75290176 2.009614
##  [83,] 1.34873745 1.452472
##  [84,] 1.06137405 1.191154
##  [85,] 0.66253531 1.354245
##  [86,] 0.79827139 1.940086
##  [87,] 0.64480204 2.597192
##  [88,] 0.45836926 1.599542
##  [89,] 0.09641503 2.012525
##  [90,] 0.97519629 1.833626
##  [91,] 0.44102633 1.694348
##  [92,] 0.60347570 2.630079
##  [93,] 0.37348281 1.808655
##  [94,] 0.46070303 1.620639
##  [95,] 1.38525049 2.107586
##  [96,] 0.72303770 1.671858
##  [97,] 0.18116400 2.234519
##  [98,] 0.31111361 2.149103
##  [99,] 0.48496547 2.500551
## [100,] 0.43341986 2.348815
## [101,] 0.52508496 1.802389
## [102,] 0.70667073 1.402252
## [103,] 0.88264173 1.420217
## [104,] 0.70926416 1.348058
## [105,] 0.26996058 2.116589
## [106,] 0.53340736 1.825331
## [107,] 0.81491922 2.320502
## [108,] 0.79454470 1.379392
## [109,] 0.48447673 2.254305
## [110,] 0.22198019 1.794574
## [111,] 0.65638008 2.667801
## [112,] 0.55608144 2.577606
## [113,] 0.62448832 2.231193
## [114,] 0.10822076 2.047172
## [115,] 0.72390947 2.308991
## [116,] 0.91522016 2.067126
## [117,] 1.00271377 1.015697
## [118,] 0.19078201 2.234250
## [119,] 0.57001342 1.694147
## [120,] 0.45053667 2.106532
## [121,] 0.96109286 1.095613
## [122,] 0.26105414 2.101320
## [123,] 0.59431037 2.572804
## [124,] 0.73079391 2.055667
## [125,] 0.35945371 1.696014
## [126,] 0.36161957 1.737757
## [127,] 0.75896330 1.502802
## [128,] 0.67955868 1.545042
## [129,] 0.79346588 2.340285
## [130,] 0.81681009 1.591208
## [131,] 0.50056079 1.769707
## [132,] 1.29980101 2.345192
## [133,] 1.75131245 1.992856
## [134,] 0.94746546 2.954573
## [135,] 0.47351799 2.086576
## [136,] 0.79436223 2.646128
## [137,] 0.18763955 1.830938
## [138,] 0.79639947 2.479623
## [139,] 0.44894298 2.494729
## [140,] 0.84592471 2.848330
## [141,] 1.27928092 3.278679
## [142,] 0.65238999 1.405210
## [143,] 0.63169056 2.373641
## [144,] 1.02717345 1.231981
## [145,] 0.72670781 2.755741
## [146,] 0.86133887 2.874972
## [147,] 0.64932855 1.561439
## [148,] 0.56203370 1.690417
## [149,] 0.14458961 2.034583
## [150,] 0.77288549 1.321377
## [151,] 0.84484201 2.266579
## [152,] 0.83683243 1.230455
## [153,] 0.11118333 2.079075
## [154,] 0.66710896 1.949594
## [155,] 1.09507894 1.105688
## [156,] 0.90092379 1.797323
## [157,] 0.52833595 1.738335
## [158,] 1.15285599 1.549193
## [159,] 0.54772512 2.583029
## [160,] 0.43042819 2.409977
## [161,] 0.28326408 1.992022
## [162,] 0.02666452 2.026610
## [163,] 0.64300166 1.703413
## [164,] 0.38964894 1.860598
## [165,] 0.88556156 1.543833
## [166,] 1.22142339 3.056036
## [167,] 1.71519394 2.158751
## 
## $k
## [1] 3
## 
## $cluster
##   [1] 3 1 2 2 1 2 1 1 1 2 1 1 1 1 2 1 1 3 1 1 1 3 1 2 1 3 3 3 3 1 1 1 3 1 1 1 3
##  [38] 2 2 1 3 1 1 1 1 1 1 2 1 3 3 1 3 1 1 2 3 1 1 2 1 1 1 2 3 3 1 1 1 3 2 2 2 1
##  [75] 1 1 1 1 1 2 3 3 1 2 3 1 1 1 3 2 1 1 1 3 3 1 1 3 1 2 1 3 2 3 1 1 3 3 3 2 1
## [112] 1 3 3 1 2 3 1 1 1 1 1 1 1 1 2 3 1 2 3 1 1 2 1 1 1 3 3 1 1 2 1 2 1 1 1 2 3
## [149] 1 1 3 1 1 1 1 3 2 2 1 1 1 2 3 1 2 2 3
## 
## $iter
## [1] 8
## 
## $converged
## [1] TRUE
## 
## $clusinfo
##   size   av_dist max_dist separation
## 1   93 0.6350463 1.348737   1.095613
## 2   33 0.8400787 1.751312   1.191154
## 3   41 0.7024138 1.715194   1.015697
## 
## $index
## numeric(0)
## 
## $call
## cclust(x = results_short_z, k = 3, simple = FALSE, save.data = TRUE)
## 
## $control
## An object of class "cclustControl"
## Slot "pol.rate":
## [1] 1 0
## 
## Slot "exp.rate":
## [1] 1e-01 1e-04
## 
## Slot "ng.rate":
## [1] 5e-01 5e-03 1e+01 1e-02
## 
## Slot "method":
## [1] "polynomial"
## 
## Slot "iter.max":
## [1] 200
## 
## Slot "tolerance":
## [1] 1e-06
## 
## Slot "verbose":
## [1] 0
## 
## Slot "classify":
## [1] "auto"
## 
## Slot "initcent":
## [1] "randomcent"
## 
## Slot "gamma":
## [1] 1
## 
## Slot "simann":
## [1]  0.30  0.95 10.00
## 
## Slot "ntry":
## [1] 5
## 
## Slot "min.size":
## [1] 2
## 
## Slot "subsampling":
## [1] 1
## 
## 
## $data
## 
## A ModelEnv with 
## 
##   design matrix column(s):  inflation life_expec 
##   number of observations: 167 
## 
## 
## $class
## [1] "kcca"
## attr(,"package")
## [1] "flexclust"

Let’s now check for 2 clusters.

k2<- cclust(results_short_z, k=4, simple=FALSE, save.data=TRUE) 

k2
## kcca object of family 'kmeans' 
## 
## call:
## cclust(x = results_short_z, k = 4, simple = FALSE, save.data = TRUE)
## 
## cluster sizes:
## 
##  1  2  3  4 
## 15 81 41 30
plot(k2)

summary(k2)
## kcca object of family 'kmeans' 
## 
## call:
## cclust(x = results_short_z, k = 4, simple = FALSE, save.data = TRUE)
## 
## cluster info:
##   size   av_dist  max_dist separation
## 1   15 0.7687061 1.4039138  0.9181396
## 2   81 0.5654505 1.2073249  0.8442075
## 3   41 0.6999541 1.7473978  0.9457724
## 4   30 0.6442661 0.9814556  0.8177011
## 
## convergence after 6 iterations
## sum of within cluster distances: 105.3582
attributes(k2) # checking the slots of output
## $second
##   [1] 1 4 1 4 4 4 2 4 4 1 3 4 2 4 1 4 3 2 4 2 4 1 2 1 4 4 1 2 2 4 3 4 4 2 4 4 2
##  [38] 4 4 4 4 4 4 4 4 4 4 3 4 4 1 4 2 4 4 4 2 2 4 4 4 3 4 3 4 4 4 4 4 4 1 1 4 4
##  [75] 4 4 2 4 2 4 2 2 2 3 4 3 4 4 4 1 4 4 4 4 1 2 4 4 4 4 3 2 3 4 4 3 1 4 4 1 4
## [112] 4 2 4 4 1 3 4 4 4 3 4 4 4 4 1 2 3 1 2 4 3 3 4 4 4 4 4 4 4 4 4 4 2 4 4 1 4
## [149] 4 4 2 3 4 4 2 1 1 2 4 4 4 4 2 4 1 4 1
## 
## $xrange
##      inflation life_expec
## [1,] -1.787475  -2.402485
## [2,]  2.852614   1.468888
## 
## $xcent
##    inflation   life_expec 
## 4.584033e-17 5.950855e-16 
## 
## $totaldist
## [1] 212.4864
## 
## $clsim
##           [,1]      [,2]       [,3]      [,4]
## [1,] 1.0000000 0.0000000 0.10048256 0.5084158
## [2,] 0.0000000 1.0000000 0.08552483 0.4236559
## [3,] 0.1432680 0.2229473 1.00000000 0.1925463
## [4,] 0.3083262 0.2932215 0.08338737 1.0000000
## 
## $centers
##      inflation life_expec
## [1,]  2.127374 -0.7877371
## [2,] -0.666532  0.7358284
## [3,] -0.113546 -1.3107874
## [4,]  0.891129  0.1985412
## 
## $family
## An object of class "kccaFamily"
## Slot "name":
## [1] "kmeans"
## 
## Slot "dist":
## function (x, centers) 
## {
##     if (ncol(x) != ncol(centers)) 
##         stop(sQuote("x"), " and ", sQuote("centers"), " must have the same number of columns")
##     z <- matrix(0, nrow = nrow(x), ncol = nrow(centers))
##     for (k in 1:nrow(centers)) {
##         z[, k] <- sqrt(colSums((t(x) - centers[k, ])^2))
##     }
##     z
## }
## <bytecode: 0x0000000021d18d70>
## <environment: namespace:flexclust>
## 
## Slot "cent":
## function (x) 
## colMeans(x)
## <bytecode: 0x0000000021cb6f30>
## <environment: 0x000000001f6bb1b0>
## 
## Slot "allcent":
## function (x, cluster, k = max(cluster, na.rm = TRUE)) 
## {
##     centers <- matrix(NA, nrow = k, ncol = ncol(x))
##     for (n in 1:k) {
##         if (sum(cluster == n, na.rm = TRUE) > 0) {
##             centers[n, ] <- z@cent(x[cluster == n, , drop = FALSE])
##         }
##     }
##     centers
## }
## <environment: 0x000000001f6bb1b0>
## 
## Slot "wcent":
## function (x, weights) 
## colMeans(x * normWeights(weights))
## <bytecode: 0x0000000021cb6bb0>
## <environment: 0x000000001f6bb1b0>
## 
## Slot "weighted":
## [1] TRUE
## 
## Slot "cluster":
## function (x, centers, n = 1, distmat = NULL) 
## {
##     if (is.null(distmat)) 
##         distmat <- z@dist(x, centers)
##     if (n == 1) {
##         return(max.col(-distmat))
##     }
##     else {
##         r <- t(matrix(apply(distmat, 1, rank, ties.method = "random"), 
##             nrow = ncol(distmat)))
##         z <- list()
##         for (k in 1:n) z[[k]] <- apply(r, 1, function(x) which(x == 
##             k))
##     }
##     return(z)
## }
## <environment: 0x000000001f6bb1b0>
## 
## Slot "preproc":
## function (x) 
## x
## <bytecode: 0x000000001d784aa8>
## <environment: namespace:flexclust>
## 
## Slot "groupFun":
## function (cluster, group, distmat) 
## {
##     G <- levels(group)
##     x <- matrix(0, ncol = ncol(distmat), nrow = length(G))
##     for (n in 1:length(G)) {
##         x[n, ] <- colSums(distmat[group == G[n], , drop = FALSE])
##     }
##     m <- max.col(-x)
##     names(m) <- G
##     z <- m[group]
##     names(z) <- NULL
##     if (is.list(cluster)) {
##         x[cbind(1:nrow(x), m)] <- Inf
##         m <- max.col(-x)
##         names(m) <- G
##         z1 <- m[group]
##         names(z1) <- NULL
##         z <- list(z, z1)
##     }
##     z
## }
## <bytecode: 0x0000000021d30fb0>
## <environment: namespace:flexclust>
## 
## 
## $cldist
##              [,1]      [,2]
##   [1,] 0.82592156 1.9652293
##   [2,] 0.34113973 1.3080799
##   [3,] 0.85626949 1.5563796
##   [4,] 0.79473910 2.3680465
##   [5,] 0.17948784 1.8110237
##   [6,] 1.40391380 1.5606325
##   [7,] 0.68400924 0.9916083
##   [8,] 0.67131882 2.1321708
##   [9,] 0.52110337 2.0769234
##  [10,] 0.55723102 1.0553248
##  [11,] 0.62680441 1.9475172
##  [12,] 0.83180017 0.8442075
##  [13,] 0.83612149 1.1308162
##  [14,] 0.36649780 1.9874734
##  [15,] 0.61500149 0.9835938
##  [16,] 0.39527057 1.8992994
##  [17,] 0.73614658 1.5569774
##  [18,] 0.83565854 1.9256554
##  [19,] 0.84371275 0.9795809
##  [20,] 0.52937616 1.2450946
##  [21,] 0.18611718 1.8174219
##  [22,] 0.68569845 1.9800695
##  [23,] 0.60344798 1.0443529
##  [24,] 0.98145558 1.5980043
##  [25,] 0.45070002 1.7973749
##  [26,] 0.38693197 2.0403322
##  [27,] 1.14404793 1.4672777
##  [28,] 0.81641181 1.3698368
##  [29,] 0.77924796 2.4741504
##  [30,] 0.54785290 1.8472242
##  [31,] 0.65353069 1.7298816
##  [32,] 0.49322909 1.6415955
##  [33,] 0.53452663 2.2295660
##  [34,] 0.93809513 1.1093241
##  [35,] 0.79708296 0.8526058
##  [36,] 0.23656003 1.4112519
##  [37,] 0.73194752 1.4095989
##  [38,] 0.95881224 2.4248588
##  [39,] 0.60881257 2.1379891
##  [40,] 0.80335080 1.3086947
##  [41,] 0.55591871 2.3249594
##  [42,] 0.29377346 1.8942308
##  [43,] 0.37780803 1.8742355
##  [44,] 0.66049624 2.2956899
##  [45,] 0.33970119 1.6740500
##  [46,] 0.57258842 1.0950518
##  [47,] 0.82854516 0.8866991
##  [48,] 0.40084533 1.4252087
##  [49,] 0.36125244 1.5441716
##  [50,] 0.07705008 1.8322018
##  [51,] 0.97236349 1.3368060
##  [52,] 0.17672897 1.7363275
##  [53,] 0.63799900 1.4944539
##  [54,] 0.52422134 2.1261860
##  [55,] 0.60790519 2.1074080
##  [56,] 0.50582655 1.4645715
##  [57,] 0.65632824 1.4721191
##  [58,] 0.54976627 1.1360596
##  [59,] 0.49017601 2.0707688
##  [60,] 0.55376839 1.5392613
##  [61,] 0.52902675 2.1005672
##  [62,] 0.78871441 1.6045804
##  [63,] 0.83687094 1.1321874
##  [64,] 1.00867291 1.7476911
##  [65,] 0.79693376 2.6092950
##  [66,] 0.60536731 1.3631378
##  [67,] 0.69084833 1.0680605
##  [68,] 0.31162321 1.6035683
##  [69,] 0.80265623 1.5809629
##  [70,] 0.87063314 0.9457724
##  [71,] 0.67511456 0.9157148
##  [72,] 0.71382667 1.3361507
##  [73,] 0.53243555 1.0518848
##  [74,] 1.04860961 2.6936294
##  [75,] 0.57054434 2.0057147
##  [76,] 0.69800629 2.2314389
##  [77,] 0.42283726 1.2511945
##  [78,] 1.03844437 2.6215382
##  [79,] 0.69061862 0.9988820
##  [80,] 0.44852465 1.3867123
##  [81,] 0.67712513 1.7821862
##  [82,] 0.71788316 2.0514144
##  [83,] 0.70044536 1.4584555
##  [84,] 0.61030400 1.2044227
##  [85,] 0.68616673 1.2017386
##  [86,] 0.73706572 1.9132114
##  [87,] 0.51981395 2.1325544
##  [88,] 0.55802774 1.2847585
##  [89,] 0.06056366 1.8346162
##  [90,] 0.58127344 1.6467052
##  [91,] 0.47408573 1.5803488
##  [92,] 0.57524494 1.7473289
##  [93,] 0.38114165 1.6440185
##  [94,] 0.49532579 1.5793782
##  [95,] 1.41635445 1.8985583
##  [96,] 0.79623957 0.8529007
##  [97,] 0.13256257 1.6325563
##  [98,] 0.28449434 2.0748798
##  [99,] 0.47333558 1.6437989
## [100,] 0.42559653 1.3057550
## [101,] 0.50398808 1.7796362
## [102,] 0.68195124 1.4699011
## [103,] 0.39619338 1.4350017
## [104,] 0.69690327 1.3511989
## [105,] 0.16751658 1.7715698
## [106,] 0.50533777 1.8023136
## [107,] 0.83450737 2.3336540
## [108,] 0.79496227 1.0914190
## [109,] 0.45958424 2.2468758
## [110,] 0.77464565 0.8177011
## [111,] 0.54478959 2.0925740
## [112,] 0.53462700 1.7020205
## [113,] 0.59523364 2.2836055
## [114,] 0.07584030 1.8731122
## [115,] 0.76591042 1.4343077
## [116,] 0.75900587 1.5402278
## [117,] 0.94255041 1.0267092
## [118,] 0.10547470 1.6721317
## [119,] 0.69934015 0.9745575
## [120,] 0.54690171 1.2148094
## [121,] 1.04144267 1.0787621
## [122,] 0.16456062 1.7586312
## [123,] 0.47284161 2.0713874
## [124,] 0.81115166 1.1792377
## [125,] 0.44472574 1.3929468
## [126,] 0.58934420 1.0051423
## [127,] 0.72761305 1.5546627
## [128,] 0.69975376 1.5219539
## [129,] 0.94345789 1.3335411
## [130,] 0.78288630 1.6336828
## [131,] 0.63066998 1.0265596
## [132,] 1.20732494 2.3143315
## [133,] 1.28420832 2.0283741
## [134,] 0.83710001 2.3489645
## [135,] 0.38471505 1.9277990
## [136,] 0.66438123 2.3054733
## [137,] 0.20834728 1.6170423
## [138,] 0.80741194 2.4870282
## [139,] 0.40842697 1.7222331
## [140,] 0.73337208 2.2670689
## [141,] 1.33869005 1.8405487
## [142,] 0.74774216 1.2429752
## [143,] 0.19161374 1.5232979
## [144,] 0.83119755 1.1468346
## [145,] 0.62337674 2.1223555
## [146,] 0.75244181 2.2642546
## [147,] 0.40392544 1.2753914
## [148,] 0.59779082 1.7382067
## [149,] 0.26628626 1.3853661
## [150,] 0.88044054 1.0953127
## [151,] 0.81381484 2.3068349
## [152,] 0.90964896 1.2121377
## [153,] 0.21902572 1.4394335
## [154,] 0.76690923 1.0628277
## [155,] 1.07973541 1.1402602
## [156,] 0.93482246 1.7628214
## [157,] 0.38793589 1.2004299
## [158,] 0.49083425 1.6681514
## [159,] 0.44712713 1.9628238
## [160,] 0.30594591 1.9267008
## [161,] 0.40703350 1.2472240
## [162,] 0.67672831 0.9181396
## [163,] 0.60841539 1.7557793
## [164,] 0.51963475 1.1293116
## [165,] 0.06661262 1.5813610
## [166,] 0.79824828 2.0672520
## [167,] 1.74739782 1.8373059
## 
## $k
## [1] 4
## 
## $cluster
##   [1] 3 2 4 1 2 1 4 2 2 4 2 2 4 2 4 2 2 3 2 4 2 3 4 4 2 3 3 3 3 2 2 2 3 4 2 2 3
##  [38] 1 1 2 3 2 2 2 2 2 2 4 2 3 3 2 3 2 2 1 3 4 2 1 2 2 2 1 3 3 2 2 2 3 4 4 1 2
##  [75] 2 2 4 2 4 1 3 3 4 4 3 2 2 2 3 4 2 2 2 3 3 4 2 3 2 1 2 3 4 3 2 2 3 3 3 4 2
## [112] 2 3 3 2 4 4 2 2 2 2 2 2 2 2 4 3 2 4 3 2 2 1 2 2 2 3 3 2 2 1 2 1 4 2 2 4 3
## [149] 2 2 3 2 2 2 3 3 4 4 2 2 2 1 3 2 4 1 3
## 
## $iter
## [1] 6
## 
## $converged
## [1] TRUE
## 
## $clusinfo
##   size   av_dist  max_dist separation
## 1   15 0.7687061 1.4039138  0.9181396
## 2   81 0.5654505 1.2073249  0.8442075
## 3   41 0.6999541 1.7473978  0.9457724
## 4   30 0.6442661 0.9814556  0.8177011
## 
## $index
## numeric(0)
## 
## $call
## cclust(x = results_short_z, k = 4, simple = FALSE, save.data = TRUE)
## 
## $control
## An object of class "cclustControl"
## Slot "pol.rate":
## [1] 1 0
## 
## Slot "exp.rate":
## [1] 1e-01 1e-04
## 
## Slot "ng.rate":
## [1] 5e-01 5e-03 1e+01 1e-02
## 
## Slot "method":
## [1] "polynomial"
## 
## Slot "iter.max":
## [1] 200
## 
## Slot "tolerance":
## [1] 1e-06
## 
## Slot "verbose":
## [1] 0
## 
## Slot "classify":
## [1] "auto"
## 
## Slot "initcent":
## [1] "randomcent"
## 
## Slot "gamma":
## [1] 1
## 
## Slot "simann":
## [1]  0.30  0.95 10.00
## 
## Slot "ntry":
## [1] 5
## 
## Slot "min.size":
## [1] 2
## 
## Slot "subsampling":
## [1] 1
## 
## 
## $data
## 
## A ModelEnv with 
## 
##   design matrix column(s):  inflation life_expec 
##   number of observations: 167 
## 
## 
## $class
## [1] "kcca"
## attr(,"package")
## [1] "flexclust"

We can already say that the best option is choosing 3 clusters. However, let’s check if our assumption is correct or not. We can check it with elbow method.

We can see from the graph that we were correct with our assumption. The best number of clusters is 3 with 0.3 variance.

opt<-Optimal_Clusters_KMeans(results_short_z, max_clusters=10, plot_clusters = TRUE)

We can also check the optimal number of clusters with silhouette method. Here also we see that 3 clusters are the best option to choose.

opt<-Optimal_Clusters_KMeans(results_short_z, max_clusters=10, plot_clusters=TRUE, criterion="silhouette")

PARTITIONING AROUND MEDOIDS

Now let’s look at the results with partitioning around medoids method with 3 clusters.

c1<-pam(results_short_z,3)
print(c1)
## Medoids:
##      ID  inflation life_expec
## [1,] 51  0.8504177 -1.1832540
## [2,] 67 -0.1757077  0.2496567
## [3,] 59 -0.9585662  1.1295142
## Clustering vector:
##   [1] 1 2 2 1 3 1 2 3 3 1 3 2 2 3 1 3 2 2 2 2 3 1 2 2 2 1 1 2 1 3 2 2 1 2 2 2 2
##  [38] 1 1 2 1 3 3 3 3 2 2 2 2 1 1 3 2 3 3 1 2 2 3 1 3 2 2 1 1 2 2 2 3 1 1 2 1 3
##  [75] 3 3 2 3 2 1 2 1 2 1 1 3 3 2 1 2 2 3 2 1 1 2 3 1 3 1 2 2 1 2 3 2 1 2 1 1 3
## [112] 3 1 1 3 2 1 3 2 2 2 3 3 2 2 1 2 2 1 2 2 3 1 3 3 3 1 1 3 3 1 2 1 2 3 3 1 1
## [149] 2 2 1 2 2 2 2 1 1 2 3 3 2 1 2 2 2 1 1
## Objective function:
##     build      swap 
## 0.7869725 0.7869725 
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"
c1$medoids
##       inflation life_expec
## [1,]  0.8504177 -1.1832540
## [2,] -0.1757077  0.2496567
## [3,] -0.9585662  1.1295142
c1$clustering
##   [1] 1 2 2 1 3 1 2 3 3 1 3 2 2 3 1 3 2 2 2 2 3 1 2 2 2 1 1 2 1 3 2 2 1 2 2 2 2
##  [38] 1 1 2 1 3 3 3 3 2 2 2 2 1 1 3 2 3 3 1 2 2 3 1 3 2 2 1 1 2 2 2 3 1 1 2 1 3
##  [75] 3 3 2 3 2 1 2 1 2 1 1 3 3 2 1 2 2 3 2 1 1 2 3 1 3 1 2 2 1 2 3 2 1 2 1 1 3
## [112] 3 1 1 3 2 1 3 2 2 2 3 3 2 2 1 2 2 1 2 2 3 1 3 3 3 1 1 3 3 1 2 1 2 3 3 1 1
## [149] 2 2 1 2 2 2 2 1 1 2 3 3 2 1 2 2 2 1 1
head(c1$clustering)
## [1] 1 2 2 1 3 1
summary(c1)
## Medoids:
##      ID  inflation life_expec
## [1,] 51  0.8504177 -1.1832540
## [2,] 67 -0.1757077  0.2496567
## [3,] 59 -0.9585662  1.1295142
## Clustering vector:
##   [1] 1 2 2 1 3 1 2 3 3 1 3 2 2 3 1 3 2 2 2 2 3 1 2 2 2 1 1 2 1 3 2 2 1 2 2 2 2
##  [38] 1 1 2 1 3 3 3 3 2 2 2 2 1 1 3 2 3 3 1 2 2 3 1 3 2 2 1 1 2 2 2 3 1 1 2 1 3
##  [75] 3 3 2 3 2 1 2 1 2 1 1 3 3 2 1 2 2 3 2 1 1 2 3 1 3 1 2 2 1 2 3 2 1 2 1 1 3
## [112] 3 1 1 3 2 1 3 2 2 2 3 3 2 2 1 2 2 1 2 2 3 1 3 3 3 1 1 3 3 1 2 1 2 3 3 1 1
## [149] 2 2 1 2 2 2 2 1 1 2 3 3 2 1 2 2 2 1 1
## Objective function:
##     build      swap 
## 0.7869725 0.7869725 
## 
## Numerical information per cluster:
##      size max_diss   av_diss diameter separation
## [1,]   53 2.457705 1.1737897 4.111791  0.1741897
## [2,]   70 1.943227 0.7441961 3.265159  0.1280190
## [3,]   44 1.181654 0.3890871 1.945823  0.1280190
## 
## Isolated clusters:
##  L-clusters: character(0)
##  L*-clusters: character(0)
## 
## Silhouette plot information:
##     cluster neighbor    sil_width
## 64        1        2  0.429967358
## 27        1        2  0.421025289
## 133       1        2  0.417325377
## 95        1        2  0.407320400
## 167       1        2  0.405158852
## 156       1        2  0.404589538
## 60        1        2  0.401636866
## 56        1        2  0.391837607
## 38        1        2  0.389100793
## 1         1        2  0.385730635
## 39        1        2  0.384513677
## 22        1        2  0.368419543
## 4         1        2  0.362766156
## 51        1        2  0.346470720
## 107       1        2  0.343868576
## 148       1        2  0.342160886
## 143       1        2  0.324583893
## 138       1        2  0.306362886
## 33        1        2  0.293792806
## 73        1        2  0.292164854
## 26        1        2  0.285479486
## 100       1        2  0.280484771
## 94        1        2  0.277861731
## 80        1        2  0.275981143
## 166       1        2  0.271493343
## 41        1        2  0.256569141
## 162       1        2  0.233248615
## 110       1        2  0.216195338
## 65        1        2  0.164689299
## 117       1        2  0.153122081
## 71        1        2  0.145901149
## 85        1        2  0.137468676
## 141       1        2  0.131736702
## 126       1        2  0.131261104
## 137       1        2  0.122517810
## 114       1        2  0.114678578
## 10        1        2  0.113240097
## 98        1        2  0.109426462
## 15        1        2  0.108780771
## 109       1        2  0.107795295
## 89        1        2  0.101106930
## 50        1        2  0.089457971
## 29        1        2  0.076161502
## 6         1        2  0.070154510
## 113       1        2  0.047435571
## 157       1        2 -0.003107643
## 151       1        2 -0.012579141
## 147       1        2 -0.023803305
## 129       1        2 -0.030854425
## 70        1        2 -0.082888392
## 82        1        2 -0.104901310
## 103       1        2 -0.162318827
## 84        1        2 -0.177674136
## 144       2        3  0.447582409
## 13        2        3  0.446686369
## 20        2        3  0.432970255
## 58        2        3  0.414590975
## 150       2        3  0.399834764
## 7         2        3  0.398386317
## 19        2        3  0.395375854
## 63        2        3  0.383900031
## 121       2        3  0.383231513
## 23        2        3  0.372084139
## 37        2        1  0.362229958
## 77        2        3  0.354956773
## 28        2        3  0.353323176
## 96        2        3  0.337834290
## 152       2        3  0.337458033
## 142       2        3  0.332083868
## 102       2        1  0.331203901
## 104       2        1  0.327080756
## 67        2        3  0.326823206
## 57        2        1  0.323201544
## 35        2        3  0.319958415
## 53        2        1  0.311380185
## 119       2        3  0.308261359
## 127       2        1  0.301394683
## 79        2        3  0.295764078
## 155       2        3  0.292698256
## 158       2        3  0.288594156
## 48        2        1  0.283103400
## 130       2        1  0.274158294
## 108       2        1  0.272718251
## 66        2        1  0.255341561
## 131       2        3  0.249807530
## 12        2        3  0.248282450
## 165       2        1  0.247134996
## 83        2        3  0.222857479
## 46        2        3  0.219474968
## 88        2        3  0.216605511
## 90        2        1  0.205332937
## 47        2        3  0.195760410
## 81        2        1  0.194527239
## 163       2        1  0.194117421
## 18        2        1  0.158624483
## 164       2        3  0.133420350
## 116       2        1  0.123597551
## 3         2        1  0.108157304
## 34        2        3  0.104653809
## 24        2        1  0.093518849
## 128       2        3  0.081203788
## 125       2        3  0.079993090
## 72        2        1  0.044997235
## 17        2        3  0.025286632
## 154       2        3  0.018711760
## 91        2        3 -0.020416663
## 62        2        3 -0.037030688
## 32        2        3 -0.057677717
## 161       2        3 -0.068378437
## 124       2        3 -0.102322781
## 49        2        3 -0.108163331
## 2         2        3 -0.124721987
## 120       2        3 -0.126394861
## 31        2        3 -0.153733587
## 93        2        3 -0.175457288
## 101       2        3 -0.196950363
## 68        2        3 -0.209737081
## 40        2        3 -0.212402348
## 106       2        3 -0.223797447
## 149       2        3 -0.231660479
## 36        2        3 -0.242506642
## 25        2        3 -0.261230704
## 153       2        3 -0.309996229
## 9         3        2  0.725638364
## 61        3        2  0.725352005
## 111       3        2  0.724507220
## 59        3        2  0.724272023
## 123       3        2  0.719348817
## 54        3        2  0.717879305
## 87        3        2  0.712200409
## 55        3        2  0.711785437
## 145       3        2  0.710037680
## 159       3        2  0.709792454
## 76        3        2  0.701852271
## 16        3        2  0.693822398
## 140       3        2  0.693720798
## 8         3        2  0.692122620
## 75        3        2  0.690420854
## 146       3        2  0.687369918
## 43        3        2  0.685139975
## 160       3        2  0.679336536
## 134       3        2  0.664179831
## 136       3        2  0.659937032
## 30        3        2  0.627138776
## 78        3        2  0.604052350
## 139       3        2  0.599051654
## 44        3        2  0.579889002
## 45        3        2  0.575114891
## 14        3        2  0.572364455
## 92        3        2  0.563778257
## 74        3        2  0.561854799
## 21        3        2  0.557212035
## 5         3        2  0.555190858
## 112       3        2  0.552464322
## 118       3        2  0.543450494
## 42        3        2  0.533569795
## 99        3        2  0.531992561
## 97        3        2  0.519787172
## 105       3        2  0.508554999
## 122       3        2  0.493845084
## 135       3        2  0.464446677
## 52        3        2  0.453367537
## 69        3        2  0.372735062
## 11        3        2  0.321227997
## 132       3        2  0.313026768
## 115       3        2  0.291169685
## 86        3        2  0.270814570
## Average silhouette width per cluster:
## [1] 0.2045834 0.1566814 0.5907913
## Average silhouette width of total data set:
## [1] 0.2862601
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"
class(c1)
## [1] "pam"       "partition"
silhouette(c1)
##     cluster neighbor    sil_width
## 64        1        2  0.429967358
## 27        1        2  0.421025289
## 133       1        2  0.417325377
## 95        1        2  0.407320400
## 167       1        2  0.405158852
## 156       1        2  0.404589538
## 60        1        2  0.401636866
## 56        1        2  0.391837607
## 38        1        2  0.389100793
## 1         1        2  0.385730635
## 39        1        2  0.384513677
## 22        1        2  0.368419543
## 4         1        2  0.362766156
## 51        1        2  0.346470720
## 107       1        2  0.343868576
## 148       1        2  0.342160886
## 143       1        2  0.324583893
## 138       1        2  0.306362886
## 33        1        2  0.293792806
## 73        1        2  0.292164854
## 26        1        2  0.285479486
## 100       1        2  0.280484771
## 94        1        2  0.277861731
## 80        1        2  0.275981143
## 166       1        2  0.271493343
## 41        1        2  0.256569141
## 162       1        2  0.233248615
## 110       1        2  0.216195338
## 65        1        2  0.164689299
## 117       1        2  0.153122081
## 71        1        2  0.145901149
## 85        1        2  0.137468676
## 141       1        2  0.131736702
## 126       1        2  0.131261104
## 137       1        2  0.122517810
## 114       1        2  0.114678578
## 10        1        2  0.113240097
## 98        1        2  0.109426462
## 15        1        2  0.108780771
## 109       1        2  0.107795295
## 89        1        2  0.101106930
## 50        1        2  0.089457971
## 29        1        2  0.076161502
## 6         1        2  0.070154510
## 113       1        2  0.047435571
## 157       1        2 -0.003107643
## 151       1        2 -0.012579141
## 147       1        2 -0.023803305
## 129       1        2 -0.030854425
## 70        1        2 -0.082888392
## 82        1        2 -0.104901310
## 103       1        2 -0.162318827
## 84        1        2 -0.177674136
## 144       2        3  0.447582409
## 13        2        3  0.446686369
## 20        2        3  0.432970255
## 58        2        3  0.414590975
## 150       2        3  0.399834764
## 7         2        3  0.398386317
## 19        2        3  0.395375854
## 63        2        3  0.383900031
## 121       2        3  0.383231513
## 23        2        3  0.372084139
## 37        2        1  0.362229958
## 77        2        3  0.354956773
## 28        2        3  0.353323176
## 96        2        3  0.337834290
## 152       2        3  0.337458033
## 142       2        3  0.332083868
## 102       2        1  0.331203901
## 104       2        1  0.327080756
## 67        2        3  0.326823206
## 57        2        1  0.323201544
## 35        2        3  0.319958415
## 53        2        1  0.311380185
## 119       2        3  0.308261359
## 127       2        1  0.301394683
## 79        2        3  0.295764078
## 155       2        3  0.292698256
## 158       2        3  0.288594156
## 48        2        1  0.283103400
## 130       2        1  0.274158294
## 108       2        1  0.272718251
## 66        2        1  0.255341561
## 131       2        3  0.249807530
## 12        2        3  0.248282450
## 165       2        1  0.247134996
## 83        2        3  0.222857479
## 46        2        3  0.219474968
## 88        2        3  0.216605511
## 90        2        1  0.205332937
## 47        2        3  0.195760410
## 81        2        1  0.194527239
## 163       2        1  0.194117421
## 18        2        1  0.158624483
## 164       2        3  0.133420350
## 116       2        1  0.123597551
## 3         2        1  0.108157304
## 34        2        3  0.104653809
## 24        2        1  0.093518849
## 128       2        3  0.081203788
## 125       2        3  0.079993090
## 72        2        1  0.044997235
## 17        2        3  0.025286632
## 154       2        3  0.018711760
## 91        2        3 -0.020416663
## 62        2        3 -0.037030688
## 32        2        3 -0.057677717
## 161       2        3 -0.068378437
## 124       2        3 -0.102322781
## 49        2        3 -0.108163331
## 2         2        3 -0.124721987
## 120       2        3 -0.126394861
## 31        2        3 -0.153733587
## 93        2        3 -0.175457288
## 101       2        3 -0.196950363
## 68        2        3 -0.209737081
## 40        2        3 -0.212402348
## 106       2        3 -0.223797447
## 149       2        3 -0.231660479
## 36        2        3 -0.242506642
## 25        2        3 -0.261230704
## 153       2        3 -0.309996229
## 9         3        2  0.725638364
## 61        3        2  0.725352005
## 111       3        2  0.724507220
## 59        3        2  0.724272023
## 123       3        2  0.719348817
## 54        3        2  0.717879305
## 87        3        2  0.712200409
## 55        3        2  0.711785437
## 145       3        2  0.710037680
## 159       3        2  0.709792454
## 76        3        2  0.701852271
## 16        3        2  0.693822398
## 140       3        2  0.693720798
## 8         3        2  0.692122620
## 75        3        2  0.690420854
## 146       3        2  0.687369918
## 43        3        2  0.685139975
## 160       3        2  0.679336536
## 134       3        2  0.664179831
## 136       3        2  0.659937032
## 30        3        2  0.627138776
## 78        3        2  0.604052350
## 139       3        2  0.599051654
## 44        3        2  0.579889002
## 45        3        2  0.575114891
## 14        3        2  0.572364455
## 92        3        2  0.563778257
## 74        3        2  0.561854799
## 21        3        2  0.557212035
## 5         3        2  0.555190858
## 112       3        2  0.552464322
## 118       3        2  0.543450494
## 42        3        2  0.533569795
## 99        3        2  0.531992561
## 97        3        2  0.519787172
## 105       3        2  0.508554999
## 122       3        2  0.493845084
## 135       3        2  0.464446677
## 52        3        2  0.453367537
## 69        3        2  0.372735062
## 11        3        2  0.321227997
## 132       3        2  0.313026768
## 115       3        2  0.291169685
## 86        3        2  0.270814570
## attr(,"Ordered")
## [1] TRUE
## attr(,"call")
## pam(x = results_short_z, k = 3)
## attr(,"class")
## [1] "silhouette"

It is better to look at advanced graphics. 3 clusters look absolutely optimal

fviz_cluster(c1, geom = "point", ellipse.type = "convex")

Let’s now check also for 4 clusters.

c2<-pam(results_short_z,4)
print(c2)
## Medoids:
##      ID  inflation life_expec
## [1,] 89 -0.1723707 -1.2963786
## [2,] 46 -0.1773762  0.4381976
## [3,] 71  1.4677614 -0.1525638
## [4,] 59 -0.9585662  1.1295142
## Clustering vector:
##   [1] 1 2 3 3 4 3 2 4 4 3 4 2 2 4 3 4 2 1 2 2 4 1 2 3 2 1 1 1 1 4 2 2 1 2 2 2 1
##  [38] 3 3 2 1 4 4 4 4 2 2 3 2 1 1 4 1 4 4 3 1 2 4 3 4 2 2 3 1 1 2 2 4 1 3 3 3 4
##  [75] 4 4 2 4 2 3 1 1 2 3 1 4 4 2 1 3 2 4 2 1 1 2 4 1 4 3 2 1 3 1 4 2 1 1 1 3 4
## [112] 4 1 1 2 3 3 4 2 2 2 4 4 2 2 3 1 2 3 1 2 4 3 4 4 4 1 1 4 4 3 2 3 2 4 4 3 1
## [149] 2 2 1 2 2 2 2 1 3 3 4 4 2 3 1 2 3 3 1
## Objective function:
##     build      swap 
## 0.6657886 0.5871041 
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"
c2$medoids
##       inflation life_expec
## [1,] -0.1723707 -1.2963786
## [2,] -0.1773762  0.4381976
## [3,]  1.4677614 -0.1525638
## [4,] -0.9585662  1.1295142
c2$clustering
##   [1] 1 2 3 3 4 3 2 4 4 3 4 2 2 4 3 4 2 1 2 2 4 1 2 3 2 1 1 1 1 4 2 2 1 2 2 2 1
##  [38] 3 3 2 1 4 4 4 4 2 2 3 2 1 1 4 1 4 4 3 1 2 4 3 4 2 2 3 1 1 2 2 4 1 3 3 3 4
##  [75] 4 4 2 4 2 3 1 1 2 3 1 4 4 2 1 3 2 4 2 1 1 2 4 1 4 3 2 1 3 1 4 2 1 1 1 3 4
## [112] 4 1 1 2 3 3 4 2 2 2 4 4 2 2 3 1 2 3 1 2 4 3 4 4 4 1 1 4 4 3 2 3 2 4 4 3 1
## [149] 2 2 1 2 2 2 2 1 3 3 4 4 2 3 1 2 3 3 1
head(c2$clustering)
## [1] 1 2 3 3 4 3
summary(c2)
## Medoids:
##      ID  inflation life_expec
## [1,] 89 -0.1723707 -1.2963786
## [2,] 46 -0.1773762  0.4381976
## [3,] 71  1.4677614 -0.1525638
## [4,] 59 -0.9585662  1.1295142
## Clustering vector:
##   [1] 1 2 3 3 4 3 2 4 4 3 4 2 2 4 3 4 2 1 2 2 4 1 2 3 2 1 1 1 1 4 2 2 1 2 2 2 1
##  [38] 3 3 2 1 4 4 4 4 2 2 3 2 1 1 4 1 4 4 3 1 2 4 3 4 2 2 3 1 1 2 2 4 1 3 3 3 4
##  [75] 4 4 2 4 2 3 1 1 2 3 1 4 4 2 1 3 2 4 2 1 1 2 4 1 4 3 2 1 3 1 4 2 1 1 1 3 4
## [112] 4 1 1 2 3 3 4 2 2 2 4 4 2 2 3 1 2 3 1 2 4 3 4 4 4 1 1 4 4 3 2 3 2 4 4 3 1
## [149] 2 2 1 2 2 2 2 1 3 3 4 4 2 3 1 2 3 3 1
## Objective function:
##     build      swap 
## 0.6657886 0.5871041 
## 
## Numerical information per cluster:
##      size max_diss   av_diss diameter separation
## [1,]   40 1.802512 0.6854609 2.537005  0.2635407
## [2,]   50 1.062252 0.5197866 1.987808  0.1490403
## [3,]   34 1.899480 0.8350715 2.813899  0.2600444
## [4,]   43 1.181654 0.3778184 1.943465  0.1490403
## 
## Isolated clusters:
##  L-clusters: character(0)
##  L*-clusters: character(0)
## 
## Silhouette plot information:
##     cluster neighbor    sil_width
## 26        1        2  0.625904767
## 33        1        2  0.624019625
## 41        1        2  0.619902296
## 98        1        2  0.609910883
## 114       1        2  0.604291305
## 109       1        2  0.600205882
## 89        1        2  0.596862110
## 138       1        2  0.595176716
## 50        1        2  0.593614228
## 65        1        2  0.570805847
## 107       1        3  0.568561863
## 113       1        2  0.568106521
## 137       1        2  0.551041487
## 29        1        2  0.549029261
## 22        1        3  0.543955610
## 94        1        3  0.516768705
## 151       1        2  0.512150887
## 148       1        3  0.510218329
## 1         1        3  0.508318449
## 82        1        2  0.489769173
## 163       1        2  0.437462101
## 81        1        2  0.427372566
## 156       1        3  0.426629332
## 18        1        2  0.417340897
## 85        1        3  0.359755249
## 130       1        2  0.329831984
## 95        1        3  0.316681725
## 127       1        2  0.307096935
## 53        1        2  0.276816944
## 66        1        2  0.255913993
## 102       1        2  0.255305809
## 27        1        3  0.253042720
## 57        1        2  0.251771308
## 167       1        3  0.190829070
## 37        1        2  0.186000112
## 51        1        3  0.173434759
## 104       1        2  0.152656866
## 70        1        2  0.134865367
## 28        1        2  0.133105629
## 108       1        2  0.051361857
## 19        2        4  0.528942037
## 7         2        4  0.518088789
## 67        2        4  0.514707768
## 119       2        4  0.512935687
## 150       2        4  0.508348727
## 96        2        4  0.507736021
## 35        2        4  0.506362726
## 63        2        4  0.504763711
## 58        2        3  0.493543328
## 23        2        4  0.491054841
## 131       2        4  0.485976769
## 142       2        4  0.477688075
## 46        2        4  0.471169269
## 12        2        4  0.446878949
## 79        2        4  0.444901504
## 13        2        1  0.444207250
## 88        2        4  0.441206812
## 144       2        1  0.431802639
## 164       2        4  0.420155012
## 152       2        4  0.418704213
## 20        2        3  0.413856854
## 47        2        4  0.406911425
## 77        2        3  0.371547909
## 125       2        4  0.354392447
## 121       2        1  0.322420453
## 161       2        4  0.282875047
## 154       2        4  0.281685864
## 34        2        4  0.270471459
## 128       2        4  0.248943888
## 91        2        4  0.244172199
## 2         2        4  0.237168260
## 83        2        3  0.224870552
## 120       2        4  0.202183114
## 32        2        4  0.199927797
## 49        2        4  0.199923580
## 17        2        4  0.187066157
## 124       2        4  0.165426440
## 155       2        1  0.151618999
## 149       2        4  0.121328337
## 62        2        4  0.113927710
## 93        2        4  0.109155940
## 36        2        4  0.106073039
## 68        2        4  0.089596567
## 40        2        4  0.042799095
## 101       2        4  0.031967394
## 31        2        4  0.031761933
## 153       2        4  0.008250875
## 106       2        4 -0.006857422
## 25        2        4 -0.041821703
## 115       2        4 -0.093887377
## 100       3        1  0.593262045
## 80        3        2  0.589825189
## 162       3        2  0.576636176
## 73        3        1  0.560543529
## 143       3        1  0.557041235
## 110       3        2  0.534437402
## 71        3        2  0.531362798
## 166       3        1  0.514573221
## 15        3        2  0.513220443
## 126       3        2  0.479369696
## 141       3        2  0.475762150
## 6         3        2  0.461559171
## 129       3        2  0.456135972
## 10        3        2  0.449893176
## 72        3        2  0.432039101
## 56        3        1  0.401267399
## 39        3        1  0.399678249
## 4         3        1  0.394890933
## 157       3        2  0.386914518
## 60        3        1  0.369023054
## 24        3        2  0.366316964
## 3         3        2  0.366189544
## 116       3        2  0.355383152
## 38        3        1  0.305787600
## 147       3        2  0.303256422
## 90        3        2  0.240149850
## 64        3        1  0.126997163
## 165       3        2  0.087437798
## 133       3        1  0.076026755
## 103       3        2  0.075707995
## 158       3        2 -0.014315733
## 117       3        1 -0.057662968
## 84        3        2 -0.081458587
## 48        3        2 -0.170992190
## 61        4        2  0.664604449
## 9         4        2  0.663422747
## 111       4        2  0.662883801
## 59        4        2  0.661475343
## 54        4        2  0.656998268
## 123       4        2  0.655400432
## 87        4        2  0.650372453
## 55        4        2  0.647511305
## 145       4        2  0.646194566
## 76        4        2  0.642080336
## 159       4        2  0.635123042
## 140       4        2  0.633832481
## 146       4        2  0.625751867
## 8         4        2  0.624297638
## 75        4        2  0.613824284
## 16        4        2  0.609123080
## 134       4        2  0.601526632
## 136       4        2  0.596532298
## 43        4        2  0.595477079
## 160       4        2  0.592297674
## 78        4        2  0.542307549
## 30        4        2  0.520463836
## 44        4        2  0.502080385
## 74        4        2  0.499259795
## 139       4        2  0.467308375
## 14        4        2  0.462735870
## 92        4        2  0.430160679
## 45        4        2  0.426634184
## 21        4        2  0.417268859
## 5         4        2  0.413480394
## 112       4        2  0.408713411
## 42        4        2  0.399566498
## 118       4        2  0.374432372
## 99        4        2  0.371351098
## 105       4        2  0.343742819
## 97        4        2  0.335197885
## 122       4        2  0.321270067
## 135       4        2  0.318444526
## 52        4        2  0.261639573
## 132       4        2  0.242106452
## 69        4        2  0.184355145
## 11        4        2  0.176704993
## 86        4        2  0.137760516
## Average silhouette width per cluster:
## [1] 0.4173972 0.2969386 0.3428312 0.4938538
## Average silhouette width of total data set:
## [1] 0.3858371
## 
## Available components:
##  [1] "medoids"    "id.med"     "clustering" "objective"  "isolation" 
##  [6] "clusinfo"   "silinfo"    "diss"       "call"       "data"
class(c2)
## [1] "pam"       "partition"

In this plot we can see that choosing 4 clusters can be also a good option.

fviz_cluster(c2, geom = "point", ellipse.type = "convex")

However, to be sure, let’s look at some useful functions for deciding upon number of clusters. Firstly, let’s check elbow method. It is a bit confusing. We can see both 3 and 4 clusters look good.

opt_md<-Optimal_Clusters_Medoids(results_short_z, 10, 'euclidean', plot_clusters=TRUE)
##   
## Based on the plot give the number of clusters (greater than 1) that you consider optimal?
## Warning: The plot can not be created for the specified number of clusters. This
## means the output data do not fit in the figure (plot) margins.
opt_md <- pam(results_short_z, 10, metric = "euclidean", stand = FALSE)

Now, let’s check silhouette method to be sure. As we can see from the results, with the silhouette method, choosing 3 clusters is the best option.

opt_md2<-Optimal_Clusters_Medoids(results_short_z, max_clusters=10, 'euclidean', plot_clusters=TRUE, criterion="silhouette")
##   
## Based on the plot give the number of clusters (greater than 1) that you consider optimal?
## Warning: The plot can not be created for the specified number of clusters. This
## means the output data do not fit in the figure (plot) margins.

THANK YOU!