The World Happiness Report is an annual survey of the state of global happiness, which ranks all of world’s countries’ happiness based on few factors. The goal of this analysis is to find a proper way to divide countries into clusters based on only 6 variables: GDP per capita, Social support, Life expectancy, Freedom to make life choices, Generosity and Perception of corruption. Although there is no “curse of dimensionality” here, additional goal is to reduce the number of dimensions in dataset to better visualize the data.
Dataset consists of 149 observations - one for each country, measured in the year 2020.
df <- read.csv('world-happiness-report-2021.csv')
print(colnames(df))
## [1] "Country.name"
## [2] "Regional.indicator"
## [3] "Ladder.score"
## [4] "Standard.error.of.ladder.score"
## [5] "upperwhisker"
## [6] "lowerwhisker"
## [7] "Logged.GDP.per.capita"
## [8] "Social.support"
## [9] "Healthy.life.expectancy"
## [10] "Freedom.to.make.life.choices"
## [11] "Generosity"
## [12] "Perceptions.of.corruption"
## [13] "Ladder.score.in.Dystopia"
## [14] "Explained.by..Log.GDP.per.capita"
## [15] "Explained.by..Social.support"
## [16] "Explained.by..Healthy.life.expectancy"
## [17] "Explained.by..Freedom.to.make.life.choices"
## [18] "Explained.by..Generosity"
## [19] "Explained.by..Perceptions.of.corruption"
## [20] "Dystopia...residual"
We only need columns with values themselves, so we are going to work with variables: Country, Region, Logged GDP per capita, Social support, Life expectancy, Freedom to make life choices, Generosity and Perception of corruption. For aesthetic reasons, all dots will be changed to underscores. We save labels “Country” and “Region” into separate arrays and keep only numerical values inside the DataFrame.
df <- df[, c(1:2, 7:12)]
old_names <- colnames(df)
new_names <- gsub('.', '_', old_names, fixed = TRUE)
df <- setnames(df,
old = old_names,
new = new_names)
Regions <- df[2]
Countries <- df[1]
df <- df[-(1:2)]
t-SNE stands for t-distributed Stochastic Neighbor Embedding - a non-linear statistical method of dimensionality reduction. The biggest difference between t-SNE and PCA is that t-SNE is (contrary to PCA) a non-linear algorithm, tries to preserve the local structure of data and can handle outliers, but t-SNE is intendent mainly for visualisations. Creators of that technique clearly state1, that the performance of this algorithm is robust to changes in the perplexity parameter. Results of t-SNE are generally hard to reproduce and usually it is a good idea to make a few visualisations to compare. Below is the best plot I could make with given data:
#t-SNE
tsne <- Rtsne(df, dims = 2, perplexity = 35, max_iter = 700)
tsne_df <- data.table(tsne$Y)
tsne_df$Region <- Regions
tsne_df %>%
ggplot(aes(x = V1,
y = V2,
color = Region)) +
geom_point() +
theme(legend.position = 'left')
We see three scattered clusters: the first one consists of mostly Sub-Saharan countries, the bottom-right - of Western European countries, and the middle one - Middle East, Central and Eastern Europe, the Commonwealth of Independent States and others. That is our starting point and we can assume for now that countries can be best clustered into 3 groups.
pca <- prcomp(df, center = TRUE, scale. = TRUE)
Principal Component Analysis is a linear dimensionality reduction technique, which can be used to reduce number of variables for modelling purposes. It results in some components which are constructed from input variables. The optimal number of components is usually determined by their Explained Variance or Eigenvalues.
fviz_eig(pca)
fviz_eig(pca, choice='eigenvalue')
summary(pca)$importance
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.764773 1.134382 0.8384161 0.71998 0.5002993 0.3565696
## Proportion of Variance 0.519070 0.214470 0.1171600 0.08640 0.0417200 0.0211900
## Cumulative Proportion 0.519070 0.733540 0.8507000 0.93709 0.9788100 1.0000000
Judging by the % of explained variance, the first 3 Principal Components would be enough - they explain roughly 85% of the variance. Regarding the Eigenvalues, Kaiser rule would suggest only 2 PC - as only for those two Eigenvalue is higher than 1. That approach would be too general as there would be only 73% of variance explained. 3 components seem to be a reasonable number as we reduce the dimensionality by 50% by losing only 15% of the information.
The first component is highly correlated with GDP, Social support, Life Expectancy and Freedom to make life choices, the determinantes of a developed country. There is also a reasonable correlation (in the opposite direction) with Perception of corruption.
The second component is highly correlated with Generosity and there also exists a relationship with Perception of corruption - in the opposite direction as well.
The third component is almost solely correlated with Perception of corruption, but in the opposite direction than the first two components. That will probably be a determinant of an underdeveloped country.
Silhouette and Total Within Sum of Squares are the most popular techniques to find the optimal number of clusters for K-means algorithm.
fviz_nbclust(pca_df[, -4], kmeans, method = 's') +
labs(subtitle = 'Silhouette method')
fviz_nbclust(pca_df[, -4], kmeans, method = "wss") +
labs(subtitle = 'Within Sum of Squares')
Silhouette method suggests 3 clusters, WSS gives similar results. Having in mind that we have 149 observations, one for each country in the world, that partition seems reasonable. It will probably result in clusters covering more or less: highly developed developing and underdeveloped countries. Choosing smaller k means that 149 countries would have to be partitioned between two clusters (developed/undeveloped, rich/poor) and some information would be lost. In this analysis a k = 3 number of clusters is recommended. Not only in k-means algorithm, but as a general rule.
K-means is one of the most popular clustering technique. It assignes every observation into one of k clusters, depending on which cluster center (created artificially) is the nearest to that given observation.
kmeans_ <- eclust(pca_df[, -4] , k=3,
hc_metric = 'euclidean', graph = FALSE)
colors = brewer.pal(n = 3, 'Set2')
fviz_cluster(kmeans_, geom = c("point")) +
scale_color_manual(values = colors) +
scale_fill_manual(values = colors) +
ggtitle('K-means with 3 clusters')
plot_ly(x = pca_df$PC1,
y = pca_df$PC2,
z = pca_df$PC3,
type = 'scatter3d',
mode = 'markers',
color = as.factor(kmeans_$cluster))
fviz_silhouette(kmeans_, print.summary = F)
kmeans_df <- data.table(kmeans_$cluster)
kmeans_df$Region <- Regions
kmeans_df$Counter <- 1
kmeans_df %>%
group_by(V1, Region) %>%
summarise(Count = sum(Counter), .groups = 'keep') %>%
arrange(V1, desc(Count)) %>%
print(n = 22)
## # A tibble: 22 x 3
## # Groups: V1, Region [22]
## V1 Region Count
## <int> <chr> <dbl>
## 1 1 Western Europe 14
## 2 1 North America and ANZ 3
## 3 1 Central and Eastern Europe 1
## 4 1 Commonwealth of Independent States 1
## 5 1 East Asia 1
## 6 1 Middle East and North Africa 1
## 7 1 Southeast Asia 1
## 8 2 Latin America and Caribbean 19
## 9 2 Central and Eastern Europe 16
## 10 2 Middle East and North Africa 12
## 11 2 Commonwealth of Independent States 11
## 12 2 Western Europe 7
## 13 2 East Asia 5
## 14 2 Southeast Asia 5
## 15 2 Sub-Saharan Africa 5
## 16 2 South Asia 2
## 17 2 North America and ANZ 1
## 18 3 Sub-Saharan Africa 31
## 19 3 South Asia 5
## 20 3 Middle East and North Africa 4
## 21 3 Southeast Asia 3
## 22 3 Latin America and Caribbean 1
kmeans_df <- data.table(kmeans_$cluster)
kmeans_df$Region <- Regions
kmeans_df$Counter <- 1
kmeans_df %>%
group_by(V1) %>%
summarise(Count = sum(Counter), .groups = 'keep') %>%
arrange(V1, desc(Count))
## # A tibble: 3 x 2
## # Groups: V1 [3]
## V1 Count
## <int> <dbl>
## 1 1 22
## 2 2 83
## 3 3 44
df$cluster <- factor(kmeans_$cluster, levels = 1:3, labels = letters[1:3])
melted <- reshape2::melt(df, id.vars = 'cluster')
ggplot(melted, aes(x = cluster,
y = value,
color = cluster)) +
geom_boxplot() +
facet_wrap(~variable, scales = 'free_y') +
theme(legend.position="bottom")
At the first glance, 2-d plot shows overlapping clusters and it generally does not look very well. Situation changes when we switch to 3-d plot (as we used 3 PC) and Clusters silhouette plot proves that observations are generally clastered well - the average silhouete width is equal to 0.41, which is relatively high score (on a scale from 0 to 1, the bigger the better) and there are only a few observations with negative silhouette values - meaning, that they should belong to another claster. In our situation, some countries classified to the 2nd claster should belong to the 3rd one and the other way around, because there are no such cases in the 1st cluster.
Looking at Countries classification and Variable distribution, initial claims are confirmed - clusters represent countries’ development level: 1. 1st cluster consists of mostly Western European countries and those belonging to the Anglosphere - with high GDP per capita, Social support, Life expectancy, Freedom of making life choices and bad Perception of corruption.
2nd cluster is mostly developing countries - Latin America, Central and Eastern Europe, Middle East, Commonwealth of Independent States with moderate values.
3rd cluster consists of poor countries, it’s almost entirely made out of Sub-Saharan African countries, low values in every variables, except for Perception of corruption and Generosity (a few outliers).
Even though these results make sense but for scientific purposes two other algorithms will be performed.
Hierarchical clustering is based on the dissimilarities matrix. Algorithm starts by treating every observation as a separate cluster. Then, iteratively, most similar clusters are combined into one, resulting in a tree (dendrogram), which can be cut at any height, resulting in a given number of clusters.
h_df <- pca$x[, 1:3]
rownames(h_df) <- Countries$Country_name
distMatrix <- dist(h_df)
groups <- hclust(distMatrix, method = 'ward.D')
fviz_dend(groups, cex = 0.5, k = 3, rect = TRUE,
k_colors = colors)
fviz_dend(groups, cex = 0.5, k = 3, rect = TRUE,
k_colors = colors,
label_cols = kmeans_df$V1[groups$order])
trees <- cutree(groups, k = 3)
dendo_df <- cbind(trees, Regions)
dendo_df$Counter <- 1
dendo_df %>%
group_by(trees, Regional_indicator) %>%
summarise(group = sum(Counter)) %>%
arrange(trees, desc(group)) %>%
`colnames<-`(c('Cluster', 'Region', 'Count')) %>%
print(n = 18)
## # A tibble: 18 x 3
## # Groups: Cluster [3]
## Cluster Region Count
## <int> <chr> <dbl>
## 1 1 Western Europe 11
## 2 1 North America and ANZ 3
## 3 1 Southeast Asia 1
## 4 2 Latin America and Caribbean 18
## 5 2 Central and Eastern Europe 17
## 6 2 Commonwealth of Independent States 12
## 7 2 Western Europe 10
## 8 2 Middle East and North Africa 7
## 9 2 Southeast Asia 7
## 10 2 East Asia 6
## 11 2 South Asia 3
## 12 2 Sub-Saharan Africa 2
## 13 2 North America and ANZ 1
## 14 3 Sub-Saharan Africa 34
## 15 3 Middle East and North Africa 10
## 16 3 South Asia 4
## 17 3 Latin America and Caribbean 2
## 18 3 Southeast Asia 1
dendo_df %>%
group_by(trees) %>%
summarise(group = sum(Counter)) %>%
arrange(trees, desc(group)) %>%
`colnames<-`(c('Cluster', 'Count'))
## # A tibble: 3 x 2
## Cluster Count
## <int> <dbl>
## 1 1 15
## 2 2 83
## 3 3 51
df$hclust <- factor(dendo_df$trees, levels = 1:3, labels = letters[1:3])
melted <- reshape2::melt(df[-7], id.vars = 'hclust')
ggplot(melted, aes(x = hclust,
y = value,
color = hclust)) +
geom_boxplot() +
facet_wrap(~variable, scales = 'free_y') +
theme(legend.position="bottom")
Results are similar to K-means, but the 1st cluster is smaller and some of them belong to the 3rd cluster now (compare two dendrograms). Because of less observations in the 1st cluster, the boxplots are respectively narrower - but every variable distributions stay relatively similar to the K-Means scenario.
The biggest advantage is this clustering method is fact, that each data point can belong to more than one cluster. Those are observationed classified as “unclear”, with lower than 50% probability of belonging to a specified cluster.
fuzzy_df <- data.frame(fuzzy_knn$clus)
fuzzy_df$Regions <- Regions$Regional_indicator
fuzzy_df$Counter <- 1
fuzzy_df %>%
group_by(Cluster, Regions) %>%
summarise(group = sum(Counter)) %>%
arrange(Cluster, desc(group)) %>%
print(n = 22)
## # A tibble: 22 x 3
## # Groups: Cluster [3]
## Cluster Regions group
## <dbl> <chr> <dbl>
## 1 1 Western Europe 15
## 2 1 North America and ANZ 4
## 3 1 Middle East and North Africa 2
## 4 1 Central and Eastern Europe 1
## 5 1 Commonwealth of Independent States 1
## 6 1 East Asia 1
## 7 1 Latin America and Caribbean 1
## 8 1 Southeast Asia 1
## 9 2 Sub-Saharan Africa 31
## 10 2 South Asia 5
## 11 2 Middle East and North Africa 4
## 12 2 Southeast Asia 3
## 13 2 Latin America and Caribbean 1
## 14 3 Latin America and Caribbean 18
## 15 3 Central and Eastern Europe 16
## 16 3 Commonwealth of Independent States 11
## 17 3 Middle East and North Africa 11
## 18 3 Western Europe 6
## 19 3 East Asia 5
## 20 3 Southeast Asia 5
## 21 3 Sub-Saharan Africa 5
## 22 3 South Asia 2
fuzzy_df %>%
group_by(Cluster) %>%
summarise(group = sum(Counter)) %>%
arrange(Cluster)
## # A tibble: 3 x 2
## Cluster group
## <dbl> <dbl>
## 1 1 26
## 2 2 44
## 3 3 79
df$fclust <- factor(fuzzy_df$Cluster, levels = c(1, 3, 2), labels = letters[1:3])
melted <- reshape2::melt(df[c(-7, -8)], id.vars = 'fclust')
ggplot(melted, aes(x = fclust,
y = value,
color = fclust)) +
geom_boxplot() +
facet_wrap(~variable, scales = 'free_y') +
theme(legend.position="bottom")
fuzzy_df$Country <- Countries$Country_name
fuzzy_df$fclust <- factor(fuzzy_df$Cluster, levels = c(1, 3, 2), labels = letters[1:3])
fuzzy_df[fuzzy_df$Membership.degree < 0.5, c('Country',
'Membership.degree', 'fclust')] %>%
arrange(desc(Membership.degree))
## Country Membership.degree fclust
## Obj 124 Namibia 0.4984663 b
## Obj 125 Palestinian Territories 0.4956267 b
## Obj 109 Algeria 0.4951335 b
## Obj 100 Laos 0.4870567 c
## Obj 78 Tajikistan 0.4854723 b
## Obj 19 United States 0.4833566 a
## Obj 101 Bangladesh 0.4800726 c
## Obj 74 North Cyprus 0.4740861 a
## Obj 97 Turkmenistan 0.4512418 b
## Obj 54 Thailand 0.4468175 b
## Obj 33 Kosovo 0.4285203 b
## Obj 114 Cambodia 0.4256271 b
## Obj 126 Myanmar 0.4231886 c
## Obj 147 Rwanda 0.4213193 c
## Obj 82 Indonesia 0.3734274 c
summary.fclust(fuzzy_knn)
##
## Fuzzy clustering object of class 'fclust'
##
## Number of objects:
## 149
##
## Number of clusters:
## 3
##
## Cluster sizes:
## Clus 1 Clus 2 Clus 3
## 26 44 79
##
##
## Clustering index values:
## SIL.F k=2 SIL.F k=3 SIL.F k=4 SIL.F k=5 SIL.F k=6
## 0.6486382 0.6657629 0.5805586 0.6176952 0.5940933
##
##
## Closest hard clustering partition:
## Obj 1 Obj 2 Obj 3 Obj 4 Obj 5 Obj 6 Obj 7 Obj 8 Obj 9 Obj 10
## 1 1 1 1 1 1 1 1 1 1
## Obj 11 Obj 12 Obj 13 Obj 14 Obj 15 Obj 16 Obj 17 Obj 18 Obj 19 Obj 20
## 1 3 1 1 1 3 1 3 1 3
## Obj 21 Obj 22 Obj 23 Obj 24 Obj 25 Obj 26 Obj 27 Obj 28 Obj 29 Obj 30
## 1 1 1 3 1 3 3 3 3 3
## Obj 31 Obj 32 Obj 33 Obj 34 Obj 35 Obj 36 Obj 37 Obj 38 Obj 39 Obj 40
## 1 1 3 3 3 3 3 3 3 1
## Obj 41 Obj 42 Obj 43 Obj 44 Obj 45 Obj 46 Obj 47 Obj 48 Obj 49 Obj 50
## 3 1 3 3 3 3 3 3 3 3
## Obj 51 Obj 52 Obj 53 Obj 54 Obj 55 Obj 56 Obj 57 Obj 58 Obj 59 Obj 60
## 3 3 3 3 3 3 3 3 3 3
## Obj 61 Obj 62 Obj 63 Obj 64 Obj 65 Obj 66 Obj 67 Obj 68 Obj 69 Obj 70
## 3 3 3 3 3 3 3 3 3 3
## Obj 71 Obj 72 Obj 73 Obj 74 Obj 75 Obj 76 Obj 77 Obj 78 Obj 79 Obj 80
## 3 3 3 1 3 3 1 3 3 3
## Obj 81 Obj 82 Obj 83 Obj 84 Obj 85 Obj 86 Obj 87 Obj 88 Obj 89 Obj 90
## 3 2 2 3 2 3 2 3 3 3
## Obj 91 Obj 92 Obj 93 Obj 94 Obj 95 Obj 96 Obj 97 Obj 98 Obj 99 Obj 100
## 2 2 3 3 2 2 3 2 2 2
## Obj 101 Obj 102 Obj 103 Obj 104 Obj 105 Obj 106 Obj 107 Obj 108 Obj 109 Obj 110
## 2 2 3 3 2 2 3 3 3 3
## Obj 111 Obj 112 Obj 113 Obj 114 Obj 115 Obj 116 Obj 117 Obj 118 Obj 119 Obj 120
## 2 3 2 3 2 2 2 2 2 2
## Obj 121 Obj 122 Obj 123 Obj 124 Obj 125 Obj 126 Obj 127 Obj 128 Obj 129 Obj 130
## 2 3 3 3 3 2 3 2 3 2
## Obj 131 Obj 132 Obj 133 Obj 134 Obj 135 Obj 136 Obj 137 Obj 138 Obj 139 Obj 140
## 2 3 2 2 2 2 2 2 2 2
## Obj 141 Obj 142 Obj 143 Obj 144 Obj 145 Obj 146 Obj 147 Obj 148 Obj 149
## 2 2 2 2 2 3 2 2 2
##
## Cluster memberships:
## Clus 1
## [1] "Obj 1" "Obj 2" "Obj 3" "Obj 4" "Obj 5" "Obj 6" "Obj 7" "Obj 8"
## [9] "Obj 9" "Obj 10" "Obj 11" "Obj 13" "Obj 14" "Obj 15" "Obj 17" "Obj 19"
## [17] "Obj 21" "Obj 22" "Obj 23" "Obj 25" "Obj 31" "Obj 32" "Obj 40" "Obj 42"
## [25] "Obj 74" "Obj 77"
## Clus 2
## [1] "Obj 82" "Obj 83" "Obj 85" "Obj 87" "Obj 91" "Obj 92" "Obj 95"
## [8] "Obj 96" "Obj 98" "Obj 99" "Obj 100" "Obj 101" "Obj 102" "Obj 105"
## [15] "Obj 106" "Obj 111" "Obj 113" "Obj 115" "Obj 116" "Obj 117" "Obj 118"
## [22] "Obj 119" "Obj 120" "Obj 121" "Obj 126" "Obj 128" "Obj 130" "Obj 131"
## [29] "Obj 133" "Obj 134" "Obj 135" "Obj 136" "Obj 137" "Obj 138" "Obj 139"
## [36] "Obj 140" "Obj 141" "Obj 142" "Obj 143" "Obj 144" "Obj 145" "Obj 147"
## [43] "Obj 148" "Obj 149"
## Clus 3 (First 50 objects)
## [1] "Obj 12" "Obj 16" "Obj 18" "Obj 20" "Obj 24" "Obj 26" "Obj 27" "Obj 28"
## [9] "Obj 29" "Obj 30" "Obj 33" "Obj 34" "Obj 35" "Obj 36" "Obj 37" "Obj 38"
## [17] "Obj 39" "Obj 41" "Obj 43" "Obj 44" "Obj 45" "Obj 46" "Obj 47" "Obj 48"
## [25] "Obj 49" "Obj 50" "Obj 51" "Obj 52" "Obj 53" "Obj 54" "Obj 55" "Obj 56"
## [33] "Obj 57" "Obj 58" "Obj 59" "Obj 60" "Obj 61" "Obj 62" "Obj 63" "Obj 64"
## [41] "Obj 65" "Obj 66" "Obj 67" "Obj 68" "Obj 69" "Obj 70" "Obj 71" "Obj 72"
## [49] "Obj 73" "Obj 75"
##
## Number of objects with unclear assignment (maximal membership degree <0.5):
## 15
##
## Objects with unclear assignment:
## [1] "Obj 19" "Obj 33" "Obj 54" "Obj 74" "Obj 78" "Obj 82" "Obj 97"
## [8] "Obj 100" "Obj 101" "Obj 109" "Obj 114" "Obj 124" "Obj 125" "Obj 126"
## [15] "Obj 147"
##
## Cluster sizes (without unclear assignments):
## Clus 1 Clus 2 Clus 3 No clus
## 24 39 71 15
##
## Membership degree matrix (rounded):
## Clus 1 Clus 2 Clus 3
## Obj 1 0.78 0.07 0.15
## Obj 2 0.82 0.06 0.12
## Obj 3 0.88 0.04 0.08
## Obj 4 0.69 0.08 0.24
## Obj 5 0.90 0.03 0.07
## Obj 6 0.86 0.04 0.09
## Obj 7 0.86 0.05 0.10
## Obj 8 0.89 0.03 0.08
## Obj 9 0.86 0.05 0.10
## Obj 10 0.97 0.01 0.02
## Obj 11 0.93 0.02 0.05
## Obj 12 0.31 0.07 0.62
## Obj 13 0.97 0.01 0.03
## Obj 14 0.97 0.01 0.02
## Obj 15 0.95 0.01 0.04
## Obj 16 0.17 0.05 0.77
## Obj 17 0.88 0.03 0.08
## Obj 18 0.19 0.08 0.73
## Obj 19 0.48 0.07 0.44
## Obj 20 0.32 0.08 0.60
## Obj 21 0.53 0.07 0.39
## Obj 22 0.54 0.08 0.38
## Obj 23 0.73 0.06 0.21
## Obj 24 0.16 0.05 0.79
## Obj 25 0.85 0.03 0.12
## Obj 26 0.25 0.06 0.69
## Obj 27 0.22 0.06 0.71
## Obj 28 0.09 0.06 0.86
## Obj 29 0.35 0.08 0.57
## Obj 30 0.12 0.12 0.76
## Obj 31 0.58 0.06 0.37
## Obj 32 0.72 0.10 0.18
## Obj 33 0.23 0.34 0.43
## Obj 34 0.10 0.07 0.83
## Obj 35 0.01 0.01 0.98
## Obj 36 0.02 0.01 0.97
## Obj 37 0.02 0.02 0.96
## Obj 38 0.12 0.07 0.81
## Obj 39 0.03 0.02 0.95
## Obj 40 0.77 0.04 0.19
## Obj 41 0.11 0.05 0.84
## Obj 42 0.53 0.20 0.27
## Obj 43 0.02 0.01 0.97
## Obj 44 0.16 0.05 0.79
## Obj 45 0.12 0.04 0.84
## Obj 46 0.07 0.06 0.87
## Obj 47 0.13 0.04 0.83
## Obj 48 0.03 0.03 0.94
## Obj 49 0.15 0.15 0.70
## Obj 50 0.08 0.04 0.88
## Obj 51 0.07 0.05 0.87
## Obj 52 0.00 0.00 0.99
## Obj 53 0.11 0.08 0.82
## Obj 54 0.32 0.23 0.45
## Obj 55 0.21 0.16 0.63
## Obj 56 0.34 0.11 0.55
## Obj 57 0.06 0.04 0.90
## Obj 58 0.17 0.08 0.75
## Obj 59 0.17 0.26 0.58
## Obj 60 0.10 0.08 0.82
## Obj 61 0.10 0.11 0.79
## Obj 62 0.10 0.06 0.84
## Obj 63 0.03 0.03 0.95
## Obj 64 0.11 0.17 0.72
## Obj 65 0.04 0.06 0.90
## Obj 66 0.00 0.00 0.99
## Obj 67 0.22 0.26 0.52
## Obj 68 0.17 0.20 0.63
## Obj 69 0.06 0.11 0.83
## Obj 70 0.12 0.21 0.68
## Obj 71 0.14 0.11 0.75
## Obj 72 0.01 0.01 0.99
## Obj 73 0.06 0.03 0.91
## Obj 74 0.47 0.06 0.46
## Obj 75 0.14 0.12 0.74
## Obj 76 0.04 0.04 0.93
## Obj 77 0.84 0.04 0.12
## Obj 78 0.24 0.27 0.49
## Obj 79 0.06 0.04 0.90
## Obj 80 0.11 0.14 0.75
## Obj 81 0.26 0.15 0.59
## Obj 82 0.30 0.37 0.33
## Obj 83 0.03 0.90 0.07
## Obj 84 0.06 0.03 0.91
## Obj 85 0.01 0.98 0.02
## Obj 86 0.18 0.12 0.70
## Obj 87 0.13 0.61 0.27
## Obj 88 0.08 0.06 0.86
## Obj 89 0.18 0.07 0.75
## Obj 90 0.31 0.16 0.53
## Obj 91 0.01 0.97 0.02
## Obj 92 0.03 0.90 0.08
## Obj 93 0.07 0.17 0.77
## Obj 94 0.08 0.16 0.76
## Obj 95 0.06 0.78 0.16
## Obj 96 0.03 0.91 0.06
## Obj 97 0.31 0.24 0.45
## Obj 98 0.13 0.68 0.19
## Obj 99 0.07 0.82 0.11
## Obj 100 0.21 0.49 0.30
## Obj 101 0.15 0.48 0.37
## Obj 102 0.02 0.95 0.03
## Obj 103 0.08 0.29 0.63
## Obj 104 0.11 0.17 0.72
## Obj 105 0.01 0.98 0.02
## Obj 106 0.11 0.52 0.36
## Obj 107 0.10 0.21 0.69
## Obj 108 0.17 0.31 0.52
## Obj 109 0.11 0.40 0.50
## Obj 110 0.07 0.13 0.80
## Obj 111 0.07 0.58 0.34
## Obj 112 0.10 0.28 0.62
## Obj 113 0.01 0.96 0.03
## Obj 114 0.15 0.42 0.43
## Obj 115 0.06 0.83 0.11
## Obj 116 0.02 0.93 0.05
## Obj 117 0.02 0.95 0.04
## Obj 118 0.09 0.65 0.25
## Obj 119 0.03 0.89 0.08
## Obj 120 0.01 0.96 0.03
## Obj 121 0.11 0.69 0.20
## Obj 122 0.11 0.36 0.53
## Obj 123 0.10 0.25 0.66
## Obj 124 0.08 0.42 0.50
## Obj 125 0.09 0.41 0.50
## Obj 126 0.29 0.42 0.28
## Obj 127 0.12 0.19 0.70
## Obj 128 0.07 0.81 0.12
## Obj 129 0.14 0.15 0.70
## Obj 130 0.08 0.67 0.25
## Obj 131 0.03 0.92 0.06
## Obj 132 0.10 0.30 0.61
## Obj 133 0.03 0.91 0.06
## Obj 134 0.07 0.68 0.24
## Obj 135 0.04 0.88 0.09
## Obj 136 0.05 0.86 0.09
## Obj 137 0.01 0.95 0.04
## Obj 138 0.04 0.89 0.07
## Obj 139 0.08 0.76 0.16
## Obj 140 0.11 0.72 0.17
## Obj 141 0.07 0.70 0.23
## Obj 142 0.15 0.66 0.19
## Obj 143 0.12 0.72 0.17
## Obj 144 0.04 0.88 0.08
## Obj 145 0.05 0.81 0.14
## Obj 146 0.09 0.17 0.75
## Obj 147 0.32 0.42 0.26
## Obj 148 0.02 0.91 0.07
## Obj 149 0.13 0.63 0.24
##
## Cluster summary:
## Cl.size Min.memb.deg. Max.memb.deg. Av.memb.deg. N.uncl.assignm.
## Clus 1 26 0.47 0.97 0.78 2
## Clus 2 44 0.37 0.98 0.77 5
## Clus 3 79 0.43 0.99 0.74 8
##
## Euclidean distance matrix for the prototypes (rounded):
## Clus 1 Clus 2
## Clus 2 4.47
## Clus 3 2.57 2.87
##
## Available components:
## [1] "U" "H" "F" "clus" "medoid" "value"
## [7] "criterion" "iter" "k" "m" "ent" "b"
## [13] "vp" "delta" "stand" "Xca" "X" "D"
## [19] "call"
##
##
Cluster sizes are similar to K-means, but here we have a little larger 1st cluster containing the most developed countires. Variable distribution stays practicallyt the same, but what’s interesting in Fuzzy clustering - here Unclear cluster assignments can be observed. For given FKM algorithms, there are 15 of observations with Membership degree < 0.5, so below 50% chances that observation belongs to a given cluster. 14 of those observations have Membership degree higher than 42% (including United States) and only one has approx. 37% (Indonesia).
Every algorithm resulted in similar results. so the conducted analysis proved that all of world’s countries can be divided into 3 groups, based on their development: 1. Highly developed countries - mostly West Europe and North America. 2. Developing countries - mostly North African/Central and Eastern Europe. 3. Underdeveloped countries - almost exclusively Sub-Saharan Africa.