Introduction

The categorization of countries based on social class, economy and health is what determines the overall development of a country. HELP international is an NGO committed to fighting poverty, providing facilities and aid in underdeveloped countries after natural disasters. The organization raised $10 million US dollars. The CEO of the NGO wants the funds to be used strategically and effectively. The CEO will decide which countries will receive the aid. A data scientist’s job is to categorize countries based on social, economic and overall health factors. Then suggest which countries should get the aid. Some of the libraries we need include:

library(dplyr)
library(tidyverse)
library(factoextra)
library(FactoMineR)
library(plotly)

Data Preprocesing

Import Data

country <- read.csv("Country-data-f.csv")

Data Inspection

We use head to see the top data

head(country)

We first separate the data according to the name

country_separate <- separate(country, col = "country.child_mort.exports.health.imports.income.inflation.life_expec.total_fer.gdpp", into = c("country", "child_mort", "exports", "health", "imports", "income", "inflation", "life_expec", "total_fer", "gdpp"), sep = ";" )
country_separate

Change data types

country_separate$country <- as.factor(country_separate$country)

# meng-assign nilai dari kolom country menjadi rownames
rownames(country_separate) <- country_separate$country

country_clean <- country_separate %>% 
  mutate_at(vars(child_mort,exports,health,imports,income,inflation,life_expec,total_fer,gdpp), as.numeric) %>% 
  select(-country)

glimpse(country_clean)

#> Rows: 167
#> Columns: 9
#> $ child_mort <dbl> 90.2, 16.6, 27.3, 119.0, 10.3, 14.5, 18.1, 4.8, 4.3, 39.2, …
#> $ exports    <dbl> 10.0, 28.0, 38.4, 62.3, 45.5, 18.9, 20.8, 19.8, 51.3, 54.3,…
#> $ health     <dbl> 7.58, 6.55, 4.17, 2.85, 6.03, 8.10, 4.40, 8.73, 11.00, 5.88…
#> $ imports    <dbl> 44.9, 48.6, 31.4, 42.9, 58.9, 16.0, 45.3, 20.9, 47.8, 20.7,…
#> $ income     <dbl> 1610, 9930, 12900, 5900, 19100, 18700, 6700, 41400, 43200, …
#> $ inflation  <dbl> 9.440, 4.490, 16.100, 22.400, 1.440, 20.900, 7.770, 1.160, …
#> $ life_expec <dbl> 56.2, 76.3, 76.5, 60.1, 76.8, 75.8, 73.3, 82.0, 80.5, 69.1,…
#> $ total_fer  <dbl> 5.82, 1.65, 2.89, 6.16, 2.13, 2.37, 1.69, 1.93, 1.44, 1.92,…
#> $ gdpp       <dbl> 553, 4090, 4460, 3530, 12200, 10300, 3220, 51900, 46900, 58…

From the glimps function above, we can see that the data has 167 rows and 10 columns. Here is the explanation of the variables:

country : Name of the country
child_mort : Death of children under 5 years of age per 1000 live births
exports : Exports of goods and services per capita. Given as %age of the GDP per capita
health : Total health spending per capita. Given as %age of GDP per capita
imports : Imports of goods and services per capita. Given as %age of the GDP per capita
income : Net income per person
inflation : The measurement of the annual growth rate of the Total GDP
life_expec : The average number of years a new born child would live if the current mortality patterns are to remain the same
total_fer : The number of children that would be born to each woman if the current age-fertility rates remain the same
gdpp : The GDP per capita. Calculated as the Total GDP divided by the total population.

Check Missing values

anyNA(country_clean)

#> [1] FALSE

No missing values

Exploratory Data Analysis

By using the summary we will have the following information

summary(country_clean)

#>    child_mort        exports            health          imports        
#>  Min.   :  2.60   Min.   :  0.109   Min.   : 1.810   Min.   :  0.0659  
#>  1st Qu.:  8.25   1st Qu.: 23.800   1st Qu.: 4.920   1st Qu.: 30.2000  
#>  Median : 19.30   Median : 35.000   Median : 6.320   Median : 43.3000  
#>  Mean   : 38.27   Mean   : 41.109   Mean   : 6.816   Mean   : 46.8902  
#>  3rd Qu.: 62.10   3rd Qu.: 51.350   3rd Qu.: 8.600   3rd Qu.: 58.7500  
#>  Max.   :208.00   Max.   :200.000   Max.   :17.900   Max.   :174.0000  
#>      income         inflation         life_expec      total_fer    
#>  Min.   :   609   Min.   : -4.210   Min.   :32.10   Min.   :1.150  
#>  1st Qu.:  3355   1st Qu.:  1.810   1st Qu.:65.30   1st Qu.:1.795  
#>  Median :  9960   Median :  5.390   Median :73.10   Median :2.410  
#>  Mean   : 17145   Mean   :  7.782   Mean   :70.56   Mean   :2.948  
#>  3rd Qu.: 22800   3rd Qu.: 10.750   3rd Qu.:76.80   3rd Qu.:3.880  
#>  Max.   :125000   Max.   :104.000   Max.   :82.80   Max.   :7.490  
#>       gdpp       
#>  Min.   :   231  
#>  1st Qu.:  1330  
#>  Median :  4660  
#>  Mean   : 12964  
#>  3rd Qu.: 14050  
#>  Max.   :105000

From the data above, each dimension has a different scale.

Data Preprocessing

Because our data has different scales, we will scale first

country_scale <- scale(country_clean)
summary(country_scale)

#>    child_mort         exports            health           imports       
#>  Min.   :-0.8845   Min.   :-1.4957   Min.   :-1.8223   Min.   :-1.9341  
#>  1st Qu.:-0.7444   1st Qu.:-0.6314   1st Qu.:-0.6901   1st Qu.:-0.6894  
#>  Median :-0.4704   Median :-0.2229   Median :-0.1805   Median :-0.1483  
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
#>  3rd Qu.: 0.5909   3rd Qu.: 0.3736   3rd Qu.: 0.6496   3rd Qu.: 0.4899  
#>  Max.   : 4.2086   Max.   : 5.7964   Max.   : 4.0353   Max.   : 5.2504  
#>      income          inflation         life_expec        total_fer      
#>  Min.   :-0.8577   Min.   :-1.1344   Min.   :-4.3242   Min.   :-1.1877  
#>  1st Qu.:-0.7153   1st Qu.:-0.5649   1st Qu.:-0.5910   1st Qu.:-0.7616  
#>  Median :-0.3727   Median :-0.2263   Median : 0.2861   Median :-0.3554  
#>  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
#>  3rd Qu.: 0.2934   3rd Qu.: 0.2808   3rd Qu.: 0.7021   3rd Qu.: 0.6157  
#>  Max.   : 5.5947   Max.   : 9.1023   Max.   : 1.3768   Max.   : 3.0003  
#>       gdpp         
#>  Min.   :-0.69471  
#>  1st Qu.:-0.63475  
#>  Median :-0.45307  
#>  Mean   : 0.00000  
#>  3rd Qu.: 0.05924  
#>  Max.   : 5.02140

Clustering

K Optimum

With the factoextra library, we will use the elbow method to determine it by noting that in terms of business, it does not determine what the optimal K is

fviz_nbclust(
  x = country_scale, # data
  FUNcluster = kmeans, 
  method = "wss" # method
)

From the plot above we can see that K starts to slope at number 6, so we can conclude that the optimum K is 6.

From the country_scale data we will create a clustering with an optimum K of 6

country_cluster <- kmeans(x = country_scale,
                       centers = 6)

Number of observations in each cluster

country_cluster$size

#> [1] 37 48 31 21 27  3

From the data above, we can see that there is 1 cluster that is only filled with 3 countries, namely cluster 6

Location of cluster center/centroid

country_cluster$centers

#>   child_mort    exports       health     imports     income   inflation
#> 1 -0.2509045 -0.3371408 -0.606008744 -0.72003193 -0.1908814  0.42283666
#> 2 -0.5832573  0.3691122  0.038430150  0.54839641 -0.1135088 -0.37881810
#> 3  0.6551993 -0.6036538  0.223878019  0.07169025 -0.7068393  0.03910137
#> 4  1.9864373 -0.2369286 -0.540872412 -0.26533075 -0.6882963  0.72971904
#> 5 -0.8224879  0.1364063  0.926673853 -0.36798367  1.5400218 -0.46258692
#> 6 -0.8464575  4.9208731 -0.008138555  4.53442030  2.4322274 -0.50269428
#>   life_expec  total_fer       gdpp
#> 1  0.2222766 -0.3291179 -0.3547909
#> 2  0.4130672 -0.6588108 -0.2150469
#> 3 -0.9446404  1.0175549 -0.6054523
#> 4 -1.5446278  1.6738414 -0.6070734
#> 5  1.1111163 -0.7328648  1.7654312
#> 6  1.2231457 -1.0357477  2.4334786

To see which countries are included in each cluster

country_cluster$cluster

#>                    Afghanistan                        Albania 
#>                              3                              2 
#>                        Algeria                         Angola 
#>                              1                              4 
#>            Antigua and Barbuda                      Argentina 
#>                              2                              1 
#>                        Armenia                      Australia 
#>                              1                              5 
#>                        Austria                     Azerbaijan 
#>                              5                              1 
#>                        Bahamas                        Bahrain 
#>                              2                              2 
#>                     Bangladesh                       Barbados 
#>                              1                              2 
#>                        Belarus                        Belgium 
#>                              2                              5 
#>                         Belize                          Benin 
#>                              2                              4 
#>                         Bhutan                        Bolivia 
#>                              2                              1 
#>       Bosnia   and Herzegovina                       Botswana 
#>                              2                              3 
#>                         Brazil                         Brunei 
#>                              1                              5 
#>                       Bulgaria                   Burkina Faso 
#>                              2                              4 
#>                        Burundi                       Cambodia 
#>                              3                              2 
#>                       Cameroon                         Canada 
#>                              4                              5 
#>                     Cape Verde       Central African Republic 
#>                              2                              4 
#>                           Chad                          Chile 
#>                              4                              1 
#>                          China                       Colombia 
#>                              1                              1 
#>                        Comoros                Congo Dem. Rep. 
#>                              3                              4 
#>                     Congo Rep.                     Costa Rica 
#>                              4                              2 
#>                  Cote d'Ivoire                        Croatia 
#>                              4                              2 
#>                         Cyprus                 Czech Republic 
#>                              2                              2 
#>                        Denmark             Dominican Republic 
#>                              5                              1 
#>                        Ecuador                          Egypt 
#>                              1                              1 
#>                    El Salvador              Equatorial Guinea 
#>                              2                              4 
#>                        Eritrea                        Estonia 
#>                              3                              2 
#>                           Fiji                        Finland 
#>                              2                              5 
#>                         France                          Gabon 
#>                              5                              1 
#>                         Gambia                        Georgia 
#>                              3                              2 
#>                        Germany                          Ghana 
#>                              5                              3 
#>                         Greece                        Grenada 
#>                              5                              2 
#>                      Guatemala                         Guinea 
#>                              1                              4 
#>                  Guinea-Bissau                         Guyana 
#>                              3                              2 
#>                          Haiti                        Hungary 
#>                              4                              2 
#>                        Iceland                          India 
#>                              5                              1 
#>                      Indonesia                           Iran 
#>                              1                              1 
#>                           Iraq                        Ireland 
#>                              3                              5 
#>                         Israel                          Italy 
#>                              5                              5 
#>                        Jamaica                          Japan 
#>                              1                              5 
#>                         Jordan                     Kazakhstan 
#>                              2                              1 
#>                          Kenya                       Kiribati 
#>                              3                              3 
#>                         Kuwait                Kyrgyz Republic 
#>                              5                              2 
#>                            Lao                         Latvia 
#>                              3                              2 
#>                        Lebanon                        Lesotho 
#>                              2                              3 
#>                        Liberia                          Libya 
#>                              3                              1 
#>                      Lithuania                     Luxembourg 
#>                              2                              6 
#>                  Macedonia FYR                     Madagascar 
#>                              2                              3 
#>                         Malawi                       Malaysia 
#>                              4                              2 
#>                       Maldives                           Mali 
#>                              2                              4 
#>                          Malta                     Mauritania 
#>                              6                              4 
#>                      Mauritius           Micronesia Fed. Sts. 
#>                              2                              3 
#>                        Moldova                       Mongolia 
#>                              2                              1 
#>                     Montenegro                        Morocco 
#>                              2                              1 
#>                     Mozambique                        Myanmar 
#>                              4                              1 
#>                        Namibia                          Nepal 
#>                              3                              1 
#>                    Netherlands                    New Zealand 
#>                              5                              5 
#>                          Niger                        Nigeria 
#>                              4                              4 
#>                         Norway                           Oman 
#>                              5                              1 
#>                       Pakistan                         Panama 
#>                              4                              2 
#>                       Paraguay                           Peru 
#>                              2                              1 
#>                    Philippines                         Poland 
#>                              1                              2 
#>                       Portugal                          Qatar 
#>                              5                              5 
#>                        Romania                         Russia 
#>                              1                              1 
#>                         Rwanda                          Samoa 
#>                              3                              3 
#>                   Saudi Arabia                        Senegal 
#>                              1                              3 
#>                         Serbia                     Seychelles 
#>                              2                              2 
#>                   Sierra Leone                      Singapore 
#>                              4                              6 
#>                Slovak Republic                       Slovenia 
#>                              2                              2 
#>                Solomon Islands                   South Africa 
#>                              3                              3 
#>                    South Korea                          Spain 
#>                              2                              5 
#>                      Sri Lanka St. Vincent and the Grenadines 
#>                              1                              2 
#>                          Sudan                       Suriname 
#>                              3                              2 
#>                         Sweden                    Switzerland 
#>                              5                              5 
#>                     Tajikistan                       Tanzania 
#>                              3                              3 
#>                       Thailand                    Timor-Leste 
#>                              2                              3 
#>                           Togo                          Tonga 
#>                              3                              3 
#>                        Tunisia                         Turkey 
#>                              2                              1 
#>                   Turkmenistan                         Uganda 
#>                              1                              3 
#>                        Ukraine           United Arab Emirates 
#>                              2                              5 
#>                 United Kingdom                  United States 
#>                              5                              5 
#>                        Uruguay                     Uzbekistan 
#>                              1                              1 
#>                        Vanuatu                      Venezuela 
#>                              3                              1 
#>                        Vietnam                          Yemen 
#>                              2                              3 
#>                         Zambia 
#>                              4

Interpretation Clustring

Create a new column containing the label information of the cluster formed using k optimum

country_clean$cluster <- as.factor(country_cluster$cluster)

country_clean %>% head()

Profiling Cluster

Melakukan grouping berdasarkan cluster yang terbentuk, untuk mengetahui karakteristik dari masing-masing cluster

country_centroid <- country_clean %>% 
  group_by(cluster) %>% 
  summarise_all(mean)
country_centroid

Perform grouping based on the clusters formed, to determine the characteristics of each cluster

country_centroid %>% 
  pivot_longer(-cluster) %>% 
  group_by(name) %>% 
  summarize(
    group_min = which.min(value),
    group_max = which.max(value))

The explanation of the above plot is as follows:

Cluster 1 belongs to the minimum group in the health and imports variables
Cluster 2 is not included in all groups both minimum and maximum in all variables
Cluster 3 belongs to the minimum group in the exports and income variables.
Cluster 4 belongs to the minimum group with the variables gdpp and life_expec, and belongs to the maximum group with the variables child_mort, inflation and total_fer.
Cluster 5 Included in the maximum group with health variables
Cluster 6 belongs to the minimum group with child_mort, inflation and total_fer variables, and belongs to the maximum group with exports, gdpp, imports, income and life_expec variables.

Goodness of Fit Clustering

The goodness of clustering results can be seen from 3 values

Within Sum of Squares

the sum of the squared distances from each observation to the centroid of each cluster. From our case, we can find the WSS value below:

country_cluster$withinss

#> [1]  93.42705  96.29759 106.26163 151.18155 121.75399  20.87409

Between Sum of Squares

sum of the weighted squared distances from each centroid to the global mean

country_cluster$betweenss

#> [1] 904.2041

Total Sum of Squares

sum of squared distances from each observation to the global mean

country_cluster$totss

#> [1] 1494

The ratio between BSS and TSS is as follows

country_cluster$betweenss/country_cluster$totss

#> [1] 0.6052236

The ratio is quite good because it is close to 1

Visualisasi Clustering

we visualize it on a 2 dimensional plot, where the object is country_cluster and the data is country_clean

# visualisasi 2 dimensi
fviz_cluster(object = country_cluster,
             data = country_clean %>% select(-cluster))

PCA

create a new axis that can capture as much information (variance) as possible from the initial variables. This new axis is called the Principal Component (PC).

We want to create a visualization that simplifies cluster profiling, where the individual views and variables factor map together. Visualizations can be created using the fviz_pca_biplot() function from the factoextra package

Model PCA

# buat model PCA
country_pca <- PCA(X = country_clean, # data untuk di PCA
               scale.unit = T,
               quali.sup = 10, # quali.sup -> indeks dari kolom kategori
               graph = F)
summary(country_pca)

#> 
#> Call:
#> PCA(X = country_clean, scale.unit = T, quali.sup = 10, graph = F) 
#> 
#> 
#> Eigenvalues
#>                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
#> Variance               4.136   1.546   1.170   0.995   0.661   0.224   0.113
#> % of var.             45.952  17.182  13.004  11.053   7.340   2.484   1.260
#> Cumulative % of var.  45.952  63.133  76.138  87.191  94.531  97.015  98.276
#>                        Dim.8   Dim.9
#> Variance               0.088   0.067
#> % of var.              0.981   0.743
#> Cumulative % of var.  99.257 100.000
#> 
#> Individuals (the 10 first)
#>                         Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
#> Afghanistan         |  3.230 | -2.913  1.229  0.814 |  0.096  0.004  0.001 |
#> Albania             |  1.473 |  0.430  0.027  0.085 | -0.588  0.134  0.160 |
#> Algeria             |  1.664 | -0.285  0.012  0.029 | -0.455  0.080  0.075 |
#> Angola              |  3.900 | -2.932  1.245  0.565 |  1.696  1.113  0.189 |
#> Antigua and Barbuda |  1.415 |  1.034  0.155  0.533 |  0.137  0.007  0.009 |
#> Argentina           |  2.223 |  0.022  0.000  0.000 | -1.779  1.226  0.641 |
#> Armenia             |  1.719 | -0.102  0.001  0.003 | -0.568  0.125  0.109 |
#> Australia           |  3.405 |  2.342  0.794  0.473 | -1.988  1.531  0.341 |
#> Austria             |  3.341 |  2.974  1.280  0.792 | -0.735  0.209  0.048 |
#> Azerbaijan          |  1.581 | -0.181  0.005  0.013 | -0.403  0.063  0.065 |
#>                      Dim.3    ctr   cos2  
#> Afghanistan         -0.718  0.264  0.049 |
#> Albania             -0.333  0.057  0.051 |
#> Algeria              1.222  0.763  0.539 |
#> Angola               1.525  1.190  0.153 |
#> Antigua and Barbuda -0.226  0.026  0.025 |
#> Argentina            0.870  0.387  0.153 |
#> Armenia              0.242  0.030  0.020 |
#> Australia            0.190  0.019  0.003 |
#> Austria             -0.520  0.138  0.024 |
#> Azerbaijan           0.867  0.385  0.301 |
#> 
#> Variables
#>                        Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
#> child_mort          | -0.853 17.600  0.728 |  0.240  3.720  0.058 | -0.032
#> exports             |  0.577  8.060  0.333 |  0.762 37.597  0.581 |  0.157
#> health              |  0.307  2.275  0.094 | -0.302  5.909  0.091 | -0.645
#> imports             |  0.328  2.608  0.108 |  0.835 45.134  0.698 | -0.324
#> income              |  0.810 15.876  0.657 |  0.028  0.051  0.001 |  0.326
#> inflation           | -0.393  3.732  0.154 | -0.010  0.007  0.000 |  0.695
#> life_expec          |  0.866 18.134  0.750 | -0.277  4.960  0.077 |  0.123
#> total_fer           | -0.821 16.300  0.674 |  0.193  2.410  0.037 |  0.021
#> gdpp                |  0.798 15.417  0.638 | -0.057  0.212  0.003 |  0.133
#>                        ctr   cos2  
#> child_mort           0.087  0.001 |
#> exports              2.096  0.025 |
#> health              35.597  0.417 |
#> imports              8.996  0.105 |
#> income               9.093  0.106 |
#> inflation           41.283  0.483 |
#> life_expec           1.298  0.015 |
#> total_fer            0.038  0.000 |
#> gdpp                 1.512  0.018 |
#> 
#> Supplementary categories
#>                         Dist    Dim.1   cos2 v.test    Dim.2   cos2 v.test  
#> cluster_1           |  1.253 | -0.268  0.046 -0.907 | -0.686  0.299 -3.790 |
#> cluster_2           |  1.263 |  0.832  0.434  3.347 |  0.290  0.053  1.909 |
#> cluster_3           |  1.915 | -1.746  0.832 -5.282 |  0.130  0.005  0.645 |
#> cluster_4           |  3.316 | -3.021  0.830 -7.259 |  0.804  0.059  3.158 |
#> cluster_5           |  3.036 |  2.637  0.754  7.337 | -0.954  0.099 -4.342 |
#> cluster_6           |  7.779 |  5.460  0.493  4.679 |  5.432  0.488  7.613 |
#>                      Dim.3   cos2 v.test  
#> cluster_1            0.728  0.337  4.623 |
#> cluster_2           -0.388  0.094 -2.933 |
#> cluster_3           -0.614  0.103 -3.490 |
#> cluster_4            0.354  0.011  1.598 |
#> cluster_5            0.098  0.001  0.514 |
#> cluster_6            0.212  0.001  0.341 |

From the data above we can explore again by looking at the proportion of dimensions with the plot below:

fviz_eig(country_pca, ncp = 9, addlabels = T, main = "Variance by each dimensions")

Dimensions 1 and 2 have a variance of about 63%, from these dimensions we will make a visualization

Visualisasi PCA

Individual Factor Map

The goal is to display the distribution of data

plot.PCA(
  x = country_pca,           
  choix = "ind",
  select = "contrib 5"
)

With the visual plot above, the countries of Singapore, Malta, Luxembourg, Haiti and Nigeria are outliers

Variable Factor Map

plot.PCA(x = country_pca,
         choix = "var")

The insights we can take from the plot above are:

PC1/ Dim 1 summarizes the variables: life_expec, child_mort, total_fer, income and gdpp.
PC1 / Dim 2 summarizes variables: imports and exports
Highly positively correlated variable pairs: imports-exports, income-gdpp, gdpp-life_expec
Highly negatively correlated variable pairs: child_mort-total_fer

Dimension Description

We will look at the variable contributions for dimension 1

fviz_contrib(X = country_pca,
             choice = "var",
             axes = 1)

From the plot above, we can conclude that the variables life_expec, child_mort, total_fer, income and gdpp have a contribution in dimension 1.

We will look at the variable contributions for dimension 2

fviz_contrib(X = country_pca,
             choice = "var",
             axes = 2)

From the plot above, we can conclude that the imports and exports variables have a contribution in dimension 2

Visualisasi PCA, Biplot & Cluster

# visualisasi biplot + cluster
fviz_pca_biplot(X = country_pca,
                habillage = "cluster",
                geom.ind = "point",
                addEllipses = TRUE)

By using PCA - Biplot we can draw conclusions: - child_mort is strongly positively correlated with total_fer - child_mort, total_fer and inflation, strongly negatively correlated with health, life_expec, gdpp and income

Case: Country Recomender

Which countries will receive assistance from HELP International?

From the case above, we will categorize the data with low social, economic and health class filtration. Below are the countries that are nominated to receive assistance from HELP International.

By looking at the correlation in the PCA - Biplot plot as follows

# variabel child_mort
country_clean %>% 
  filter(cluster == "4") %>% arrange(-child_mort) %>% head()

# variabel total_fer
country_clean %>%
  filter(cluster == "4") %>% arrange(-total_fer)%>% head()

# variabel inflation
country_clean %>%
  filter(cluster == "4")%>% arrange(-inflation) %>% head()

From the child_mort, total_fer and inflation variables, our recommended country is in cluster 4.

We will look for countries with strong negative correlations to child_mort, total_fer and inflation, namely health, life_expec, gdpp and income

# variabel health
country_clean %>% 
  filter(cluster == "1") %>% arrange(health) %>% head()

# variabel life_expec
country_clean %>% 
  filter(cluster == "4") %>% arrange(life_expec) %>% head()

# variabel gdpp
country_clean %>% 
  filter(cluster == "4") %>% arrange(gdpp) %>% head()

# variabel life_expec
country_clean %>% 
  filter(cluster == "4") %>% arrange(income) %>% head()

From the life_expec, gdpp and income variables, our recommendation is in cluster 4. From the health variable, our recommendation is in Cluster 1.

Conclusion

From the PCA-biplot we can conclude that the variable child_mort is strongly positively correlated with total_fer, which means that the number of deaths of children under 5 years per 1000 births is strongly correlated with the number of children born. The variables child_mort, total_fer and inflation are strongly negatively correlated with health, life_expec, gdpp and income, which means that if child mortality under 5 years old, the number of child births and inflation increase, health, the number of live births, gdpp and income per person will decrease. From the above recommendations, cluster no. 4 is a priority that receives assistance from the HELP International organization. As for cluster 1 with the value of health, we have an assumption that maybe the country uses subsidies, so the total health expenditure per capita is low, therefore cluster 1 is not recommended.

Reference

https://www.kaggle.com/datasets/rohan0301/unsupervised-learning-on-country-data

LBB Unsupervised Learning: Country Data

Muh Amri Sidiq

2023-07-24