Dataset Description

This dataset includes data from each country regarding socio-economic and health. This data helps estimate the development level of each country and compair them among eachother. The examples of usage could be:

estimating an exclusive pricing for a product for each country.
deciding how to split humanitarian help in underdeveloped countries
categorizing each country to the development levels (as we will do in this analysis)

The link to the source of the dataset: Unsupervised Learning on Country Data.

Dataset upload

#Dataset is loaded. Note that the data set should be in the same location as this rmd file
temp <- read.csv("Country-data.csv")

#We will separate the "country" column since it has character values.

rownames(temp) <- temp$country
temp2 <- temp[, -1]
country_data <-scale(temp2)

Variables

Summary of Unsupervised Learning on Country Dataset Variables
Variable	Description
country	Name of the country
child_mort	Death of children under 5 years of age per 1000 live births
exports	Exports of goods and services per capita. Given as %age of the GDP per capita
health	Total health spending per capita. Given as %age of GDP per capita
imports	Imports of goods and services per capita. Given as %age of the GDP per capita
Income	Net income per person
Inflation	The measurement of the annual growth rate of the Total GDP
life_expec	The average number of years a new born child would live if the current mortality patterns are to remain the same
total_fer	The number of children that would be born to each woman if the current age-fertility rates remain the same.
gdpp	The GDP per capita. Calculated as the Total GDP divided by the total population.

Creating Hierarchical Clustering

#Calculation of the distances (euclidean method)
distances = dist(country_data, method = "euclidean")

#Creating Hierarchical Clustering
clusterCountries = hclust(distances, method = "ward.D2")

#The display of these data
plot(clusterCountries)

# Creating clusters.
clusterGroups = cutree(clusterCountries, k = 3)

#The display of the subsets
plot(clusterGroups)

#Displaying cluster 1
colMeans(subset(country_data, clusterGroups == 1))

##  child_mort     exports      health     imports      income   inflation 
##  1.65638682 -0.63911207 -0.11236616 -0.29852849 -0.80687277 -0.06045525 
##  life_expec   total_fer        gdpp 
## -1.49637729  1.64200130 -0.67087483

#Displaying cluster 2
colMeans(subset(country_data, clusterGroups == 2))

##  child_mort     exports      health     imports      income   inflation 
## -0.16494698 -0.04080723 -0.16819684  0.04937351 -0.30100537  0.12664922 
##  life_expec   total_fer        gdpp 
##  0.04115624 -0.19377211 -0.35773328

#Displaying cluster 3
colMeans(subset(country_data, clusterGroups == 3))

##  child_mort     exports      health     imports      income   inflation 
## -0.80111954  0.63475270  0.61361032  0.08313756  1.57918040 -0.34683900 
##  life_expec   total_fer        gdpp 
##  1.05998899 -0.69982916  1.64803967

Conclusion

The clusters are devided to 3 groups representing the development level (Underdeveloped, Developing, Developed). In order to link the groups with development levels, few things need to be considered:

For the positive indicators (exports, health, etc.), the more the value increases, the more desirable outcome becomes.
For the negative indicators (child_mort, Inflation, etc.), the more the value decreases, the more desirable outcome becomes.

Based on the above indicators and data, it is possible to deduce the linkage of the groups. Specifically:

cluster 1 is Underdeveloped. It tends to have the lowest positive indicator and highest negative indicators values compairing to the rest groups.
cluster 2 is Developing. It tends to have values in the middle compairing to the rest groups.
cluster 3 is Developed. It tends to have the highest positive indicator and lowest negative indicators values compairing to the rest groups.

As a result,

Creating K-means Clustering

# Creating k-means clusters.
country_kmeans <- kmeans(country_data, centers = 3, nstart = 25)

#The display of K-means data
fviz_cluster(country_kmeans,
             data = country_data,
             geom = "point",
             ggtheme = theme_minimal())

Unsupervised Learning on Country Data

Diana Gromova

2026-04-24