This dataset includes data from each country regarding socio-economic and health. This data helps estimate the development level of each country and compair them among eachother. The examples of usage could be:
The link to the source of the dataset: Unsupervised Learning on Country Data.
#Dataset is loaded. Note that the data set should be in the same location as this rmd file
temp <- read.csv("Country-data.csv")
#We will separate the "country" column since it has character values.
rownames(temp) <- temp$country
temp2 <- temp[, -1]
country_data <-scale(temp2)
| Variable | Description |
|---|---|
| country | Name of the country |
| child_mort | Death of children under 5 years of age per 1000 live births |
| exports | Exports of goods and services per capita. Given as %age of the GDP per capita |
| health | Total health spending per capita. Given as %age of GDP per capita |
| imports | Imports of goods and services per capita. Given as %age of the GDP per capita |
| Income | Net income per person |
| Inflation | The measurement of the annual growth rate of the Total GDP |
| life_expec | The average number of years a new born child would live if the current mortality patterns are to remain the same |
| total_fer | The number of children that would be born to each woman if the current age-fertility rates remain the same. |
| gdpp | The GDP per capita. Calculated as the Total GDP divided by the total population. |
#Calculation of the distances (euclidean method)
distances = dist(country_data, method = "euclidean")
#Creating Hierarchical Clustering
clusterCountries = hclust(distances, method = "ward.D2")
#The display of these data
plot(clusterCountries)
# Creating clusters.
clusterGroups = cutree(clusterCountries, k = 3)
#The display of the subsets
plot(clusterGroups)
#Displaying cluster 1
colMeans(subset(country_data, clusterGroups == 1))
## child_mort exports health imports income inflation
## 1.65638682 -0.63911207 -0.11236616 -0.29852849 -0.80687277 -0.06045525
## life_expec total_fer gdpp
## -1.49637729 1.64200130 -0.67087483
#Displaying cluster 2
colMeans(subset(country_data, clusterGroups == 2))
## child_mort exports health imports income inflation
## -0.16494698 -0.04080723 -0.16819684 0.04937351 -0.30100537 0.12664922
## life_expec total_fer gdpp
## 0.04115624 -0.19377211 -0.35773328
#Displaying cluster 3
colMeans(subset(country_data, clusterGroups == 3))
## child_mort exports health imports income inflation
## -0.80111954 0.63475270 0.61361032 0.08313756 1.57918040 -0.34683900
## life_expec total_fer gdpp
## 1.05998899 -0.69982916 1.64803967
The clusters are devided to 3 groups representing the development level (Underdeveloped, Developing, Developed). In order to link the groups with development levels, few things need to be considered:
Based on the above indicators and data, it is possible to deduce the linkage of the groups. Specifically:
As a result,
# Creating k-means clusters.
country_kmeans <- kmeans(country_data, centers = 3, nstart = 25)
#The display of K-means data
fviz_cluster(country_kmeans,
data = country_data,
geom = "point",
ggtheme = theme_minimal())