The United Nations reports both the Human Development Index (HDI) and the Inequality-adjusted Human Development Index (IHDI). Although the data source is the same, these indexes represent different things. The HDI represents the national average of human development achievements in the three basic dimensions: i) life expectancy (health), ii) education, and iii) income. Like all averages, it conceals disparities in human development across the population within the same country. For example, two countries with the same HDI average may have a widely different improvements across the three dimensions1. In turn, the IHDI accounts for the distribution of a country’s achievements in the same three dimensions among its population. Access to data source and technical notes.
Naively interpreted, the HDI tells us the average development of a country regardless of how such development is distributed among its citizes, whereas the IHDI tells us how large is the inequality gap bewteen those enjoying the highest developments and those standing the lowest achievements in a given country.
For this project, there are three kinds of pre-processed inequality datasets available: Adjusted index, Percentage, and Coefficients. The example shown below uses Adjusted index and Percentage datasets specifically about life expectancy inequality. Each dataset has entries spanning over multiple years, one column per year.
In this visual review we will explore the IHDI dataset to gain insights on global, regional and country-level inequality. In order to make a regional exploration, the variable Continent was added, so each country has a reference to the region to which it belongs. The regions are shown in this figure:
The usual strategy in visual analytics is to visualize that dataset at a broad level of reading and digg into the details as patterns or anomalities emerge. In this case we are using the coeficient of human inequality dataset which is a subset of the large IHDI dataset. A quick exploration of the dataset shows that it has 1153 datapoints (also called observations) and collects data from 8 years (2010-2019) (See table below). The value column presents the percentage of inequality for each country over years. It means that the inequality in a country with low percentage is more evenly distributed than one with a higher percentage. Zero percent means no inequality whereas 100 % means total inequality. As a reference it is worth taking a look at the analysis of income inequality in the United States Inequality made by the think-tank Economic Policy Institute http://inequality.is/.
## 'data.frame': 1670 obs. of 6 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ HDI.Rank: int 169 69 91 148 46 81 8 18 88 58 ...
## $ Country : chr "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ ISO3 : chr "AFG" "ALB" "DZA" "AGO" ...
## $ variable: int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## $ value : num NA 12.7 NA 38.8 19 10.9 7.7 7.3 13.4 14 ...
In particular the income inequality across 8 years has this distribution:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 3.60 10.93 19.40 20.41 29.50 45.90 216
The lowest inequality is 3.6% and the highest is 45.90 %. The median is 19.4 %, meaning that half of the dataset has an inequality index above 19.4%. Let’s visualize that in a chart.
Not bad! Half of the world is an acceptable condition. But taking a look at the same dataset splited by region shows a concerning situation. IHDI is also unequally distributed!
First we need to mark each country with its corresponding region in the world. We use the library dplyr to merge to datasets: IHDI and the CountryCodeRegions.csv
The same data ploted in a combination of scatterplot and boxplot
Notice that the number of datapoints in Oceania is less than any other region.
A Heatmap is a matrix of colored tiles displaying a numerical value at each intersection from two sets of categorical variables. The heatmaps are symmetrical when the categorical variables in the x and y coordinates are the same, and the interaction between the two readings is bidirectional. It means that the interaction of the variable A on B applies also from B to A. If the matrix coordinates are different of each other the heatmap is asymmetrical. Non-directed networks are represented as symmetrical heatmaps, whereas directed networks as asymmetrical.
The library to be used is ggplot and the geometry is tile. The idea is to create a grid with the two categorical variables and assign the fill of each tile to the numerical value.
The idea is to visualize each region separately, thus we need to subset the coefficient index by region.
This chart presents a temporal evolution of IHDI in the Americas
Let’s focus in America and desaggregate the IHDI coefficient. Remember that IHDI is the average of life expectancy, education and income indexes. We need to import each dataset, extract American data and bind them together in a single dataframe.
We clearly see that what accounts the most for American IHDI between 2010 and 2019 is inequality in income.
Using a Choroplet map we can compare the evolution of inequality globally
Inequality in the world is widely distributed in all the regions except for Europe. Africa is the region with the highest inequality because the majority of the countries have an inequality index above the media. The distribution in the Americas is balanced around the median, with extreme cases such as Canada and Haiti. Overall, education in the Americas is the best index of all three composing the Human Development index. In 2019, there is a wide diversity in terms of income, followed by life expectancy. There are concerning cases such as Haiti, Nicaragua, Honduras, Guatemala and Bolivia.