A visual review of World’s Inequality

The United Nations reports both the Human Development Index (HDI) and the Inequality-adjusted Human Development Index (IHDI). Although the data source is the same, these indexes represent different things. The HDI represents the national average of human development achievements in the three basic dimensions: i) life expectancy (health), ii) education, and iii) income. Like all averages, it conceals disparities in human development across the population within the same country. For example, two countries with the same HDI average may have a widely different improvements across the three dimensions¹. In turn, the IHDI accounts for the distribution of a country’s achievements in the same three dimensions among its population. Access to data source and technical notes.

Naively interpreted, the HDI tells us the average development of a country regardless of how such development is distributed among its citizes, whereas the IHDI tells us how large is the inequality gap bewteen those enjoying the highest developments and those standing the lowest achievements in a given country.

For this project, there are three kinds of pre-processed inequality datasets available: Adjusted index, Percentage, and Coefficients. The example shown below uses Adjusted index and Percentage datasets specifically about life expectancy inequality. Each dataset has entries spanning over multiple years, one column per year.

In this visual review we will explore the IHDI dataset to gain insights on global, regional and country-level inequality. In order to make a regional exploration, the variable Continent was added, so each country has a reference to the region to which it belongs. The regions are shown in this figure:

A broad level dataset visualization

The usual strategy in visual analytics is to visualize that dataset at a broad level of reading and digg into the details as patterns or anomalities emerge. In this case we are using the coeficient of human inequality dataset which is a subset of the large IHDI dataset. A quick exploration of the dataset shows that it has 1153 datapoints (also called observations) and collects data from 8 years (2010-2019) (See table below). The value column presents the percentage of inequality for each country over years. It means that the inequality in a country with low percentage is more evenly distributed than one with a higher percentage. Zero percent means no inequality whereas 100 % means total inequality. As a reference it is worth taking a look at the analysis of income inequality in the United States Inequality made by the think-tank Economic Policy Institute http://inequality.is/.

## 'data.frame':    1670 obs. of  6 variables:
##  $ X       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ HDI.Rank: int  169 69 91 148 46 81 8 18 88 58 ...
##  $ Country : chr  "Afghanistan" "Albania" "Algeria" "Angola" ...
##  $ ISO3    : chr  "AFG" "ALB" "DZA" "AGO" ...
##  $ variable: int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ value   : num  NA 12.7 NA 38.8 19 10.9 7.7 7.3 13.4 14 ...

In particular the income inequality across 8 years has this distribution:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    3.60   10.93   19.40   20.41   29.50   45.90     216

The lowest inequality is 3.6% and the highest is 45.90 %. The median is 19.4 %, meaning that half of the dataset has an inequality index above 19.4%. Let’s visualize that in a chart.

Not bad! Half of the world is an acceptable condition. But taking a look at the same dataset splited by region shows a concerning situation. IHDI is also unequally distributed!

First we need to mark each country with its corresponding region in the world. We use the library dplyr to merge to datasets: IHDI and the CountryCodeRegions.csv

The same data ploted in a combination of scatterplot and boxplot

Notice that the number of datapoints in Oceania is less than any other region.

Digging deeper

A Heatmap is a matrix of colored tiles displaying a numerical value at each intersection from two sets of categorical variables. The heatmaps are symmetrical when the categorical variables in the x and y coordinates are the same, and the interaction between the two readings is bidirectional. It means that the interaction of the variable A on B applies also from B to A. If the matrix coordinates are different of each other the heatmap is asymmetrical. Non-directed networks are represented as symmetrical heatmaps, whereas directed networks as asymmetrical.

The library to be used is ggplot and the geometry is tile. The idea is to create a grid with the two categorical variables and assign the fill of each tile to the numerical value.

The idea is to visualize each region separately, thus we need to subset the coefficient index by region.

This chart presents a temporal evolution of IHDI in the Americas

Decomposing aggregated statistics

Let’s focus in America and desaggregate the IHDI coefficient. Remember that IHDI is the average of life expectancy, education and income indexes. We need to import each dataset, extract American data and bind them together in a single dataframe.

We clearly see that what accounts the most for American IHDI between 2010 and 2019 is inequality in income.

A review of World Inequality-Adjusted Human Development Index

Juan Salamanca

1/0/2022

A visual review of World’s Inequality

A broad level dataset visualization

Digging deeper

Decomposing aggregated statistics

Mapping last year’s data

Conclusion