A visual review of World’s Inequality

The United Nations reports both the Human Development Index (HDI) and the Inequality-adjusted Human Development Index (IHDI). Although the data source is the same, these indexes represent different things. The HDI represents the national average of human development achievements in the three basic dimensions: i) life expectancy (health), ii) education, and iii) income. Like all averages, it conceals disparities in human development across the population within the same country. For example, two countries with the same HDI average may have a widely different improvements across the three dimensions¹. In turn, the IHDI accounts for the distribution of a country’s achievements in the same three dimensions among its population. Access to data source and technical notes.

Naively interpreted, the HDI tells us the average development of a country regardless of how such development is distributed among its citizes, whereas the IHDI tells us how large is the inequality gap bewteen those enjoying the highest developments and those standing the lowest achievements in a given country.

For this project, there are three kinds of pre-processed inequality datasets available: Adjusted index, Percentage, and Coefficients. The example shown below uses Adjusted index and Percentage datasets specifically about life expectancy inequality. Each dataset has entries spanning over multiple years, one column per year.

In this visual review we will explore the IHDI dataset to gain insights on global, regional and country-level inequality. In order to make a regional exploration, the variable Continent was added, so each country has a reference to the region to which it belongs. The regions are shown in this figure:

# Import a map and plot regions
globeMap <- read.csv2("Datasets/map.csv")

# add latitude and longitude to the original dataset
library(ggplot2)

ggplot(globeMap, aes(x = long, y = lat, group = group)) +
  geom_polygon(aes(fill=`sub.region`))+
  scale_fill_manual(values =  c("#c8522c","#4bafd0","#d34459","#64b948","#a35ac7","#b4b335","#6a75c8","#d24699","#5dc18a","#a04a6d","#598233","#d389c3","#388864","#d07a6c","#b5a861","#886a2c","#db923b"))+
  labs(title = "Regions of the World", caption = "Empty territories have no index. Source Uknown", x = "Longitude", y= "Latitude")+ theme_bw()

A broad level dataset visualization

The usual strategy in visual analytics is to visualize that dataset at a broad level of reading and digg into the details as patterns or anomalities emerge. In this case we are using the coeficient of human inequality dataset which is a subset of the large IHDI dataset. A quick exploration of the dataset shows that it has 1153 datapoints (also called observations) and collects data from 8 years (2010-2019) (See table below). The value column presents the percentage of inequality for each country over years. It means that the inequality in a country with low percentage is more evenly distributed than one with a higher percentage. Zero percent means no inequality whereas 100 % means total inequality. As a reference it is worth taking a look at the analysis of income inequality in the United States Inequality made by the think-tank Economic Policy Institute http://inequality.is/.

## 'data.frame':    1670 obs. of  6 variables:
##  $ X       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ HDI.Rank: int  169 69 91 148 46 81 8 18 88 58 ...
##  $ Country : chr  "Afghanistan" "Albania" "Algeria" "Angola" ...
##  $ ISO3    : chr  "AFG" "ALB" "DZA" "AGO" ...
##  $ variable: int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ value   : num  NA 12.7 NA 38.8 19 10.9 7.7 7.3 13.4 14 ...

In particular the income inequality across 8 years has this distribution:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    3.60   10.93   19.40   20.41   29.50   45.90     216

The lowest inequality is 3.6% and the highest is 45.90 %. The median is 19.4 %, meaning that half of the dataset has an inequality index above 19.4%. Let’s visualize that in a chart.

library(ggplot2)
overallDist <- ggplot(IHDI,aes(x=value))
overallDist <- overallDist + geom_histogram(aes(y=..density..), binwidth=.5,colour="white", fill="grey")
overallDist <- overallDist + geom_density(alpha=.2, fill="#FF6666")
overallDist <- overallDist + geom_vline (aes ( xintercept = 19.9, color = 'red'))
overallDist <- overallDist + geom_text (x=19.4, y=0.045, label="global median")
overallDist <- overallDist + labs(title = "Global distribution of inequality", subtitle = "Aggregated from 2010-2019", x = "Percentage of IHDI")
overallDist <- overallDist + theme(legend.position = "none")
overallDist

Not bad! Half of the world is an acceptable condition. But taking a look at the same dataset splited by region shows a concerning situation. IHDI is also unequally distributed!

First we need to mark each country with its corresponding region in the world. We use the library dplyr to merge to datasets: IHDI and the CountryCodeRegions.csv

library(readxl)
regions <- read_excel("Datasets/Country codes and regions.xlsx")
 
library(dplyr)
IHDI <- left_join(IHDI,regions,by=c('ISO3' = 'alpha-3'))
# clean redundat columns
IHDI$name <-NULL

library(ggplot2)
overallDist <- ggplot(IHDI,aes(x=value))
overallDist <- overallDist + geom_histogram(aes(y=..density..), binwidth=.5,colour="black", fill="white")
overallDist <- overallDist + geom_density(alpha=.2, fill="#FF6666")
overallDist <- overallDist + geom_vline (aes ( xintercept = 21, color = 'red'))
overallDist <- overallDist + geom_text (x=19.4, y=0.3, label="global median")
overallDist <- overallDist + facet_grid(as.character(region)~.)
overallDist <- overallDist + labs(title = "Global distribution of inequality by region", subtitle = "Aggregated from 2010-2019", caption = " 1: Africa, 2:Antartida (removed), 3:Asia, 4:Europe, 5:North America, 6:Oceania, 7:South America",x = "Percentage of IHDI")
overallDist <- overallDist + theme(legend.position = "none")
overallDist

The same data ploted in a combination of scatterplot and boxplot

overallDist <- ggplot(IHDI, aes(x=as.factor(region), y=value, group = region))
overallDist <- overallDist + geom_point(aes(color=as.factor(region)), position=position_jitter(width=0.2,height=.2))
overallDist <- overallDist + geom_boxplot(aes(alpha=0.2))
overallDist <- overallDist + scale_color_manual(values = c("#ff1d1d","#db406b","#95f4da","#4e3e7b","#c9ca00","#6a9c00","#004100","#831811","#ff9f00","#3f0900","#302a87","#009b6c"))
overallDist <- overallDist + labs(title = "Global distribution of inequality by region", subtitle = "Aggregated from 2010-2019", caption = " 1: Africa, 2:Antartida (removed), 3:Asia, 4:Europe, 5:North America, 6:Oceania, 7:South America", x = "Percentage of IHDI")
overallDist <- overallDist + theme(legend.position = "none")
overallDist

Notice that the number of datapoints in Oceania is less than any other region.

Digging deeper

A Heatmap is a matrix of colored tiles displaying a numerical value at each intersection from two sets of categorical variables. The heatmaps are symmetrical when the categorical variables in the x and y coordinates are the same, and the interaction between the two readings is bidirectional. It means that the interaction of the variable A on B applies also from B to A. If the matrix coordinates are different of each other the heatmap is asymmetrical. Non-directed networks are represented as symmetrical heatmaps, whereas directed networks as asymmetrical.

The library to be used is ggplot and the geometry is tile. The idea is to create a grid with the two categorical variables and assign the fill of each tile to the numerical value.

The idea is to visualize each region separately, thus we need to subset the coefficient index by region.

AfricaIHDI <- subset (IHDI, IHDI$region == 'Africa')
AsiaIHDI <- subset (IHDI, IHDI$region == 'Asia')
EuropeIHDI <- subset (IHDI, IHDI$region == 'Europe')
AmericasIHDI <- subset (IHDI, IHDI$`region` == 'Americas')
OceaniaIHDI <- subset (IHDI, IHDI$region == 'Oceania')
LatinAmericaIHDI <- subset (IHDI, IHDI$`sub-region` == 'Latin America and the Caribbean')

# Create plot
library(ggplot2)

# AFRICA
heatmap <- ggplot(AfricaIHDI, aes(x=Country, y=variable, fill=value)) 
heatmap <- heatmap + geom_tile()
heatmap <- heatmap + scale_fill_viridis_c(option="magma",limits = c(0,50))
heatmap <- heatmap + theme(axis.text.x = element_text(angle = 90))
heatmap <- heatmap + labs(title = "IHDI in Africa", subtitle = "From 2010-2019", caption = "Source: UN", x = "Country", y="Year")
heatmap

# ASIA
heatmap <- ggplot(AsiaIHDI, aes(x=Country, y=variable, fill=value)) 
heatmap <- heatmap + geom_tile()
heatmap <- heatmap + scale_fill_viridis_c(option="magma", limits = c(0,50))
heatmap <- heatmap + theme(axis.text.x = element_text(angle = 90))
heatmap <- heatmap + labs(title = "IHDI in Asia", subtitle = "From 2010-2019", caption = "Source: UN", x = "Country", y="Year")
heatmap

# EUROPE
heatmap <- ggplot(EuropeIHDI,aes(x=Country, y=variable, fill=value)) 
heatmap <- heatmap + geom_tile()
heatmap <- heatmap + scale_fill_viridis_c(option="magma", limits = c(0,50))
heatmap <- heatmap + theme(axis.text.x = element_text(angle = 90))
heatmap <- heatmap + labs(title = "IHDI in Europe", subtitle = "From 2010-2019", caption = "Source: UN", x = "Country", y="Year")
heatmap

# AMERICAS
heatmap <- ggplot(AmericasIHDI, aes(x=Country, y=variable, fill=value)) 
heatmap <- heatmap + geom_tile()
heatmap <- heatmap + scale_fill_viridis_c(option="magma", limits = c(0,50))
heatmap <- heatmap + theme(axis.text.x = element_text(angle = 90))
heatmap <- heatmap + labs(title = "IHDI in the Americas", subtitle = "From 2010-2019", caption = "Source: UN", x = "Country", y="Year")
heatmap

# OCEANIA
heatmap <- ggplot(OceaniaIHDI,aes(x=Country, y=variable, fill=value)) 
heatmap <- heatmap + geom_tile()
heatmap <- heatmap + scale_fill_viridis_c(option="magma", limits = c(0,50))
heatmap <- heatmap + theme(axis.text.x = element_text(angle = 90))
heatmap <- heatmap + labs(title = "IHDI in Oceania", subtitle = "From 2010-2019", caption = "Source: UN", x = "Country", y="Year")
heatmap

This chart presents a temporal evolution of IHDI in the Americas

chart <- ggplot(AmericasIHDI, aes(x=variable, y=value, fill=value))
chart <- chart + geom_col()
chart <- chart + facet_wrap(.~ISO3)
chart <- chart + scale_fill_viridis_c(option="magma") 
chart <- chart + labs(title = "Inequality-Adjusted Development Index (IHDI) in the Americas", subtitle = "Average of education, health and income inequality indexes 2010-2019", caption = "Source UN", x = "year", y= "value") 
chart <- chart + theme(axis.text.x = element_text(angle = 90))
chart

Decomposing aggregated statistics

Let’s focus in America and desaggregate the IHDI coefficient. Remember that IHDI is the average of life expectancy, education and income indexes. We need to import each dataset, extract American data and bind them together in a single dataframe.

IHDI_Education <- read.csv2("Datasets/Inequality_Education_2010-2019.csv")
IHDI_LifeExpectancy <- read.csv2("Datasets/Inequality_Life_Expectancy_2010-2019.csv")
IHDI_Income <- read.csv2("Datasets/Inequality_Income_2010-2019.csv")

# Merge with regions data

IHDI_Education <- left_join(IHDI_Education, regions, by=c("ISO3"="alpha-3"))
IHDI_LifeExpectancy <- left_join(IHDI_LifeExpectancy, regions, by=c("ISO3"="alpha-3"))
IHDI_Income <- left_join(IHDI_Income, regions, by=c("ISO3"="alpha-3"))

# Here we repourpose the column 'name' to mark each row with the name of the index
IHDI_Education$name <- "Education"
IHDI_LifeExpectancy$name <- "Life Expectancy"
IHDI_Income$name <- "Income"

# Select only data related to the Americas
A_IHDI_Education <- subset(IHDI_Education, IHDI_Education$region == 'Americas')
A_IHDI_LifeExpectancy <- subset(IHDI_LifeExpectancy, IHDI_LifeExpectancy$region == 'Americas')
A_IHDI_Income <- subset(IHDI_Income, IHDI_Income$region == 'Americas')

# ∫ind all the datasets
A_IHDIaggregated <- A_IHDI_Education
A_IHDIaggregated <- rbind(A_IHDIaggregated, A_IHDI_Income)
A_IHDIaggregated <- rbind(A_IHDIaggregated, A_IHDI_LifeExpectancy)

We clearly see that what accounts the most for American IHDI between 2010 and 2019 is inequality in income.

Mapping last year’s data

Using a Choroplet map we can compare the evolution of inequality globally

# Subset the main dataset 2019
A_IHDI_2019 <- subset(A_IHDIaggregated, A_IHDIaggregated$variable == "2019")

# Add data to the map
mapData <- left_join(A_IHDI_2019,globeMap, by = "ISO3")

mapDataEducation <- subset(mapData,mapData$name == "Education")
mapDataLifeExpectancy <- subset(mapData,mapData$name == "Life Expectancy")
mapDataIncome <- subset(mapData,mapData$name == "Income")

# Create ggplot object and save it in an object. The group parameter is very important because it groups all the coordinates by country
myMap <- ggplot(mapDataEducation, aes(x=long, y=lat, group = as.factor(group)))
# add geometry
myMap <- myMap + geom_polygon(aes(fill = value))
# add color palettes
myMap <- myMap + scale_fill_viridis_c(option="plasma")
# add labels
myMap <- myMap + labs(title = "Americas' inequality in Education for 2019", caption = "Grey territories have no index. Source UN", x = "Longitude", y= "Latitude")
# add margins
myMap <- myMap + theme_bw()
# plot the map
myMap

***

# Create ggplot object and save it in an object. The group parameter is very important because it groups all the coordinates by country
myMap <- ggplot(mapDataLifeExpectancy, aes(x=long, y=lat, group = as.factor(group)))
# add geometry
myMap <- myMap + geom_polygon(aes(fill = value))
# add color palettes
myMap <- myMap + scale_fill_viridis_c(option="plasma")
# add labels
myMap <- myMap + labs(title = "Americas' inequality in Life Expectancy for 2019", caption = "Grey territories have no index. Source UN", x = "Longitude", y= "Latitude")
# add margins
myMap <- myMap + theme_bw()
# plot the map
myMap

***

# Create ggplot object and save it in an object. The group parameter is very important because it groups all the coordinates by country
myMap <- ggplot(mapDataIncome, aes(x=long, y=lat, group = as.factor(group)))
# add geometry
myMap <- myMap + geom_polygon(aes(fill = value))
# add color palettes
myMap <- myMap + scale_fill_viridis_c(option="plasma")
# add labels
myMap <- myMap + labs(title = "Americas' inequality in Income for 2019", caption = "Grey territories have no index. Source UN", x = "Longitude", y= "Latitude")
# add margins
myMap <- myMap + theme_bw()
# plot the map
myMap

Conclusion

Inequality in the world is widely distributed in all the regions except for Europe. Africa is the region with the highest inequality because the majority of the countries have an inequality index above the media. The distribution in the Americas is balanced around the median, with extreme cases such as Canada and Haiti. Overall, education in the Americas is the best index of all three composing the Human Development index. In 2019, there is a wide diversity in terms of income, followed by life expectancy. There are concerning cases such as Haiti, Nicaragua, Honduras, Guatemala and Bolivia.

http://hdr.undp.org/en/faq-page/inequality-adjusted-human-development-index-ihdi#t293n2906 ↩︎

A review of World Inequality-Adjusted Human Development Index

Juan Salamanca