R Notebook

How to create a cartogram

Preparation

First, we need to load the libraries:

library(sf)             # spatial data classes

## Linking to GEOS 3.5.1, GDAL 2.1.3, PROJ 4.9.2

library(rnaturalearth)  # world map data
library(readxl)         # reading excel files
library(dplyr)          # data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)          # data manipulation
library(purrr)          # data manipulation
library(cartogram)      # cartograms creation
library(tmap)           # maps creation

Data

To create cartograms of a global development indicator, we need two datasets - one containing spatial data of the world’s countries and one non-spatial with information about the specific indicator we are interested in. The first one can be downloaded from the Natural Earth website, for example using the rnaturalearth package:

world_map = ne_countries(returnclass = "sf")

names(world_map)

##  [1] "scalerank"  "featurecla" "labelrank"  "sovereignt" "sov_a3"    
##  [6] "adm0_dif"   "level"      "type"       "admin"      "adm0_a3"   
## [11] "geou_dif"   "geounit"    "gu_a3"      "su_dif"     "subunit"   
## [16] "su_a3"      "brk_diff"   "name"       "name_long"  "brk_a3"    
## [21] "brk_name"   "brk_group"  "abbrev"     "postal"     "formal_en" 
## [26] "formal_fr"  "note_adm0"  "note_brk"   "name_sort"  "name_alt"  
## [31] "mapcolor7"  "mapcolor8"  "mapcolor9"  "mapcolor13" "pop_est"   
## [36] "gdp_md_est" "pop_year"   "lastcensus" "gdp_year"   "economy"   
## [41] "income_grp" "wikipedia"  "fips_10"    "iso_a2"     "iso_a3"    
## [46] "iso_n3"     "un_a3"      "wb_a2"      "wb_a3"      "woe_id"    
## [51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent" 
## [56] "region_un"  "subregion"  "region_wb"  "name_len"   "long_len"  
## [61] "abbrev_len" "tiny"       "homepart"   "geometry"

In this example, we will need population data. In this example, we will download the yearly population data from the [GapMinder website] website[https://www.gapminder.org/]:

if(!dir.exists("data")) dir.create("data")
download.file("http://gapm.io/dl_pop", destfile = "data/pop1800_2100.xlsx")
world_pop = read_xlsx("data/pop1800_2100.xlsx", sheet = 7)

Note that the data contains number of people for a wide range of years, since 1800 to 2100.

We will need also to download a development indicator from the Our World in data website. We are interested on the global indicator for Hunger and Undernourishment. The data comes from the World Bank (2017). It is a csv table containing prevalence of undernourishment (%) from 1992 to 2016.

under = readr::read_csv("data/prevalence-of-undernourishment.csv")

names(under)

## [1] "Entity"                                                                   
## [2] "Code"                                                                     
## [3] "Year"                                                                     
## [4] "Prevalence of undernourishment (World Bank 2017 & UN FAO SOFI (2018)) (%)"

We will shorten the name of our attribute of interest:

under1 <- under %>% rename("undernourish" =
                           "Prevalence of undernourishment (World Bank 2017 & UN FAO SOFI (2018)) (%)")

Note that our task is challenging in the sense that we need to join three different datasets: an spatial object representing countries (from the Natural Earth site), a multi-year table with percentage of undernourishment population at each country (from the Our World in Development site), and a multi-year table with population at each country (from the GapMinder website).

Cleaning

As always when working with multiple datasets - some data cleaning will be necessary. Our world_map dataset has many columns unnecessary for cartograms creation and we do not need spatial data of Antarctica. Let’s get rid of them. we can also transform our data into a more appropriate projection.

world_map = world_map %>% 
  select(sovereignt) %>% 
  filter(sovereignt != "Antarctica") %>% 
  st_transform(world_map, crs = "+proj=robin")

tail(world_map)

library(stringr)
tan <- under1 %>% filter(str_detect(Entity, "Tanzania"))
tan

We need to have a common identifier to combine our spatial and non-spatial datasets, for example, names of the countries. However, there are inconsistencies between some of the names. We need to fix it manually in both tables, under1 and world_pop.

Let’s start our cleaning work with world_pop:

world_pop = world_pop %>% 
  mutate(sovereignt = name) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Tanzania", "United Republic of Tanzania")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "United States", "United States of America")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Congo, Dem. Rep.", "Democratic Republic of the Congo")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Bahamas", "The Bahamas")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Serbia", "Republic of Serbia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Macedonia, FYR", "Macedonia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Slovak Republic", "Slovakia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Czech Republic", "Czechia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Congo, Rep.", "Republic of the Congo")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Kyrgyz Republic", "Kyrgyzstan")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Lao", "Laos")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Swaziland", "eSwatini")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Cote d'Ivoire", "Ivory Coast")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Timor-Leste", "East Timor"))

Let’s check the output:

tail(world_pop)

As we have in this table much more years than needed, we will select only columns ranging from 1992.0 to 2016.0:

indice <- which(names(world_pop) == "1992.0")
indice

## [1] 196

indice <- which(names(world_pop) == "2016.0")
indice

## [1] 220

nworld_pop <- world_pop %>% select(geo, name, 196:220)

tail(nworld_pop)

We need to remove the .0 preffix attached to the year columns:

n2world_pop <- nworld_pop
names(n2world_pop) <- gsub('[.]0','',names(n2world_pop))  #remove the ".0" suffix

Note that the table is not tidy and we need to fix it:

n2world_pop$indicator = "population"

tail(n2world_pop)

tail(world_map)

Now, it’s time for our first join:

world_data = dplyr::left_join(world_map, n2world_pop, by = c("sovereignt"="name"))

world_data

In addition, we need to apply the gather function to tidy the messy data:

world_data2 <- world_data %>% na.omit() %>% 
  select(-indicator) %>% 
  gather(key = "year", value = "population", `1992`:`2016`) %>% 
  mutate(year = as.integer(year))

Let’s check that the output is OK for our purposes:

world_data2

Now, it is time to clean the remaining under1 table:

under1 <- under1 %>% 
  mutate(sovereignt = Entity) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Tanzania", "United Republic of Tanzania")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "United States", "United States of America")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Congo, Dem. Rep.", "Democratic Republic of the Congo")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Bahamas", "The Bahamas")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Serbia", "Republic of Serbia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Macedonia, FYR", "Macedonia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Slovak Republic", "Slovakia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Czech Republic", "Czechia")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Congo, Rep.", "Republic of the Congo")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Kyrgyz Republic", "Kyrgyzstan")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Lao", "Laos")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Swaziland", "eSwatini")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Cote d'Ivoire", "Ivory Coast")) %>% 
  mutate(sovereignt = replace(sovereignt, sovereignt == "Timor-Leste", "East Timor"))

Let’s check the output:

tail(under1)

Note that, in contrast to the population table, this table is tidy in the sense explained by [Hadley Wickham] (https://blog.rstudio.com/2014/07/22/introducing-tidyr/): - Each column is a variable. - Each row is an observation.

However, we need to check names of the two tables we want to join:

names(under1)

## [1] "Entity"       "Code"         "Year"         "undernourish"
## [5] "sovereignt"

names(world_data2)

## [1] "sovereignt" "geo"        "year"       "population" "geometry"

Second join

Now we can join our first the enhanced spatial object, that is the one that now contains population data, and the under1 table. Note that our code is different than in the previous join (that is due that both data are tidy):

nworld_data2 = left_join(world_data2, under1, 
                         by = c("sovereignt" = "Entity",
                         "year"="Year")) %>%
                              na.omit()

nworld_data2

We can now calculate undernourished population in every country by multiplyint its population by the percentage of undernourished people:

nworld_data3 <- nworld_data2 %>% mutate(under_pop = population *  undernourish / 100)

nworld_data3

Additionally, we can calculate total undernourished global populations in each year:

nworld_data3 = nworld_data3 %>% 
  group_by(year) %>% 
  mutate(total_upop = sum(as.numeric(under_pop), na.rm = TRUE)) %>% 
  mutate(title = paste0("year: ", year, "\nTotal undernourished population (mln): ", round(total_upop/1e6, 2)))

Transforming

Finally, we are able to create our cartograms. We need to split our data into independent annual datasets, create cartograms based on the undernourished population variable, and combine all of the cartograms back into one object.

Before we create cartograms of the world, let’s create a single cartogram:

world_data_2015 = nworld_data3 %>%
  filter(year == 2015)
world_data_2015

world_carto2 = cartogram_cont(world_data_2015, "under_pop", 
                              itermax = 15, maxSizeError = 1.0001,
  prepare = "adjust", threshold = 0.05)
plot(st_geometry(world_carto2["under_pop"]))

plot(st_geometry(world_carto2), col = sf.colors(20, categorical = TRUE), 
     border="grey",
     axes = TRUE)

The resulting plot was with the sf package, showing the undernourished global populations based on WB data. Note that the number of undernourished people is largest in Asia. For a detailed revision of the global hunger facts [see this link] (https://www.worldhunger.org/africa-hunger-poverty-facts-2018/).

To make this animated, we need to a) do some more processing (to calculate the cartogram shapes for every year in our sequence) and b) switch to using the awesome tmap package for animated map creation. Note: this next step may take a few minutes.

# warning: this may make your computer's fan spin!
world_data_carto = nworld_data3 %>% 
  split(.$year) %>%
  map(cartogram_cont, "under_pop", maxSizeError = 1.5) %>% 
  do.call(rbind, .)

Animated maps can be created in a single command using the tmap package. Below we pass it the data produced in the previous command:

carto_anim = tm_shape(world_data_carto) +
  tm_polygons("under_pop", title = "Undernourished Population: ") +
  tm_facets(along = "title", free.coords = FALSE, drop.units = TRUE)+
  tm_layout(legend.outside.position = "right", legend.outside = TRUE)]]]]]]]]]]]]]]]

The last step is to save the output object as a .gif file:

  tmap_animation(carto_anim, filename = "under_pop_1992_2015.gif", delay = 150,
               width = 1326, height = 942)

Note that a series of cartograms may not be enough to understand the big issue of undernourishment. If you are engaged with any sustainable development topic and want to understand it better, you will need to explore additional graphs & charts.

Bis bald!!!