Elegant Cartography in R

Tinkering with tmap for Better Maps

Lance R. Owen, PhD

24 June 2021

In cartography, as in medicine, art and science are inseparable.

Introduction

While many data scientists and statisticians routinely use R for conducting analysis and creating data visuzliations, many GIS analysts and cartographers run into frustration when trying to create professional-grade maps in R. Map outputs are often merely satisfactory, and rarely seem to be good or even excellent. Most analysts would rather default to programs like QGIS, ArcMap or ArcGIS Pro, all of which have more user-friendly map design capabilities.

Indeed, if your goal is to create an intricate map with complex labeling and symbology with many layers, your best bet is to stick with GIS software like QGIS or ArcGIS Pro. However, if you simply need a basic map or map series, R can definitely satisfy. The purpose of this tutorial is to demonstrate that even if you use R map to create a basic map, you can still create one with a high level of professional polish that is suitable for publication or presentaiton.

This tutorial will show you how, and has two goals:

  1. To outline the elements of an elegant map in the context of R.

  2. To show users how to work within the tmap package to produce elegant cartographic results that are suitable for presentations and publications.

Let’s start by loading the necessary packages (and installing them if you have not already). We’ll be using the following:

#Run the following line if you need to install these packages.
install.packages(c("tmap", "sp", "geojsonio", "stringr"))

#Load packages
library(tmap)
library(sp)
library(geojsonio)
library(stringr)

Data Prep: COVID Cases in Mexico

Let’s say that you are charged with created a basic map of cumulative confirmed COVID-19 cases in Mexico by state. Mexico’s Ministry of Health provides COVID case data in downloadable CSV files.1 The data can be downloaded directly here. The data used in this tutorial is from 22 June 2021. Once we load the raw csv into R, here is what we see:

df <- read.csv('https://raw.githubusercontent.com/lancerowen23/R_User_Group/main/Mexico_Cases_June22.csv')

A subset of the raw COVID data.

cve_ent poblacion nombre X31.12.2019 X01.01.2020 X02.01.2020
1 1434635 AGUASCALIENTES 0 0 0
2 3634868 BAJA CALIFORNIA 0 0 0
3 804708 BAJA CALIFORNIA SUR 0 0 0
4 1000617 CAMPECHE 0 0 0
7 5730367 CHIAPAS 0 0 0
8 3801487 CHIHUAHUA 0 0 0

It’s clear that we need to clean the data a bit to ready it for visualization. First we need to change the state names (nombre) from all caps to proper case (for the purpose of labeling). Then, we need to create a column called Total_Cases and sum the cases for each state. I’m also going to add a join to a table that includes the abbreviations for each state. This will come in handy for labelling. I’ll finish by subsetting the data frame to include only nombre, abbrv, and Total_Cases. Finally, we need to fix the abbreviation for the state of Nayarit, which incorrectly imports as NA instead of “NA.”

#Convert state names from all caps to proper case
df$nombre <- str_to_title(df$nombre)
#Create new field of total cases
df$Total_Cases <- rowSums(df[,4:543])
#Import and merge csv of state abbreviations.
state_codes <- read.csv('https://raw.githubusercontent.com/lancerowen23/R_User_Group/main/mexico_states_abbrv.csv')
df_merge <- merge(df, state_codes, by.x="nombre", by.y="Estado")
#Subset to have only name and total cases in the data frame.
df_final <- df_merge[c("nombre", "Abbrev", "Total_Cases")]
#fix NA in abbreviation for Nayarit
df_final[18,2] <- "NA"

The resulting data frame looks like this:

The head of the Mexico COVID data after some light wrangling.

nombre Abbrev Total_Cases
Aguascalientes AG 26705
Baja California BN 50339
Baja California Sur BS 36743
Campeche CP 11137
Chiapas CS 12091

Using tmap

While mapping is possible with several packages, I prefer the tmap package, which offers a lot of customization in terms of the map output. If you have used ggplot2 to map, then tmap will be easy to learn. It offers the same capabilities of layers via addition of elements to the command chain.2 Regardless of what packages one uses, the best documentation resource (and the one I will continually link below for various functions) is Rdocumentation.org

To start, I am going to load a geojson of Mexico with subnational divisions that I downloaded (originally as a shapefile) from Humanitarian Data Exchange.3 I am using the geojson_read function from the geojsonio package to load the shapefile. Additionally, the superb tool I used to convert the shapefile to geojson is mapshaper.org, which is set up for the quick manipulation and conversion of various GIS data formats.

mex <- geojson_read('https://raw.githubusercontent.com/lancerowen23/R_User_Group/main/Estados_Mexico.json', what = "sp")

Then I will join the COVID data to the geojson using merge. The field containing the state name in the geojson is ADMIN_NAME.

mex_data <- merge(mex, df_final, by.x="ADMIN_NAME", by.y="nombre")

For the most basic visualization in tmap–one that accepts all defaults–we run the following code:4 A note on method: My preference is to create a variable for each chunk of the map and then to concatenate them. Thus I create a variable for the main map, one for the background, another for the layout elements, etc. You can just as easily (but with more visual clutter) compose all of the code in one fell swoop. However, the concatenation method is more visually digestible and thus more appropriate for the purposes of this tutorial. Concatenating also helps reinforce the principle of layering key map elements.

mexico <-tm_shape(mex_data) + 
         tm_fill("Total_Cases")
mexico

Yikes. For many reasons, this map is far from ideal. The symbology needs improvement, as does the layout and typography in general. Let’s take each of those categories in turn.

Symbology

Colors and Bins

Our first step is to add some arguments to tm_fill that will improve the communication of the information, namely the cumulative case information by state. From the legend, we can see that the default setting is to have five bins of equal intervals, which is clearly not a good choice for this particular distribution.5 tm_fill documentation; tm_shape documentation

Let’s use the n argument to bump up the number of bins from 5 to 6. To change the bin method, we can specify the style argument. We’ll use natural breaks, or jenks in tmap parlance.

Finally, let’s opt for a slightly different color scheme. I like the use of shades of yellow/orange/red for this sort of health information. (It doesn’t make a lot of sense in terms of common conventions to use soothing palattes of blues or greens to indicate the toll of a deadly virus.) That said, I would like to opt for more bespoke colors, as the default colors tend to look a bit harsh, and I want to soften the tone a bit to get a more elegant effect.

We’ll use the palette argument to set the colors. I’m going to use HEX codes, and we can set the ramp simply by supplying two colors in the standard R format: c("HEX1", "HEX2")6 An invaluable tool for choosing colors to use in maps is the Instant Eyedropper. Also, if you need a colorblindess simulator to test for accessibility, download Color Oracle.

mexico <- tm_shape(mex_data) + 
          tm_fill("Total_Cases",
                  n=6, 
                  style = "jenks", 
                  palette = c("#FFE7CF", "#F03B20"))
mexico

Borders and Background

Many good data visualization specialists would say that our symbology changes should stop here, and that any additional colors or shapes on the map would be superfluous. This is where the work of cartographers and data visualization specialists can start to diverge.7 A foundational concept in data visualization is the data/ink ratio, which was developed by Edward Tufte, a leading data vizualization theorist. It essentially states that all ink that is not in the service of the data at hand is gratuitous and should be erased, thereby maximizing the data/ink ratio. Thus, in our case, a map showing COVID cases for Mexican states should show no more than the Mexican states with symbology indicating the cases. Because this is a tutorial geared towards producing polished maps, I’m going to veer towards a more cartographic perspective, which would argue for more geographic context to be shown.8 If I stopped here, our learning would also be truncated, which would be silly. In other words, instead of just having a Mexico that floats in a white void, we should add the surrounding land masses.9 One argument for this approach (and against maximizing the data/ink ratio) is that the surrounding land masses can better orient map readers who might not be as familiar with the shape of Mexico, thereby making them more at home with the visual.

To add the surrounding land masses using tmap, we need to create a background layer that we will simply add to the chain of code for the Mexico data. Think of this as if reading through the tmap code is like starting at the bottom layer and moving up through the layers. Because the surrounding countries are a background of sorts, they’ll be at the beginning of the chain.

After loading the topojson, I’m going to subset the dataframe to just the countries that will be visible. This subsetting will improve processing speed.

#load world topojson
world <- topojson_read('https://raw.githubusercontent.com/lancerowen23/R_User_Group/main/world_admin0.json')
#subset world topojson to only necessary countries
bg <- world[world$GEOUNIT %in% c("United States of America", 
                                 "Belize", 
                                 "Guatemala", 
                                 "Honduras", 
                                 "Nicaragua"),]

Let’s make those surrounding countries a muted grey by setting the col argument in tm_fill to grey80.10 Note that in R, you can specify the darkness of any shade of grey by adding a number from 0–100 after “grey” that correponds to its position on the dark–light gradient. Thus grey20 would be a very dark grey and grey90 would be very light.

Also, because the default extent of the map is based on the bottom (i.e. first) layer, we’ll need to customize the extent, as adding a layer with the United States (which in this case includes the far-flung territories) will essentially produce a map of the entire globe. We set the extent using the bbox argument in the tm_shape function. We can simply create a variable for Mexico’s bounding box using the st_bbox function.

#set a variable for the Mexico data bounding box
bbox_mex <- st_bbox(mex_data)
#set background with bbox_mex and colors
background <- tm_shape(bg, bbox=bbox_mex) +  
              tm_fill(col = "grey70") +
              tm_borders(col = "grey70")

#add on Mexico as we've visualized it so far 
background + mexico 

While we are adding continents, let’s also add a bit of color to the ocean. I’ve found that an ideal shade of very light blue is #BBD7E5. We set this color using the bg.color argument of the tm_layout function, which is also where we’ll control other major elements of the map like the title.11 tm_borders documentation

Let’s also add some state borders to the map with the tm_borders function. (We always want to be subtle with border colors, but their presence is important to the eyes’ ability to tell one area from another, particularly if using a single-color ramp as we are. We’ll use a very light grey (#E1E1E1) for the color and set the argument lwd to .5 for border thickness.

mexico <- tm_shape(mex_data) + 
          tm_fill("Total_Cases",
            n=6, 
            style = "jenks", 
            palette = c("#FFE7CF", "#F03B20"))  +
           #add borders to Mexico's states
           tm_borders(col = "#E1E1E1",
                     lwd = .5)

#add background color and title using the tm_layout function
layout <- tm_layout(title = "MEXICO | COVID Cases",
                    bg.color = "#BBD7E5")

background + mexico + layout

Layout

Thoughtful layouts are critical for effective cartography. The elements should be balanced, and the focal point of the map should not be obscured by titles, legends, or other map elements. Mexico’s cornucopia-esque shape makes our job easy, as there is a lot of room in the southwest and northeast corners of the map to place additional layout features.

Current, the title of the map overruns the upper portions of Mexico’s northwestern states, so let’s move it to the top, right side of the map. We can do this with the title_position argument in the tm_layout function. The argument takes two values corresponding to the x,y of the position of the beginning of the title text. (Finding these particular values can take a bit of trial and error, and while you can use generic terms like ‘left’, ‘right’, ‘top, ’bottom’, etc., the results using actual numbers are better.)

mexico <- tm_shape(mex_data) + 
          tm_fill("Total_Cases",
                  n=6, 
                  style = "jenks", 
                  palette = c("#FFE7CF", "#F03B20"))  +
          tm_borders(col = "grey90",
                     lwd = .5) +
          tm_layout(title = "MEXICO | COVID Mortality",
                    bg.color = "#BBD7E5",
                    #specify title position at top right
                    title.position = c('.6','.95')) 

background + mexico + layout

I would also like to rename the legend (to get rid of the underscore and use a better phrasing) and use dashes instead of “to” as a separator for the values. For the former, we use the title argument in tm_fill. For the latter, we use the legend.format argument in the tm_layout function.

mexico <- tm_shape(mex_data) + 
          tm_fill("Total_Cases",
                  n=6, 
                  style = "jenks", 
                  palette = c("#FFE7CF", "#F03B20"),
                  #set title for legend other than default
                  title = "Cumulative Cases")  +
          tm_borders(col = "grey90",
                     lwd = .5)

layout <- tm_layout(title = "MEXICO | COVID Cases",
                    bg.color = "#BBD7E5",
                    title.position = c('.6','.95'),
                    #add dash as separator for legend values
                    legend.format = list(text.separator = "-")) 

background + mexico + layout

Typography

As many GIS professionals are aware, typography is often the most overlooked element of map design, despite the fact that it can have an enormous impact on the final presentation. Choice of font style, color, size, and position are cruicial in producing a map that has a high level of professional polish.12 Note that font support varies in R between operating systems and output formats. This tutorial will focus on R in Windows and .png outputs.

Font manipulation is not one of the major strengths of R, but the package extrafont allows R users far more choice when it comes to typography. After installing and loading extrafont, you’ll need to run the function font_import() to register your system’s fonts so that they can be used.13 The extrafont package currently only works with TrueType fonts. You can then use the fonts() or fonttable() command to view the fonts that are available post-import.

install.packages("extrafont")
library(extrafont)
font_import()
loadfonts()
#to view fonts that are available to use after conducting import
fonts()
#or
fonttable()

For my primary font, I’m choosing Corbel, a sans-serif typeface that has a contemporary yet elegant look that strikes a balance between being unique and inconspicuous. For the text color, I’m going to opt for shades of grey14 A note on shades of grey: I find that jet black text–particularly on stark white backgrounds–yields too much contrast and is jarring to the eye. A great way to soften this effect for a more professional look is to use very dark shades of grey for most of the text (assiming the map in overall light in its color scheme). For more about shades of grey in R, see note 10 above.–dark in the case of the legend, and lighter in the case of the title (becuase I’m going to place it on a dark background).15 NB: Specifying typography is often what makes the code balloon to many lines. This is one reason I opt for seeing each map component as a distinct variable. Otherwise the full code becomes unwiedly. For a dark background underneath the title, I’m setting the bg.color argument to #BBD7E5, which is a dusty shade of navy blue.16 Setting the title in this way–in a light color on a dark banner–gives the map a bit extra polish, as it distinguishes the title from other map text.

layout <-   tm_layout(title = "MEXICO | COVID-19 Cases",
                      bg.color = "#BBD7E5",
                      #specify title font/face/color/etc.
                      title.fontfamily = "Corbel",
                      title.fontface = "plain",
                      title.color = "grey90",
                      title.bg.color = "#324C63",
                      title.bg.alpha = .8,
                      title.position = c('.59','.96'),
                      title.size = 1.5,
                      legend.format = list(text.separator = "-"),
                      #specify legend font/face/color/etc.
                      legend.text.fontfamily = "Corbel",
                      legend.text.size = .8,
                      legend.text.color = "grey30",
                      legend.text.fontface = "plain",
                      legend.title.fontfamily = "Corbel",
                      legend.title.fontface = "plain",
                      legend.title.size = 1.25,
                      legend.title.color = "grey10") 

background + mexico + layout

I also want to add a footnote to clarify the data source/date of data. We can use the tm_credits function and its arguments to add that.17 tm_credits documentation I want to place it in the lower left corner, which means I need to move the legend up. I will make sure the x value for the position of each is the same so they will be left-justified. And the title bar needs some space in the left and right margin. A good hack for that is to simply add a space within the quotation marks in the title argument. I’ll need to adjust the title.position to compensate.

layout <- tm_layout(title = "  MEXICO | COVID-19 Cases ",
                    bg.color = "#BBD7E5",
                    title.fontfamily = "Corbel",
                    title.fontface = "plain",
                    title.color = "grey90",
                    title.bg.color = "#324C63",
                    title.bg.alpha = .8,
                    title.position = c('.57','.96'),
                    title.size = 1.5,
                    legend.format = list(text.separator = "-"),
                    legend.text.fontfamily = "Corbel",
                    legend.text.size = .8,
                    legend.text.color = "grey20",
                    legend.text.fontface = "plain",
                    legend.title.fontfamily = "Corbel",
                    legend.title.fontface = "plain",
                    legend.title.size = 1.25,
                    legend.title.color = "grey10",
                    #adjust legend position
                    legend.position = c(.02, .1))

#add credits to the lower left of the map
credits <- tm_credits("Data: Secretaría de Salud de Mexico | Data as of 31 Jan. 2021.", 
                       position=c(".02", ".01"),
                       fontfamily = "Corbel",
                       col = "Grey30",
                       size = .8)

background + mexico + layout + credits

Now, let’s add some labels to the states using the tm_text function.18 tm_text documentation] This is where our state abbreviations will come in handy. (Short abbreviations are key to good labeling when dealing with states of highly uneven land areas.)

labels <- tm_text("Abbrev",
                  fontfamily = "Corbel",
                  fontface = "bold",
                  size = .7,
                  col = "grey40",
                  shadow = FALSE,
                  auto.placement = FALSE,
                  remove.overlap = TRUE)

background + mexico + labels + layout + credits 

I’d also like to add a north arrow (tm_compass) and a scale bar (tm_scale_bar), mostly to show how to adapt these elements to fit aesthetically with the map.19 tm_compass documentation; tm_scale_bar documentation

compass <- tm_compass(north = 0,
                      type = "arrow",
                      text.size = 1,
                      show.labels = 0,
                      size = 1,
                      text.color = "grey40",
                      color.dark = "grey40",
                      color.light = "grey90",
                      position = c(.02, .90))

scale_bar <- tm_scale_bar(width = .25,
                          text.size = 0.7,
                          text.color = "grey30",
                          color.dark = "grey40",
                          color.light = "grey90",
                          lwd = .5,
                          position = c(.62, .005))


background + mexico + labels + layout + credits + compass + scale_bar

What we now have is a much improved map over the one generated initially. It’s polished, professional, and ready for distribution.

Notes

This document was produced using the tufte package.