R Markdown

This is a tutorial on how to build pretty maps in R. As long as we follow key geomapping principles, the process is very straightforward. I will outline these principles in the following.

1. Libraries

The first thing we need to do is to call and load a couple of libraries/packages. Please install the following libraries with “install.package(”PACKAGENAME“)” on your computer and then you can load the packages into your R session with library(PACKAGENAME). Also, do not forget to set the working directory before you start your project.

# setwd('XXX')
library(plyr)
library(dplyr)
library(eurostat)
library(ggmap)
library(tidyr)
library(geosphere)
library(ggplot2)
library(sp)
library(rgdal)
library(rgeos)
library(maps)
library(GADMTools)
library(maptools)
library(countrycode)

The most of the functions we use are part of dplyr (package for cleaning and reshaping), ggplot2 (package for plotting – the best thing the R community ever created) and sp (functions to handle spatial data).

2. Data

2.1 Dataset

Data is the key ingridient in every R application. Thus, we need a dataset. In this tutorial, we use a geocoded dataset with longitude and latitudes of cities around the world. The dataset is part of the maps package and can easily be loaded.

data("world.cities")
head(world.cities)
##                 name country.etc   pop   lat  long capital
## 1 'Abasan al-Jadidah   Palestine  5629 31.31 34.34       0
## 2 'Abasan al-Kabirah   Palestine 18999 31.32 34.35       0
## 3       'Abdul Hakim    Pakistan 47788 30.55 72.11       0
## 4 'Abdullah-as-Salam      Kuwait 21817 29.36 47.98       0
## 5              'Abud   Palestine  2456 32.03 35.07       0
## 6            'Abwein   Palestine  3434 32.03 35.20       0
df <- world.cities

2.2 Shapefiles

All spatial projects need a shapefile. Usually, we are interested in country or regional shapefiles, which can be gathered from various sources. The beauty of R is that there exist many many packages that enable us to download shapefiles quite easily. For instance, when I want to use a world shapefile, I usually use the one from the maptools package.

data(wrld_simpl)
plot(wrld_simpl)

If you want to use just one or a couple of countries, you can subset the dataset in the following way:

shape_country <- wrld_simpl %>% subset(., ISO3 %in% c('AUT', 'DEU', 'POL', 'HUN', 'CZE', 'CHE'))
plot(shape_country)

If you want to look at within-country shapefiles, two sources are important: GADM and Eurostat. While Eurostat provides NUTS shapefiles, GADM provides regional shapefiles for all countries.

spatial_nuts <- get_eurostat_geospatial(output_class = 'spdf', res = 10, year = 2016)
plot(spatial_nuts)

spatial_gadm <- gadm_sp_loadCountries(fileNames = 'GBR', level = 2, basefile = './shapefiles', simplify=0.01)
plot(spatial_gadm[[2]])

So in summary: we need a dataset with longitude and latitudes and we need a shapefile. In the next step I will show you how you can bring both of these things together to create pretty maps.

3. Plotting

The plots above are all executed with the basic plot function in R. However, we do not use this one that often because it lacks a lot of functionalities. Thus, we use the most beautiful and artistic package the (R) world has ever seen: ggplot2!

With ggplot, we are able to plot as many layers of data as we want. In our simple case, we want to plot the locations of all capitals (1. layer) into a world map (2. layer). First, let’s get the world shapefile:

data(wrld_simpl)
shape <- wrld_simpl

Second, at the moment, the shapefile is a SpatialPolygonsDataFrame. However, ggplot always need a proper dataframe to plot. Thus, we have to transform this SpatialPolygonsDataFrame with the fortify function. The only thing that we have to define is the unit of analysis. If we are interested in countries, then you have to search in the original shapefile and find the name of the variable that defines the country. In the shapefile above, this variable is called ISO3. In contrast, if you are interested in, let’s say NUTS2, then you have to search for the relevant variable name that indicates the NUTS2 region.

shape_df <- fortify(shape, region = 'ISO3')

After you have fortified the data, we can easily plot the map as polygons and the capital locations as dots.

capital <- df %>% subset(., capital == 1)
ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group), fill = 'white', color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat), fill = 'red', color = 'red', size = 0.5)

A few words on the ggplot functions:

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group), fill = 'white', color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop), fill = 'red', color = 'red')

Plotting dots is cool, coloring entire region is even more fun. Okay, let’s do it. First, let’s color countries based on the continent they belong to. In order to to this, we need the continent name in our fortified shapefile for the coloring. We can get it with the countrycode package.

shape_df <- shape_df %>% mutate(continent = countrycode(id, 'iso3c', 'continent')) %>%
  subset(., !is.na(continent))

Now, we can create the same plot but additionally specify the fill parameter in the geom_polygons function.

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop), fill = 'red', color = 'red')

And this is basically it. Get the shapefile, load your data, combine both and here we go. Of course, there numerous other functions that can be used for spatial plots, e.g.:

On the parameter side, you have already learned/seen a few above. Although the parameters depend on the function you are using, common ones are: color, fill, size, linetype, shape, alpha. Again, remember, when the parameter value is the same for all items (e.g. all dots should be red), then you specify the parameters outside of the aes-paranthesis. If the parameter value should depend on a variable, then you specify it inside.

4. Beauty

Of course, these maps are super ugly so far, but it was important to understand the basic principles of mapping in R at first. Okay, let’s give them pretty. First, we get rid of this ugly grey background and replace it with a nice blue representing the water on our planet.

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop), fill = 'red', color = 'red') +
  theme_void() +
  theme(panel.background = element_rect('light blue'))

You do not like the legend of the population size and want to remove it completely? Same here! Let’s do it. Also, the projection looks a bit off, so we fix this as well with the coord_equal function.

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop), fill = 'red', color = 'red') +
  theme_void() + theme(panel.background = element_rect('light blue')) + guides(size = F) +
  coord_equal()

Now, let’s also manipulate the different layers, say the fill colors of the regions. In general, ggplot comes with a few helpful functions, called scale_PARAMETER_XXX while “PARAMETER” can be one of the aestetics parameters discussed above (e.g. color, fill, size, shape, alpha, etc.) and “XXX” can be (1) manual, (2) continuous, (3) gradient, (4) brewer, and some more. It is needless to say that with scale_color_continuous you manipulate the color of a layer, while with scale_size_continuous you can change the size of it. The more interesting part is when and how you use manual, continuous, … .

Let me start with manual. The manual functions allow you to specify a certain size/color/… value for each value of your variable. For instance, in order to change the colors of the continents in a manual fashion, we just have to do something like in the following.

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop), fill = 'red', color = 'red') +
  theme_void() + theme(panel.background = element_rect('light blue')) + guides(size = F) +
  scale_fill_manual(values = c('red', 'blue', 'green', 'yellow', 'black')) +
  coord_equal()

There are a couple of other things you can do: You can change the labels of the legend items with the labels argument, you can define the items you want to show with the breaks argument and you can change the legend title with the name argument

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop), fill = 'red', color = 'red') +
  theme_void() + theme(panel.background = element_rect('light blue')) + guides(size = F) +
  scale_fill_manual(name = 'Your legend title', values = c('red', 'blue', 'green', 'yellow', 'black'),
                    breaks = c("Americas", "Asia", "Africa"), labels = c("America Label", "Asia Label", "Africa Label")) +
  coord_equal()

As economists, we are most of the time interested in continuous stuff. Therefore, the scale_XXX_continuous function is probably the most important function for you. The function arguments are actually very similar to the manual one, but let’s give it a try so you can see how it works: we want to color the dots of the map based on the population size. Large capital cities should be red, small ones should be green.

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop, color = pop)) +
  theme_void() + theme(panel.background = element_rect('light blue')) + guides(size = F) +
  scale_fill_manual(name = 'Your legend title', values = c('orange', 'blue', 'pink', 'yellow', 'black'),
                    breaks = c("Americas", "Asia", "Africa"), labels = c("America Label", "Asia Label", "Africa Label")) +
    scale_colour_continuous(name = 'Population Size', low = "green", high = 'red') +
  coord_equal()

Easy, right? What else? Mhm, I think that’s it for changing aestetic parameters. Ah, yeah, one other thing: you can of course also change the part of the map that should be displayed. For instance, if you just want to take a look at Latin America, you simply just have to define the xlim and ylim in coord_equal.

ggplot() +
  geom_polygon(data = shape_df , aes(x = long, y = lat, group = group, fill = continent), color = 'black') +
  geom_point(data = capital, aes(x = long, y = lat, size = pop, color = pop)) +
  theme_void() + 
  theme(panel.background = element_rect('light blue')) + guides(size = F) +
  scale_fill_manual(name = 'Your legend title', values = c('orange', 'blue', 'pink', 'yellow', 'black'),
                    breaks = c("Americas", "Asia", "Africa"), labels = c("America Label", "Asia Label", "Africa Label")) +
  scale_colour_continuous(name = 'Population Size', low = "green", high = 'red') +
  coord_equal(xlim = c(-85, -35), ylim = c(-53, 20))

And as always, if you have questions, google is your best friend ;-)

Cheers, M