Introduction

This article aims at explaining how to plot shapefiles without and with attribute data using ggplot. I assume that readers know what ESRI Shapefile is.

Plotting a shapefile without attributes is easy, which follows the steps:

  1. Get the shapefile
  2. Read the shapefile into R. For example, using rgdal::readOGR.
  3. Use ggplot to plot the shapefile.
  4. DONE!

But the resulting map only has shapes but no attributes. What if you want to add the attributes to your map, like the population density of each county, or the name of each state? You will have to do tidy with broom package beforehand:

  1. Get the shapefile.
  2. Read the shapefile into R (we name it shp).
  3. Select the region variable, which should be distinct for different rows. (Or generate one!)
  4. Use broom::tidy function to transform the SpatialPolygonDataFrame into a data.frame. We name it shp_df.
  5. Add the attributes to shp_df by joining shp_df and shp@data.
  6. Use ggplot to plot shp_df.
  7. DONE!

Example

Now I will use an example to show the steps. Download and read the shapefile. This shapefile is the map of UK.

require(rgdal)
require(ggplot2)
fn <- file.path(tempdir(), "GBR_adm_gdb.zip", fsep = "\\")
download.file("http://biogeo.ucdavis.edu/data/gadm2.8/shp/GBR_adm_shp.zip", fn)
utils::unzip(fn, exdir = tempdir())
shp <- readOGR(dsn = file.path(tempdir(), "GBR_adm1.shp"), stringsAsFactors = F)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\HUANFA~1\AppData\Local\Temp\RtmpeSSQIP/GBR_adm1.shp", layer: "GBR_adm1"
## with 4 features
## It has 12 fields
## Integer64 fields read as strings:  ID_0 ID_1 CCN_1
summary(shp@data)
##      ID_0               ISO               NAME_0         
##  Length:4           Length:4           Length:4          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##      ID_1              NAME_1             HASC_1         
##  Length:4           Length:4           Length:4          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##     CCN_1              CCA_1              TYPE_1         
##  Length:4           Length:4           Length:4          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##   ENGTYPE_1          NL_NAME_1          VARNAME_1        
##  Length:4           Length:4           Length:4          
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character

Draw the map without attributes

It is very easy to draw the map using plot or ggplot. Obviously ggplot is more attractive and flexible! Simply draw the map:

map <- ggplot() + geom_polygon(data = shp, aes(x = long, y = lat, group = group), colour = "black", fill = NA)
## Regions defined for each Polygons
map + theme_void()

Draw the map without country names

Now, we want to put the name of the four contries on the map. First, try to have some knowledge on the data. The attribute _NAME_1 is the country name. Fortunately, it is unique for each country!

We will tidy up the data, and create a data.frame. Using _NAME_1 as the region.

shp_df <- broom::tidy(shp, region = "NAME_1")
lapply(shp_df, class)
## $long
## [1] "numeric"
## 
## $lat
## [1] "numeric"
## 
## $order
## [1] "integer"
## 
## $hole
## [1] "logical"
## 
## $piece
## [1] "factor"
## 
## $group
## [1] "factor"
## 
## $id
## [1] "character"
head(shp_df)
##        long      lat order  hole piece     group      id
## 1 -2.030279 55.80440     1 FALSE     1 England.1 England
## 2 -2.030279 55.80431     2 FALSE     1 England.1 England
## 3 -2.029723 55.80431     3 FALSE     1 England.1 England
## 4 -2.029723 55.80403     4 FALSE     1 England.1 England
## 5 -2.029167 55.80403     5 FALSE     1 England.1 England
## 6 -2.029167 55.80375     6 FALSE     1 England.1 England

Look at the shp_df. It is a data.frame with seven cols. “long” and “lat” are the coordinates of nodes. “id” comes from the region variable NAME_1 that is used in tidy function. “group” is a combination of id and a number. It is used for plotting. Let’s try to plot the map. As we want to put the text on map, we need to explicitly tell the position of the text. Here we use the center of each country.

map <- ggplot() + geom_polygon(data = shp, aes(x = long, y = lat, group = group), colour = "black", fill = NA)
## Regions defined for each Polygons
cnames <- aggregate(cbind(long, lat) ~ id, data=shp_df, FUN=mean)
map + geom_text(data = cnames, aes(x = long, y = lat, label = id), size = 4) + theme_void()

Now we get the map of UK with the country name on it! Not too difficult.

Draw the map indicating coutries as kingdom or non-kingdom

Mapping the kingdom and non-kingdom countries. The attribute ENGTYPE_1 is useful. First, we distinguish the two types by creating a factor

require(dplyr)
shp@data <- shp@data %>% mutate(isKingdom = as.factor(ifelse(ENGTYPE_1 == "Kingdom", yes = "Kingdom", no="non-Kingdom")))

As is showed, after broom::tidy, all attributes are lost except for the region one. Can we use isKingdom as the region? The answer is NO. Look at the map below - England and Scotland are treated as one entity, which is different from the original data.

shp_df <- broom::tidy(shp, region = "isKingdom")
head(shp_df)
##        long      lat order  hole piece     group      id
## 1 -2.030279 55.80440     1 FALSE     1 Kingdom.1 Kingdom
## 2 -2.030279 55.80431     2 FALSE     1 Kingdom.1 Kingdom
## 3 -2.029723 55.80431     3 FALSE     1 Kingdom.1 Kingdom
## 4 -2.029723 55.80403     4 FALSE     1 Kingdom.1 Kingdom
## 5 -2.029167 55.80403     5 FALSE     1 Kingdom.1 Kingdom
## 6 -2.029167 55.80375     6 FALSE     1 Kingdom.1 Kingdom
map <- ggplot() + geom_polygon(data = shp_df, aes(x = long, y = lat, group = group, fill = id), colour = "black") + theme_void()
map 

The reason is that the isKingdom attribute is not unique for each row or object. In this situation, we have to generate a new unique attribute, for example, using the rownames(), which is guaranteed to be unique.

shp@data <- shp@data %>% mutate(id = row.names(.))
shp_df <- broom::tidy(shp, region = "id")
shp_df <- shp_df %>% left_join(shp@data, by = c("id"="id"))
map <- ggplot() + geom_polygon(data = shp_df, aes(x = long, y = lat, group = group, fill = isKingdom), colour = "black") + theme_void()
map 

Using rownames as the region has at least two advantages. First, it must be unique. Second, combining with left_join the attribute names in shp@data is kept, without being renamed as id in shp_df.

There are also disadvantage: an extra left_join step is needed, and an extra attribute is added to shp@data. Note not to write the shp as a file, to avoid write the id into the file.

Perhaps in the near future, the function to ggplot a SpatialPolygonDataFrame with a given attribute (which is called themetic map in cartography) would be implemented.

Summary:

  1. Plotting the shapes of a shapefile is easy. Directly call ggplot().
  2. Plotting a shapefile with attributes involves extra steps, including broom::tidy and possibly left_join.
  3. In the broom::tidy function, region must be a unique variable. Using rownames() is a good option.
  4. If region is not specified, it is automatically generated and is unique for each row in SpatialPolygonDataFrame@data, and starts from 0.