Library overview

Quick glossary of useful spatial libraries–examples will follow below.

sfis the key R library for spatial R. You’ll likely always want to load whenever doing mapping + spatial analysis

For pulling in census geometries directly from R, tigris is extremely useful. tidycensus is less straightforward to use, but can pull census estimates from decennial/ACS as well as geometries (although it has some limitations that can be annoying)

For cleaning + manipulating spatial data, rmapshaper and lwgeom are also both worth noting.

You might see references to sp and a host of other libraries on stackoverflow and elsewhere–sp provided some similar functionalities but is rapidly getting displaced by sf. I recommend trying to avoid older libraries that are falling out of favor. sf is really easy and good and it is gaining popularity for a reason.

for visualization

For exploratory visualization, I generally use base plot (extended by sf), or mapview.

For more complex or polished plots, I generally use either ggplot2 or leaflet.

leaflet and mapview make interactive/zoom-able plots.

sf Overview

sf is the library that all the others above will depend. Full documentation for the library is here.

Some quick notes:

Spatial data sample workflow

Here’s a little sample workflow that models most of the above libraries.

First, let’s load sf and get some geometries for a study area from tigris. I’ll also use a personal library I made, xwalks, to link counties to commuting zones.

suppressMessages(library(dplyr))
suppressMessages(library(sf))
## Warning: package 'sf' was built under R version 4.0.3
# let's get some Places in NJ with population attached, using tidycensus
acs.vars <- tidycensus::load_variables(year = 2018, dataset = "acs5")
acs.vars %>% head(3)

nj.pl <- 
  suppressMessages(tidycensus::get_acs(geography = "place",
                               state = "NJ",
                               variables = "B01001_001", # just population
                              geometry = TRUE,
                               year = 2018))
(nj.pl <- nj.pl %>% 
  select(GEOID, place = NAME, population = estimate))
## Simple feature collection with 545 features and 3 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -75.54914 ymin: 38.92852 xmax: -73.89398 ymax: 41.25704
## geographic CRS: NAD83
## First 10 features:
##      GEOID                                 place population
## 1  3481200    Wildwood Crest borough, New Jersey       3131
## 2  3468730 South Bound Brook borough, New Jersey       4550
## 3  3442630          Magnolia borough, New Jersey       4289
## 4  3482450         Woodlynne borough, New Jersey       2915
## 5  3419360       East Newark borough, New Jersey       2665
## 6  3478110           Wenonah borough, New Jersey       2225
## 7  3401330           Andover borough, New Jersey        594
## 8  3418490          Dunellen borough, New Jersey       7284
## 9  3470380          Stanhope borough, New Jersey       3377
## 10 3472240         Tavistock borough, New Jersey          3
##                          geometry
## 1  MULTIPOLYGON (((-74.85534 3...
## 2  MULTIPOLYGON (((-74.54279 4...
## 3  MULTIPOLYGON (((-75.0498 39...
## 4  MULTIPOLYGON (((-75.10198 3...
## 5  MULTIPOLYGON (((-74.16598 4...
## 6  MULTIPOLYGON (((-75.1603 39...
## 7  MULTIPOLYGON (((-74.74864 4...
## 8  MULTIPOLYGON (((-74.47789 4...
## 9  MULTIPOLYGON (((-74.71946 4...
## 10 MULTIPOLYGON (((-75.0372 39...
# class of object
class(nj.pl)
## [1] "sf"         "data.frame"
# quick plot
plot(nj.pl["population"])

Coordinate reference systems

You might run into some issues caused by the curvature of the earth. As you probably know, maps are projected different ways to represent geography on a flat surface. The projection method is called a coordinate reference system (CRS).

The curvature of the earth is not dealt with in all spatial R libraries. The lwgeom package can extend sf to allow “geodetic” functions, which do account for the curved surface when calculating things like distance or area. See: (https://r-spatial.github.io/lwgeom/reference/geod.html)

You’ll also generally have to transform your spatial datasets to the same CRS to work with them together. Some functions will give a message/warning/error based on whether it expects long/lat or planar geometries.

A CRS can store a lot of information, but common ones can just be represented by a 4+ digit number, called an EPSG code. You can use the espg 4326 for a common long-lat crs.

library(lwgeom)
## Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.8.0, PROJ 6.3.1
nj.pl <- st_transform(nj.pl, 4326)
st_crs(nj.pl)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["World"],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]
# alternate projection -- this puts geometries in meters instead of long/lat
# this projection won't be area-preserving
nj.plm <- st_transform(nj.pl, "+proj=eqdc +lat_0=0 +lon_0=0 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0")
st_crs(nj.plm)
## Coordinate Reference System:
##   User input: +proj=eqdc +lat_0=0 +lon_0=0 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0 
##   wkt:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["World Geodetic System 1984",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6326]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Equidistant Conic"],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Latitude of 1st standard parallel",33,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8823]],
##         PARAMETER["Latitude of 2nd standard parallel",45,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8824]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
# compare calculated areas
st_area(nj.pl) %>% head()
## Units: [m^2]
## [1] 3761797.5 1898517.2 2545010.7  575681.9  318508.2 2623709.1
st_area(nj.plm) %>% head()
## Units: [m^2]
## [1] 3741279.3 1888708.8 2531294.8  572587.0  316890.8 2609535.5
(st_area(nj.pl) - st_area(nj.plm)) %>% summary()
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    926.6  14529.8  26987.8  46785.9  50974.6 973098.6
# explict call to geod_area, which expects a long/lat projection
st_geod_area(nj.plm)
## Error in st_geod_area(nj.plm): st_is_longlat(x) is not TRUE

Spatial joins & filters

These are useful for working across different geography levels. Let’s join places with commuting zones, say. I’ll use a personal library, xwalks, to link counties to CZs.

# get counties from tigris
counties <- 
  tigris::counties(state = "NJ",
                             cb = TRUE,
                             year = 2018)
# devtools::install_github("https://github.com/kmcd39/xwalks.git")
# use xwalks to add czs
(co2cz <- xwalks::co2cz)
## # A tibble: 3,142 x 4
##    countyfp cz    cz_name    statefp
##    <chr>    <chr> <chr>      <chr>  
##  1 01001    11101 Montgomery 01     
##  2 01003    11001 Mobile     01     
##  3 01005    10301 Eufaula    01     
##  4 01007    10801 Tuscaloosa 01     
##  5 01009    10700 Birmingham 01     
##  6 01011    09800 Auburn     01     
##  7 01013    11101 Montgomery 01     
##  8 01015    09600 Anniston   01     
##  9 01017    09600 Anniston   01     
## 10 01019    06600 Rome       01     
## # ... with 3,132 more rows
counties <- left_join(counties, co2cz,
                      by = c("GEOID" = "countyfp"))

# clarify column names, so we don't end up with diffferent GEOIDs for Places and
# Counties
counties <- counties %>% 
  select(countyfp = GEOID, county_name = NAME,
         cz, cz_name)

nj.pl <- nj.pl %>% 
  rename(place.id = GEOID)


# First try spatial join (this will cause an issue)
counties <- counties %>% st_transform(4326)
nj <- counties %>% st_intersection(nj.pl)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
nj.pl$place.id %>% duplicated() %>% sum()
## [1] 0
nj$place.id %>% duplicated() %>% sum()
## [1] 165
# we duplicated places!

# this is because very slight overlaps will be included in the interesection. Joins
# can be thrown off by shapes having slightly different resolutions or other issues

# The xwalks pkg has two fcns for dealing with this. (turns smaller geos into points
# to do so)
nj <- xwalks::generate.coterminous.xwalk(nj.pl, counties)
## Warning in st_point_on_surface.sf(smaller.geo): st_point_on_surface assumes
## attributes are constant over geometries of x
## Warning in st_point_on_surface.sfc(st_geometry(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
nj$geometry <- nj.pl$geometry # add back in polygon geometries

nj.pl$place.id %>% duplicated() %>% sum()
## [1] 0
nj$place.id %>% duplicated() %>% sum()
## [1] 0
# places by cz count
nj %>% count(cz_name)
## Simple feature collection with 3 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -75.54914 ymin: 38.92852 xmax: -73.89398 ymax: 41.25704
## geographic CRS: WGS 84
##          cz_name   n                       geometry
## 1 Brick Township 105 MULTIPOLYGON (((-74.26661 3...
## 2         Newark 294 MULTIPOLYGON (((-74.32628 4...
## 3   Philadelphia 146 MULTIPOLYGON (((-74.51703 3...
# plots
plot(counties["cz_name"], main = "counties by cz")
plot(nj["cz_name"], main = "places by cz")

Alternatives mapping methods

  • ggplot & geom_sf
nj.czs <- counties %>% group_by(cz, cz_name) %>% summarise(., do_union = T)

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
ggplot() +
  geom_sf(data = nj, aes(fill = population)) +
  geom_sf(data = st_boundary(nj.czs), color = "red") +
  scale_fill_binned(type = "viridis", n.breaks = 6)

  • Mapview
library(mapview)
mapview(nj.pl, zcol="population")