Quick glossary of useful spatial libraries–examples will follow below.
sfis the key R library for spatial R. You’ll likely always want to load whenever doing mapping + spatial analysis
For pulling in census geometries directly from R, tigris is extremely useful. tidycensus is less straightforward to use, but can pull census estimates from decennial/ACS as well as geometries (although it has some limitations that can be annoying)
For cleaning + manipulating spatial data, rmapshaper and lwgeom are also both worth noting.
You might see references to sp and a host of other libraries on stackoverflow and elsewhere–sp provided some similar functionalities but is rapidly getting displaced by sf. I recommend trying to avoid older libraries that are falling out of favor. sf is really easy and good and it is gaining popularity for a reason.
For exploratory visualization, I generally use base plot (extended by sf), or mapview.
For more complex or polished plots, I generally use either ggplot2 or leaflet.
leaflet and mapview make interactive/zoom-able plots.
sf Overviewsf is the library that all the others above will depend. Full documentation for the library is here.
Some quick notes:
st_sf objects are basically data.frames that include a geometry column. The geometry column will represent the spatial features associated with each row.sf into a tibble or other non-spatial data.frame. This is often useful because grouping an sf object will cause geometries to union. This can very useful, but it can also be an expensive operation that slows down non-spatial analysis.Here’s a little sample workflow that models most of the above libraries.
First, let’s load sf and get some geometries for a study area from tigris. I’ll also use a personal library I made, xwalks, to link counties to commuting zones.
suppressMessages(library(dplyr))
suppressMessages(library(sf))
## Warning: package 'sf' was built under R version 4.0.3
# let's get some Places in NJ with population attached, using tidycensus
acs.vars <- tidycensus::load_variables(year = 2018, dataset = "acs5")
acs.vars %>% head(3)
nj.pl <-
suppressMessages(tidycensus::get_acs(geography = "place",
state = "NJ",
variables = "B01001_001", # just population
geometry = TRUE,
year = 2018))
(nj.pl <- nj.pl %>%
select(GEOID, place = NAME, population = estimate))
## Simple feature collection with 545 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -75.54914 ymin: 38.92852 xmax: -73.89398 ymax: 41.25704
## geographic CRS: NAD83
## First 10 features:
## GEOID place population
## 1 3481200 Wildwood Crest borough, New Jersey 3131
## 2 3468730 South Bound Brook borough, New Jersey 4550
## 3 3442630 Magnolia borough, New Jersey 4289
## 4 3482450 Woodlynne borough, New Jersey 2915
## 5 3419360 East Newark borough, New Jersey 2665
## 6 3478110 Wenonah borough, New Jersey 2225
## 7 3401330 Andover borough, New Jersey 594
## 8 3418490 Dunellen borough, New Jersey 7284
## 9 3470380 Stanhope borough, New Jersey 3377
## 10 3472240 Tavistock borough, New Jersey 3
## geometry
## 1 MULTIPOLYGON (((-74.85534 3...
## 2 MULTIPOLYGON (((-74.54279 4...
## 3 MULTIPOLYGON (((-75.0498 39...
## 4 MULTIPOLYGON (((-75.10198 3...
## 5 MULTIPOLYGON (((-74.16598 4...
## 6 MULTIPOLYGON (((-75.1603 39...
## 7 MULTIPOLYGON (((-74.74864 4...
## 8 MULTIPOLYGON (((-74.47789 4...
## 9 MULTIPOLYGON (((-74.71946 4...
## 10 MULTIPOLYGON (((-75.0372 39...
# class of object
class(nj.pl)
## [1] "sf" "data.frame"
# quick plot
plot(nj.pl["population"])
You might run into some issues caused by the curvature of the earth. As you probably know, maps are projected different ways to represent geography on a flat surface. The projection method is called a coordinate reference system (CRS).
The curvature of the earth is not dealt with in all spatial R libraries. The lwgeom package can extend sf to allow “geodetic” functions, which do account for the curved surface when calculating things like distance or area. See: (https://r-spatial.github.io/lwgeom/reference/geod.html)
You’ll also generally have to transform your spatial datasets to the same CRS to work with them together. Some functions will give a message/warning/error based on whether it expects long/lat or planar geometries.
A CRS can store a lot of information, but common ones can just be represented by a 4+ digit number, called an EPSG code. You can use the espg 4326 for a common long-lat crs.
library(lwgeom)
## Linking to liblwgeom 3.0.0beta1 r16016, GEOS 3.8.0, PROJ 6.3.1
nj.pl <- st_transform(nj.pl, 4326)
st_crs(nj.pl)
## Coordinate Reference System:
## User input: EPSG:4326
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["unknown"],
## AREA["World"],
## BBOX[-90,-180,90,180]],
## ID["EPSG",4326]]
# alternate projection -- this puts geometries in meters instead of long/lat
# this projection won't be area-preserving
nj.plm <- st_transform(nj.pl, "+proj=eqdc +lat_0=0 +lon_0=0 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0")
st_crs(nj.plm)
## Coordinate Reference System:
## User input: +proj=eqdc +lat_0=0 +lon_0=0 +lat_1=33 +lat_2=45 +x_0=0 +y_0=0
## wkt:
## PROJCRS["unknown",
## BASEGEOGCRS["unknown",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]],
## ID["EPSG",6326]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8901]]],
## CONVERSION["unknown",
## METHOD["Equidistant Conic"],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Latitude of 1st standard parallel",33,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8823]],
## PARAMETER["Latitude of 2nd standard parallel",45,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8824]],
## PARAMETER["False easting",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
# compare calculated areas
st_area(nj.pl) %>% head()
## Units: [m^2]
## [1] 3761797.5 1898517.2 2545010.7 575681.9 318508.2 2623709.1
st_area(nj.plm) %>% head()
## Units: [m^2]
## [1] 3741279.3 1888708.8 2531294.8 572587.0 316890.8 2609535.5
(st_area(nj.pl) - st_area(nj.plm)) %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 926.6 14529.8 26987.8 46785.9 50974.6 973098.6
# explict call to geod_area, which expects a long/lat projection
st_geod_area(nj.plm)
## Error in st_geod_area(nj.plm): st_is_longlat(x) is not TRUE
These are useful for working across different geography levels. Let’s join places with commuting zones, say. I’ll use a personal library, xwalks, to link counties to CZs.
# get counties from tigris
counties <-
tigris::counties(state = "NJ",
cb = TRUE,
year = 2018)
# devtools::install_github("https://github.com/kmcd39/xwalks.git")
# use xwalks to add czs
(co2cz <- xwalks::co2cz)
## # A tibble: 3,142 x 4
## countyfp cz cz_name statefp
## <chr> <chr> <chr> <chr>
## 1 01001 11101 Montgomery 01
## 2 01003 11001 Mobile 01
## 3 01005 10301 Eufaula 01
## 4 01007 10801 Tuscaloosa 01
## 5 01009 10700 Birmingham 01
## 6 01011 09800 Auburn 01
## 7 01013 11101 Montgomery 01
## 8 01015 09600 Anniston 01
## 9 01017 09600 Anniston 01
## 10 01019 06600 Rome 01
## # ... with 3,132 more rows
counties <- left_join(counties, co2cz,
by = c("GEOID" = "countyfp"))
# clarify column names, so we don't end up with diffferent GEOIDs for Places and
# Counties
counties <- counties %>%
select(countyfp = GEOID, county_name = NAME,
cz, cz_name)
nj.pl <- nj.pl %>%
rename(place.id = GEOID)
# First try spatial join (this will cause an issue)
counties <- counties %>% st_transform(4326)
nj <- counties %>% st_intersection(nj.pl)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
nj.pl$place.id %>% duplicated() %>% sum()
## [1] 0
nj$place.id %>% duplicated() %>% sum()
## [1] 165
# we duplicated places!
# this is because very slight overlaps will be included in the interesection. Joins
# can be thrown off by shapes having slightly different resolutions or other issues
# The xwalks pkg has two fcns for dealing with this. (turns smaller geos into points
# to do so)
nj <- xwalks::generate.coterminous.xwalk(nj.pl, counties)
## Warning in st_point_on_surface.sf(smaller.geo): st_point_on_surface assumes
## attributes are constant over geometries of x
## Warning in st_point_on_surface.sfc(st_geometry(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
nj$geometry <- nj.pl$geometry # add back in polygon geometries
nj.pl$place.id %>% duplicated() %>% sum()
## [1] 0
nj$place.id %>% duplicated() %>% sum()
## [1] 0
# places by cz count
nj %>% count(cz_name)
## Simple feature collection with 3 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -75.54914 ymin: 38.92852 xmax: -73.89398 ymax: 41.25704
## geographic CRS: WGS 84
## cz_name n geometry
## 1 Brick Township 105 MULTIPOLYGON (((-74.26661 3...
## 2 Newark 294 MULTIPOLYGON (((-74.32628 4...
## 3 Philadelphia 146 MULTIPOLYGON (((-74.51703 3...
# plots
plot(counties["cz_name"], main = "counties by cz")
plot(nj["cz_name"], main = "places by cz")
nj.czs <- counties %>% group_by(cz, cz_name) %>% summarise(., do_union = T)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
ggplot() +
geom_sf(data = nj, aes(fill = population)) +
geom_sf(data = st_boundary(nj.czs), color = "red") +
scale_fill_binned(type = "viridis", n.breaks = 6)
library(mapview)
mapview(nj.pl, zcol="population")