Mapping!

How did we generate that?

Essential Libraries!

sf Overview

Sample Census Data Workflow

Our initial map!
Improving it!

Coordinate Reference Systems (CRS)

CRS affecting calculations

Mapping!

Some quick sample maps to motivate the lesson!

Static
Interactive

(colors are different because they’re not capped)

How did we generate that?

I used a helper function i had written (in a package called censusrx) to pull the tract information very quickly. But this is built in turn on packages to quickly query data – including spatial data, such as area geometries, from the US Census Bureau.

This tutorial quickly goes over essential libraries for doing spatial work in R and gives an example workflow working with census data.

Essential Libraries!

sf is the key R library for spatial R. You’ll likely always want to load whenever doing mapping + spatial analysis
For pulling in census geometries directly from R, tigris is extremely useful. tidycensus is less straightforward to use, but can pull census estimates for many social attributes, as well as geometries. Resource: Reference book on working with Census Bureau data in R
You might see references to sp or other libraries on stackoverflow and elsewhere–sp provided some similar functionalities but is rapidly getting displaced by sf. I recommend trying to avoid older libraries that are falling out of favor and using sf or compatible libraries whenever possible.

For visualization:

ggplot2 in the tidyverse is great for static visualization!
mapview is great for interactive ones.
More polished or modifiable interactive maps can be made with leaflet.

sf Overview

The sf library will underlie most spatial work in R.

Some quick notes:

All functions are prefixed with st_
sf objects are basically data.frames that include a geometry column. The geometry column will represent the spatial features associated with each row.
The geometry column will be “sticky.” To drop it, you’ll have to turn the sf into a tibble or other non-spatial data.frame. This is often useful because grouping an sf object will cause geometries to union. This can very useful, but it can also be an expensive operation that slows down non-spatial analysis.

Sample Census Data Workflow

Get population for the five boroughs of NYC!

This requires looking up the state/county FIPS codes we want (FIPS codes are identifiers used by the Census Bureau to identify areas like counties, census tracts, etc.)

# county FIPS codes for the 5 boroughs of NYC
nyboros <- c("085", "005", "047", "061", "081")

# query population data with tidycensus
boro.pops <- 
  tidycensus::get_acs(
     geography = 'county'
    ,state = 'ny'
    ,county = nyboros
    ,year = 2021
    ,variables = c(pop = "B01001_001"), # just population
  )

# i like column names in lowercase
boro.pops <- boro.pops %>% 
  rename_with(tolower)

# get geometries for the 5 boroughs
boro.geos <- 
  tigris::counties(
     state = 36
    ,year = 2021
  ) %>% 
  rename_with(tolower) %>% 
  select(geoid, aland) %>% 
  filter(geoid %in% paste0('36', nyboros))

# merge and make it an sf object
boro.popsf <- boro.pops %>% 
  left_join(boro.geos) %>% 
  st_sf()

Our initial map!

# map!
ggplot(boro.popsf) +
  geom_sf(
    aes(fill = estimate)
  )

It looks awful!

Improving it!

we can make it prettier by adding water, using better color palette, neatening the legend, and removing the gridded background.

# use tigris to pull water areas.
wtr <- tigris::area_water(state = 36, county = nyboros) %>% 
  rename_with(tolower)
wtr <- wtr %>% filter(aland >= quantile(wtr$aland, .98))

# define a bkgd color
bkgd.color <- 'grey90'

ggplot(boro.popsf) +
  geom_sf(
    aes(fill = estimate)
    ,color = NA
  ) +
  # add a layer for water areas that matches background color.
  geom_sf(data = wtr
          ,fill = bkgd.color
          ,color = bkgd.color) +
  # change color palette and neaten labels.
  scale_fill_viridis_c(
    'Population'
    ,labels = scales::comma
    ,option = 'B'
  ) +
  theme_void() +
  theme(panel.background =
          element_rect(fill = bkgd.color)
          ) +
  labs(
    caption = 'Source: 2021, American Community Survey, 5-year estimates.'
  )

Coordinate Reference Systems (CRS)

If you’re doing spatial data manipulation, you might run into some issues caused by the curvature of the earth. As you probably know, maps are projected different ways to represent geography on a flat surface. The projection method is called a coordinate reference system (CRS).

The curvature of the earth is not dealt with in all spatial R libraries. The lwgeom package can extend sf to allow “geodetic” functions, which do account for the curved surface when calculating things like distance or area. See here. Sf is also being extended to allow geodetic operations. See here.

In general, you’ll have to transform your spatial datasets to the same CRS to work with them together. Some functions will give a message/warning/error based on whether it expects long/lat or planar geometries.

A CRS can store a lot of information, but common ones can just be represented by a 4+ digit number, called an EPSG code. You can use the espg 4326 for a common long-lat crs, used by google maps and elsewhere.

Changing CRS can also help your maps by shifting to appropriate projections for the area you’re mapping.

CRS affecting calculations

Example of how using geodetic operations vs planar operations (and a non-areal-preserving CRS) will yield different results:

library(lwgeom)

boro.geos %>% 
  st_transform(32618) %>% 
  mutate(calculated.area =
           st_area(geometry)
  ) %>% 
  st_transform(4326) %>% 
  mutate(calculated.geodetic.area =
           st_geod_area(geometry)
         ) %>% 
  tibble() %>% 
  select(-geometry) %>% 
  mutate(difference =
           calculated.geodetic.area - calculated.area)

## # A tibble: 5 × 5
##   geoid     aland calculated.area calculated.geodetic.area difference
##   <chr>     <dbl>           [m^2]                    [m^2]      [m^2]
## 1 36085 148982680      266246093.               266424173.    178079.
## 2 36081 281594049      469774022.               470038471.    264450.
## 3 36047 179684484      250691422.               250843279.    151857.
## 4 36061  58683562       87640496.                87694321.     53825.
## 5 36005 109233466      148502118.               148586736.     84618.

The first calculated area has a UTM CRS projection (represented by the EPSG code 32618). This map projection doesn’t preserve area, and our area calculation function doesn’t know how to compensate.

When we change the CRS to a longitude/latitude projection (using the EPSG code 4326) and use a geodetic function to calculate area, we get a different answer (see the difference column).

The second answer will be more accurate because it accounts for the curvature of the earth and doesn’t base calculations on an areal-distorted map projection.

Spatial R Tutorial

kira