tidycensus practice using Anne Arundel county, MD, data - focusing on map creation

Mapping 2020 US Census Data in R

Part 1: Census geographic data and “GIS” in R

knitr::include_graphics("US_Census_Geography.png")

1.1 Basics

Census TIGER/Line shapefiles

TIGER: Topologically Integrated Geographic Encoding and Referencing database

High-quality series of geographic datasets released by the US Census Bureau

Distributed as shapefiles, a common GIS data format comprised of several related files

## ----basic-usage------------------------------------------------
md_counties <- counties(state = "MD")

## ----show-counties----------------------------------------------
#md_counties
str(md_counties)
## ----basic-plot-------------------------------------------------
plot(md_counties$geometry)

tigris package

  • R interface to the US Census Bureau’s TIGER/Line shapefile FTP server
  • No API key necessary - just install the package and start using Census shapefiles in R!
  • To use tigris, call a function that corresponds to the Census geography you want, optionally by state or county, when appropriate
  • Defaults to 2020 unless year is otherwise specified (up to 2021 is available)

sf package and simple feature geometry

  • The sf package implements a simple features data model for vector spatial data in R
  • Vector geometries: points, lines, and polygons stored in a list-column of a data frame
  • Allows for tidy spatial data analysis

1.2 Making sense of GIS data & tigris

Datasets available in tigris__

Legal entities: units that have legal significance in the US (e.g. states, counties)

Statistical entities: units that are used to tabulate Census data but do not have legal standing (e.g. Census tracts or block groups)

Geographic features: other geographic datasets provided by the Census Bureau that are not used for demographic tabulation (e.g. roads, water)

Polygons: statistical entities Lines: geographic features Points: geographic features

## ----travis-tracts-----------------------------------------------
travis_tracts <- tracts(state = "MD", county = "Anne Arundel")

## ----travis-roads-------------------------------------------------
travis_roads <- roads(state = "MD", county = "Anne Arundel")

## ----dc-landmarks-------------------------------------------------
dc_landmarks <- landmarks("DC", type = "point")
plot(travis_tracts$geometry)

plot(travis_roads$geometry)

plot(dc_landmarks$geometry)

How tigris works

When you call a tigris function, it does the following:
1. Downloads your data from the US Census Bureau website;
2. Stores your data in a temporary directory by default;
3. Loads your data into R as a simple features object using sf::st_read()

Recommended option: use options(tigris_use_cache = TRUE) to cache downloaded shapefiles and prevent having to re-download every time you use them

Tigris features and options

Cartographic boundary shapefiles

The core TIGER/Line shapefiles include water area that belongs to US states and counties

Use the argument cb = TRUE to obtain a cartographic boundary shapefile pre-clipped to the US shoreline

Note: For some geographies, highly generalized (1:5 million and 1:20 million) shapefiles are available with the resolution argument

## ----Maryland-tiger-----------------------------------------------
md_counties <- counties("MD")

## ----Maryland-cb--------------------------------------------------
md_counties_cb <- counties("MD", cb = TRUE)
plot(md_counties$geometry)

plot(md_counties_cb$geometry)

Understanding yearly differences in TIGER/Line files
## ----get-yearly-data-------------------------------------------------------------------
annearundel90 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 1990)
annearundel00 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 2000)
annearundel10 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 2010)
annearundel20 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 2020)
## ----plot-yearly-data------------------------------------------------------------------
par(mfrow = c(2, 2))

plot(annearundel90$geometry, main = "1990")
plot(annearundel00$geometry, main = "2000")
plot(annearundel10$geometry, main = "2010")
plot(annearundel20$geometry, main = "2020")

1.3 Interactive viewing of data with mapview

## ----mapview-----------------------------------------------------
library(mapview)

mapview(annearundel20)
## ----sync--------------------------------------------------------
library(leafsync)
 
sync(mapview(annearundel90), mapview(annearundel20))
## ----all-tracts--------------------------------------------------
us_tracts <- tracts(state = "MD", cb = TRUE)

## ----------------------------------------------------------------
str(us_tracts)

Part 2: Mapping US Census data in R

2.1 Mapping Census data with tidycensus

Typical Census GIS workflows. Traditionally, getting “spatial” Census data requires:
1. Fetching shapefiles from the Census website;
2. Downloading a CSV of data, cleaning/formatting it;
3. Loading geometries and data into your GIS of choice;
4. Aligning key fields in your GIS and joining your data

tidycensus takes care of this entire process with the argument geometry = TRUE

## ----tidycensus-geometry-----------------------------------------
library(tidycensus)
options(tigris_use_cache = TRUE)

md_population <- get_decennial(
  geography = "county",
  variables = "P1_001N",
  state = "MD",
  year = 2020,
  geometry = TRUE #<<
) 


## ----show-geometry-----------------------------------------------------------
#md_population

## ----plot-geometry-----------------------------------------------------------
plot(md_population["value"])

## ----geom-sf-----------------------------------------------------------------
library(tidyverse)

md_map <- ggplot(md_population, aes(fill = value)) + 
  geom_sf()

## ----plot-geom-sf------------------------------------------------------------
md_map

2.2 Mapping Census data with tmap

The tmap package:
- Comprehensive package for thematic mapping in R
- ggplot2-like syntax, but designed in a way to feel friendly to GIS cartographers coming to R for mapping

Example: comparing the distributions of racial and ethnic groups in Anna Arundel County, MD

library(tidycensus)
library(tidyverse)

## ----get-AA-data-------------------------------------------------------------
aa_race <- get_decennial(
        state = "MD",
        county = "Anne Arundel",
        geography = "tract",
        
        variables = c(
            Hispanic = "P2_002N",
            White = "P2_005N",
            Black = "P2_006N",
            Native = "P2_007N",
            Asian = "P2_008N"
        ),
        summary_var = "P2_001N",
        year = 2020,
        geometry = TRUE)%>%
    mutate(percent = 100 * (value / summary_value))

#one GEOID does not have data - is this BWI?? 
aa_race <- aa_race%>%filter(is.na(percent)==FALSE)

#dim(aa_race)
#length(unique(aa_race$GEOID))

## ----glimpse-AA-data---------------------------------------------------------
#dplyr::glimpse(aa_race)

Basic plotting with tmap

## ----polygons-map--------------------------------------------------------
library(tmap)

aa_black <- aa_race%>%filter(variable == "Black")

#dim(aa_black)
#length(unique(aa_black$GEOID))
#one GEOID does not have data - is this BWI?? 

tm_shape(aa_black) + 
    tm_polygons() 

Choropleth mapping with tmap

Color palettes can be modified with the palette parameter, which accepts ColorBrewer and viridis palettes

If you’ve mapped with GIS software before, the style parameter implements various breaks methods, including “equal”, “quantile” and “jenks”

## ----choropleth-show-----------------------------------------------------
tm_shape(aa_black) + 
  tm_polygons(col = "percent")

## ----custom-choropleth-show----------------------------------------------
tm_shape(aa_black) + 
  tm_polygons(col = "percent",
          style = "quantile",
          n = 7,
          palette = "Purples",
          title = "2020 US Census") + 
  tm_layout(title = "Percent Black\nby Census tract",
            frame = FALSE,
            legend.outside = TRUE)

## ----jenks-show----------------------------------------------------------
tm_shape(aa_black) + 
  tm_polygons(col = "percent",
          style = "jenks",
          n = 7,
          palette = "viridis",
          title = "2020 US Census",
          legend.hist = TRUE) + 
  tm_layout(title = "Percent Black population\nby Census tract",
            frame = FALSE,
            legend.outside = TRUE)

tmap choropleth tips and tricks

Use tmaptools::palette_explorer() to interactively browse color options

Use the option tm_layout(legend.hist = TRUE) to display the distribution of data values among classes

library(sf)
tm_shape(aa_black) + 
  tm_polygons(col = "percent",
          style = "jenks",
          n = 7,
          palette = "viridis",
          title = "2020 US Census",
          legend.hist = TRUE) + 
  tm_layout(title = "Percent Black population\nby Census tract",
            frame = FALSE,
            legend.outside = TRUE)

Graduated symbol maps

Graduated symbols: using size of a symbol to represent statistical variation on a map

Implemented in tmap with tm_bubbles()

## ----bubbles-code------------------------------------------------------------
symbol_map <- tm_shape(aa_black) + 
  tm_polygons() + 
  tm_bubbles(size = "value", alpha = 0.5, 
             col = "navy") + 
  tm_layout(legend.outside = TRUE,
            legend.outside.position = "bottom")

## ----bubbles-map-------------------------------------------------------------
symbol_map

Faceted mapping

tm_facets() allows for comparative small multiples maps. It works well with long-form spatial data returned by tidycensus

## ----facet-map-code----------------------------------------------------------
facet_map <- tm_shape(aa_race) + 
  tm_facets(by = "variable", scale.factor = 4) + 
  tm_fill(col = "percent",
          style = "quantile",
          n = 7,
          palette = "Blues") + 
  tm_layout(legend.position = c(-0.7, 0.15)) #<<

## ----facet-map---------------------------------------------------------------
facet_map

Dot-density maps in tidycensus

Dot-density maps are useful alternatives to choropleth maps as they can represent internal heterogeneity of Census geographies

A brand-new function in tidycensus (GitHub version for now), as_dot_density(), helps you create group-wise dots for dot-density mapping

Map the result with tm_dots()

#requires developer version tidycensus
#developer::
https://stackoverflow.com/questions/9656016/how-to-install-development-version-of-r-packages-github-repository

#install.packages("devtools")
library(devtools)

dev_mode(on=T)

devtools::install_github("walkerke/tidycensus")

# use dev ggplot2 now
# when finished do:

dev_mode(on=F)  #and you are back to having stable ggplot2

packageVersion("tidycensus")
library(remotes)
?as_dot_density
aa_dots <- aa_race %>%
  as_dot_density(
    value = "value",
    values_per_dot = 250,
    group = "variable"
  )

aa_base <- aa_race %>%
  distinct(GEOID, .keep_all = TRUE)

dot_map <- tm_shape(aa_base) + 
  tm_polygons(col = "white") + 
  tm_shape(aa_dots) + 
  tm_dots(col = "variable", 
          palette = "Set1", 
          size = 0.01) + 
  tm_layout(legend.outside = TRUE)

## ----dot-density-map-show----------------------------------------------------
dot_map

2.3 Adding cartographic elements

Adding basemaps

Thematic maps often will use basemaps with a semi-transparent overlay to provide important geographic context to the mapped data

The easiest way to get a basemap for tmap is with the rosm package

## ----get-basemap, -----------------------------------------------------
library(rosm)

basemap <- osm.raster(
  st_bbox(aa_black), 
  zoom = 10,
  type = "cartolight",
  crop = TRUE
)

## ----show-basemap------------------------------------------------------------
tm_shape(basemap) + 
  tm_rgb()

Modify the transparency with an alpha value to show the basemap beneath the Census tract polygons

## ----map-with-basemap-show-----------------------------------------------
tm_shape(basemap) + 
  tm_rgb() + 
  tm_shape(aa_black) + 
  tm_polygons(col = "percent",
          style = "quantile",
          n = 7,
          palette = "Purples",
          title = "2020 US Census", 
          alpha = 0.6) + #<< 
  tm_layout(title = "Percent Black\nby Census tract",
            legend.outside = TRUE)

Adding cartographic elements

## ----cartographic-elements-show------------------------------------------
tm_shape(basemap) + 
  tm_rgb() + 
  tm_shape(aa_black) + 
  tm_polygons(col = "percent",
          style = "quantile",
          n = 7,
          palette = "Purples",
          title = "2020 US Census", 
          alpha = 0.6) + #<< 
  tm_layout(title = "Percent Black\nby Census tract",
            legend.outside = TRUE) + 
  tm_scale_bar(position = c("left", "BOTTOM")) + 
  tm_compass(position = c("right", "top")) + 
  tm_credits("Basemap (c) CARTO, OSM", 
             bg.color = "white",
             position = c("RIGHT", "BOTTOM"), 
             bg.alpha = 0,
             align = "right")

Part 3: Interactive mapping and advanced geometry handling

3.1 Interactive maps with tmap

Regular tmap code can generate interactive maps by setting tmap_mode(“view”)

## ----interactive-tmap-1--------------------------------------------------
tmap_mode("view")

tm_shape(aa_black) +
  tm_polygons(col = "percent",
          style = "quantile",
          n = 7,
          palette = "Purples",
          title = "Percent Black<br/>by Census tract",
          alpha = 0.6)

Customizing interactive maps As with static maps, tmap includes a wide range of options for customizing interactive maps

## ----interactive-tmap-2--------------------------------------------------
tmap_options(basemaps = c("Esri.WorldTopoMap", "Stamen.TonerLite", #<<
                          "CartoDB.DarkMatter")) #<<

tm_shape(aa_black) +
  tm_polygons(col = "percent",
          style = "quantile",
          n = 7,
          palette = "Purples",
          title = "Percent Black<br/>by Census tract",
          alpha = 0.6,
          id = "NAME") #<<

Saving interactive maps Use tmap_save() to write an interactive map to a website (also works for static plots!)

## ----save-tmap-----------------------------------------------------------
tmap_save(tmap_obj, "aa_black.html")

Interactive maps with mapview Informative interactive choropleth maps can be generated quickly with mapview::mapview() by specifying a zcol argument

## ----mapview-1-----------------------------------------------------------
mapview(aa_black, zcol = "percent")

3.2 Additional resources for interactive maps

Advanced geometry handling

Mapping population density When tidycensus calls tigris internally to get Census geometries, it dismisses all the information (besides the geometry) from the tigris object by default

The argument keep_geo_vars = TRUE keeps the information from tigris, which includes a column, ALAND for land area in meters

This can be used to calculate and map population density

mi_density <- get_decennial(
  geography = "county",
  variables = "P1_001N",
  state = "MI",
  year = 2020,
  geometry = TRUE,
  keep_geo_vars = TRUE
) %>%
  mutate(density = 1000000 * (value / ALAND))

Mapping population density Maps of population density (and everything else, typically!) should aim to minimize distortion of areas

Solution: use a projected coordinate reference system

##----pop-density-----------------------------------------------------------
tmap_mode("plot")

library(crsuggest)
# View the mi_crs object and pick a suitable CRS for the state
mi_crs <- suggest_crs(mi_density)
# Use that CRS for your map
mi_density_map <- tm_shape(mi_density, projection = 6497) + 
  tm_polygons(col = "density", style = "cont",
              palette = "Blues", title = "People/km2")

mi_density_map

The problem with national maps

A common use-case is generating a map of data for the entire United States. Let’s get some data:

## ----national-data-----------------------------------------------------------
us_percent_hispanic <- get_decennial(
  geography = "county",
  variables = "P2_002N",
  summary_var = "P2_001N",
  year = 2020,
  geometry = TRUE
) %>%
  mutate(percent = 100 * (value / summary_value))

## ----plot-national-map-------------------------------------------------------
tmap_mode("plot")

national_map <- tm_shape(us_percent_hispanic) + 
  tm_polygons(col = "percent", palette = "plasma", 
              lwd = 0.05, style = "cont",
              legend.is.portrait = FALSE,
              title = "% Hispanic, 2020 Census") + 
  tm_layout(legend.outside = TRUE, legend.outside.position = "bottom")


## ----show-national-map-------------------------------------------------------
national_map

“Shifting” and re-scaling US geometry Use the shift_geometry() function in the tigris package to move Alaska, Hawaii, and Puerto Rico to better positions for national mapping

## ----rescaled-map------------------------------------------------------------
us_rescaled <- shift_geometry(us_percent_hispanic)

rescaled_map <- tm_shape(us_rescaled) + 
  tm_polygons(col = "percent", palette = "plasma", 
              lwd = 0.05, style = "cont",
              legend.is.portrait = FALSE,
              title = "% Hispanic, 2020 Census") + 
  tm_layout(legend.outside = TRUE, legend.outside.position = "bottom",
            legend.outside.size = 0.2)


## ----rescaled-map-show-------------------------------------------------------
rescaled_map

Use preserve_area = TRUE to preserve the relative area of Alaska, Hawaii, and Puerto Rico relative to the continental US

## ----shifted-map-------------------------------------------------------------
us_shifted <- shift_geometry(us_percent_hispanic,
                              position = "outside",
                              preserve_area = TRUE)

shifted_map <- tm_shape(us_shifted) + 
  tm_polygons(col = "percent", palette = "plasma", 
              lwd = 0.05, style = "cont",
              legend.is.portrait = FALSE,
              title = "% Hispanic, 2020 Census") + 
  tm_layout(legend.position = c("left", "BOTTOM"))

## ----shifted-map-show--------------------------------------------------------
shifted_map

The problem with water areas

Locations with signficant water area (e.g. Seattle, New York City) will often have Census tracts that cover rivers and lakes where no one lives

Census maps may then misrepresent what viewers expect to see from a location

## ----king-asian----------------------------------------------------------
king_asian <- get_decennial(
  geography = "tract",
  variables = c(asian = "P2_008N",
                total = "P2_001N"),
  state = "WA",
  county = "King",
  geometry = TRUE,
  year = 2020,
  output = "wide"
) %>%
  mutate(percent = 100 * (asian / total))

mapview(king_asian, zcol = "percent")

Erasing" water areas The erase_water() function in the tigris package will remove water areas above a given area threshold (size percentile) from a spatial dataset

Performs best when data are in a projected coordinate reference system (read here for more information)

## ----erase-king-asian----------------------------------------------------
library(sf)

king_erase <- king_asian %>%
  st_transform(6596) %>% #<<
  erase_water(area_threshold = 0.9) #<<

mapview(king_erase, zcol = "percent")
Dasymetric dot-density mapping

The new as_dot_density() function in tidycensus integrates erase_water() for dasymetric dot placement, where dots will not be placed in water areas

Helpful for locations like Minneapolis, which have a lot of lakes!

## ----dasymetric-dots-----------------------------------------------------
aa_dots_dasy <- aa_race %>%
  st_transform(26914) %>%
  as_dot_density(
    value = "value",
    values_per_dot = 250,
    group = "variable",
    erase_water = TRUE
  )

hn_basemap <- osm.raster(st_bbox(aa_race),
                         zoom = 10,
                         type = "cartolight",
                         crop = TRUE)

tm_shape(hn_basemap) +
  tm_rgb() +
  tm_shape(aa_dots_dasy) +
  tm_dots(col = "variable",
          palette = "Set1",
          size = 0.01) +
  tm_layout(legend.outside = TRUE)

Part 3 exercises Using a static map you created in Part 2 of the workshop, try converting to an interactive map with tmap_mode(“view”)

Experiment with the shift_geometry() function and try different combinations of the preserve_area and position arguments. Which layout do you prefer, and why?