tidycensus practice using Anne Arundel county, MD, data - focusing on map creation
knitr::include_graphics("US_Census_Geography.png")
TIGER: Topologically Integrated Geographic Encoding and Referencing database
High-quality series of geographic datasets released by the US Census Bureau
Distributed as shapefiles, a common GIS data format comprised of several related files
## ----basic-usage------------------------------------------------
md_counties <- counties(state = "MD")
## ----show-counties----------------------------------------------
#md_counties
str(md_counties)
## ----basic-plot-------------------------------------------------
plot(md_counties$geometry)
Legal entities: units that have legal significance in the US (e.g. states, counties)
Statistical entities: units that are used to tabulate Census data but do not have legal standing (e.g. Census tracts or block groups)
Geographic features: other geographic datasets provided by the Census Bureau that are not used for demographic tabulation (e.g. roads, water)
Polygons: statistical entities Lines: geographic features Points: geographic features
## ----travis-tracts-----------------------------------------------
travis_tracts <- tracts(state = "MD", county = "Anne Arundel")
## ----travis-roads-------------------------------------------------
travis_roads <- roads(state = "MD", county = "Anne Arundel")
## ----dc-landmarks-------------------------------------------------
dc_landmarks <- landmarks("DC", type = "point")
plot(travis_tracts$geometry)
plot(travis_roads$geometry)
plot(dc_landmarks$geometry)
When you call a tigris function, it does the following:
1. Downloads your data from the US Census Bureau website;
2. Stores your data in a temporary directory by default;
3. Loads your data into R as a simple features object using sf::st_read()
Recommended option: use options(tigris_use_cache = TRUE) to cache downloaded shapefiles and prevent having to re-download every time you use them
The core TIGER/Line shapefiles include water area that belongs to US states and counties
Use the argument cb = TRUE to obtain a cartographic boundary shapefile pre-clipped to the US shoreline
Note: For some geographies, highly generalized (1:5 million and 1:20 million) shapefiles are available with the resolution argument
## ----Maryland-tiger-----------------------------------------------
md_counties <- counties("MD")
## ----Maryland-cb--------------------------------------------------
md_counties_cb <- counties("MD", cb = TRUE)
plot(md_counties$geometry)
plot(md_counties_cb$geometry)
## ----get-yearly-data-------------------------------------------------------------------
annearundel90 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 1990)
annearundel00 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 2000)
annearundel10 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 2010)
annearundel20 <- tracts("MD", "Anne Arundel", cb = TRUE, year = 2020)
## ----plot-yearly-data------------------------------------------------------------------
par(mfrow = c(2, 2))
plot(annearundel90$geometry, main = "1990")
plot(annearundel00$geometry, main = "2000")
plot(annearundel10$geometry, main = "2010")
plot(annearundel20$geometry, main = "2020")
## ----mapview-----------------------------------------------------
library(mapview)
mapview(annearundel20)
## ----sync--------------------------------------------------------
library(leafsync)
sync(mapview(annearundel90), mapview(annearundel20))
## ----all-tracts--------------------------------------------------
us_tracts <- tracts(state = "MD", cb = TRUE)
## ----------------------------------------------------------------
str(us_tracts)
Typical Census GIS workflows. Traditionally, getting “spatial” Census data requires:
1. Fetching shapefiles from the Census website;
2. Downloading a CSV of data, cleaning/formatting it;
3. Loading geometries and data into your GIS of choice;
4. Aligning key fields in your GIS and joining your data
tidycensus takes care of this entire process with the argument geometry = TRUE
## ----tidycensus-geometry-----------------------------------------
library(tidycensus)
options(tigris_use_cache = TRUE)
md_population <- get_decennial(
geography = "county",
variables = "P1_001N",
state = "MD",
year = 2020,
geometry = TRUE #<<
)
## ----show-geometry-----------------------------------------------------------
#md_population
## ----plot-geometry-----------------------------------------------------------
plot(md_population["value"])
## ----geom-sf-----------------------------------------------------------------
library(tidyverse)
md_map <- ggplot(md_population, aes(fill = value)) +
geom_sf()
## ----plot-geom-sf------------------------------------------------------------
md_map
The tmap package:
- Comprehensive package for thematic mapping in R
- ggplot2-like syntax, but designed in a way to feel friendly to GIS cartographers coming to R for mapping
Example: comparing the distributions of racial and ethnic groups in Anna Arundel County, MD
library(tidycensus)
library(tidyverse)
## ----get-AA-data-------------------------------------------------------------
aa_race <- get_decennial(
state = "MD",
county = "Anne Arundel",
geography = "tract",
variables = c(
Hispanic = "P2_002N",
White = "P2_005N",
Black = "P2_006N",
Native = "P2_007N",
Asian = "P2_008N"
),
summary_var = "P2_001N",
year = 2020,
geometry = TRUE)%>%
mutate(percent = 100 * (value / summary_value))
#one GEOID does not have data - is this BWI??
aa_race <- aa_race%>%filter(is.na(percent)==FALSE)
#dim(aa_race)
#length(unique(aa_race$GEOID))
## ----glimpse-AA-data---------------------------------------------------------
#dplyr::glimpse(aa_race)
## ----polygons-map--------------------------------------------------------
library(tmap)
aa_black <- aa_race%>%filter(variable == "Black")
#dim(aa_black)
#length(unique(aa_black$GEOID))
#one GEOID does not have data - is this BWI??
tm_shape(aa_black) +
tm_polygons()
Color palettes can be modified with the palette parameter, which accepts ColorBrewer and viridis palettes
If you’ve mapped with GIS software before, the style parameter implements various breaks methods, including “equal”, “quantile” and “jenks”
## ----choropleth-show-----------------------------------------------------
tm_shape(aa_black) +
tm_polygons(col = "percent")
## ----custom-choropleth-show----------------------------------------------
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "quantile",
n = 7,
palette = "Purples",
title = "2020 US Census") +
tm_layout(title = "Percent Black\nby Census tract",
frame = FALSE,
legend.outside = TRUE)
## ----jenks-show----------------------------------------------------------
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "jenks",
n = 7,
palette = "viridis",
title = "2020 US Census",
legend.hist = TRUE) +
tm_layout(title = "Percent Black population\nby Census tract",
frame = FALSE,
legend.outside = TRUE)
Use tmaptools::palette_explorer() to interactively browse color options
Use the option tm_layout(legend.hist = TRUE) to display the distribution of data values among classes
library(sf)
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "jenks",
n = 7,
palette = "viridis",
title = "2020 US Census",
legend.hist = TRUE) +
tm_layout(title = "Percent Black population\nby Census tract",
frame = FALSE,
legend.outside = TRUE)
Graduated symbols: using size of a symbol to represent statistical variation on a map
Implemented in tmap with tm_bubbles()
## ----bubbles-code------------------------------------------------------------
symbol_map <- tm_shape(aa_black) +
tm_polygons() +
tm_bubbles(size = "value", alpha = 0.5,
col = "navy") +
tm_layout(legend.outside = TRUE,
legend.outside.position = "bottom")
## ----bubbles-map-------------------------------------------------------------
symbol_map
tm_facets() allows for comparative small multiples maps. It works well with long-form spatial data returned by tidycensus
## ----facet-map-code----------------------------------------------------------
facet_map <- tm_shape(aa_race) +
tm_facets(by = "variable", scale.factor = 4) +
tm_fill(col = "percent",
style = "quantile",
n = 7,
palette = "Blues") +
tm_layout(legend.position = c(-0.7, 0.15)) #<<
## ----facet-map---------------------------------------------------------------
facet_map
Dot-density maps are useful alternatives to choropleth maps as they can represent internal heterogeneity of Census geographies
A brand-new function in tidycensus (GitHub version for now), as_dot_density(), helps you create group-wise dots for dot-density mapping
Map the result with tm_dots()
#requires developer version tidycensus
#developer::
https://stackoverflow.com/questions/9656016/how-to-install-development-version-of-r-packages-github-repository
#install.packages("devtools")
library(devtools)
dev_mode(on=T)
devtools::install_github("walkerke/tidycensus")
# use dev ggplot2 now
# when finished do:
dev_mode(on=F) #and you are back to having stable ggplot2
packageVersion("tidycensus")
library(remotes)
?as_dot_density
aa_dots <- aa_race %>%
as_dot_density(
value = "value",
values_per_dot = 250,
group = "variable"
)
aa_base <- aa_race %>%
distinct(GEOID, .keep_all = TRUE)
dot_map <- tm_shape(aa_base) +
tm_polygons(col = "white") +
tm_shape(aa_dots) +
tm_dots(col = "variable",
palette = "Set1",
size = 0.01) +
tm_layout(legend.outside = TRUE)
## ----dot-density-map-show----------------------------------------------------
dot_map
Thematic maps often will use basemaps with a semi-transparent overlay to provide important geographic context to the mapped data
The easiest way to get a basemap for tmap is with the rosm package
## ----get-basemap, -----------------------------------------------------
library(rosm)
basemap <- osm.raster(
st_bbox(aa_black),
zoom = 10,
type = "cartolight",
crop = TRUE
)
## ----show-basemap------------------------------------------------------------
tm_shape(basemap) +
tm_rgb()
Modify the transparency with an alpha value to show the basemap beneath the Census tract polygons
## ----map-with-basemap-show-----------------------------------------------
tm_shape(basemap) +
tm_rgb() +
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "quantile",
n = 7,
palette = "Purples",
title = "2020 US Census",
alpha = 0.6) + #<<
tm_layout(title = "Percent Black\nby Census tract",
legend.outside = TRUE)
Adding cartographic elements
## ----cartographic-elements-show------------------------------------------
tm_shape(basemap) +
tm_rgb() +
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "quantile",
n = 7,
palette = "Purples",
title = "2020 US Census",
alpha = 0.6) + #<<
tm_layout(title = "Percent Black\nby Census tract",
legend.outside = TRUE) +
tm_scale_bar(position = c("left", "BOTTOM")) +
tm_compass(position = c("right", "top")) +
tm_credits("Basemap (c) CARTO, OSM",
bg.color = "white",
position = c("RIGHT", "BOTTOM"),
bg.alpha = 0,
align = "right")
Regular tmap code can generate interactive maps by setting tmap_mode(“view”)
## ----interactive-tmap-1--------------------------------------------------
tmap_mode("view")
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "quantile",
n = 7,
palette = "Purples",
title = "Percent Black<br/>by Census tract",
alpha = 0.6)
Customizing interactive maps As with static maps, tmap includes a wide range of options for customizing interactive maps
## ----interactive-tmap-2--------------------------------------------------
tmap_options(basemaps = c("Esri.WorldTopoMap", "Stamen.TonerLite", #<<
"CartoDB.DarkMatter")) #<<
tm_shape(aa_black) +
tm_polygons(col = "percent",
style = "quantile",
n = 7,
palette = "Purples",
title = "Percent Black<br/>by Census tract",
alpha = 0.6,
id = "NAME") #<<
Saving interactive maps Use tmap_save() to write an interactive map to a website (also works for static plots!)
## ----save-tmap-----------------------------------------------------------
tmap_save(tmap_obj, "aa_black.html")
Interactive maps with mapview Informative interactive choropleth maps can be generated quickly with mapview::mapview() by specifying a zcol argument
## ----mapview-1-----------------------------------------------------------
mapview(aa_black, zcol = "percent")
Mapping population density When tidycensus calls tigris internally to get Census geometries, it dismisses all the information (besides the geometry) from the tigris object by default
The argument keep_geo_vars = TRUE keeps the information from tigris, which includes a column, ALAND for land area in meters
This can be used to calculate and map population density
mi_density <- get_decennial(
geography = "county",
variables = "P1_001N",
state = "MI",
year = 2020,
geometry = TRUE,
keep_geo_vars = TRUE
) %>%
mutate(density = 1000000 * (value / ALAND))
Mapping population density Maps of population density (and everything else, typically!) should aim to minimize distortion of areas
Solution: use a projected coordinate reference system
##----pop-density-----------------------------------------------------------
tmap_mode("plot")
library(crsuggest)
# View the mi_crs object and pick a suitable CRS for the state
mi_crs <- suggest_crs(mi_density)
# Use that CRS for your map
mi_density_map <- tm_shape(mi_density, projection = 6497) +
tm_polygons(col = "density", style = "cont",
palette = "Blues", title = "People/km2")
mi_density_map
A common use-case is generating a map of data for the entire United States. Let’s get some data:
## ----national-data-----------------------------------------------------------
us_percent_hispanic <- get_decennial(
geography = "county",
variables = "P2_002N",
summary_var = "P2_001N",
year = 2020,
geometry = TRUE
) %>%
mutate(percent = 100 * (value / summary_value))
## ----plot-national-map-------------------------------------------------------
tmap_mode("plot")
national_map <- tm_shape(us_percent_hispanic) +
tm_polygons(col = "percent", palette = "plasma",
lwd = 0.05, style = "cont",
legend.is.portrait = FALSE,
title = "% Hispanic, 2020 Census") +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom")
## ----show-national-map-------------------------------------------------------
national_map
“Shifting” and re-scaling US geometry Use the shift_geometry() function in the tigris package to move Alaska, Hawaii, and Puerto Rico to better positions for national mapping
## ----rescaled-map------------------------------------------------------------
us_rescaled <- shift_geometry(us_percent_hispanic)
rescaled_map <- tm_shape(us_rescaled) +
tm_polygons(col = "percent", palette = "plasma",
lwd = 0.05, style = "cont",
legend.is.portrait = FALSE,
title = "% Hispanic, 2020 Census") +
tm_layout(legend.outside = TRUE, legend.outside.position = "bottom",
legend.outside.size = 0.2)
## ----rescaled-map-show-------------------------------------------------------
rescaled_map
Use preserve_area = TRUE to preserve the relative area of Alaska, Hawaii, and Puerto Rico relative to the continental US
## ----shifted-map-------------------------------------------------------------
us_shifted <- shift_geometry(us_percent_hispanic,
position = "outside",
preserve_area = TRUE)
shifted_map <- tm_shape(us_shifted) +
tm_polygons(col = "percent", palette = "plasma",
lwd = 0.05, style = "cont",
legend.is.portrait = FALSE,
title = "% Hispanic, 2020 Census") +
tm_layout(legend.position = c("left", "BOTTOM"))
## ----shifted-map-show--------------------------------------------------------
shifted_map
Locations with signficant water area (e.g. Seattle, New York City) will often have Census tracts that cover rivers and lakes where no one lives
Census maps may then misrepresent what viewers expect to see from a location
## ----king-asian----------------------------------------------------------
king_asian <- get_decennial(
geography = "tract",
variables = c(asian = "P2_008N",
total = "P2_001N"),
state = "WA",
county = "King",
geometry = TRUE,
year = 2020,
output = "wide"
) %>%
mutate(percent = 100 * (asian / total))
mapview(king_asian, zcol = "percent")
Erasing" water areas The erase_water() function in the tigris package will remove water areas above a given area threshold (size percentile) from a spatial dataset
Performs best when data are in a projected coordinate reference system (read here for more information)
## ----erase-king-asian----------------------------------------------------
library(sf)
king_erase <- king_asian %>%
st_transform(6596) %>% #<<
erase_water(area_threshold = 0.9) #<<
mapview(king_erase, zcol = "percent")
The new as_dot_density() function in tidycensus integrates erase_water() for dasymetric dot placement, where dots will not be placed in water areas
Helpful for locations like Minneapolis, which have a lot of lakes!
## ----dasymetric-dots-----------------------------------------------------
aa_dots_dasy <- aa_race %>%
st_transform(26914) %>%
as_dot_density(
value = "value",
values_per_dot = 250,
group = "variable",
erase_water = TRUE
)
hn_basemap <- osm.raster(st_bbox(aa_race),
zoom = 10,
type = "cartolight",
crop = TRUE)
tm_shape(hn_basemap) +
tm_rgb() +
tm_shape(aa_dots_dasy) +
tm_dots(col = "variable",
palette = "Set1",
size = 0.01) +
tm_layout(legend.outside = TRUE)
Part 3 exercises Using a static map you created in Part 2 of the workshop, try converting to an interactive map with tmap_mode(“view”)
Experiment with the shift_geometry() function and try different combinations of the preserve_area and position arguments. Which layout do you prefer, and why?