Code Through: Tigris and SF Pacakges


Tired of manipulating geometries in datasets? Want more options for overlays and map features? Tigris may be your answer. While you have most likely used it already alongside libraries such as tidycensus, using it separately can boost its functionality and reduce the amount of code it takes to manipulate large data sets.

Another fun feature this code through is how to use sf() to snip pieces of geometry to include or exclude overlap.

Step 1: Install Libraries:

#install.packages("tigris") # only needed if you have not installed before
#install.packages("sf") # needed to work with spatial data in sf form

Step 2: Load Libraries:

library(tigris)
library(sf)
library(dplyr)
library(tidycensus)

Now you are ready to start working with the library! Here are some helpful tips to get started.

Since we are familiar with the vernacular surrounding census data, lets start with loading some tracts. In this example, we will use Baltimore, MD. In order to call the data, the tigris library gives us 35 functions to call for geometry. The ‘tracts()’ function below retrieves the familiar census tracts for the year indicated.

balt_tracts <- tracts(state= "MD", county = "Baltimore City", year = 2024) # note that when using the county search, R will give you notification of which FIPS are being used. In this case, the State FIPS code is 25, and the county is 510

With our saved data, we can use ggplot2 to create a plot of what we have captured.

library(ggplot2)

ggplot(balt_tracts) +
  geom_sf()

Now you have an easy plot of your tracts! But tigris can do more than just census tracts. Lets look at what other spatial data is available before working in a census variable!

Tigris has several other geopraphies which could be helpful when looking into community makeup. For example, school districts vary widely even within a single city. Lets look at Maryland School districts.

md_schools <- school_districts("MD", type = "unified")
ggplot(md_schools) +
  geom_sf()

Since Baltimore is located just North of the Chesapeake Bay and on the Patapsco River, we can use the ‘area_water()’ function to retrieve geometric outlines of all water features in the city limits. Additionally, we can use the ‘roads()’ function to find all roads in the city.

balt_water <- area_water("MD", "Baltimore City")
balt_roads <- roads("MD", "Baltimore City")

Since these geometries are now saved, we can use bind to combine all of these pieces into a single dataset and plot the results.

Balt_Area <- bind_rows(balt_roads, balt_tracts, balt_water)
ggplot(Balt_Area) + 
  geom_sf(data = balt_tracts, fill = "blue") + 
  geom_sf(data = balt_water, fill = "white") + 
  geom_sf(data=balt_roads, alpha =.3) # alpha = x is used to change transparency of the lines, with 1 = 0% transparency, and 0 = 100% transparency

Using Tigris, we were able to combine multiple pieces of geography into a single map, including features that aren available in census data. This will help visualize both the actual geometries of the region, as well as some of the divisions throughout the data. By ensuring the fill colors are contrasting, we can see where the division of our geomgraphies are, and where the roads run.

Now, lets add some census data into the mix. We will use poverty as the example.

balt_pov <- get_acs(
  geography = "tract", 
  state = "MD", 
  county = "Baltimore City", 
  variables = "B17001_001", 
  year = 2023, 
  geometry = TRUE) #even though we have the tract data separate, this keeps everything together

Join the ACS tract data with the above geometry for water features. This step is optional using the ‘sf’ library, but can help with plotting features in the correct order. If your coding syntax is incorrect (i.e. balt_water came before balt_tracts in the ggplot() function above) it would plot the tracts on top of the water features and hide them from view. By using the st_difference() function, we are able to remove the portions of balt_pov’s geometry that intersects the balt_water geometry.

balt_region <- st_join(balt_pov, balt_water)
balt_tracts_corrected <- st_difference(balt_region, st_union(balt_water))

And now we can plot.

ggplot(balt_tracts_corrected) + 
  geom_sf(aes(fill = estimate)) + # Poverty fill in the tract geometry
  geom_sf(data = balt_water, fill = "white") + # water in the area shows as white
  geom_sf(data = balt_roads, fill = "black", size = .03, alpha = .1) + # roads plotted
  ggtitle("Baltimore City Poverty")

The result of this process has left us with a more informative, complete, and detailed plot of our variable and the geography. By selecting specific geometries outside the normal ACS data pull, we are able to expand on the contextual information our graphic provides.

Some additional useful tools in the Tigris library expand the possibilities for this type of plot: - state_legislative_districts() pulls all districts with a state’s legistature - metro_divisions() can help show smaller scale divisions of neighborhoods - tribal_block_groups() retrieves tribal lands and groups - school_districts() help visualize divisions in school zoning

From its core, tigris is built to help further the study of census data by expanding the number of variables, overlays, and divisions we can use to draw conclusions and make change.

While maps may just be lines on paper, tigris works to make those lines a tool for all R users to harness.