Primary Source

Blog: USING TIDYCENSUS AND LEAFLET TO MAP CENSUS DATA
Author: Julia Silge
Original Publish Date: Jun 24, 2017
https://juliasilge.com/blog/using-tidycensus/

tidycensus source

tidycensus[https://github.com/walkerke/tidycensus]

General setup

Load Libraries

library(tidyverse)
library(tidycensus)
library(stringr)
library(leaflet)
library(sf)

Searching for variables

Getting variables from the Census or ACS requires knowing the variable ID - and there are thousands of these IDs across the different Census files. To rapidly search for variables, use the load_variables function. The function takes two required arguments: the year of the Census or endyear of the ACS sample, and the dataset - one of “sf1”, “sf3”, or “acs5”. For ideal functionality, I recommend assigning the result of this function to a variable, setting cache = TRUE to store the result on your computer for future access, and using the View function in RStudio to interactively browse for variables.

Load the American Community Survey 5-year report list for 2015

load_variables()

Description
Load variables from a decennial Census or American Community Survey dataset to search in R
Usage
load_variables(year, dataset, cache = FALSE)

Load the 2015 acs5 datasets

v15 <- load_variables(2015, "acs5", cache = TRUE)

glimpse(v15)
## Observations: 45,503
## Variables: 3
## $ name    <chr> "AIANHH", "AIHHTLI", "AITS", "AITSCE", "ANRC", "B00001...
## $ label   <chr> "FIPS AIANHH code", "American Indian Trust Land/Hawaii...
## $ concept <chr> "Selectable Geographies", "Selectable Geographies", "S...

Wow, 45,503 different reports available and no grouping variable.

Let’s just follow along in Julie’s example…

I live in Arkansas so I’m going to change her code when it refers to Texas or Utah and use Arkansas instead.

Pull the population data for Arkansas by County

This example uses the B01003_001 dataset

get_acs()

Description
Obtain data and feature geometry for the five-year American Community Survey
Usage
get_acs(geography, variables, endyear = 2015, output = "tidy", state = NULL, county = NULL, geometry = FALSE, keep_geo_vars = FALSE, summary_var = NULL, key = NULL, moe_level = 90, ...)

*Setting geometry = TRUE loads the mapping figures so it’s very important to what we’re doing now, even though the default it to FALSE.

arkansas_pop <- get_acs(geography = "county", 
                     variables = "B01003_001", 
                     state = "AR",
                     geometry = TRUE) 

glimpse(arkansas_pop)
## Observations: 75
## Variables: 6
## $ GEOID    <chr> "05011", "05015", "05019", "05025", "05063", "05067",...
## $ NAME     <chr> "Bradley County, Arkansas", "Carroll County, Arkansas...
## $ variable <chr> "B01003_001", "B01003_001", "B01003_001", "B01003_001...
## $ estimate <dbl> 11206, 27635, 22751, 8510, 36952, 17597, 43652, 40633...
## $ moe      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ geometry <simple_feature> MULTIPOLYGON(((-92.381616 3..., MULTIPOLYG...

What is this moe variable?

It is the margin of error which is useful for statistics, but insignificant for what we’re doing.

Bring in Leaflet to visualize this data

color_pal <- colorQuantile(palette = "viridis", domain = arkansas_pop$estimate, n = 10)

arkansas_pop %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ color_pal(estimate)) %>%
    addLegend("bottomright", 
              pal = color_pal, 
              values = ~ estimate,
              title = "Population percentiles",
              opacity = 1)

What is st_transform() doing?

Julia says, “Well, I am no cartographer and I am still fuzzy on these issues, but it is doing a projection onto a certain reference system of the spatial information contained in the sf column. The specific choice of an EPSG code of 4326 is for a given projection.”
crs = Coordinate Reference System

A couple more examples

Instead of quantiles, just for something different.

Use the plasma palette

color_pal <- colorNumeric(palette = "plasma", 
                          domain = arkansas_pop$estimate)
#color_pal <- colorNumeric(palette = "viridis", 
#                          domain = arkansas_pop$estimate)
arkansas_pop %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ color_pal(estimate)) %>%
    addLegend("bottomright", 
              pal = color_pal, 
              values = ~ estimate,
              title = "County Populations",
              opacity = 1)

Median home values per census tract in Benton county

Benton County is the northwestern most county in Arkansas with borders on the states of Missouri to the North, and Oklahoma to the West.

benton_co_home_value <- get_acs(geography = "tract", 
                     variables = "B25077_001", 
                     state = "AR",
                     county = "Benton County",
                     geometry = TRUE)

pal <- colorNumeric(palette = "viridis", 
                    domain = benton_co_home_value$estimate)

benton_co_home_value %>%
    st_transform(crs = "+init=epsg:4326") %>%
    leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>%
    addPolygons(popup = ~ str_extract(NAME, "^([^,]*)"),
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(estimate)) %>%
    addLegend("bottomright", 
              pal = pal, 
              values = ~ estimate,
              title = "Median Home Value",
              labFormat = labelFormat(prefix = "$"),
              opacity = 1)

Conclusion:

tidycensus is probably going to change the way that I work with US demographic data. The datasets are very rich and the only limitation is based on your ability to efficiently explore them and extract the data that you are looking for.
leaflet is my favorite package for mapping and I can do so much with so little code it is really incredible.
Big thanks to Julia Silge for her blog that inspired this post.

END