Tidycensus

A sane(r) interface to Census Data

Binh - STAT 405

Agenda

  • Prelims
  • Core Functionality
  • Application – Affordability Percentiles
  • Conclusion

Prelims

The Census and its Data

  • The U.S. Census Bureau collects data on population, housing, and economic conditions
  • Two main data sources we care about:
    • Decennial Census – full population count every 10 years
    • American Community Survey (ACS) – ongoing, detailed socioeconomic estimates, covering virtually every unit of geography in the country
  • Additionally, the Census provides other surveys (other ACS flavors, microdata for the initiated) and sometimes hosts data from other Bureaus where pertinent

Access Methods

How do we actually get the numbers?

  1. Manual (The GUI): data.census.gov. Point, click, download data files. Fine for one county, impossible for the country.
  2. Raw API: Send HTTP GET requests to the Census servers, parse the raw JSON responses, wrangle into a dataframe. Very involved, though technically possible
  3. The Bridge (tidycensus): Wraps the Census API in R (and nicer error messages). We pass a few arguments, and it does the gruntwork.

The Difference

Without tidycensus

  • Navigate the census website’s 4 personalities
  • Identify correct survey, year, variables
  • Download CSV per geography
  • Merge with geometry shapefiles manually
  • Repeat for every state if you want more granularity
  • Unwittingly get a step wrong and hate life(?)

With tidycensus

  • Take 5 minutes to get instance up-and-running
get_acs(
  geography = "tract",
  variables = "B25105_001",
  state     = "WI",
  year      = 2023,
  survey    = "acs5",
  geometry  = TRUE
)

Returns a clean df, ready for analysis. Seems like a no-brainer.

Core Functionality

Setup

# install, if you hadn't already
# install.packages("tidycensus")

# Get a free API key at (say Binh brought you here)
# https://api.census.gov/data/key_signup.html

# Register it (install = T if local instance)
census_api_key("ff9b96ad9268a3eddf2bf959cd6204e9f5649ccc")
# Browse available variables
vars <- load_variables(2023, "acs5", cache = TRUE)
head(vars)
# A tibble: 6 × 4
  name        label                                   concept          geography
  <chr>       <chr>                                   <chr>            <chr>    
1 B01001A_001 Estimate!!Total:                        Sex by Age (Whi… tract    
2 B01001A_002 Estimate!!Total:!!Male:                 Sex by Age (Whi… tract    
3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years  Sex by Age (Whi… tract    
4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years   Sex by Age (Whi… tract    
5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (Whi… tract    
6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (Whi… tract    

Pulling Data: get_acs()

The most commonly used function. Five arguments do most of the work:

wi_income <- get_acs(
  geography = "county", # can be state, tract, place, ...
  variables = c(median_income = "B19013_001"), # where you pass the desired variable(s)
  state     = "WI", # only needed for some geographies
  year      = 2023, # takes that year's survey (avg from 19-23)
  survey    = "acs5"
)

head(wi_income)
# A tibble: 6 × 5
  GEOID NAME                       variable      estimate   moe
  <chr> <chr>                      <chr>            <dbl> <dbl>
1 55001 Adams County, Wisconsin    median_income    59153  3270
2 55003 Ashland County, Wisconsin  median_income    57645  4291
3 55005 Barron County, Wisconsin   median_income    64619  2076
4 55007 Bayfield County, Wisconsin median_income    69609  2612
5 55009 Brown County, Wisconsin    median_income    77490  1375
6 55011 Buffalo County, Wisconsin  median_income    68722  5361
  • The Census also provides margin-of-error estimates for inference purposes(!)

Nationwide Data at Higher Resolutions

Not all geographies can be pulled nationwide in a single call: places and tracts require a state argument to be passed as well (e.g state = "WI")

We solve this by looping over all 50 states + DC using the pull_national() helper as described in the next slide:

pull_national()

all_states <- c(state.abb, "DC")

pull_national <- function(geo, variables, yr = 2023) {
   # if pulling tract data, loop across the vector all_states
  if (geo %in% c("tract", "place")) {  
    map_dfr(all_states, \(st) {
      get_acs(
        geography = geo, variables = variables,
        state = st, year = yr, survey = "acs5", geometry = FALSE
      )
    })
  } # if not, don't pass a state argument in data call 
  else {    
    get_acs(
      geography = geo, variables = variables,
      year = yr, survey = "acs5", geometry = FALSE
    )
  }
}

Application – Affordability Percentiles

Motivation

Goal: Measure housing affordability in the Fox Valley (and beyond).

The Problem: - Raw housing costs are meaningless without local income context (California vs. Wisconsin). - Even if we calculate a cost-to-income ratio, how do we know if a percentage is “good” or “bad”?

The Solution: Stop guessing. We pull the ratio for every tract in the country to create an empirical national standard, then see exactly where our local tracts rank.

Variables

We need two ACS variables:

Measure ACS Variable Code
Median Monthly Housing Costs Median Monthly Housing Costs B25105_001
Median Household Income (annual) Median Household Income B19013_001
# declaring the vector of variables we will pull
acs_vars <- c(mhc = "B25105_001", inc = "B19013_001")

We then construct our affordability metric:

\[ \text{Affordability Ratio} = \frac{\text{Monthly Housing Cost}}{\text{Annual Income} / 12} \times 100 \]

Lower is better – you’re spending less of your income on housing.

Building the National Baseline

# Pulling 51 states takes ~1 minute; caching is crucial here
# first, we use the loop
nat_ic_tract <- pull_national("tract", acs_vars) |>    
  # then we get rid of MOE
  select(GEOID, variable, estimate) |>                
  # pivot so that columns are different variables
  pivot_wider(names_from = variable, values_from = estimate) |>
  # making the ratio
  mutate(icPct = mhc / (inc / 12) * 100) |>           
  filter(is.finite(icPct), icPct > 0)

# We now have the national baseline
nrow(nat_ic_tract)
[1] 82839
summary(nat_ic_tract$icPct)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.698  16.645  20.186  21.729  25.013 771.188 

Scoring the Fox Valley

We now have some 83000 national affordability ratios.

Instead of arbitrary thresholds, we use base R’s ecdf() function to rank our local Fox Valley tracts against our massive tidycensus call.

# create a percentile ranking function from the national census pull
national_benchmark <- ecdf(nat_ic_tract$icPct)

# Example: scoring a local tract w/ affordability ratio of 15 (i.e., spending 15% of income on housing)
local_ratio <- 15
percentile <- national_benchmark(local_ratio)

# since spending less is better --> Flip it
score <- (1 - percentile) * 100
round(score, 1)
[1] 85.5

In this simple example, we can see that our hypothetical tract has more affordable housing than 85.5% of the US’s census tracts.

Mapping our Results (1)

  • After some hidden wrangling, we have produced percentile measures for all tracts in the Fox Valley Area.
  • We now can display them in a map, using the geometry provided by tidycensus:
# wisconsin geometry - a quirk asks us to use a variable
wi_geom <- get_acs(
  geography = "tract",
  variables = "B19013_001", # placeholder
  state     = "WI",
  year      = 2023,
  geometry  = TRUE # to pull the requisite geometry data
)

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |============                                                          |  18%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  61%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  76%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%

Mapping our Results (2)

Conclusion

What tidycensus Does

  • Pulls and wrangles labyrinthine Census data automatically
  • Handles geometry so you can map immediately using sf, leaflet, and more
  • Enables reproducibility: a few scripts replaces hours of manual downloads

References

  • Walker, K. (2023). Analyzing US Census Data: Methods, Maps, and Models in R. CRC Press.
  • tidycensus documentation
  • US Census Bureau ACS documentation

Thank you for listening!