A sane(r) interface to Census Data
How do we actually get the numbers?
data.census.gov. Point, click, download data files. Fine for one county, impossible for the country.tidycensus): Wraps the Census API in R (and nicer error messages). We pass a few arguments, and it does the gruntwork.Without tidycensus
# A tibble: 6 × 4
name label concept geography
<chr> <chr> <chr> <chr>
1 B01001A_001 Estimate!!Total: Sex by Age (Whi… tract
2 B01001A_002 Estimate!!Total:!!Male: Sex by Age (Whi… tract
3 B01001A_003 Estimate!!Total:!!Male:!!Under 5 years Sex by Age (Whi… tract
4 B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years Sex by Age (Whi… tract
5 B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years Sex by Age (Whi… tract
6 B01001A_006 Estimate!!Total:!!Male:!!15 to 17 years Sex by Age (Whi… tract
get_acs()The most commonly used function. Five arguments do most of the work:
wi_income <- get_acs(
geography = "county", # can be state, tract, place, ...
variables = c(median_income = "B19013_001"), # where you pass the desired variable(s)
state = "WI", # only needed for some geographies
year = 2023, # takes that year's survey (avg from 19-23)
survey = "acs5"
)
head(wi_income)# A tibble: 6 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 55001 Adams County, Wisconsin median_income 59153 3270
2 55003 Ashland County, Wisconsin median_income 57645 4291
3 55005 Barron County, Wisconsin median_income 64619 2076
4 55007 Bayfield County, Wisconsin median_income 69609 2612
5 55009 Brown County, Wisconsin median_income 77490 1375
6 55011 Buffalo County, Wisconsin median_income 68722 5361
Not all geographies can be pulled nationwide in a single call: places and tracts require a state argument to be passed as well (e.g state = "WI")
We solve this by looping over all 50 states + DC using the pull_national() helper as described in the next slide:
pull_national()all_states <- c(state.abb, "DC")
pull_national <- function(geo, variables, yr = 2023) {
# if pulling tract data, loop across the vector all_states
if (geo %in% c("tract", "place")) {
map_dfr(all_states, \(st) {
get_acs(
geography = geo, variables = variables,
state = st, year = yr, survey = "acs5", geometry = FALSE
)
})
} # if not, don't pass a state argument in data call
else {
get_acs(
geography = geo, variables = variables,
year = yr, survey = "acs5", geometry = FALSE
)
}
}Goal: Measure housing affordability in the Fox Valley (and beyond).
The Problem: - Raw housing costs are meaningless without local income context (California vs. Wisconsin). - Even if we calculate a cost-to-income ratio, how do we know if a percentage is “good” or “bad”?
The Solution: Stop guessing. We pull the ratio for every tract in the country to create an empirical national standard, then see exactly where our local tracts rank.
We need two ACS variables:
| Measure | ACS Variable | Code |
|---|---|---|
| Median Monthly Housing Costs | Median Monthly Housing Costs | B25105_001 |
| Median Household Income (annual) | Median Household Income | B19013_001 |
We then construct our affordability metric:
\[ \text{Affordability Ratio} = \frac{\text{Monthly Housing Cost}}{\text{Annual Income} / 12} \times 100 \]
Lower is better – you’re spending less of your income on housing.
# Pulling 51 states takes ~1 minute; caching is crucial here
# first, we use the loop
nat_ic_tract <- pull_national("tract", acs_vars) |>
# then we get rid of MOE
select(GEOID, variable, estimate) |>
# pivot so that columns are different variables
pivot_wider(names_from = variable, values_from = estimate) |>
# making the ratio
mutate(icPct = mhc / (inc / 12) * 100) |>
filter(is.finite(icPct), icPct > 0)
# We now have the national baseline
nrow(nat_ic_tract)[1] 82839
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.698 16.645 20.186 21.729 25.013 771.188
We now have some 83000 national affordability ratios.
Instead of arbitrary thresholds, we use base R’s ecdf() function to rank our local Fox Valley tracts against our massive tidycensus call.
# create a percentile ranking function from the national census pull
national_benchmark <- ecdf(nat_ic_tract$icPct)
# Example: scoring a local tract w/ affordability ratio of 15 (i.e., spending 15% of income on housing)
local_ratio <- 15
percentile <- national_benchmark(local_ratio)
# since spending less is better --> Flip it
score <- (1 - percentile) * 100
round(score, 1)[1] 85.5
In this simple example, we can see that our hypothetical tract has more affordable housing than 85.5% of the US’s census tracts.
tidycensus:
|
| | 0%
|
|= | 1%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|================ | 23%
|
|================= | 24%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 29%
|
|===================== | 31%
|
|====================== | 32%
|
|======================== | 34%
|
|========================= | 35%
|
|========================== | 37%
|
|=========================== | 38%
|
|============================ | 40%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 45%
|
|================================ | 46%
|
|================================== | 48%
|
|=================================== | 50%
|
|==================================== | 51%
|
|===================================== | 53%
|
|====================================== | 54%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 59%
|
|========================================== | 61%
|
|============================================ | 62%
|
|============================================= | 64%
|
|============================================== | 65%
|
|=============================================== | 67%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 75%
|
|====================================================== | 76%
|
|======================================================= | 78%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|============================================================ | 86%
|
|============================================================= | 88%
|
|============================================================== | 89%
|
|================================================================ | 91%
|
|================================================================= | 92%
|
|================================================================== | 94%
|
|=================================================================== | 95%
|
|==================================================================== | 97%
|
|===================================================================== | 99%
|
|======================================================================| 100%
tidycensus Doessf, leaflet, and more