Click one of the ZIP codes in this map from an earlier lesson, and you’ll see that the pop-up window includes some new information. Specifically, each pop-up window now shows not only the ZIP code’s latest fair market rent for each rental unit size but also the ZIP code’s total number of rental units and total households - that is, rental households and owner-occupied households.
The new information makes the map more useful. It shows that although those eastern ZIP codes, like 37149 and 37118, have the area’s cheapest rental homes, actually finding an available one could be challenging. The 37149 ZIP code has only around 82 rentals, and the 37118 ZIP code has even fewer: about 58.
In fact, neither ZIP code offers very many places to live of any kind - just around 946 in the 37149 ZIP code and fewer than half as many, about 440, in the 37118 ZIP code. By contrast, there are over 23,000 places to live, nearly 12,000 of them rentals, just one ZIP code west, in 37130, the area surrounding MTSU’s campus.
“Moe” is a “what,” not a “who.” But if thinking about a guy named Moe helps you, then go with it.
In the map’s pop-up windows, the “MOE” part of the “Rentals_MOE” and “Households_MOE” labels refers to margin of error, a bit of statistical jargon that may need some explaining. The rental and total housing figures are not actual counts. Instead, they are estimates based on random sampling. Estimates are a lot more practical than actual counts. After all, the actual count of rental homes in a given ZIP code probably fluctuates all the time, as new rental homes become available and others go off the market.
But at any given moment, an actual count of rental homes available in a given ZIP code really does exist, even if nobody knows exactly what it is. Coupled with the estimate, the margin of error tells you the range in which that actual count probably falls. This range is called the estimate’s confidence interval. An example might help. The estimated number of rentals in the 37130 ZIP code is 11,891, and the margin of error is 921. To find the range in which the real count probably falls, all you have to do is subtract the margin of error from the estimate to get the low end of the range, and add the margin of error to the estimate to get the high end of the range. Do that for the 37130 area code, and you’ll find that the real count of rental homes is probably somewhere between 11,891 - 921 = 10,970 and 11,891 + 921 = 12,812. The figures are for a “90 percent confidence interval,” which means we can be 90 percent sure the real count is somewhere in that range, and there’s a (tolerable) 10 percent chance that it is either above or below that range.
To put it more succinctly: We can be 90 percent sure that the 37130 ZIP code has between 10,970 and 12,812 rental homes available. The estimate isn’t entirely useless, either; 11,891 is the figure likeliest to match the real figure. So, it’s OK to use it as the figure for the ZIP code, as long as you keep in mind that it’s just an estimate.
The rental and total household estimates and error margins all came from an application programming interface, or API, maintained by the U.S. Census Bureau - specifically, the API for the Bureau’s annual American Community Survey.
An API is simply an efficient way for an organization to share data with external users (like you), particularly when the total amount of data the organization wants to make available would be much too large to fit on a typical user’s computer. That’s certainly true of the Census Bureau’s ACS data. The data offer nearly 50,000 annual estimates of social, economic, housing, and demographic characteristics, each available for the entire U.S. as well as for individual states, regions, federal and state political districts, metro areas, counties, cities, places, school districts, neighborhoods, and - yes - ZIP codes. Using the bureau’s API lets you locate and download only the data you need.
This lesson will show you how to get the bureau’s API to give you ZIP-code-level American Community Survey estimates of housing unit and rental housing unit counts, then add them to the map. Learn that, and you’ll know how to get anything else you want out of the bureau’s API, making you incredibly valuable to any of the many nonprofit, governmental, business, and research organizations that needs current, high-quality, free information about the people living in a particular area.
Like a lot of APIs, the Census API requires you to get an access code, called a “key,” then use the key whenever you request data from the API. Some API keys cost money, and sometimes a lot of money. But a Census API key is free, mainly because the Census Bureau is a taxpayer funded government agency.
All you have to do is visit https://api.census.gov/data/key_signup.html, fill out and submit the form, check your e-mail after a few minutes, and click the activation link inside the e-mail the Census Bureau will send you. Here’s a picture of the e-mail that delivered my API key. I have blurred the key, because you’re supposed to key your key a secret.
Remember to click the activation link inside the e-mail, and to do it within 48 hours. If you take longer to activate your key, the key will stop working.
There’s a hard way to access the Census API. I vote for using the easy way: the tidycensus package, developed by Kyle Walker, Ph.D., professor of geography at Texas Christian University, spatial data analysis consultant, and author of the (free) online book Analyzing US Census Data: Methods, Maps, and Models in R.
Install and load tidycensus. To get started, use
if (!require("tidycensus")) install.packages("tidycensus")
and library(tidycensus)
to add tidycensus to the code for
installing and loading the usual lineup of R packages:
# Installing and loading required packages
if (!require("tidyverse"))
install.packages("tidyverse")
if (!require("gtExtras"))
install.packages("gtExtras")
if (!require("leafpop"))
install.packages("leafpop")
if (!require("sf"))
install.packages("sf")
if (!require("mapview"))
install.packages("mapview")
if (!require("RColorBrewer"))
install.packages("RColorBrewer")
if (!require("tidycensus"))
install.packages("tidycensus")
library(tidyverse)
library(gtExtras)
library(sf)
library(mapview)
library(leafpop)
library(RColorBrewer)
library(tidycensus)
Provide you API key. Next, use the tidycensus
package’s census_api_key()
function to share your API key
with R. Simply replace
PasteYourAPIKeyBetweenTheseQuoteMarks
with your API key. I
suggest copying and pasting the key directly from the e-mail you
received, taking care to include all of the key’s characters but nothing
else, like the spaces before and after the key. Trying to type the key
manually is not a good idea. You’re nearly certain to make at least one
typing error, and the key won’t work unless it is exactly correct.
census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
Getting the goods. This code retrieves the rental
and household counts using the tidycensus package’s
get_acs()
function and stores the results in a data frame
we’ll call Census_Data.
Census_Data <- get_acs(
geography = "zcta",
variables = c("DP04_0047", "DP04_0045"),
year = 2023,
survey = "acs5",
output = "wide",
geometry = FALSE
)
The get_acs()
function has a bunch of arguments that you
can use to specify exactly what data you want. In this case:
geography = "zcta"
tells tidycensus
you want the data broken down by ZIP code. Recall that a ZCTA, or ZIP
Code Tabulation Area, is a Census-created geographic area that
corresponds, at least roughly, to a ZIP code used by the U.S. Postal
Service. Each ZCTA has a corresponding ZIP code, but not all ZIP codes
have a corresponding ZCTA. The tidycensus package lets you get data for
many other geographic areas other than ZCTAs. For a list, and the code
that specifies each one, see the Geography in tidycensus
section of the Basic usage of tidycensus guide.
variables = c("DP04_0047", "DP04_0045")
tells tidycensus to retrieve two variables: DPO4_0047, which estimates
the number of rental units, and and DPO4_0045, which estimates the total
number of housing units, whether renter-occupied or otherwise. Later in
this lesson, I’ll show you how to look up American Community Survey
variable names and what they mean. For now, understand that the
c()
part of this code is base R’s “combine” function, which combines the things
between the parentheses (and separated by commas) into a vector or a
list. Vectors and lists are both data storage structures, although they
have different properties. Here, it’s being used to hold the list of
variable names you want tidycensus to retrieve.
year = 2023
tells tidycensus which
year you want data for. At present, 2023 is the latest year available.
See the Census Bureau’s release schedule to learn what data will be released
when.
survey = "acs5"
tells tidycensus
you want the five-year American Community Survey dataset. In most cases,
that’s the one you will want. It covers a five-year period ending with
the specified year. For example, the 2023 five-year dataset’s estimates
cover 2019-2023, not 20203 in particular. But five-year datasets contain
data for all places in the U.S. There is a one-year dataset that you can
retrieve using a survey = "acs1"
argument. It covers just
the year specified, so it’s more up to date that the five-year dataset
released for the same year. But it includes data only for places with
60,000 or more residents.
output = "wide"
tells tidycensus to
format the data so that each column name reflects the variable the
column contains data for. I find the alternative format,
output = "tidy"
, harder to work with, especially when
multiple variables are being requested.
geometry = FALSE
tells tidycensus
to give you just the data, rather than the data plus a “geometry” column
containing borders for the geographic region specified. I’ll say more
about the latter option later in the lesson. It can come in pretty
handy. In this case, we already have a geometry column showing the ZIP
code boundaries. We don’t need another.
The easiest way to see what you got from the Census API would be to click on the Census_Data data frame in RStudio’s Environment tab. Doing so will open Census_Data in RStudio’s spreadsheet-like data viewer.
An alternative, and sometimes handy, approach is to use the dplyr package’s glimpse() function, which can display all of the variables in a data frame as a vertical column, along with columns to the right for the first several entries for each variable:
glimpse(Census_Data)
## Rows: 33,772
## Columns: 6
## $ GEOID <chr> "00601", "00602", "00603", "00606", "00610", "00611", "0061…
## $ NAME <chr> "ZCTA5 00601", "ZCTA5 00602", "ZCTA5 00603", "ZCTA5 00606",…
## $ DP04_0047E <dbl> 1709, 3103, 8389, 433, 2423, 218, 9103, 1047, 2471, 821, 33…
## $ DP04_0047M <dbl> 224, 379, 600, 113, 325, 100, 693, 223, 405, 229, 450, 329,…
## $ DP04_0045E <dbl> 5611, 12546, 19537, 1871, 8838, 611, 23438, 3471, 8579, 314…
## $ DP04_0045M <dbl> 258, 510, 671, 192, 459, 146, 953, 369, 524, 328, 654, 363,…
Either way you look at the Census_Data data frame, you can see that the Census API gave us 33,772 rows (one for each ZIP code in the U.S.) of data for six variables:
GEOID, a character variable consisting of each five-digit ZIP code.
NAME, a character variable showing each ZIP code preceded by “ZCTA5,” and a space.
DP04_0047E, the estimated number of rental units in each ZIP code. The “E” at the end stands for “Estimate.”
DP04_0047M, the rental unit estimate’s error margin (see the “Wait … who is Moe?” explanation above). The “M” at the end stands for “Margin.”
DP04_0045E, the estimated number of total housing units - both rental and otherwise - in each ZIP code. Again, the “E” stands for “Estimate.”
DP04_0045M, the housing unit estimate’s error margin. Again, the “M” stands for “Margin.”
Note that the Census API doesn’t make you ask for the error margins. It just gives them to you automatically. It also automatically tacks the “E” and “M” characters onto the end of the variable name to let you know which column contains the estimates and which contains the error margins.
Keeping the original Census variable names wouldn’t hurt a thing.
They aren’t all that intuitive, though. So, let’s use the dplyr
package’s rename()
function to replace them with
names that will making it easier to keep track of what’s what. Here’s
the code, and then another glimpse() function so you can see the
results:
Census_Data <- Census_Data %>%
rename(c("Rentals" = "DP04_0047E",
"Rentals_MOE" = "DP04_0047M",
"Households" = "DP04_0045E",
"Households_MOE" = "DP04_0045M"))
glimpse(Census_Data)
## Rows: 33,772
## Columns: 6
## $ GEOID <chr> "00601", "00602", "00603", "00606", "00610", "00611", "…
## $ NAME <chr> "ZCTA5 00601", "ZCTA5 00602", "ZCTA5 00603", "ZCTA5 006…
## $ Rentals <dbl> 1709, 3103, 8389, 433, 2423, 218, 9103, 1047, 2471, 821…
## $ Rentals_MOE <dbl> 224, 379, 600, 113, 325, 100, 693, 223, 405, 229, 450, …
## $ Households <dbl> 5611, 12546, 19537, 1871, 8838, 611, 23438, 3471, 8579,…
## $ Households_MOE <dbl> 258, 510, 671, 192, 459, 146, 953, 369, 524, 328, 654, …
You learned in a previous lesson how to merge a map frame with a data file to create a mappable data file that we called FMR_RuCo_Map. Here, we’ll retrieve the FMR_RuCo_Map file (remember saving a copy of it?) and use the same technique to add the rental and housing unit counts and error margins.
Reloading the map. If you have the map file stored on your computer, this code will retrieve it into a data frame called FMR_RuCo_Map:
FMR_RuCo_Map <- read_sf("FMR_RuCo_Map.shp")
If you no longer have access to the file for some reason, you can use this code to get it from my GitHub page:
download.file("https://github.com/drkblake/Data/raw/refs/heads/main/FMR_RuCo_Map.zip","FMR_RuCo_Map.zip")
unzip("FMR_RuCo_Map.zip")
FMR_RuCo_Map <- read_sf("FMR_RuCo_Map.shp")
Merging the map and the new data. With FMR_RuCo_Map
reloaded and available under RStudio’s “Environment” tab, you can use
the dplyr package’s left_join()
function to add rental
housing unit and total housing unit data from Census_Data to each ZIP
code listed in FMR_RuCo_Map. Note that the code matches the data by
comparing FMR_RuCo_Map’s “ZIP” column with Census_Data’s “GEOID” column.
Both contain ZIP codes, and both are already text variables. I threw in
another glimpse()
function to give you a peek at the
now-extended FMR_RuCo_Map data frame.
FMR_RuCo_Map <- left_join(FMR_RuCo_Map, Census_Data,
by = c("ZIP" = "GEOID"))
glimpse(FMR_RuCo_Map)
## Rows: 12
## Columns: 14
## $ ZIP <chr> "37037", "37086", "37128", "37129", "37153", "37167", "…
## $ Studio <dbl> 1660, 1580, 1510, 1420, 1410, 1290, 1260, 1240, 1180, 1…
## $ BR1 <dbl> 1710, 1620, 1550, 1460, 1450, 1330, 1290, 1270, 1210, 1…
## $ BR2 <dbl> 1920, 1820, 1740, 1640, 1630, 1490, 1450, 1430, 1360, 1…
## $ BR3 <dbl> 2410, 2290, 2190, 2060, 2040, 1870, 1820, 1800, 1710, 1…
## $ BR4 <dbl> 2940, 2790, 2670, 2510, 2490, 2280, 2210, 2190, 2080, 2…
## $ ZIP_Avr <dbl> 2128, 2020, 1932, 1818, 1804, 1652, 1606, 1586, 1508, 1…
## $ Rnt_Ctg <chr> "Above average", "Above average", "Above average", "Abo…
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((-86.52566 3..., MULTIPOLYG…
## $ NAME <chr> "ZCTA5 37037", "ZCTA5 37086", "ZCTA5 37128", "ZCTA5 371…
## $ Rentals <dbl> 330, 3778, 9564, 7152, 227, 9002, 71, 1811, 11891, 0, 5…
## $ Rentals_MOE <dbl> 172, 534, 1021, 748, 96, 700, 63, 402, 921, 14, 65, 51
## $ Households <dbl> 3124, 12551, 27607, 22515, 1807, 22705, 1980, 7279, 231…
## $ Households_MOE <dbl> 285, 529, 1077, 944, 333, 836, 295, 542, 966, 14, 168, …
To make and show the map, all you have to do is add a few lines to
the map code from the Basic mapping lesson. I think I’ll use the code for
the map that emphasizes three-bedroom rents and uses a blue color
palette. Recall that using custom colors for the map requires the
installing (if needed) and loading (once per computing session) the RColorBrewer package, without which the code’s
col.regions = brewer.pal(9, "Blues")
line won’t work.
The package got installed and loaded in the first block of code on
this page, though. So, if you’ve run that, you should be good to go. The
only thing needed here was the addition of
, "Rentals", "Rentals_MOE", "Households", "Households_MOE"
to the zcol = c()
argument’s list of FMR_RuCo_Map fields to
display in the map’s pop-up windows.
BR3_Map <- mapview(
FMR_RuCo_Map,
zcol = "BR3",
col.regions = brewer.pal(9, "Blues"),
layer.name = "Three-bedroom rent",
popup = popupTable(
FMR_RuCo_Map,
feature.id = FALSE,
row.numbers = FALSE,
zcol = c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4",
"Rentals", "Rentals_MOE", "Households", "Households_MOE")))
# Showing the map
BR3_Map
Earlier, I promised I’d show you how to look up American Community Survey variable names and what they mean.
The actual ACS questionnaire isn’t all that large. The 2025 survey contains eight pages of questions for each person in a household, plus four pages of question about the entire household. As major surveys go, that’s pretty reasonable. There are separate, similar-sized questionnaires for people in group quarters and in Puerto Rico, a U.S. territory. But the Census Bureau uses that data to produce nearly 50,000 estimates of various social, economic, housing, and demographic characteristics.
Fortunately, the Bureau makes it fairly easy to search through all of those variables to find what you need. You can find a description of each variable in one of three digital documents, called codebooks, published with each ACS data release.
This code use the tidycensus package’s load_variables() function to retrieve the codebooks for a specified ACS dataset (here, the 2023 dataset, which is the latest avaialble as of this writing) and put all three of them under the “Environment” tab in RStudio in data frames called DetailedTables, ProfileTables, and SubjectTables:
# ACS codebooks
DetailedTables <- load_variables(2023, "acs5", cache = TRUE)
ProfileTables <- load_variables(2023, "acs5/profile", cache = TRUE)
SubjectTables <- load_variables(2023, "acs5/subject", cache = TRUE)
Of the three codebooks, ProfileTables is probably the one to start with. It’s the smallest - 1,364 variables in 2023 - and contains entries for the most-frequently-requested ACS estimates. I used it to find the variable names for both the rental unit count (DP04_0047) and the total occupied housing count (DP04_0045).
Find ProfileTables under RStudio’s Environment tab, click on it, and RStudio will open it under a tab in RStudio’s upper-left pane. You’ll then be able to scroll through the codebook. The “name” column will contain the variable names. The “label” column will explain what the variable estimates (although not always terribly clearly). The “concept” column will give you the general topic that the variable fits into.
Some tips:
Clicking View / Panes / Zoom Source will give you a better view. So will using your mouse to grab the right border of the codebook’s “label” column and move the border farther to the right.
Paying attention to the concept column can be important. For example, the ProfileTables codebook contains variables for the United States and also for Puerto Rico. If you want to estimate something about a place in the U.S., make sure you’re not trying to do so with a variable for Puerto Rico, and vice versa. Another hint is that Puerto Rico variable names always end with “PR.”
By default, the variables are sorted alphabetically by variable name. Simply scroll down until you find DP04_0045, the variable that estimates total occupied housing units. A little below it, you’ll find DP04_0047, the variable that estimates total occupied rental units:
Scroll around a little, and you’ll perhaps begin to understand how much this course could pay off for you. There’s a wide range of relevant, high-quality, highly localizable data here, all freely available to the relatively few people who (like, increasingly, you) know how to get it and draw insights from it.
By now, you have a pretty good idea of how R code works and how to get it to do fundamental things like retrieve, sort, filter, mutate, merge and save data. You’ve also begun to realize just how much valuable raw data the Web can offer. Starting with the next lesson, we will take a closer look at how to dig into American Community Survey data about things other than rent. We’ll also start looking at all the graphics R can produce besides the data-driven maps you have been learning to make. Finally, we’ll start looking at sources of data other than HUD and the American Community Survey.