Estimating rental home counts per ZIP code

How about a more informative map?

Click one of the ZIP codes in this map of fair market rent estimates, and you’ll see that the pop-up window includes some new information. Specifically, each pop-up window now shows not only the ZIP code’s fair market rent for each rental unit size but also the ZIP code’s total number of rental units and total households - that is, rental households and owner-occupied households.

The new information makes the map more useful. It shows that although those eastern ZIP codes, like 37149 and 37118, have the area’s cheapest rental homes, actually finding an available one could be challenging, given how few rental homes they contain. In fact, neither ZIP code offers very many places to live of any kind.

By contrast, there are over 23,000 places to live, nearly 12,000 of them rentals, just one ZIP code west, in 37130, the area surrounding MTSU’s campus.

Wait … who is Moe?

“Moe” is a “what,” not a “who.” But if thinking about a guy named Moe helps you, then go with it.

In the map’s pop-up windows, the “MOE” part of the “Renter MOE (±)” and “Occupied MOE (±)” labels refers to margin of error, a bit of statistical jargon that may need some explaining.

The rental and total housing figures are not actual counts. Instead, they are estimates based on random sampling. Estimates are a lot more practical than actual counts. After all, the actual count of rental homes in a given ZIP code probably fluctuates all the time, as new rental homes become available and others go off the market.

But at any given moment, an actual count of rental homes available in a given ZIP code really does exist, even if nobody knows exactly what it is. Coupled with the estimate, the margin of error tells you the range in which that actual count probably falls. This range is called the estimate’s confidence interval.

An example might help. Suppose the estimated number of rentals in a ZIP code is 11,891, and the margin of error is 921. To find the range in which the real count for the ZIP code probably falls, all you have to do is subtract the margin of error from the estimate to get the low end of the range, and add the margin of error to the estimate to get the high end of the range.

Do that for the the example area code, and you’ll find that the real count of rental homes is probably somewhere between 11,891 - 921 = 10,970 and 11,891 + 921 = 12,812. The figures are for a “90 percent confidence interval,” which means we can be 90 percent sure the real count is somewhere in that range, and there’s a (tolerable) 10 percent chance that it is either above or below that range.

To put it more succinctly: We can be 90 percent sure that the ZIP code has between 10,970 and 12,812 rental homes available. The estimate isn’t entirely useless, either; 11,891 is the figure likeliest to match the real figure. So, it’s OK to use it as the figure for the ZIP code, as long as you keep in mind that it’s just an estimate.

Here’s one way to visualize each ZIP code’s estimate and confidence interval:

The “what” and “why” of an API

The rental and total household estimates and error margins all came from an application programming interface, or API, maintained by the U.S. Census Bureau - specifically, the API for the Bureau’s annual American Community Survey.

An API is simply an efficient way for an organization to share data with external users (like you), particularly when the total amount of data the organization wants to make available would be much too large to fit on a typical user’s computer. That’s certainly true of the Census Bureau’s ACS data. The data offer nearly 50,000 annual estimates of social, economic, housing, and demographic characteristics, each available for the entire U.S. as well as for individual states, regions, federal and state political districts, metro areas, counties, cities, places, school districts, neighborhoods, and - yes - ZIP codes. Using the bureau’s API lets you locate and download only the data you need.

This lesson will show you how to get the bureau’s API to give you ZIP-code-level American Community Survey estimates of housing unit and rental housing unit counts. Learn that, and you’ll know how to get anything else you want out of the bureau’s API, making you incredibly valuable to any of the many nonprofit, governmental, business, and research organizations that need current, high-quality, free information about the people living in a particular area.

You’ll learn in the next lesson how to add the new information to the rent map.

Getting your (free) Census API key

Like a lot of APIs, the Census API requires you to get an access code, called a key, then use the key whenever you request data from the API. Some API keys cost money, and sometimes a lot of money. But a Census API key is free, mainly because the Census Bureau is a taxpayer-funded government agency.

All you have to do is visit https://api.census.gov/data/key_signup.html, fill out and submit the form, check your e-mail after a few minutes, and click the activation link inside the e-mail the Census Bureau will send you. Here’s a picture of the e-mail that delivered my API key. I have blurred the key, because you’re supposed to keep your key a secret.

Remember to click the activation link inside the e-mail, and to do it within 48 hours. If you take longer to activate your key, the key will stop working.

The tidycensus package

There’s a hard way to access the Census API. I vote for using the easy way: the tidycensus package, developed by Kyle Walker, Ph.D., professor of geography at Texas Christian University, spatial data analysis consultant, and author of the (free) online book Analyzing US Census Data: Methods, Maps, and Models in R.

Install and load tidycensus. To get started, use if (!require("tidycensus")) install.packages("tidycensus") and library(tidycensus) to add tidycensus to the code for installing and loading the usual lineup of R packages:

# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("gt"))
  install.packages("gt")
if (!require("leaflet"))
  install.packages("leaflet")
if (!require("leafpop"))
  install.packages("leafpop")
if (!require("sf"))
  install.packages("sf")
if (!require("RColorBrewer"))
  install.packages("RColorBrewer")
if (!require("classInt"))
  install.packages("classInt")     
if (!require("scales"))
  install.packages("scales")       
if (!require("htmlwidgets"))
  install.packages("htmlwidgets")  
if (!require("tidycensus"))
  install.packages("tidycensus")   # << added for ACS join

library(tidyverse)
library(gt)
library(sf)
library(leaflet)
library(leafpop)
library(RColorBrewer)
library(classInt)
library(scales)
library(htmlwidgets)   
library(tidycensus)    # << added

Provide you API key. Next, use the tidycensus package’s census_api_key() function to share your API key with R. Simply replace PasteYourAPIKeyBetweenTheseQuoteMarks with your API key. I suggest copying and pasting the key directly from the e-mail you received, taking care to include all of the key’s characters but nothing else, like the spaces before and after the key. Trying to type the key manually is not a good idea. You’re nearly certain to make at least one typing error, and the key won’t work unless it is exactly correct.

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

Getting the goods. This code retrieves the rental and household counts using the tidycensus package’s get_acs() function and stores the results in a data frame we’ll call Census_Data.

Census_Data <- get_acs(
  geography = "zcta",
  variables = c("DP04_0047", "DP04_0045"),
  year = 2024,
  survey = "acs5",
  output = "wide",
  geometry = FALSE
)

The get_acs() function has a bunch of arguments that you can use to specify exactly what data you want. In this case:

geography = "zcta" tells tidycensus you want the data broken down by ZIP code. Recall that a ZCTA, or ZIP Code Tabulation Area, is a Census-created geographic area that corresponds, at least roughly, to a ZIP code used by the U.S. Postal Service. Each ZCTA has a corresponding ZIP code, but not all ZIP codes have a corresponding ZCTA. The tidycensus package lets you get data for many other geographic areas other than ZCTAs. For a list, and the code that specifies each one, see the Geography in tidycensus section of the Basic usage of tidycensus guide.
variables = c("DP04_0047", "DP04_0045") tells tidycensus to retrieve two variables: DPO4_0047, which estimates the number of rental units, and and DPO4_0045, which estimates the total number of housing units, whether renter-occupied or otherwise. In the next lesson, I’ll show you how to look up American Community Survey variable names and what they mean. For now, understand that the c() part of this code is base R’s “combine” function, which combines the things between the parentheses (and separated by commas) into a vector or a list. Vectors and lists are both data storage structures, although they have different properties. Here, it’s being used to hold the list of variable names you want tidycensus to retrieve.
year = 2023 tells tidycensus which year you want data for. At present, 2023 is the latest year available. See the Census Bureau’s release schedule to learn what data will be released when.
survey = "acs5" tells tidycensus you want the five-year American Community Survey dataset. In most cases, that’s the one you will want. It covers a five-year period ending with the specified year. For example, the 2023 five-year dataset’s estimates cover 2019-2023, not 20203 in particular. But five-year datasets contain data for all places in the U.S. There is a one-year dataset that you can retrieve using a survey = "acs1" argument. It covers just the year specified, so it’s more up to date that the five-year dataset released for the same year. But it includes data only for places with 60,000 or more residents.
output = "wide" tells tidycensus to format the data so that each column name reflects the variable the column contains data for. I find the alternative format, output = "tidy", harder to work with, especially when multiple variables are being requested.
geometry = FALSE tells tidycensus to give you just the data, rather than the data plus a “geometry” column containing borders for the geographic region specified. I’ll say more about the latter option later in the lesson. It can come in pretty handy. In this case, we already have a geometry column showing the ZIP code boundaries. We don’t need another.

Glimpse what you got

The resulting data frame, Census_Data, is too big to view in a gt table, the way we have been doing with some of the data frames we have worked with so far. The easiest alternative is find Census_Data in RStudio’s Environment tab and click on it. Doing so will open the data frame in RStudio’s spreadsheet-like data viewer.

Another, and sometimes handy, approach to viewing large data frames is to use the dplyr package’s glimpse() function, which can display all of the variables in a data frame as a vertical column, along with columns to the right for the first several cases:

glimpse(Census_Data)

## Rows: 33,772
## Columns: 6
## $ GEOID      <chr> "00601", "00602", "00603", "00606", "00610", "00611", "0061…
## $ NAME       <chr> "ZCTA5 00601", "ZCTA5 00602", "ZCTA5 00603", "ZCTA5 00606",…
## $ DP04_0047E <dbl> 1877, 3425, 8600, 381, 2725, 250, 8760, 940, 2341, 911, 375…
## $ DP04_0047M <dbl> 271, 472, 652, 130, 379, 134, 692, 260, 384, 219, 528, 298,…
## $ DP04_0045E <dbl> 5768, 12954, 20131, 1860, 9604, 679, 24426, 3580, 9063, 325…
## $ DP04_0045M <dbl> 303, 592, 780, 182, 524, 173, 977, 421, 472, 348, 713, 419,…

Either way you look at the Census_Data data frame, you can see that the Census API gave us 33,772 rows (one for each ZIP code in the U.S.) of data for six variables:

GEOID, a character variable consisting of each five-digit ZIP code.
NAME, a character variable showing each ZIP code preceded by “ZCTA5,” and a space.
DP04_0047E, the estimated number of rental units in each ZIP code. The “E” at the end stands for “Estimate.”
DP04_0047M, the rental unit estimate’s error margin (see the “Wait … who is Moe?” explanation above). The “M” at the end stands for “Margin.”
DP04_0045E, the estimated number of total housing units - both rental and otherwise - in each ZIP code. Again, the “E” stands for “Estimate.”
DP04_0045M, the housing unit estimate’s error margin. Again, the “M” stands for “Margin.”

Note that the Census API doesn’t make you ask for the error margins. It just gives them to you automatically. It also automatically tacks the “E” and “M” characters onto the end of the variable name to let you know which column contains the estimates (xxxx_xxxxE) and which contains the error margins (xxxx_xxxxM).

Creating better column names

Keeping the original Census variable names wouldn’t hurt a thing. They aren’t all that intuitive, though. So, let’s use the dplyr package’s transmute() function to replace them with names that will making it easier to keep track of what’s what. Handily, the transmute() function also drops any variables that don’t get renamed. So, we can use it to get rid of the NAME column, which we won’t need.

Here’s the code, and then another glimpse() function so you can see the results:

Census_Data <- Census_Data %>%
  transmute(
    ZIP = GEOID,
    Rentals         = DP04_0047E,
    Rentals_MOE     = DP04_0047M,
    Households      = DP04_0045E,
    Households_MOE  = DP04_0045M
  )

glimpse(Census_Data)

## Rows: 33,772
## Columns: 5
## $ ZIP            <chr> "00601", "00602", "00603", "00606", "00610", "00611", "…
## $ Rentals        <dbl> 1877, 3425, 8600, 381, 2725, 250, 8760, 940, 2341, 911,…
## $ Rentals_MOE    <dbl> 271, 472, 652, 130, 379, 134, 692, 260, 384, 219, 528, …
## $ Households     <dbl> 5768, 12954, 20131, 1860, 9604, 679, 24426, 3580, 9063,…
## $ Households_MOE <dbl> 303, 592, 780, 182, 524, 173, 977, 421, 472, 348, 713, …

Filtering for Rutherford County ZIP codes

Remember how retrieving the Fair Market Rent data initially got us Fair Market Rent for all monitored ZIP codes in the U.S., even though we wanted only the ZIP codes in Rutherford County? We had to filter the rent data for just the ZIP codes in Rutherford County.

The same thing needs to happen here, in the Census data. We have Census data for every ZIP code in the U.S. So, let’s filter it for just the Rutherford County ZIP codes we are working with:

# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------

ZIPList <- c(
  "37127", "37128", "37129", "37130", "37132",
  "37085", "37118", "37149", "37037", "37153",
  "37167", "37086"
)

Census_Data <- Census_Data %>%
  filter(ZIP %in% ZIPList)

glimpse(Census_Data)

## Rows: 12
## Columns: 5
## $ ZIP            <chr> "37037", "37085", "37086", "37118", "37127", "37128", "…
## $ Rentals        <dbl> 395, 113, 3434, 55, 1650, 10523, 8241, 11852, 0, 81, 28…
## $ Rentals_MOE    <dbl> 186, 86, 488, 59, 330, 1007, 990, 804, 14, 69, 175, 720
## $ Households     <dbl> 3128, 1992, 12887, 424, 7056, 28968, 23583, 23624, 0, 9…
## $ Households_MOE <dbl> 372, 302, 545, 155, 523, 1212, 1187, 927, 14, 198, 326,…

Where did that plot come from?

I made the plot by sneaking in a little code involving R’s plotly package. All you have to do is paste this code onto the end of your R script for fetching the Census data. It creates a data frame called plot_df that contains the ZIP, Rentals, and Rentals_MOE variables from the Census_Data data frame. Then, it produces the plot from the plot_df data frame.

# ----------------------------------------------------------
# Plotly: Rentals with ±MOE, sorted in descending order
#          + hover shows lower/upper MOE bounds
# ----------------------------------------------------------
if (!require("plotly"))
  install.packages("plotly")
library(plotly)

# Prepare a tidy, ordered data frame (drop geometry for plotting)
plot_df <- Census_Data %>%
  select(ZIP, Rentals, Rentals_MOE) %>%
  filter(!is.na(Rentals)) %>%
  arrange(desc(Rentals)) %>%
  mutate(
    # Fix ZIP order by Rentals (descending)
    ZIP = factor(ZIP, levels = ZIP),
    # Compute bounds (cap lower at 0 since counts can't be negative)
    Lower = pmax(Rentals - Rentals_MOE, 0, na.rm = TRUE),
    Upper = Rentals + Rentals_MOE,
    # Build a clean hover label with bounds
    .hover = paste0(
      "ZIP: ", ZIP,
      "<br>Renter-occupied units: ", scales::comma(Rentals),
      ifelse(
        is.na(Rentals_MOE),
        "",
        paste0(
          "<br>MOE (±): ", scales::comma(Rentals_MOE),
          "<br>Min: ", scales::comma(Lower),
          "<br>Max: ", scales::comma(Upper)
        )
      )
    )
  )

# Horizontal scatter with symmetric x-error bars (± MOE)
Rentals_MOE_Plot <- plot_ly(
  data = plot_df,
  x = ~Rentals,
  y = ~ZIP,
  type = "scatter",
  mode = "markers",
  marker = list(size = 9, color = "#2C7FB8"),
  text = ~.hover,
  hoverinfo = "text",
  error_x = list(
    type = "data",
    array = ~Rentals_MOE,     # ±MOE
    visible = TRUE,
    color = "#636363",
    thickness = 1
  )
) %>%
  layout(
    title = "Renter-Occupied Units by ZIP (± ACS MOE)",
    xaxis = list(
      title = "Renter-occupied units",
      tickformat = ",d",
      rangemode = "tozero"
    ),
    yaxis = list(
      title = "ZIP",
      categoryorder = "array",
      categoryarray = levels(plot_df$ZIP),
      autorange = "reversed"
    ),
    margin = list(l = 90, r = 30, b = 60, t = 60),
    hoverlabel = list(align = "left")
  )

Rentals_MOE_Plot

In-class exercise

Learning how to retrieve ACS data using tidycensus is a pretty big step to take in one class. So, let’s stop there. We’ll get to the map-making part next time and also have a look at where those variable names came from and how to look up names for the (many) other variables available in the ACS.

Here’s the above code, assembled and all in one place:

# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("gt"))
  install.packages("gt")
if (!require("leaflet"))
  install.packages("leaflet")
if (!require("leafpop"))
  install.packages("leafpop")
if (!require("sf"))
  install.packages("sf")
if (!require("RColorBrewer"))
  install.packages("RColorBrewer")
if (!require("classInt"))
  install.packages("classInt")     
if (!require("scales"))
  install.packages("scales")       
if (!require("htmlwidgets"))
  install.packages("htmlwidgets")  
if (!require("tidycensus"))
  install.packages("tidycensus")   # << added for ACS join

library(tidyverse)
library(gt)
library(sf)
library(leaflet)
library(leafpop)
library(RColorBrewer)
library(classInt)
library(scales)
library(htmlwidgets)   
library(tidycensus)    # << added

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

Census_Data <- get_acs(
  geography = "zcta",
  variables = c("DP04_0047", "DP04_0045"),
  year = 2024,
  survey = "acs5",
  output = "wide",
  geometry = FALSE
)

Census_Data <- Census_Data %>%
  transmute(
    ZIP = GEOID,
    Rentals         = DP04_0047E,
    Rentals_MOE     = DP04_0047M,
    Households      = DP04_0045E,
    Households_MOE  = DP04_0045M
  )

# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------

ZIPList <- c(
  "37127", "37128", "37129", "37130", "37132",
  "37085", "37118", "37149", "37037", "37153",
  "37167", "37086"
)

Census_Data <- Census_Data %>%
  filter(ZIP %in% ZIPList)

glimpse(Census_Data)

# ----------------------------------------------------------
# Plotly: Rentals with ±MOE, sorted in descending order
#          + hover shows lower/upper MOE bounds
# ----------------------------------------------------------
if (!require("plotly"))
  install.packages("plotly")
library(plotly)

# Prepare a tidy, ordered data frame (drop geometry for plotting)
plot_df <- Census_Data %>%
  select(ZIP, Rentals, Rentals_MOE) %>%
  filter(!is.na(Rentals)) %>%
  arrange(desc(Rentals)) %>%
  mutate(
    # Fix ZIP order by Rentals (descending)
    ZIP = factor(ZIP, levels = ZIP),
    # Compute bounds (cap lower at 0 since counts can't be negative)
    Lower = pmax(Rentals - Rentals_MOE, 0, na.rm = TRUE),
    Upper = Rentals + Rentals_MOE,
    # Build a clean hover label with bounds
    .hover = paste0(
      "ZIP: ", ZIP,
      "<br>Renter-occupied units: ", scales::comma(Rentals),
      ifelse(
        is.na(Rentals_MOE),
        "",
        paste0(
          "<br>MOE (±): ", scales::comma(Rentals_MOE),
          "<br>Min: ", scales::comma(Lower),
          "<br>Max: ", scales::comma(Upper)
        )
      )
    )
  )

# Horizontal scatter with symmetric x-error bars (± MOE)
Rentals_MOE_Plot <- plot_ly(
  data = plot_df,
  x = ~Rentals,
  y = ~ZIP,
  type = "scatter",
  mode = "markers",
  marker = list(size = 9, color = "#2C7FB8"),
  text = ~.hover,
  hoverinfo = "text",
  error_x = list(
    type = "data",
    array = ~Rentals_MOE,     # ±MOE
    visible = TRUE,
    color = "#636363",
    thickness = 1
  )
) %>%
  layout(
    title = "Renter-Occupied Units by ZIP (± ACS MOE)",
    xaxis = list(
      title = "Renter-occupied units",
      tickformat = ",d",
      rangemode = "tozero"
    ),
    yaxis = list(
      title = "ZIP",
      categoryorder = "array",
      categoryarray = levels(plot_df$ZIP),
      autorange = "reversed"
    ),
    margin = list(l = 90, r = 30, b = 60, t = 60),
    hoverlabel = list(align = "left")
  )

Rentals_MOE_Plot

Pro tip

If you don’t want to have to supply your API key every time you use TidyCensus, you can run this code to store your API key in your R installation’s environment. From then on, as long as you are using the same computer, you can skip running the census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks") code.

# Run once to save your key for future sessions

census_api_key("PASTE_YOUR_KEY_HERE", install = TRUE)
readRenviron("~/.Renviron")  # activate without restarting R
Sys.getenv("CENSUS_API_KEY") # optional check

For now, paste the above code into an R script window, add your personal API key (or use the “environmental install” option), run the code, and show me the results. If I can see the graphic, and if the graphic shows valid data, you’ll be good to go.