How about a more informative map?

Click one of the ZIP codes in this map of fair market rent estimates, and you’ll see that the pop-up window includes some new information. Specifically, each pop-up window now shows not only the ZIP code’s fair market rent for each rental unit size but also the ZIP code’s total number of rental units and total households - that is, rental households and owner-occupied households.

The new information makes the map more useful. It shows that although those eastern ZIP codes, like 37149 and 37118, have the area’s cheapest rental homes, actually finding an available one could be challenging, given how few rental homes they contain. In fact, neither ZIP code offers very many places to live of any kind.

By contrast, there are over 23,000 places to live, nearly 12,000 of them rentals, just one ZIP code west, in 37130, the area surrounding MTSU’s campus.

Wait … who is Moe?

“Moe” is a “what,” not a “who.” But if thinking about a guy named Moe helps you, then go with it.

In the map’s pop-up windows, the “MOE” part of the “Renter MOE (±)” and “Occupied MOE (±)” labels refers to margin of error, a bit of statistical jargon that may need some explaining.

The rental and total housing figures are not actual counts. Instead, they are estimates based on random sampling. Estimates are a lot more practical than actual counts. After all, the actual count of rental homes in a given ZIP code probably fluctuates all the time, as new rental homes become available and others go off the market.

But at any given moment, an actual count of rental homes available in a given ZIP code really does exist, even if nobody knows exactly what it is. Coupled with the estimate, the margin of error tells you the range in which that actual count probably falls. This range is called the estimate’s confidence interval.

An example might help. Suppose the estimated number of rentals in a ZIP code is 11,891, and the margin of error is 921. To find the range in which the real count for the ZIP code probably falls, all you have to do is subtract the margin of error from the estimate to get the low end of the range, and add the margin of error to the estimate to get the high end of the range.

Do that for the the example area code, and you’ll find that the real count of rental homes is probably somewhere between 11,891 - 921 = 10,970 and 11,891 + 921 = 12,812. The figures are for a “90 percent confidence interval,” which means we can be 90 percent sure the real count is somewhere in that range, and there’s a (tolerable) 10 percent chance that it is either above or below that range.

To put it more succinctly: We can be 90 percent sure that the ZIP code has between 10,970 and 12,812 rental homes available. The estimate isn’t entirely useless, either; 11,891 is the figure likeliest to match the real figure. So, it’s OK to use it as the figure for the ZIP code, as long as you keep in mind that it’s just an estimate.

Here’s one way to visualize each ZIP code’s estimate and confidence interval:



The “what” and “why” of an API

The rental and total household estimates and error margins all came from an application programming interface, or API, maintained by the U.S. Census Bureau - specifically, the API for the Bureau’s annual American Community Survey.

An API is simply an efficient way for an organization to share data with external users (like you), particularly when the total amount of data the organization wants to make available would be much too large to fit on a typical user’s computer. That’s certainly true of the Census Bureau’s ACS data. The data offer nearly 50,000 annual estimates of social, economic, housing, and demographic characteristics, each available for the entire U.S. as well as for individual states, regions, federal and state political districts, metro areas, counties, cities, places, school districts, neighborhoods, and - yes - ZIP codes. Using the bureau’s API lets you locate and download only the data you need.

This lesson will show you how to get the bureau’s API to give you ZIP-code-level American Community Survey estimates of housing unit and rental housing unit counts. Learn that, and you’ll know how to get anything else you want out of the bureau’s API, making you incredibly valuable to any of the many nonprofit, governmental, business, and research organizations that need current, high-quality, free information about the people living in a particular area.

You’ll learn in the next lesson how to add the new information to the rent map.

Getting your (free) Census API key

Like a lot of APIs, the Census API requires you to get an access code, called a key, then use the key whenever you request data from the API. Some API keys cost money, and sometimes a lot of money. But a Census API key is free, mainly because the Census Bureau is a taxpayer-funded government agency.

All you have to do is visit https://api.census.gov/data/key_signup.html, fill out and submit the form, check your e-mail after a few minutes, and click the activation link inside the e-mail the Census Bureau will send you. Here’s a picture of the e-mail that delivered my API key. I have blurred the key, because you’re supposed to keep your key a secret.

Remember to click the activation link inside the e-mail, and to do it within 48 hours. If you take longer to activate your key, the key will stop working.

The tidycensus package

There’s a hard way to access the Census API. I vote for using the easy way: the tidycensus package, developed by Kyle Walker, Ph.D., professor of geography at Texas Christian University, spatial data analysis consultant, and author of the (free) online book Analyzing US Census Data: Methods, Maps, and Models in R.

Install and load tidycensus. To get started, use if (!require("tidycensus")) install.packages("tidycensus") and library(tidycensus) to add tidycensus to the code for installing and loading the usual lineup of R packages:

# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("gt"))
  install.packages("gt")
if (!require("leaflet"))
  install.packages("leaflet")
if (!require("leafpop"))
  install.packages("leafpop")
if (!require("sf"))
  install.packages("sf")
if (!require("RColorBrewer"))
  install.packages("RColorBrewer")
if (!require("classInt"))
  install.packages("classInt")     
if (!require("scales"))
  install.packages("scales")       
if (!require("htmlwidgets"))
  install.packages("htmlwidgets")  
if (!require("tidycensus"))
  install.packages("tidycensus")   # << added for ACS join

library(tidyverse)
library(gt)
library(sf)
library(leaflet)
library(leafpop)
library(RColorBrewer)
library(classInt)
library(scales)
library(htmlwidgets)   
library(tidycensus)    # << added

Provide you API key. Next, use the tidycensus package’s census_api_key() function to share your API key with R. Simply replace PasteYourAPIKeyBetweenTheseQuoteMarks with your API key. I suggest copying and pasting the key directly from the e-mail you received, taking care to include all of the key’s characters but nothing else, like the spaces before and after the key. Trying to type the key manually is not a good idea. You’re nearly certain to make at least one typing error, and the key won’t work unless it is exactly correct.

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

Getting the goods. This code retrieves the rental and household counts using the tidycensus package’s get_acs() function and stores the results in a data frame we’ll call Census_Data.

Census_Data <- get_acs(
  geography = "zcta",
  variables = c("DP04_0047", "DP04_0045"),
  year = 2024,
  survey = "acs5",
  output = "wide",
  geometry = FALSE
)

The get_acs() function has a bunch of arguments that you can use to specify exactly what data you want. In this case:

Glimpse what you got

The resulting data frame, Census_Data, is too big to view in a gt table, the way we have been doing with some of the data frames we have worked with so far. The easiest alternative is find Census_Data in RStudio’s Environment tab and click on it. Doing so will open the data frame in RStudio’s spreadsheet-like data viewer.

Another, and sometimes handy, approach to viewing large data frames is to use the dplyr package’s glimpse() function, which can display all of the variables in a data frame as a vertical column, along with columns to the right for the first several cases:

glimpse(Census_Data)
## Rows: 33,772
## Columns: 6
## $ GEOID      <chr> "00601", "00602", "00603", "00606", "00610", "00611", "0061…
## $ NAME       <chr> "ZCTA5 00601", "ZCTA5 00602", "ZCTA5 00603", "ZCTA5 00606",…
## $ DP04_0047E <dbl> 1877, 3425, 8600, 381, 2725, 250, 8760, 940, 2341, 911, 375…
## $ DP04_0047M <dbl> 271, 472, 652, 130, 379, 134, 692, 260, 384, 219, 528, 298,…
## $ DP04_0045E <dbl> 5768, 12954, 20131, 1860, 9604, 679, 24426, 3580, 9063, 325…
## $ DP04_0045M <dbl> 303, 592, 780, 182, 524, 173, 977, 421, 472, 348, 713, 419,…

Either way you look at the Census_Data data frame, you can see that the Census API gave us 33,772 rows (one for each ZIP code in the U.S.) of data for six variables:

Note that the Census API doesn’t make you ask for the error margins. It just gives them to you automatically. It also automatically tacks the “E” and “M” characters onto the end of the variable name to let you know which column contains the estimates (xxxx_xxxxE) and which contains the error margins (xxxx_xxxxM).

Creating better column names

Keeping the original Census variable names wouldn’t hurt a thing. They aren’t all that intuitive, though. So, let’s use the dplyr package’s transmute() function to replace them with names that will making it easier to keep track of what’s what. Handily, the transmute() function also drops any variables that don’t get renamed. So, we can use it to get rid of the NAME column, which we won’t need.

Here’s the code, and then another glimpse() function so you can see the results:

Census_Data <- Census_Data %>%
  transmute(
    ZIP = GEOID,
    Rentals         = DP04_0047E,
    Rentals_MOE     = DP04_0047M,
    Households      = DP04_0045E,
    Households_MOE  = DP04_0045M
  )

glimpse(Census_Data)
## Rows: 33,772
## Columns: 5
## $ ZIP            <chr> "00601", "00602", "00603", "00606", "00610", "00611", "…
## $ Rentals        <dbl> 1877, 3425, 8600, 381, 2725, 250, 8760, 940, 2341, 911,…
## $ Rentals_MOE    <dbl> 271, 472, 652, 130, 379, 134, 692, 260, 384, 219, 528, …
## $ Households     <dbl> 5768, 12954, 20131, 1860, 9604, 679, 24426, 3580, 9063,…
## $ Households_MOE <dbl> 303, 592, 780, 182, 524, 173, 977, 421, 472, 348, 713, …

Filtering for Rutherford County ZIP codes

Remember how retrieving the Fair Market Rent data initially got us Fair Market Rent for all monitored ZIP codes in the U.S., even though we wanted only the ZIP codes in Rutherford County? We had to filter the rent data for just the ZIP codes in Rutherford County.

The same thing needs to happen here, in the Census data. We have Census data for every ZIP code in the U.S. So, let’s filter it for just the Rutherford County ZIP codes we are working with:

# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------

ZIPList <- c(
  "37127", "37128", "37129", "37130", "37132",
  "37085", "37118", "37149", "37037", "37153",
  "37167", "37086"
)

Census_Data <- Census_Data %>%
  filter(ZIP %in% ZIPList)

glimpse(Census_Data)
## Rows: 12
## Columns: 5
## $ ZIP            <chr> "37037", "37085", "37086", "37118", "37127", "37128", "…
## $ Rentals        <dbl> 395, 113, 3434, 55, 1650, 10523, 8241, 11852, 0, 81, 28…
## $ Rentals_MOE    <dbl> 186, 86, 488, 59, 330, 1007, 990, 804, 14, 69, 175, 720
## $ Households     <dbl> 3128, 1992, 12887, 424, 7056, 28968, 23583, 23624, 0, 9…
## $ Households_MOE <dbl> 372, 302, 545, 155, 523, 1212, 1187, 927, 14, 198, 326,…

Where did that plot come from?

I made the plot by sneaking in a little code involving R’s plotly package. All you have to do is paste this code onto the end of your R script for fetching the Census data. It creates a data frame called plot_df that contains the ZIP, Rentals, and Rentals_MOE variables from the Census_Data data frame. Then, it produces the plot from the plot_df data frame.

# ----------------------------------------------------------
# Plotly: Rentals with ±MOE, sorted in descending order
#          + hover shows lower/upper MOE bounds
# ----------------------------------------------------------
if (!require("plotly"))
  install.packages("plotly")
library(plotly)

# Prepare a tidy, ordered data frame (drop geometry for plotting)
plot_df <- Census_Data %>%
  select(ZIP, Rentals, Rentals_MOE) %>%
  filter(!is.na(Rentals)) %>%
  arrange(desc(Rentals)) %>%
  mutate(
    # Fix ZIP order by Rentals (descending)
    ZIP = factor(ZIP, levels = ZIP),
    # Compute bounds (cap lower at 0 since counts can't be negative)
    Lower = pmax(Rentals - Rentals_MOE, 0, na.rm = TRUE),
    Upper = Rentals + Rentals_MOE,
    # Build a clean hover label with bounds
    .hover = paste0(
      "ZIP: ", ZIP,
      "<br>Renter-occupied units: ", scales::comma(Rentals),
      ifelse(
        is.na(Rentals_MOE),
        "",
        paste0(
          "<br>MOE (±): ", scales::comma(Rentals_MOE),
          "<br>Min: ", scales::comma(Lower),
          "<br>Max: ", scales::comma(Upper)
        )
      )
    )
  )

# Horizontal scatter with symmetric x-error bars (± MOE)
Rentals_MOE_Plot <- plot_ly(
  data = plot_df,
  x = ~Rentals,
  y = ~ZIP,
  type = "scatter",
  mode = "markers",
  marker = list(size = 9, color = "#2C7FB8"),
  text = ~.hover,
  hoverinfo = "text",
  error_x = list(
    type = "data",
    array = ~Rentals_MOE,     # ±MOE
    visible = TRUE,
    color = "#636363",
    thickness = 1
  )
) %>%
  layout(
    title = "Renter-Occupied Units by ZIP (± ACS MOE)",
    xaxis = list(
      title = "Renter-occupied units",
      tickformat = ",d",
      rangemode = "tozero"
    ),
    yaxis = list(
      title = "ZIP",
      categoryorder = "array",
      categoryarray = levels(plot_df$ZIP),
      autorange = "reversed"
    ),
    margin = list(l = 90, r = 30, b = 60, t = 60),
    hoverlabel = list(align = "left")
  )

Rentals_MOE_Plot

In-class exercise

Learning how to retrieve ACS data using tidycensus is a pretty big step to take in one class. So, let’s stop there. We’ll get to the map-making part next time and also have a look at where those variable names came from and how to look up names for the (many) other variables available in the ACS.

Here’s the above code, assembled and all in one place:

# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("gt"))
  install.packages("gt")
if (!require("leaflet"))
  install.packages("leaflet")
if (!require("leafpop"))
  install.packages("leafpop")
if (!require("sf"))
  install.packages("sf")
if (!require("RColorBrewer"))
  install.packages("RColorBrewer")
if (!require("classInt"))
  install.packages("classInt")     
if (!require("scales"))
  install.packages("scales")       
if (!require("htmlwidgets"))
  install.packages("htmlwidgets")  
if (!require("tidycensus"))
  install.packages("tidycensus")   # << added for ACS join

library(tidyverse)
library(gt)
library(sf)
library(leaflet)
library(leafpop)
library(RColorBrewer)
library(classInt)
library(scales)
library(htmlwidgets)   
library(tidycensus)    # << added

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

Census_Data <- get_acs(
  geography = "zcta",
  variables = c("DP04_0047", "DP04_0045"),
  year = 2024,
  survey = "acs5",
  output = "wide",
  geometry = FALSE
)

Census_Data <- Census_Data %>%
  transmute(
    ZIP = GEOID,
    Rentals         = DP04_0047E,
    Rentals_MOE     = DP04_0047M,
    Households      = DP04_0045E,
    Households_MOE  = DP04_0045M
  )

# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------

ZIPList <- c(
  "37127", "37128", "37129", "37130", "37132",
  "37085", "37118", "37149", "37037", "37153",
  "37167", "37086"
)

Census_Data <- Census_Data %>%
  filter(ZIP %in% ZIPList)

glimpse(Census_Data)

# ----------------------------------------------------------
# Plotly: Rentals with ±MOE, sorted in descending order
#          + hover shows lower/upper MOE bounds
# ----------------------------------------------------------
if (!require("plotly"))
  install.packages("plotly")
library(plotly)

# Prepare a tidy, ordered data frame (drop geometry for plotting)
plot_df <- Census_Data %>%
  select(ZIP, Rentals, Rentals_MOE) %>%
  filter(!is.na(Rentals)) %>%
  arrange(desc(Rentals)) %>%
  mutate(
    # Fix ZIP order by Rentals (descending)
    ZIP = factor(ZIP, levels = ZIP),
    # Compute bounds (cap lower at 0 since counts can't be negative)
    Lower = pmax(Rentals - Rentals_MOE, 0, na.rm = TRUE),
    Upper = Rentals + Rentals_MOE,
    # Build a clean hover label with bounds
    .hover = paste0(
      "ZIP: ", ZIP,
      "<br>Renter-occupied units: ", scales::comma(Rentals),
      ifelse(
        is.na(Rentals_MOE),
        "",
        paste0(
          "<br>MOE (±): ", scales::comma(Rentals_MOE),
          "<br>Min: ", scales::comma(Lower),
          "<br>Max: ", scales::comma(Upper)
        )
      )
    )
  )

# Horizontal scatter with symmetric x-error bars (± MOE)
Rentals_MOE_Plot <- plot_ly(
  data = plot_df,
  x = ~Rentals,
  y = ~ZIP,
  type = "scatter",
  mode = "markers",
  marker = list(size = 9, color = "#2C7FB8"),
  text = ~.hover,
  hoverinfo = "text",
  error_x = list(
    type = "data",
    array = ~Rentals_MOE,     # ±MOE
    visible = TRUE,
    color = "#636363",
    thickness = 1
  )
) %>%
  layout(
    title = "Renter-Occupied Units by ZIP (± ACS MOE)",
    xaxis = list(
      title = "Renter-occupied units",
      tickformat = ",d",
      rangemode = "tozero"
    ),
    yaxis = list(
      title = "ZIP",
      categoryorder = "array",
      categoryarray = levels(plot_df$ZIP),
      autorange = "reversed"
    ),
    margin = list(l = 90, r = 30, b = 60, t = 60),
    hoverlabel = list(align = "left")
  )

Rentals_MOE_Plot

Pro tip

If you don’t want to have to supply your API key every time you use TidyCensus, you can run this code to store your API key in your R installation’s environment. From then on, as long as you are using the same computer, you can skip running the census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks") code.

# Run once to save your key for future sessions

census_api_key("PASTE_YOUR_KEY_HERE", install = TRUE)
readRenviron("~/.Renviron")  # activate without restarting R
Sys.getenv("CENSUS_API_KEY") # optional check

For now, paste the above code into an R script window, add your personal API key (or use the “environmental install” option), run the code, and show me the results. If I can see the graphic, and if the graphic shows valid data, you’ll be good to go.