Click one of the ZIP codes in this map of fair market rent estimates, and you’ll see that the pop-up window includes some new information. Specifically, each pop-up window now shows not only the ZIP code’s fair market rent for each rental unit size but also the ZIP code’s total number of rental units and total households - that is, rental households and owner-occupied households.
The new information makes the map more useful. It shows that although those eastern ZIP codes, like 37149 and 37118, have the area’s cheapest rental homes, actually finding an available one could be challenging, given how few rental homes they contain. In fact, neither ZIP code offers very many places to live of any kind.
By contrast, there are over 23,000 places to live, nearly 12,000 of them rentals, just one ZIP code west, in 37130, the area surrounding MTSU’s campus.
“Moe” is a “what,” not a “who.” But if thinking about a guy named Moe helps you, then go with it.
In the map’s pop-up windows, the “MOE” part of the “Renter MOE (±)” and “Occupied MOE (±)” labels refers to margin of error, a bit of statistical jargon that may need some explaining.
The rental and total housing figures are not actual counts. Instead, they are estimates based on random sampling. Estimates are a lot more practical than actual counts. After all, the actual count of rental homes in a given ZIP code probably fluctuates all the time, as new rental homes become available and others go off the market.
But at any given moment, an actual count of rental homes available in a given ZIP code really does exist, even if nobody knows exactly what it is. Coupled with the estimate, the margin of error tells you the range in which that actual count probably falls. This range is called the estimate’s confidence interval.
An example might help. Suppose the estimated number of rentals in a ZIP code is 11,891, and the margin of error is 921. To find the range in which the real count for the ZIP code probably falls, all you have to do is subtract the margin of error from the estimate to get the low end of the range, and add the margin of error to the estimate to get the high end of the range.
Do that for the the example area code, and you’ll find that the real count of rental homes is probably somewhere between 11,891 - 921 = 10,970 and 11,891 + 921 = 12,812. The figures are for a “90 percent confidence interval,” which means we can be 90 percent sure the real count is somewhere in that range, and there’s a (tolerable) 10 percent chance that it is either above or below that range.
To put it more succinctly: We can be 90 percent sure that the ZIP code has between 10,970 and 12,812 rental homes available. The estimate isn’t entirely useless, either; 11,891 is the figure likeliest to match the real figure. So, it’s OK to use it as the figure for the ZIP code, as long as you keep in mind that it’s just an estimate.
Here’s one way to visualize each ZIP code’s estimate and confidence interval:
The rental and total household estimates and error margins all came from an application programming interface, or API, maintained by the U.S. Census Bureau - specifically, the API for the Bureau’s annual American Community Survey.
An API is simply an efficient way for an organization to share data with external users (like you), particularly when the total amount of data the organization wants to make available would be much too large to fit on a typical user’s computer. That’s certainly true of the Census Bureau’s ACS data. The data offer nearly 50,000 annual estimates of social, economic, housing, and demographic characteristics, each available for the entire U.S. as well as for individual states, regions, federal and state political districts, metro areas, counties, cities, places, school districts, neighborhoods, and - yes - ZIP codes. Using the bureau’s API lets you locate and download only the data you need.
This lesson will show you how to get the bureau’s API to give you ZIP-code-level American Community Survey estimates of housing unit and rental housing unit counts. Learn that, and you’ll know how to get anything else you want out of the bureau’s API, making you incredibly valuable to any of the many nonprofit, governmental, business, and research organizations that need current, high-quality, free information about the people living in a particular area.
You’ll learn in the next lesson how to add the new information to the rent map.
Like a lot of APIs, the Census API requires you to get an access code, called a key, then use the key whenever you request data from the API. Some API keys cost money, and sometimes a lot of money. But a Census API key is free, mainly because the Census Bureau is a taxpayer-funded government agency.
All you have to do is visit https://api.census.gov/data/key_signup.html, fill out and submit the form, check your e-mail after a few minutes, and click the activation link inside the e-mail the Census Bureau will send you. Here’s a picture of the e-mail that delivered my API key. I have blurred the key, because you’re supposed to keep your key a secret.
Remember to click the activation link inside the e-mail, and to do it within 48 hours. If you take longer to activate your key, the key will stop working.
There’s a hard way to access the Census API. I vote for using the easy way: the tidycensus package, developed by Kyle Walker, Ph.D., professor of geography at Texas Christian University, spatial data analysis consultant, and author of the (free) online book Analyzing US Census Data: Methods, Maps, and Models in R.
Install and load tidycensus. To get started, use
if (!require("tidycensus")) install.packages("tidycensus")
and library(tidycensus) to add tidycensus to the code for
installing and loading the usual lineup of R packages:
# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------
if (!require("tidyverse"))
install.packages("tidyverse")
if (!require("gt"))
install.packages("gt")
if (!require("leaflet"))
install.packages("leaflet")
if (!require("leafpop"))
install.packages("leafpop")
if (!require("sf"))
install.packages("sf")
if (!require("RColorBrewer"))
install.packages("RColorBrewer")
if (!require("classInt"))
install.packages("classInt")
if (!require("scales"))
install.packages("scales")
if (!require("htmlwidgets"))
install.packages("htmlwidgets")
if (!require("tidycensus"))
install.packages("tidycensus") # << added for ACS join
library(tidyverse)
library(gt)
library(sf)
library(leaflet)
library(leafpop)
library(RColorBrewer)
library(classInt)
library(scales)
library(htmlwidgets)
library(tidycensus) # << added
Provide you API key. Next, use the tidycensus
package’s census_api_key() function to share your API key
with R. Simply
replace PasteYourAPIKeyBetweenTheseQuoteMarks with your API
key. I suggest copying and pasting the key directly
from the e-mail you received, taking care to include all of the key’s
characters but nothing else, like the spaces before and after the key.
Trying to type the key manually is not a good idea.
You’re nearly certain to make at least one typing error, and the key
won’t work unless it is exactly correct.
census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
Getting the goods. This code retrieves the rental
and household counts using the tidycensus
package’s get_acs() function and stores the results in a
data frame we’ll call Census_Data.
Census_Data <- get_acs(
geography = "zcta",
variables = c("DP04_0047", "DP04_0045"),
year = 2024,
survey = "acs5",
output = "wide",
geometry = FALSE
)
The get_acs() function has a bunch of arguments that you
can use to specify exactly what data you want. In this case:
geography = "zcta" tells tidycensus
you want the data broken down by ZIP code. Recall that a ZCTA, or ZIP
Code Tabulation Area, is a Census-created geographic area that
corresponds, at least roughly, to a ZIP code used by the U.S. Postal
Service. Each ZCTA has a corresponding ZIP code, but not all ZIP codes
have a corresponding ZCTA. The tidycensus package lets you get data for
many other geographic areas other than ZCTAs. For a list, and the code
that specifies each one, see the Geography in tidycensus
section of the Basic usage of tidycensus guide.
variables = c("DP04_0047", "DP04_0045")
tells tidycensus to retrieve two variables: DPO4_0047, which estimates
the number of rental units, and and DPO4_0045, which estimates the total
number of housing units, whether renter-occupied or otherwise. In the
next lesson, I’ll show you how to look up American Community Survey
variable names and what they mean. For now, understand that the
c() part of this code is base R’s “combine” function, which combines the things
between the parentheses (and separated by commas) into a
vector or a list. Vectors and lists
are both data storage structures, although they have different
properties. Here, it’s being used to hold the list of variable names you
want tidycensus to retrieve.
year = 2023 tells tidycensus which
year you want data for. At present, 2023 is the latest year available.
See the Census Bureau’s release schedule to learn what data will be released
when.
survey = "acs5" tells tidycensus
you want the five-year American Community Survey dataset. In most cases,
that’s the one you will want. It covers a five-year period ending with
the specified year. For example, the 2023 five-year dataset’s estimates
cover 2019-2023, not 20203 in particular. But five-year datasets contain
data for all places in the U.S. There is a one-year dataset that you can
retrieve using a survey = "acs1" argument. It covers just
the year specified, so it’s more up to date that the five-year dataset
released for the same year. But it includes data only for places with
60,000 or more residents.
output = "wide" tells tidycensus to
format the data so that each column name reflects the variable the
column contains data for. I find the alternative format,
output = "tidy", harder to work with, especially when
multiple variables are being requested.
geometry = FALSE tells tidycensus
to give you just the data, rather than the data plus a “geometry” column
containing borders for the geographic region specified. I’ll say more
about the latter option later in the lesson. It can come in pretty
handy. In this case, we already have a geometry column showing the ZIP
code boundaries. We don’t need another.
The resulting data frame, Census_Data, is too
big to view in a gt table, the way we have been
doing with some of the data frames we have worked with so far. The
easiest alternative is find Census_Data in RStudio’s
Environment tab and click on it. Doing so will open the data frame in
RStudio’s spreadsheet-like data viewer.
Another, and sometimes handy, approach to viewing large data frames
is to use the dplyr package’s glimpse() function,
which can display all of the variables in a data frame as a vertical
column, along with columns to the right for the first several cases:
glimpse(Census_Data)
## Rows: 33,772
## Columns: 6
## $ GEOID <chr> "00601", "00602", "00603", "00606", "00610", "00611", "0061…
## $ NAME <chr> "ZCTA5 00601", "ZCTA5 00602", "ZCTA5 00603", "ZCTA5 00606",…
## $ DP04_0047E <dbl> 1877, 3425, 8600, 381, 2725, 250, 8760, 940, 2341, 911, 375…
## $ DP04_0047M <dbl> 271, 472, 652, 130, 379, 134, 692, 260, 384, 219, 528, 298,…
## $ DP04_0045E <dbl> 5768, 12954, 20131, 1860, 9604, 679, 24426, 3580, 9063, 325…
## $ DP04_0045M <dbl> 303, 592, 780, 182, 524, 173, 977, 421, 472, 348, 713, 419,…
Either way you look at the Census_Data data frame, you
can see that the Census API gave us 33,772 rows (one
for each ZIP code in the U.S.) of data for six
variables:
GEOID, a character variable consisting of each five-digit ZIP code.
NAME, a character variable showing each ZIP code preceded by “ZCTA5,” and a space.
DP04_0047E, the estimated number of rental units in each ZIP code. The “E” at the end stands for “Estimate.”
DP04_0047M, the rental unit estimate’s error margin (see the “Wait … who is Moe?” explanation above). The “M” at the end stands for “Margin.”
DP04_0045E, the estimated number of total housing units - both rental and otherwise - in each ZIP code. Again, the “E” stands for “Estimate.”
DP04_0045M, the housing unit estimate’s error margin. Again, the “M” stands for “Margin.”
Note that the Census API doesn’t make you ask for the error margins. It just gives them to you automatically. It also automatically tacks the “E” and “M” characters onto the end of the variable name to let you know which column contains the estimates (xxxx_xxxxE) and which contains the error margins (xxxx_xxxxM).
Keeping the original Census variable names wouldn’t hurt a thing.
They aren’t all that intuitive, though. So, let’s use the
dplyr package’s transmute() function to
replace them with names that will making it easier to keep track of
what’s what. Handily, the transmute() function also drops
any variables that don’t get renamed. So, we can use it to get rid of
the NAME column, which we won’t need.
Here’s the code, and then another glimpse() function so
you can see the results:
Census_Data <- Census_Data %>%
transmute(
ZIP = GEOID,
Rentals = DP04_0047E,
Rentals_MOE = DP04_0047M,
Households = DP04_0045E,
Households_MOE = DP04_0045M
)
glimpse(Census_Data)
## Rows: 33,772
## Columns: 5
## $ ZIP <chr> "00601", "00602", "00603", "00606", "00610", "00611", "…
## $ Rentals <dbl> 1877, 3425, 8600, 381, 2725, 250, 8760, 940, 2341, 911,…
## $ Rentals_MOE <dbl> 271, 472, 652, 130, 379, 134, 692, 260, 384, 219, 528, …
## $ Households <dbl> 5768, 12954, 20131, 1860, 9604, 679, 24426, 3580, 9063,…
## $ Households_MOE <dbl> 303, 592, 780, 182, 524, 173, 977, 421, 472, 348, 713, …
Remember how retrieving the Fair Market Rent data initially got us Fair Market Rent for all monitored ZIP codes in the U.S., even though we wanted only the ZIP codes in Rutherford County? We had to filter the rent data for just the ZIP codes in Rutherford County.
The same thing needs to happen here, in the Census data. We have Census data for every ZIP code in the U.S. So, let’s filter it for just the Rutherford County ZIP codes we are working with:
# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------
ZIPList <- c(
"37127", "37128", "37129", "37130", "37132",
"37085", "37118", "37149", "37037", "37153",
"37167", "37086"
)
Census_Data <- Census_Data %>%
filter(ZIP %in% ZIPList)
glimpse(Census_Data)
## Rows: 12
## Columns: 5
## $ ZIP <chr> "37037", "37085", "37086", "37118", "37127", "37128", "…
## $ Rentals <dbl> 395, 113, 3434, 55, 1650, 10523, 8241, 11852, 0, 81, 28…
## $ Rentals_MOE <dbl> 186, 86, 488, 59, 330, 1007, 990, 804, 14, 69, 175, 720
## $ Households <dbl> 3128, 1992, 12887, 424, 7056, 28968, 23583, 23624, 0, 9…
## $ Households_MOE <dbl> 372, 302, 545, 155, 523, 1212, 1187, 927, 14, 198, 326,…
I made the plot by sneaking in a little code involving R’s
plotly package. All you have to do is paste this code onto
the end of your R script for fetching the Census data. It creates a data
frame called plot_df that contains the ZIP,
Rentals, and Rentals_MOE variables from the
Census_Data data frame. Then, it produces the plot from the
plot_df data frame.
# ----------------------------------------------------------
# Plotly: Rentals with ±MOE, sorted in descending order
# + hover shows lower/upper MOE bounds
# ----------------------------------------------------------
if (!require("plotly"))
install.packages("plotly")
library(plotly)
# Prepare a tidy, ordered data frame (drop geometry for plotting)
plot_df <- Census_Data %>%
select(ZIP, Rentals, Rentals_MOE) %>%
filter(!is.na(Rentals)) %>%
arrange(desc(Rentals)) %>%
mutate(
# Fix ZIP order by Rentals (descending)
ZIP = factor(ZIP, levels = ZIP),
# Compute bounds (cap lower at 0 since counts can't be negative)
Lower = pmax(Rentals - Rentals_MOE, 0, na.rm = TRUE),
Upper = Rentals + Rentals_MOE,
# Build a clean hover label with bounds
.hover = paste0(
"ZIP: ", ZIP,
"<br>Renter-occupied units: ", scales::comma(Rentals),
ifelse(
is.na(Rentals_MOE),
"",
paste0(
"<br>MOE (±): ", scales::comma(Rentals_MOE),
"<br>Min: ", scales::comma(Lower),
"<br>Max: ", scales::comma(Upper)
)
)
)
)
# Horizontal scatter with symmetric x-error bars (± MOE)
Rentals_MOE_Plot <- plot_ly(
data = plot_df,
x = ~Rentals,
y = ~ZIP,
type = "scatter",
mode = "markers",
marker = list(size = 9, color = "#2C7FB8"),
text = ~.hover,
hoverinfo = "text",
error_x = list(
type = "data",
array = ~Rentals_MOE, # ±MOE
visible = TRUE,
color = "#636363",
thickness = 1
)
) %>%
layout(
title = "Renter-Occupied Units by ZIP (± ACS MOE)",
xaxis = list(
title = "Renter-occupied units",
tickformat = ",d",
rangemode = "tozero"
),
yaxis = list(
title = "ZIP",
categoryorder = "array",
categoryarray = levels(plot_df$ZIP),
autorange = "reversed"
),
margin = list(l = 90, r = 30, b = 60, t = 60),
hoverlabel = list(align = "left")
)
Rentals_MOE_Plot
Learning how to retrieve ACS data using tidycensus is a
pretty big step to take in one class. So, let’s stop there. We’ll get to
the map-making part next time and also have a look at where those
variable names came from and how to look up names for the (many) other
variables available in the ACS.
Here’s the above code, assembled and all in one place:
# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------
if (!require("tidyverse"))
install.packages("tidyverse")
if (!require("gt"))
install.packages("gt")
if (!require("leaflet"))
install.packages("leaflet")
if (!require("leafpop"))
install.packages("leafpop")
if (!require("sf"))
install.packages("sf")
if (!require("RColorBrewer"))
install.packages("RColorBrewer")
if (!require("classInt"))
install.packages("classInt")
if (!require("scales"))
install.packages("scales")
if (!require("htmlwidgets"))
install.packages("htmlwidgets")
if (!require("tidycensus"))
install.packages("tidycensus") # << added for ACS join
library(tidyverse)
library(gt)
library(sf)
library(leaflet)
library(leafpop)
library(RColorBrewer)
library(classInt)
library(scales)
library(htmlwidgets)
library(tidycensus) # << added
census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
Census_Data <- get_acs(
geography = "zcta",
variables = c("DP04_0047", "DP04_0045"),
year = 2024,
survey = "acs5",
output = "wide",
geometry = FALSE
)
Census_Data <- Census_Data %>%
transmute(
ZIP = GEOID,
Rentals = DP04_0047E,
Rentals_MOE = DP04_0047M,
Households = DP04_0045E,
Households_MOE = DP04_0045M
)
# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------
ZIPList <- c(
"37127", "37128", "37129", "37130", "37132",
"37085", "37118", "37149", "37037", "37153",
"37167", "37086"
)
Census_Data <- Census_Data %>%
filter(ZIP %in% ZIPList)
glimpse(Census_Data)
# ----------------------------------------------------------
# Plotly: Rentals with ±MOE, sorted in descending order
# + hover shows lower/upper MOE bounds
# ----------------------------------------------------------
if (!require("plotly"))
install.packages("plotly")
library(plotly)
# Prepare a tidy, ordered data frame (drop geometry for plotting)
plot_df <- Census_Data %>%
select(ZIP, Rentals, Rentals_MOE) %>%
filter(!is.na(Rentals)) %>%
arrange(desc(Rentals)) %>%
mutate(
# Fix ZIP order by Rentals (descending)
ZIP = factor(ZIP, levels = ZIP),
# Compute bounds (cap lower at 0 since counts can't be negative)
Lower = pmax(Rentals - Rentals_MOE, 0, na.rm = TRUE),
Upper = Rentals + Rentals_MOE,
# Build a clean hover label with bounds
.hover = paste0(
"ZIP: ", ZIP,
"<br>Renter-occupied units: ", scales::comma(Rentals),
ifelse(
is.na(Rentals_MOE),
"",
paste0(
"<br>MOE (±): ", scales::comma(Rentals_MOE),
"<br>Min: ", scales::comma(Lower),
"<br>Max: ", scales::comma(Upper)
)
)
)
)
# Horizontal scatter with symmetric x-error bars (± MOE)
Rentals_MOE_Plot <- plot_ly(
data = plot_df,
x = ~Rentals,
y = ~ZIP,
type = "scatter",
mode = "markers",
marker = list(size = 9, color = "#2C7FB8"),
text = ~.hover,
hoverinfo = "text",
error_x = list(
type = "data",
array = ~Rentals_MOE, # ±MOE
visible = TRUE,
color = "#636363",
thickness = 1
)
) %>%
layout(
title = "Renter-Occupied Units by ZIP (± ACS MOE)",
xaxis = list(
title = "Renter-occupied units",
tickformat = ",d",
rangemode = "tozero"
),
yaxis = list(
title = "ZIP",
categoryorder = "array",
categoryarray = levels(plot_df$ZIP),
autorange = "reversed"
),
margin = list(l = 90, r = 30, b = 60, t = 60),
hoverlabel = list(align = "left")
)
Rentals_MOE_Plot
Pro tip
If you don’t want to have to supply your API key every time you use
TidyCensus, you can run this code to store your API key in your R
installation’s environment. From then on, as long as you are using the
same computer, you can skip running the
census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
code.
# Run once to save your key for future sessions
census_api_key("PASTE_YOUR_KEY_HERE", install = TRUE)
readRenviron("~/.Renviron") # activate without restarting R
Sys.getenv("CENSUS_API_KEY") # optional check
For now, paste the above code into an R script window, add your personal API key (or use the “environmental install” option), run the code, and show me the results. If I can see the graphic, and if the graphic shows valid data, you’ll be good to go.