Reloading the Fair Market Rent data

In an earlier lesson, we looked at how to extract ZIP-code-level Small-Area Fair Market Rent data from the U.S. Department of Housing and Urban Development, filter it for selected Rutherford County ZIP codes, and display the data in a table. Below is the code from that lesson, followed by the table the code produces.

Rerun the code. Doing so will ensure that the rent data will be available to to R on the computer you are using. Then we’ll look at some additional things we can do with the data using some of R’s data wrangling tools.

# Installing and loading required packages

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("openxlsx"))
  install.packages("openxlsx")
if (!require("gtExtras"))
  install.packages("gtExtras")

library(tidyverse)
library(openxlsx)
library(gtExtras)

# Reading data from:
# https://www.huduser.gov/portal/datasets/fmr/smallarea/index.html#year2024

FMR <- read.xlsx(
  "https://www.huduser.gov/portal/datasets/fmr/fmr2024/fy2024_safmrs_revised.xlsx",
  sheet = 1)

# Making a list of Rutherford County ZIP codes

ZIPList <- c(
  "37127",
  "37128",
  "37129",
  "37130",
  "37132",
  "37085",
  "37118",
  "37149",
  "37037",
  "37153",
  "37167",
  "37086")

# Filtering for Rutherford ZIP codes and
# selecting columns of interest

FMR_RuCo <- FMR %>%
  filter(ZIP.Code %in% ZIPList) %>%
  select(ZIP.Code, SAFMR.0BR, SAFMR.1BR, SAFMR.2BR, SAFMR.3BR, SAFMR.4BR) %>% 
  distinct()

# Renaming the columns

colnames(FMR_RuCo) <- c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4")

# Showing the data as a table

FMR_RuCo_table <- gt(FMR_RuCo) %>%
  tab_header("Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left") %>%
  gt_theme_538
FMR_RuCo_table
Rutherford FMR, by size and ZIP
ZIP Studio BR1 BR2 BR3 BR4
37037 1660 1710 1920 2410 2940
37085 1260 1290 1450 1820 2210
37086 1580 1620 1820 2290 2790
37118 1100 1130 1270 1590 1960
37127 1240 1270 1430 1800 2190
37128 1510 1550 1740 2190 2670
37129 1420 1460 1640 2060 2510
37130 1180 1210 1360 1710 2080
37132 1180 1210 1360 1710 2080
37149 1100 1130 1270 1590 1960
37153 1410 1450 1630 2040 2490
37167 1290 1330 1490 1870 2280

Mutating

Chances are, you already know how to produce an average. You just add the numbers you want to average, then divide by however many numbers there are, right? Easy. The mutate() function can average each ZIP code’s rents, then put the average in a new column we’ll call “ZIP_Average.”

The mutate() function is part of the dplyr package, which R loaded into memory when it ran the library(tidyverse) line in the code above. The dplyr package is part of the tidyverse package. Note that mutate() won’t work unless library(tidyverse) has been run sometime since you opened RStudio, and library(tidyverse) won’t work unless install.packages("tidyverse") has been run at at least one time on the computer you are using.

That’s a general principle to keep in mind. If a function is part of base R, then all you have to do to use the function is to open R Studio. But if, like mutate(), the function is part of a package, then you have to install the package once per computer, using install.packages() and load the package once per computing session, using library(), before you can use the function. It sounds like a pain, but it’s really not. In practice, you simply begin a script by installing, if needed, and loading all the packages required by the script. That’s what’s going on above, in the section of code marked # Installing and loading required packages. And there are just a handful of packages you’ll use routinely.

This code will use mutate() to calculate the average. Then, the code will regenerate the table with the new variable included. The table is shown below.

FMR_RuCo <- FMR_RuCo %>%
  mutate(ZIP_Average = (Studio + BR1 + BR2 + BR3 + BR4) / 5)

FMR_RuCo_table <- gt(FMR_RuCo) %>% 
  tab_header("Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left") %>%
  gt_theme_538
FMR_RuCo_table
Rutherford FMR, by size and ZIP
ZIP Studio BR1 BR2 BR3 BR4 ZIP_Average
37037 1660 1710 1920 2410 2940 2128
37085 1260 1290 1450 1820 2210 1606
37086 1580 1620 1820 2290 2790 2020
37118 1100 1130 1270 1590 1960 1410
37127 1240 1270 1430 1800 2190 1586
37128 1510 1550 1740 2190 2670 1932
37129 1420 1460 1640 2060 2510 1818
37130 1180 1210 1360 1710 2080 1508
37132 1180 1210 1360 1710 2080 1508
37149 1100 1130 1270 1590 1960 1410
37153 1410 1450 1630 2040 2490 1804
37167 1290 1330 1490 1870 2280 1652

Sorting, low to high

It’s not difficult to scan through the dozen average rents in the table and discern that two ZIP codes, 37118 and 37149, tie for the cheapest average, $1,410, while the 37037 ZIP code has the most expensive average, $2,128. Identifying these extremes would be a lot more difficult, though, if the table contained (as it will in future lessons in this series) all 80 ZIP codes across Rutherford, Davidson and Williamson counties.

So it would be handy to know how to arrange the ZIP codes by their ZIP_Average values, so that the ZIP_Average values get bigger as you go from the top of the table to the bottom of the table. This code will do exactly that, using the arrange() function from the dplyr package, then regenerate the table:

# Sorting, ascending order

FMR_RuCo <- FMR_RuCo %>%
  arrange(ZIP_Average)

FMR_RuCo_table <- gt(FMR_RuCo) %>% 
  tab_header("Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left") %>%
  gt_theme_538
FMR_RuCo_table
Rutherford FMR, by size and ZIP
ZIP Studio BR1 BR2 BR3 BR4 ZIP_Average
37118 1100 1130 1270 1590 1960 1410
37149 1100 1130 1270 1590 1960 1410
37130 1180 1210 1360 1710 2080 1508
37132 1180 1210 1360 1710 2080 1508
37127 1240 1270 1430 1800 2190 1586
37085 1260 1290 1450 1820 2210 1606
37167 1290 1330 1490 1870 2280 1652
37153 1410 1450 1630 2040 2490 1804
37129 1420 1460 1640 2060 2510 1818
37128 1510 1550 1740 2190 2670 1932
37086 1580 1620 1820 2290 2790 2020
37037 1660 1710 1920 2410 2940 2128

Sorting, high to low

See the advantage? Now it’s easy to tell which ZIP codes have the lowest and highest average rents. Furthermore, it’s easy to tell how any one ZIP code’s average rent compares to the average rents for all of the other ZIP codes.

What if you wanted to sort the table in the other direction - that is, with the highest ZIP_Average figure at the top and the lowest at the bottom? You could do so by situating the dplyr package’s desc() function inside the arrange() function, like this:

# Sorting, descending order

FMR_RuCo <- FMR_RuCo %>%
  arrange(desc(ZIP_Average))

FMR_RuCo_table <- gt(FMR_RuCo) %>% 
  tab_header("Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left") %>%
  gt_theme_538
FMR_RuCo_table
Rutherford FMR, by size and ZIP
ZIP Studio BR1 BR2 BR3 BR4 ZIP_Average
37037 1660 1710 1920 2410 2940 2128
37086 1580 1620 1820 2290 2790 2020
37128 1510 1550 1740 2190 2670 1932
37129 1420 1460 1640 2060 2510 1818
37153 1410 1450 1630 2040 2490 1804
37167 1290 1330 1490 1870 2280 1652
37085 1260 1290 1450 1820 2210 1606
37127 1240 1270 1430 1800 2190 1586
37130 1180 1210 1360 1710 2080 1508
37132 1180 1210 1360 1710 2080 1508
37118 1100 1130 1270 1590 1960 1410
37149 1100 1130 1270 1590 1960 1410

Recoding

In data analysis, “recoding” means creating a variable based on the values of one or more other variables. To illustrate, let’s make a variable called Rent_Category that indicates whether each ZIP code’s ZIP_Average is above, equal to, or below the average of all the ZIP_Average values.

Finding the average. The first task is to find the average of the ZIP_Average values. The mean() function, from base R, offers a handy shortcut. The code finds the average, stores it in a variable called Average_ZIP_Average, and displays the average’s value:

# Finding the average of the ZIP_Average values

Average_ZIP_Average <- mean(FMR_RuCo$ZIP_Average)
Average_ZIP_Average
## [1] 1698.5

Weird syntax. We’re working in base R, not dplyr, here, so the syntax is a little different than what you have seen so far. Here, the FMR_RuCo$ZIP_Average$ code let’s R know that the ZIP_Average variable is a column in the FMR_RuCo data frame. The $ separates the data frame name from the variable name.

Getting logical. With the average, 1698.5, stored in Average_ZIP_Average, we can use the case_when() function from the dplyr package to add the Rent_Category variable to the FMR_RuCo data frame and make the variable’s values depend on how each ZIP_Average value compares logically to the Average_ZIP_Average value.

By “logically,” I mean that we’re going to account for all possible ways that Zip_Average could compare to Average_ZIP_Average.

The next block of code will:

# Recoding 

FMR_RuCo <- FMR_RuCo %>%
  mutate(
    Rent_Category = case_when(
      ZIP_Average > Average_ZIP_Average ~ "Above average",
      ZIP_Average == Average_ZIP_Average ~ "Average",
      ZIP_Average < Average_ZIP_Average ~ "Below average",
      .default = "Error"))

FMR_RuCo_table <- gt(FMR_RuCo) %>% 
  tab_header("Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left") %>%
  gt_theme_538
FMR_RuCo_table
Rutherford FMR, by size and ZIP
ZIP Studio BR1 BR2 BR3 BR4 ZIP_Average Rent_Category
37037 1660 1710 1920 2410 2940 2128 Above average
37086 1580 1620 1820 2290 2790 2020 Above average
37128 1510 1550 1740 2190 2670 1932 Above average
37129 1420 1460 1640 2060 2510 1818 Above average
37153 1410 1450 1630 2040 2490 1804 Above average
37167 1290 1330 1490 1870 2280 1652 Below average
37085 1260 1290 1450 1820 2210 1606 Below average
37127 1240 1270 1430 1800 2190 1586 Below average
37130 1180 1210 1360 1710 2080 1508 Below average
37132 1180 1210 1360 1710 2080 1508 Below average
37118 1100 1130 1270 1590 1960 1410 Below average
37149 1100 1130 1270 1590 1960 1410 Below average

Grouping and summarizing

Now that we have the ZIP codes simplified into ZIP codes with overall “Above average” and “Below average” rents, let’s put these categories to work.

Suppose you and two friends were hunting for a three-bedroom apartment to share. Suppose, also, that each of you had agreed to chip in $600 on each month’s rent, making your total budget 3 x $600 = $1,800 a month.

With only a dozen ZIP codes to look at in the table above, you and your friends could easily see that you probably should begin your search in the 37127 ZIP code, and maybe consider the 37130 and 37132 ZIP codes as well. But if you were searching among the 80 ZIP codes throughout Rutherford, Davidson and Williamson counties, you might appreciate a more efficient way to narrow your search.

Here’s a technique for doing that. Let’s pull the three-bedroom data into a separate data frame called Summary_BR3 and get the average, maximum and minimum three-bedroom rent for the “Above average” and “Below average” ZIP codes, along with the Count of how many ZIP codes are in each category.

Looking at those results would let you know whether you should begin your search among the “Above average” ZIP codes or the “Below average” ZIP codes - or whether your budget is too small for either category or big enough to let you shop in both categories.

# Grouping and summarizing

Summary_BR3 <- FMR_RuCo %>% 
  group_by(Rent_Category) %>% 
  summarize(Average = round(mean(BR3), digits = 0),
            Minimum = min(BR3),
            Maximum = max(BR3),
            Count = n())

Summary_BR3_table <- gt(Summary_BR3) %>% 
  tab_header("Three-bedroom stats, by rent category") %>%
  cols_align(align = "left") %>%
  gt_theme_538
Summary_BR3_table
Three-bedroom stats, by rent category
Rent_Category Average Minimum Maximum Count
Above average 2198 2040 2410 5
Below average 1727 1590 1870 7

A glance at the summary table tells you that you and your two friends likely will be apartment hunting exclusively in the seven “Below average” rent ZIP codes, where average rents range from $1,590 to $1,870 and average $1,727. You’ll probably have fewer choices in the ZIP code with the $1,870 average rent. But the figure is an average, which means at least some of the available apartments should go for less than $1,870.

Meanwhile, it looks like all, or at least most, of the five “Above average” ZIP codes will be out of your price range. Even the lowest average, $2,198, is well above your budget. Again, the figure is an average. You might get lucky and find an $1,800 apartment in that ZIP code. But it would be a long shot.

Of course, you’ll still have to look at the full table to get the actual ZIP codes that fit your budget. But the summary table can let you quickly narrow your search.

Saving your data

You might want to work with this data again at a later time. If so, you could simply rerun the code. Doing so would redownload the raw data and quickly repeat all of the work you have done. But all the same, you might want to save the analyzed data to your computer’s hard drive so you can simply load it from there.

Here’s how you would save the data to your computer using the write_csv() function from the readr package. The readr package is part of the tidyverse package, so you loaded readr into memory when you ran the library(tidyverse) code way back at the start of the script.

The code will save the FMR_RuCo data frame to your hard drive, in the same folder as this script. The file will be called FMR_RuCo.csv.” You’ll see the file show up under the “Files” tab on the right side of the screen.

You could name the file anything you like, but using a name that repeats the data frame name can make it easier to remember what’s what later on.

The .csv at the end of the file name signals that the file is in comma-separated value format. That means that, in the file, commas indicate where the columns begin and end. You could save the file in other formats, but .csv format is nearly universal.

write_csv(FMR_RuCo, "FMR_RuCo.csv")

Reading your data back into an R data frame

It’s easy to reverse the process and read your .csv file back into an R data frame. Just to illustrate how it’s done, here is code that will read the FMR_RuCo.csv file into an R data frame called Retrieved_Data:

Retrieved_Data <- read_csv("FMR_RuCo.csv")