A basic R script

A previous lesson introduced you to a basic R script that, when run, retrieved and displayed this table of fair market rent estimates for several Rutherford County ZIP codes:

Rutherford FMR, by size and ZIP
ZIP Studio BR1 BR2 BR3 BR4
37037 1900 1990 2180 2790 3400
37085 1320 1380 1520 1940 2360
37086 1730 1820 1990 2540 3100
37118 1150 1170 1320 1660 2020
37127 1360 1420 1560 1990 2430
37128 1570 1640 1800 2300 2800
37129 1570 1640 1800 2300 2800
37130 1280 1340 1470 1880 2290
37132 1280 1340 1470 1880 2290
37149 1150 1180 1320 1660 2020
37153 1670 1750 1920 2450 2990
37167 1430 1500 1640 2100 2560

Cool … but how did the script work? That’s what I hope you really want to know. Here’s the R script that produced the table. It may seem like gobbledygook. But let’s take a closer look at it to begin getting an idea of what makes every R script tick, including the ones you’ll soon be creating for your own purposes.

# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("gt"))
  install.packages("gt")

library(tidyverse)
library(readxl)
library(gt)

# ----------------------------------------------------------
# Download HUD SAFMR Excel file
# ----------------------------------------------------------

download.file(
  "https://www.huduser.gov/portal/datasets/fmr/fmr2026/fy2026_safmrs.xlsx",
  "rent.xlsx",
  mode = "wb"
)

# ----------------------------------------------------------
# Read Excel data
# ----------------------------------------------------------

FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal")

# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------

ZIPList <- c(
  "37127", "37128", "37129", "37130", "37132",
  "37085", "37118", "37149", "37037", "37153",
  "37167", "37086"
)

# ----------------------------------------------------------
# Filter, select columns, and rename
# ----------------------------------------------------------

FMR_RuCo <- FMR %>%
  filter(ZIP.Code %in% ZIPList) %>%
  select(
    ZIP.Code,
    SAFMR.0BR,
    SAFMR.1BR,
    SAFMR.2BR,
    SAFMR.3BR,
    SAFMR.4BR
  ) %>%
  distinct()

colnames(FMR_RuCo) <- c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4")

# ----------------------------------------------------------
# Basic GT table
# ----------------------------------------------------------

FMR_RuCo_table <- gt(FMR_RuCo) %>%
  tab_header(title = "Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left")

FMR_RuCo_table

Objects, functions & arguments

Most of what you see in the script is either an object, a function, or an argument. Learn what each of these things is, and the script will make a lot more sense to you. Objects, functions and arguments make up commands. Commands arranged in a logical sequence, make up scripts.

An analogy might help

Suppose I said, “Everyone, I’d like you to do something for me: Please wave.”

Everyone probably would. But probably not in exactly the same way. Most would wave a hand. Others might wave both hands. Those who happened to have some object in their hand, like a phone or a water bottle, might somehow gesture with it. Some might wave immediately. Others might hesitate. Some might do a single, quick wave. Others might wave several times.

If I don’t particularly care how people wave, any and all of the above would be fine. But if I want everyone to wave in a particular way, I would have to say something more like: “Everyone, I’d like you to do something for me: Please wave your right hand at me, as quickly as you can, and for exactly three seconds.

The analogy, explained

In R code, an object is like the word something in the analogy. When I said, “Everyone, I’d like you to do something for me,” I signaled that I had in mind a task that I would like you to complete, and I referred to the task as “something.” A function in R code is like the word wave in the next thing I said. The “something” I want you to do is “wave.” Furthermore, “wave” isn’t a single action. It’s more like shorthand for a whole collection of actions and processes involving signals from your brain and responses from your muscles. You don’t think about all of those component processes individually when you “wave.” You just think, “wave,” and your body knows how to do all of the things that make the “wave” happen. Finally, an argument in R code is like hand, right, quickly, or three seconds, all of which I used to indicate what body part I wanted you to wave, and in what manner.

An example

In the script shown above, this is the first command that has all three of these parts: an object, a function, and arguments:

FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal") 

So, R uses functions to create objects, and arguments tell R precisely how to carry out functions.

Look for other functions in the script

Take another look at the script, now that you have a better understanding of what you’re seeing. Parts of it might make more sense:

Applying several functions to the same object

Sometimes, it’s more efficient to apply several function to the same object. That’s what’s happening in this section of the script:

# ----------------------------------------------------------
# Filter, select columns, and rename
# ----------------------------------------------------------

FMR_RuCo <- FMR %>%
  filter(ZIP.Code %in% ZIPList) %>%
  select(
    ZIP.Code,
    SAFMR.0BR,
    SAFMR.1BR,
    SAFMR.2BR,
    SAFMR.3BR,
    SAFMR.4BR
  ) %>%
  distinct()

We’re using the filter(), select(), and distinct() functions to do three different things to the same object, FMR_RuCo.

Specifically:

The “pipe operator,” which looks like this: %>%, is a gift from the dplyr package that lets you string together chains of functions like this. The dplyer package is part of the tidyverse package loaded at the start of the script. If you don’t like using it, you could get the same things accomplished by using each function separately. But the code will be longer and more complex:

FMR_RuCo <- FMR

FMR_RuCo <- filter(
  FMR_RuCo,
  ZIP.Code == "37127" |
    ZIP.Code == "37128" |
    ZIP.Code == "37129" |
    ZIP.Code == "37130" |
    ZIP.Code == "37132" |
    ZIP.Code == "37085" |
    ZIP.Code == "37118" |
    ZIP.Code == "37149" |
    ZIP.Code == "37037" |
    ZIP.Code == "37153" |
    ZIP.Code == "37167" |
    ZIP.Code == "37086")

FMR_RuCo <- select(FMR_RuCo,
                   c("ZIP.Code",
                     "SAFMR.0BR",
                     "SAFMR.1BR",
                     "SAFMR.2BR",
                     "SAFMR.3BR",
                     "SAFMR.4BR"))

FMR_RuCo <- distinct(FMR_RuCo)

Renaming columns

The column names HUD picked are kind of awkward: “SAFMR.0BR” stands for “Small-Area Fair Market Rent, 0 Bedrooms.” But something simpler would be better. This code uses the colnames() function to give each column a newer, simpler name:

colnames(FMR_RuCo) <- c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4")

Making the table

This final part of the code uses some code from the gt package, which is responsible for making the nice table used to display the rent estimates. But even this code is just an arrangement of an object, some functions and their arguments:

# ----------------------------------------------------------
# Basic GT table
# ----------------------------------------------------------

FMR_RuCo_table <- gt(FMR_RuCo) %>%
  tab_header(title = "Rutherford FMR, by size and ZIP") %>%
  cols_align(align = "left")

FMR_RuCo_table

Order matters!

It’s important to realize that much of what happens in the script has to happen in the order shown, starting with the top of the script.

… and on it goes, with the script ensuring that everything needed to complete a line of code has already been made available to R by one or more preceding lines of code.

Do indentation and spacing matter?

Generally, no. R doesn’t really care about either. Coders indent and space their code mainly to make it look neat and organized. Lines that present arguments, for example, get indented under the line that contains the function they are part of.

RStudio will do some indentation for you automatically as you type your code. If it gets messy nonetheless (it often does), RStudio will tidy it up for you. Just highlight the code you want to tidy, click Code / Reformat Selection. Beautiful magic will occur. No matter how haphazard the spacing was, RStudio will align everything perfectly.

In-class exercise

Open an AI assistant, type, “Below is an R script. List every function included in the script, and explain what the function and each of its arguments are doing,” paste the entire script, above, and press “Enter.”

Take a few minutes to examine the response you get. Does the script make more sense now? I hope so. If there’s something you don’t understand, ask your AI assistnat for clarification. If your AI assistant suggested any interesting-sounding tweaks or additions, go ahead and try them out. Don’t worry - you’re not going to break anything.

Show me the response your AI assistant generated and the assistant’s response to at least one follow-up question you asked or at least one coding tweak you tried. I just want to see evidence that you were able to engage with the assistant in some way. Once you have shown me that, you are free to leave.