A previous lesson introduced you to a basic R script that, when run, retrieved and displayed this table of fair market rent estimates for several Rutherford County ZIP codes:
| Rutherford FMR, by size and ZIP | |||||
| ZIP | Studio | BR1 | BR2 | BR3 | BR4 |
|---|---|---|---|---|---|
| 37037 | 1900 | 1990 | 2180 | 2790 | 3400 |
| 37085 | 1320 | 1380 | 1520 | 1940 | 2360 |
| 37086 | 1730 | 1820 | 1990 | 2540 | 3100 |
| 37118 | 1150 | 1170 | 1320 | 1660 | 2020 |
| 37127 | 1360 | 1420 | 1560 | 1990 | 2430 |
| 37128 | 1570 | 1640 | 1800 | 2300 | 2800 |
| 37129 | 1570 | 1640 | 1800 | 2300 | 2800 |
| 37130 | 1280 | 1340 | 1470 | 1880 | 2290 |
| 37132 | 1280 | 1340 | 1470 | 1880 | 2290 |
| 37149 | 1150 | 1180 | 1320 | 1660 | 2020 |
| 37153 | 1670 | 1750 | 1920 | 2450 | 2990 |
| 37167 | 1430 | 1500 | 1640 | 2100 | 2560 |
Cool … but how did the script work? That’s what I hope you really want to know. Here’s the R script that produced the table. It may seem like gobbledygook. But let’s take a closer look at it to begin getting an idea of what makes every R script tick, including the ones you’ll soon be creating for your own purposes.
# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------
if (!require("tidyverse"))
install.packages("tidyverse")
if (!require("gt"))
install.packages("gt")
library(tidyverse)
library(readxl)
library(gt)
# ----------------------------------------------------------
# Download HUD SAFMR Excel file
# ----------------------------------------------------------
download.file(
"https://www.huduser.gov/portal/datasets/fmr/fmr2026/fy2026_safmrs.xlsx",
"rent.xlsx",
mode = "wb"
)
# ----------------------------------------------------------
# Read Excel data
# ----------------------------------------------------------
FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal")
# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------
ZIPList <- c(
"37127", "37128", "37129", "37130", "37132",
"37085", "37118", "37149", "37037", "37153",
"37167", "37086"
)
# ----------------------------------------------------------
# Filter, select columns, and rename
# ----------------------------------------------------------
FMR_RuCo <- FMR %>%
filter(ZIP.Code %in% ZIPList) %>%
select(
ZIP.Code,
SAFMR.0BR,
SAFMR.1BR,
SAFMR.2BR,
SAFMR.3BR,
SAFMR.4BR
) %>%
distinct()
colnames(FMR_RuCo) <- c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4")
# ----------------------------------------------------------
# Basic GT table
# ----------------------------------------------------------
FMR_RuCo_table <- gt(FMR_RuCo) %>%
tab_header(title = "Rutherford FMR, by size and ZIP") %>%
cols_align(align = "left")
FMR_RuCo_table
Most of what you see in the script is either an object, a function, or an argument. Learn what each of these things is, and the script will make a lot more sense to you. Objects, functions and arguments make up commands. Commands arranged in a logical sequence, make up scripts.
An analogy might help
Suppose I said, “Everyone, I’d like you to do something for me: Please wave.”
Everyone probably would. But probably not in exactly the same way. Most would wave a hand. Others might wave both hands. Those who happened to have some object in their hand, like a phone or a water bottle, might somehow gesture with it. Some might wave immediately. Others might hesitate. Some might do a single, quick wave. Others might wave several times.
If I don’t particularly care how people wave, any and all of the above would be fine. But if I want everyone to wave in a particular way, I would have to say something more like: “Everyone, I’d like you to do something for me: Please wave your right hand at me, as quickly as you can, and for exactly three seconds.”
The analogy, explained
In R code, an object is like the word something in the analogy. When I said, “Everyone, I’d like you to do something for me,” I signaled that I had in mind a task that I would like you to complete, and I referred to the task as “something.” A function in R code is like the word wave in the next thing I said. The “something” I want you to do is “wave.” Furthermore, “wave” isn’t a single action. It’s more like shorthand for a whole collection of actions and processes involving signals from your brain and responses from your muscles. You don’t think about all of those component processes individually when you “wave.” You just think, “wave,” and your body knows how to do all of the things that make the “wave” happen. Finally, an argument in R code is like hand, right, quickly, or three seconds, all of which I used to indicate what body part I wanted you to wave, and in what manner.
An example
In the script shown above, this is the first command that has all three of these parts: an object, a function, and arguments:
FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal")
FMR is the object. It’s the name of
the thing you want from R, like “something” was the name I chose for
what I wanted you to do.
I could have chosen just about any name I wanted. I picked “FMR” to help myself remember that this particular object will contain Fair Market Rent data.
Once created by R, an object will show up in the “Environment” area of RStudio. Here, FMR will be a data frame containing a row for every ZIP code in which HUD estimates small-area fair market rents.
read_xlsx() is the function. It’s
the action I want R to complete. Specifically, read_xlsx() tells R to
read, or import, the Excel-formatted fair market rent data file that the
preceding command downloaded from HUD and stored on the computer’s hard
drive.
In an R script, you can pick out the functions by looking for terms that are followed by a left parenthesis. Somewhere, there will be a corresponding right parenthesis.
If, as here, the function is being applied to an object, you’ll
see <- typed between the function and the object.
Mentally, you can read it as “Create (the object) by performing (the
function).
Here, the code FMR <- read_xlsx() means “Create
FMR by performing read_xlsx().”
Performing read_xlsx() is a pretty involved process,
just like waving your hand is a pretty involved process. R has to locate
the file being asked for, open it, and translate it into a format the
rest of the script can work with. But all of those component processes
are built into the read_xlsx() function and don’t have to
be specified individually.
The path = "rent.xlsx" and
.name_repair = "universal" parts of the command are both
arguments.
Some functions have no arguments. Others have many, some of which are necessary only sometimes.
Here, path = "rent.xlsx" is an argument that tells R
the name of the Excel file you want the read_xlsx()
function to import, and where on your computer to find it. Meanwhile,
.name_repair = "universal" tells R to convert the
spreadsheet file’s column headings to a format R can
understand.
For the read_xlsx() function, the
path = "" argument, with the file name typed between the
quotes, is always required. Omit it, or get the file name wrong, and
you’ll get an error when the script runs. The
.name_repair = "universal" argument, however, is needed
only if the column names in the Excel file aren’t compatible with
R.
Arguments always go between the function’s left and right
parentheses. Arguments are also usually separated by commas, like the
comma separating path = "rent.xlsx" from
.name_repair = "universal". You’ll get an error of you omit
a comma - or accidentally type a period instead of a comma.
Pro tip: If you want to know what a function
does, ask your AI assistant. I gave Microsoft Copilot
the prompt, “What does the R function read_xlsx() do?” and got
a response reading, “In R, the function read_xlsx() is used to
import data from an Excel .xlsx file into R.” Copilot went on to
give me details about how to use the function, what arguments it can
include, and an example of what the function looks like in a line of R
code. I got even more specific information by pasting the script’s
FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal")
line of code into Copilot and asking Copilot to explain what the line
does.
So, R uses functions to create objects, and arguments tell R precisely how to carry out functions.
You can spot functions by looking for terms that are followed by
a (.
If you want to know what a function does, you can ask an AI assistant about the function (or, you can Google the function).
If if the function is being applied to an object, there will be a
<- between the function and its object.
Arguments, if present, will be found between a function’s
( and ) and will be separated by a
, if there are two or more arguments.
Got it?
Take another look at the script, now that you have a better understanding of what you’re seeing. Parts of it might make more sense:
install.packages() is a function. It
installs packages. It’s used two times: once to install the tidyverse package, and once to install
the gt package. More on packages later.
Each time it is used, the name of the package is given, in quotes, as an
argument. It doesn’t have an object, because although R is being asked
to do something (install a package), that something doesn’t involve
creating an object that will need a name. Installing a package in R is
kind of like installing an application on your computer or smartphone.
You have to do it only once per device. Once you do, it’s on the device
and available for you to use, at least until you remove it. The function
is wrapped in an if() statement, a bit of more advanced R
that simply checks to see whether the package is already installed and
skips the installation if it is.
library() is also a function. It asks R
to open a library - in this case, the tidyverse, readxl, and gt
libraries. Opening a library in R is kind of like opening an application
on your computer or smartphone.
The download.file() function downloads a file from the
web. In this case, it’s downloading the fair market rent data file from
HUD’s website. Arguments include the URL for the file to be downloaded,
the file name you want R to use when storing the file on your computer
(I picked “rent.xlsx”), and the download mode. In this
case, the download mode is wb, which tells R to download
the file in a strictly binary form. The setting avoids confusing Windows
systems about the nature of the downloaded file. Here, again, no object
is needed, because no object is being created. R is just being asked to
do something.
The c() function creates an object -
specifically, a “vector” - from the arguments listed in quotes and
separated by commas. The c stands for “combine.” Here, it’s
creating a vector of Rutherford county ZIP codes. Because it’s telling R
to make an object, the object has to have a name. I chose “ZIPList.” The
<- symbol applies the c() function to the
object. When run, the code creates the ZIPlist object and
stores it in RStudio’s “Environment” area.
Sometimes, it’s more efficient to apply several function to the same object. That’s what’s happening in this section of the script:
# ----------------------------------------------------------
# Filter, select columns, and rename
# ----------------------------------------------------------
FMR_RuCo <- FMR %>%
filter(ZIP.Code %in% ZIPList) %>%
select(
ZIP.Code,
SAFMR.0BR,
SAFMR.1BR,
SAFMR.2BR,
SAFMR.3BR,
SAFMR.4BR
) %>%
distinct()
We’re using the filter(), select(), and distinct() functions to do three
different things to the same object, FMR_RuCo.
Specifically:
filter() uses it’s handy %in% argument
to drop all rows except those associated with a ZIP code that appears in
ZIPList
select() drops all columns except the ones we need,
which are listed as the function’s arguments
Finally, distinct() deletes any duplicate rows.
Remember that some ZIP codes reach across county borders. When that
happens, a ZIP code gets a row for each county that includes it. Data
mischief can result if you don’t get rid of the duplicates.
The “pipe operator,” which looks like this: %>%, is a
gift from the dplyr package that lets you string
together chains of functions like this. The dplyer package
is part of the tidyverse package loaded at the start
of the script. If you don’t like using it, you could get the same things
accomplished by using each function separately. But the code will be
longer and more complex:
FMR_RuCo <- FMR
FMR_RuCo <- filter(
FMR_RuCo,
ZIP.Code == "37127" |
ZIP.Code == "37128" |
ZIP.Code == "37129" |
ZIP.Code == "37130" |
ZIP.Code == "37132" |
ZIP.Code == "37085" |
ZIP.Code == "37118" |
ZIP.Code == "37149" |
ZIP.Code == "37037" |
ZIP.Code == "37153" |
ZIP.Code == "37167" |
ZIP.Code == "37086")
FMR_RuCo <- select(FMR_RuCo,
c("ZIP.Code",
"SAFMR.0BR",
"SAFMR.1BR",
"SAFMR.2BR",
"SAFMR.3BR",
"SAFMR.4BR"))
FMR_RuCo <- distinct(FMR_RuCo)
The column names HUD picked are kind of awkward: “SAFMR.0BR” stands
for “Small-Area Fair Market Rent, 0 Bedrooms.” But something simpler
would be better. This code uses the colnames() function to give each column
a newer, simpler name:
colnames(FMR_RuCo) <- c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4")
This final part of the code uses some code from the gt
package, which is responsible for making the nice table used to display
the rent estimates. But even this code is just an arrangement of an
object, some functions and their arguments:
# ----------------------------------------------------------
# Basic GT table
# ----------------------------------------------------------
FMR_RuCo_table <- gt(FMR_RuCo) %>%
tab_header(title = "Rutherford FMR, by size and ZIP") %>%
cols_align(align = "left")
FMR_RuCo_table
FMR_RuCo_table is the object that the code
creates.
The gt() function tells R to make a table. The
FMR_RuCo argument is simply the name of the data frame the
gt() function is supposed to make a table out of.
tab_header() is a function for defining the table’s
title. It’s argument, in quotes, is the text of the title to be
displayed. Note that you could change the table’s title by changing the
text between the quote marks.
The cols_align() function controls the alignment of
the contents of each cell in the table. It’s align = left
argument tells R to align the contents left.
The pipe operator, %>%, strings the functions
together for efficiency, and the <- operator applies
them to the job of creating the FMR_RuCo_Table
object.
The last line, which repeats the object’s name, FMR_RuCo_table, tells R to display the table.
It’s important to realize that much of what happens in the script has to happen in the order shown, starting with the top of the script.
First, the script installes the two packages,
tidyverse and gt, if they aren’t already
installed.
Next, the script loads the tidyverse,
gt, and readxl libraries. These libraries can
be loaded only if the packages they are part of have already been
installed. So, the package installation code has to come before this
code. (The readxl library is part of the
tidyverse package, by the way, which is why it doesn’t have
to be installed separately.)
Next, the script downloads the raw data from HUD and stores it on your computer as an Excel file called rent.xlsx.
Next, the script reads the rent.xlsx file and imports it into an object - specifically, a data frame - called FMR. The script can’t import rent.xlsx if rent.xlsx hasn’t already been downloaded and saved in a place where R can find it.
Next, the script creates an object - specifically, a vector - containing Rutherford County ZIP codes and names the object ZIPList.
Next, the script creates a data frame object, names it FMR_RuCo, and fills the object with the contents of the FMR data frame object. This wouldn’t work unless the FMR data frame object hadn’t already been created.
Next, the script filters FMR_RuCo to include only the ZIP codes in ZIPList. This will work only if ZIPList has been created already.
… and on it goes, with the script ensuring that everything needed to complete a line of code has already been made available to R by one or more preceding lines of code.
Generally, no. R doesn’t really care about either. Coders indent and space their code mainly to make it look neat and organized. Lines that present arguments, for example, get indented under the line that contains the function they are part of.
RStudio will do some indentation for you automatically as you type your code. If it gets messy nonetheless (it often does), RStudio will tidy it up for you. Just highlight the code you want to tidy, click Code / Reformat Selection. Beautiful magic will occur. No matter how haphazard the spacing was, RStudio will align everything perfectly.
Open an AI assistant, type, “Below is an R script. List every
function included in the script, and explain what the function and each
of its arguments are doing,” paste the entire script, above, and press
“Enter.”
Take a few minutes to examine the response you get. Does the script make
more sense now? I hope so. If there’s something you don’t understand,
ask your AI assistnat for clarification. If your AI assistant suggested
any interesting-sounding tweaks or additions, go ahead and try them out.
Don’t worry - you’re not going to break anything.
Show me the response your AI assistant generated and the assistant’s response to at least one follow-up question you asked or at least one coding tweak you tried. I just want to see evidence that you were able to engage with the assistant in some way. Once you have shown me that, you are free to leave.