We will focus today on R packages - what they are, how to install and use them, and why you would want to. We’ll look especially closely at a package called R Markdown. You’ll use it to produce your lab exercises, starting with this Friday’s lab exercise. Before we look at packages, though, we’ll circle back to Monday’s data exercise and look at the solutions to the tasks it required.
As you saw in Monday’s lesson, functions make R do things. Many functions are included in “base R,” the basic version of R you get when you install R on a computer. Others functions, though, have to be added to R by installing a package. Packages contain “add-on” functions that extend R’s capabilities and make doing things in R easier than they would be in base R.
To see what I mean, use RStudio to run this part of the fair market rent script from Monday’s lesson:
# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------
if (!require("tidyverse"))
install.packages("tidyverse")
if (!require("gt"))
install.packages("gt")
library(tidyverse)
library(readxl)
library(gt)
# ----------------------------------------------------------
# Download HUD SAFMR Excel file
# ----------------------------------------------------------
download.file(
"https://www.huduser.gov/portal/datasets/fmr/fmr2026/fy2026_safmrs.xlsx",
"rent.xlsx",
mode = "wb"
)
# ----------------------------------------------------------
# Read Excel data
# ----------------------------------------------------------
FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal")
Among other things, the code will produce a data frame called FMR that contains current fair market rent data for all U.S. ZIP codes monitored by HUD’s Small-Area Fair Market Rent program. Click on the data frame in RStudio’s “Environment” area, and you’ll be able to scroll through 18 variables’ worth of rent-related information about 51,895 U.S. ZIP codes.
Filtering the ugly way
But we aren’t interested in all 51,895 ZIP codes. We want just the dozen ZIP codes in Rutherford County. Here’s one way to find those dozen ZIP codes and stash them in a new data frame called FMR_RuCo:
FMR_RuCo <- FMR[FMR$ZIP.Code == "37127" |
FMR$ZIP.Code == "37128" |
FMR$ZIP.Code == "37129" |
FMR$ZIP.Code == "37130" |
FMR$ZIP.Code == "37132" |
FMR$ZIP.Code == "37085" |
FMR$ZIP.Code == "37118" |
FMR$ZIP.Code == "37149" |
FMR$ZIP.Code == "37037" |
FMR$ZIP.Code == "37153" |
FMR$ZIP.Code == "37167" |
FMR$ZIP.Code == "37086", ]
The code works. Go ahead and run it, in fact. The code is a pain to
type, though. The FMR$ZIP.Code == part gets repeated over
and over, once for each ZIP code. So does the |, which
means “or” in base R. The code means, “Include an FMR row of data in the
new FMR_RuCo data frame only if the row’s
ZIP.Code value is 37127 OR 37128 OR 37129,” and so on, up
to 37086. All that repetition increases the chances of including a typo
that will break the code. And what’s with that seemingly random
, at the end? I have no idea … but the code won’t work
unless you remember to include it.
Filtering the pretty way
Let’s try a different way. To prove that it works, run this code to delete the FMR_RuCo data frame that the ugly code produced:
rm(FMR_RuCo)
… then run this code, which will redo the filtering operation and create the FMR_RuCo data frame, but with far simpler - and more intuitive - syntax.
# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------
ZIPList <- c(
"37127", "37128", "37129", "37130", "37132",
"37085", "37118", "37149", "37037", "37153",
"37167", "37086"
)
# ----------------------------------------------------------
# Filter, select columns, and rename
# ----------------------------------------------------------
FMR_RuCo <- FMR %>%
filter(ZIP.Code %in% ZIPList)
A package made the difference
The ugly code uses base R. The pretty code uses the filter() function, which is part of the
dplyer package, which is part of the tidyverse package loaded at the start
of the script.
More later about other packages. Let’s look at one you’ll need to start using right away. The R Markdown package is essentially a basic word processor capable of incorporating R code and output into a document, then publishing the document on the Web.
Installing R Markdown
R Markdown comes pre-installed in RStudio. If it didn’t for some
reason, though, you can install it using the install.packages() function:
install.packages("rmarkdown")
Using R Markdown
Watch this R Markdown demo video on YouTube to learn how to create and publish a research report using R Markdown. The video’s URL is: https://youtu.be/nThLoGg8Sdg?si=5ez0d79-Snx6b95U. The code shown in the video, as well as the table produced and displayed in the report, are from a previous semester and won’t match the code and table you will be working with. But the process will be the same.
As the video explains: R Markdown is mostly a point-and-click editor.
But you have to use manually-typed formatting commands called “code
chunk options” to control what R Markdown does with the blocks of code
you include. The commands go at the top of each code chunk, inside the
{r} command at the top of the chunk, and preceded by a
comma.
Basically, there are three chunk code options you will need:
{r, include=FALSE} tells R Markdown to run the code
in the block, but wholly in the background, without
displaying the code itself or any of the output the code produces. When
formatting your R Markdown as a report, this is the code to put in the
first code chunk, along with the entire script you want
to use. If warnings or messages ever do show up once the document is
“knitted,” (that is, compiled), you can suppress them by expanding the
code chunk option to read:
{r, include=FALSE, message=FALSE, warning=FALSE}. Note that
FALSE is capitalized.
{r, echo=FALSE} tells R Markdown to run the code in
the block, but display only the output from the code
instead of showing both the code and the output. So, suppose you want to
display a table produced, but not shown, when R Markdown ran the code in
the {r, include=FALSE} code chunk. Simply put the name of
the table, like FMR_RuCo_table, in a code chunk that begins
with this option. The FMR_RuCo_table code will tell R
Markdown to show the table, but the {r, echo=FALSE} will
tell R Markdown to do it without showing the FMR_RuCo_table
code. Here, again, you can suppress any warnings or messages that happen
to show up by expanding the chunk code option to read
{r, echo=FALSE, message=FALSE, warning=FALSE}.
Finally, {r, eval=FALSE} tells R Markdown to
display the code in the chunk but do nothing else with
it - that is, don’t show its output, run it, or even check it
for errors. When producing a report-style R Markdown document, this is
the option you use in the final code chunk, the one that displays the R
code for whatever output you displayed in the report.
First, here is the complete basic fair market rent script from Monday, which you can copy and paste:
# ----------------------------------------------------------
# Install & load required packages
# ----------------------------------------------------------
if (!require("tidyverse"))
install.packages("tidyverse")
if (!require("gt"))
install.packages("gt")
library(tidyverse)
library(readxl)
library(gt)
# ----------------------------------------------------------
# Download HUD SAFMR Excel file
# ----------------------------------------------------------
download.file(
"https://www.huduser.gov/portal/datasets/fmr/fmr2026/fy2026_safmrs.xlsx",
"rent.xlsx",
mode = "wb"
)
# ----------------------------------------------------------
# Read Excel data
# ----------------------------------------------------------
FMR <- read_xlsx(path = "rent.xlsx", .name_repair = "universal")
# ----------------------------------------------------------
# Rutherford County ZIP Codes
# ----------------------------------------------------------
ZIPList <- c(
"37127", "37128", "37129", "37130", "37132",
"37085", "37118", "37149", "37037", "37153",
"37167", "37086"
)
# ----------------------------------------------------------
# Filter, select columns, and rename
# ----------------------------------------------------------
FMR_RuCo <- FMR %>%
filter(ZIP.Code %in% ZIPList) %>%
select(
ZIP.Code,
SAFMR.0BR,
SAFMR.1BR,
SAFMR.2BR,
SAFMR.3BR,
SAFMR.4BR
) %>%
distinct()
colnames(FMR_RuCo) <- c("ZIP", "Studio", "BR1", "BR2", "BR3", "BR4")
# ----------------------------------------------------------
# Basic GT table
# ----------------------------------------------------------
FMR_RuCo_table <- gt(FMR_RuCo) %>%
tab_header(title = "Rutherford FMR, by size and ZIP") %>%
cols_align(align = "left")
FMR_RuCo_table
Use R Markdown, the above script, what you learned from the video,
and the include=FALSE, echo=FALSE, and
eval=FALSE code chunk options to create an R Markdown
document that looks something like what you see below after you knit it.
If you can get it to work, then Friday’s graded lab exercise will be a
breeze, because all you have to do is publish the report on RPubs.com
and send me the URL.
Your R Markdown-produced report should look something like this:
(your name)
(the date)
Here are fair market rents for various apartment sizes in Rutherford County ZIP codes.
| Rutherford FMR, by size and ZIP | |||||
| ZIP | Studio | BR1 | BR2 | BR3 | BR4 |
|---|---|---|---|---|---|
| 37037 | 1900 | 1990 | 2180 | 2790 | 3400 |
| 37085 | 1320 | 1380 | 1520 | 1940 | 2360 |
| 37086 | 1730 | 1820 | 1990 | 2540 | 3100 |
| 37118 | 1150 | 1170 | 1320 | 1660 | 2020 |
| 37127 | 1360 | 1420 | 1560 | 1990 | 2430 |
| 37128 | 1570 | 1640 | 1800 | 2300 | 2800 |
| 37129 | 1570 | 1640 | 1800 | 2300 | 2800 |
| 37130 | 1280 | 1340 | 1470 | 1880 | 2290 |
| 37132 | 1280 | 1340 | 1470 | 1880 | 2290 |
| 37149 | 1150 | 1180 | 1320 | 1660 | 2020 |
| 37153 | 1670 | 1750 | 1920 | 2450 | 2990 |
| 37167 | 1430 | 1500 | 1640 | 2100 | 2560 |