Intro

As you work on writing your final paper, I want to finish out the course with a series of lessons about some of the practical things R can do. This is Part 1 of the series. We’ll cover Part 2 and Part 3 in upcoming weeks. The goal is to show you some things you might use R for even if you never willingly go near an inferential statistic again for as long as you live - all while not taking up too much of the time you will need to finish your paper.

Specifically, you’ll learn how to make the following maps, graphic and tables, which explore the percentage of people age 25 or older who have a bachelor’s degree or higher within Rutherford County, across the Nashville Metropolitan Statistical Area, and throughout all of Tennessee:


Percent with a bachelor’s degree or higher
Rutherford County, 2019-2023. Source: American Community Survey


Percent with a bachelor’s degree or higher
Nashville MSA, 2019-2023. Source: American Community Survey.


Estimates by County
(2019–2023 ACS 5-Year ACS)
County Percent (%) Margin of Error Count
Williamson County, Tennessee 61.8 1.2 167,620
Davidson County, Tennessee 47.3 0.7 493,272
Wilson County, Tennessee 37.2 1.2 105,888
Rutherford County, Tennessee 34.4 1.1 223,304
Sumner County, Tennessee 32.6 1.2 138,609
Maury County, Tennessee 26.5 1.5 73,091
Cheatham County, Tennessee 25.1 2.3 29,614
Dickson County, Tennessee 21.4 1.9 38,559
Robertson County, Tennessee 21.1 1.6 50,891
Smith County, Tennessee 17.9 2.3 14,180
Cannon County, Tennessee 16.9 3.2 10,397
Hickman County, Tennessee 13.6 2.2 18,044
Macon County, Tennessee 11.8 2.3 17,135
Trousdale County, Tennessee 10.1 2.7 8,490

Estimates by Region
(2019–2023 ACS 5-Year ACS)
Region Percent (%) Margin of Error Count
Davidson 47.3 0.7 493,272
Doughnut 35.4 1.4 715,926
Non-doughnut 16.9 2.3 179,896

Percent with a bachelor’s degree or higher
Tennessee, 2019-2023. Source: American Community Survey.


That’s a lot. And a lot going on “under the hood,” in the code behind the output. But we’ll tackle it in manageable bites over what remains of the semester. And you’ll know how to make similar maps, graphics and tables of all sorts of other local population characteristics, for any area or set of areas anywhere in the U.S.

The America Community Survey

The data for the above come from the latest release of the American Community Survey (ACS), an ongoing survey conducted by the U.S. Census Bureau that collects detailed demographic, social, economic, and housing information from a representative sample of U.S. households each year.

Unlike the decennial census, which counts every resident every 10 years, the ACS provides annual estimates that allow researchers, policymakers, and communities to track trends in population characteristics, income, education, employment, and housing at national, state, and local levels. ACS estimates are used for program planning, policy decisions, and allocation of federal funds. They’re also immensely valuable for media professionals working in journalism, public relations, advertising, political communication. ACS data can show you which areas of your community are growing or shrinking, rich or poor, white collar or blue collar, diverse or segregated, transient or settled, and much more.

ACS data also is uniquely suited to showing off R’s tabling, graphing and mapping capabilities. And, handily, tidycensus, one of R’s best-designed, best-documented add-on packages, makes working with ACS data a breeze.

Why I picked education

Interest in ACS measures of renter and homeowner costs surges when local housing prices go up or down. Population change gets attention when schools grow overcrowded or begin to empty. Public assistance figures move front and center when political fights erupt over the social safety net’s cost or sufficiency.

Education levels don’t make the news nearly as often. But few single measures can tell you as much about an area as the percentage of its residents who hold at least a bachelor’s degree. Education correlates strongly with digital literacy, news consumption, political engagement, and communication channel preferences - all things that interest media scholars. It’s also closely associated with leisure time and disposable income, factors that can tell you whether an area might be a suitable location for certain kinds of businesses or ad campaigns. Finally, looking at education levels can guide journalists to important stories including ones about community conditions, disparities, health outcomes, labor exploitation, and economic development.

Education-related variables are also one of the easier sets of ACS measures to work with. And if you learn how to work with education measures, you’ll know how to work with any of the ACS’s other measures.

Getting started

You will need a Census API key. Go ahead and get one by completing and submitting this online form. It’s free, and it will arrive by e-mail usually within a few minutes. The e-mail will include instructions on how to verify and activate your key. The key will expire after a short while, so be sure to follow the instructions as soon as possible after you receive the e-mail.

The key is essentially an access code that lets the Census Bureau’s servers know you are an authorized user. Your key is good for life, so store it somewhere safe, and don’t share it with anyone.

Once you have your key, you can plug it into the script below by finding the line that reads:

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

… and, well, replacing PasteYourAPIKeyBetweenTheseQuoteMarks with your key. Be sure you don’t accidentally delete the quote marks. Also be sure you don’t accidentally include a space before or after your key, or any other stray characters from the e-mail in which you received the key.

Your job this week is to get your Census API key, plug it into code below, ensure that the code runs correctly, then use R Markdown to produce, publish and submit the URL for an R Markdown document showing the Rutherford County map and table. We’ll add the rest of the output during the coming weeks.

My Practical R Part 1 YouTube video will walk you though using the code and making the R Markdown document. Here’s the script you will need for Part 1:

Part 1 R code

# ============================================================
# 0. INSTALL AND LOAD REQUIRED PACKAGES
# ============================================================

if (!require("tidyverse"))
  install.packages("tidyverse")
if (!require("tidycensus"))
  install.packages("tidycensus")
if (!require("sf"))
  install.packages("sf")
if (!require("mapview"))
  install.packages("mapview")
if (!require("leaflet"))
  install.packages("leaflet")
if (!require("leaflet.extras2"))
  install.packages("leaflet.extras2")
if (!require("gt"))
  install.packages("gt")
if (!require("gtExtras"))
  install.packages("gtExtras")
if (!require("plotly"))
  install.packages("plotly")

library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
library(leaflet)
library(leafpop)
library(gt)
library(gtExtras)
library(plotly)

# ============================================================
# 1. CENSUS API KEY. 

# NOTE: Replace PasteYourAPIKeyBetweenTheseQuoteMarks with your
# actual API key. If you don't the script won't work.

# ============================================================

census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

# ============================================================
# 2. LOAD ACS CODEBOOKS
# ============================================================

DetailedTables <- load_variables(2023, "acs5", cache = TRUE)
SubjectTables  <- load_variables(2023, "acs5/subject", cache = TRUE)
ProfileTables  <- load_variables(2023, "acs5/profile", cache = TRUE)

# ============================================================
# 3. DEFINE VARIABLES OF INTEREST
# ============================================================

VariableList =
  c(
    Count_ = "DP02_0059",
    Percent_ = "DP02_0068P"
  )

# ============================================================
# 4. FETCH COUNTY SUBDIVISION DATA (TENNESSEE)
# ============================================================

mydata <- get_acs(
  geography = "county subdivision",
  state = "TN",
  variables = VariableList,
  year = 2023,
  survey = "acs5",
  output = "wide",
  geometry = TRUE
)

# ============================================================
# 5. CLEAN AND REFORMAT GEOGRAPHIC NAMES
# ============================================================

mydata <- mydata %>%
  separate_wider_delim(
    NAME,
    delim  = ", ",
    names  = c("Division", "County", "State")
  ) %>%
  mutate(County = str_remove(County, " County"))

# ============================================================
# 6. FILTER FOR A SINGLE COUNTY
# ============================================================

filtereddata <- mydata %>%
  filter(County %in% c("Rutherford"))

# ============================================================
# 7. MAP DATA FOR SINGLE COUNTY SUBDIVISIONS
# ============================================================

mapdata <- filtereddata %>%
  rename(
    Percent = Percent_E,
    PercentEM = Percent_M,
    Count = Count_E,
    CountEM = Count_M
  ) %>%
  st_as_sf()

mapviewOptions(basemaps.color.shuffle = FALSE)

DivisionMap <- mapview(
  mapdata,
  zcol = "Percent",
  layer.name = "Percent",
  popup = popupTable(
    mapdata,
    feature.id  = FALSE,
    row.numbers = FALSE,
    zcol = c("State", "County", "Division", "Percent",
             "PercentEM", "Count", "CountEM")
  )
)

DivisionMap

# ============================================================
# 8. INTERACTIVE PLOTLY GRAPH OF ESTIMATES WITH ERROR BARS
# ============================================================

mygraph <- plot_ly(
  data = filtereddata,
  x = ~Percent_E,
  y = ~reorder(Division, Percent_E),
  type = 'scatter',
  mode = 'markers',
  error_x = list(
    type = "data",
    array = ~Percent_M,
    visible = TRUE
  ),
  marker = list(color = "#099d91", size = 10)
) %>%
  layout(
    title = "Estimates by Area (Interactive)",
    xaxis = list(title = "2019–2023 ACS Estimate"),
    yaxis = list(title = "", automargin = TRUE)
  )

mygraph