Practical R Part 3

Intro

This last lesson in the “Practical R” series will show you how to make the Region table and statewide map in the output shown below. The region table shows the result of combining the counties that share a border with Davidson County into what Nashvillians ofen call “The Doughnut,” and the rest of the region’s counties (except for Davidson County, where Nashville is) into the “non-Doughnut” counties.

These geographic designations are meaningful in many ways. Davidson County is clearly the economic hub of the region. The counties that share a border with it - the “Doughnut” counties - have economies and cultures that are distinct from Davidson County but nonetheless closely tied to it, or are least more closely tied than to it than the counties farther away from Davidson.

The “Estimates by Region” table illustrates what I’m talking about. Generally - and certainly setting aside the incredibly high education level in Williamson County - education levels tend to decline as you move geographically outward from Davidson County. Many other ACS variables show the same pattern, or show it in reverse.

It’s a little table, but a lot goes into making it. See About the code for an explanation of the case_when(), group_by() and summarize() functions, which make the Estimates by Region table possible and are all-around super handy data wrangling functions.

The map, meanwhile, is just a statewide version of the Nashville MSA map. The main difference in the code is that the statewide map code simply omits the code that filters the mydata data frame for just the counties in the Nashville MSA.

But if you don’t have time for all of that, you can just skip to the Code, copy it, incorporate it into an R Markdown document, knit and publish the document, submit the URL, and be done.

Output from previous lessons

Percent with a bachelor’s degree or higher
Rutherford County, 2019-2023. Source: American Community Survey

Percent with a bachelor’s degree or higher
Nashville MSA, 2019-2023. Source: American Community Survey.

County	Percent (%)	Margin of Error	Count
Estimates by County
(2019–2023 ACS 5-Year ACS)
Williamson County, Tennessee	61.8	1.2	167,620
Davidson County, Tennessee	47.3	0.7	493,272
Wilson County, Tennessee	37.2	1.2	105,888
Rutherford County, Tennessee	34.4	1.1	223,304
Sumner County, Tennessee	32.6	1.2	138,609
Maury County, Tennessee	26.5	1.5	73,091
Cheatham County, Tennessee	25.1	2.3	29,614
Dickson County, Tennessee	21.4	1.9	38,559
Robertson County, Tennessee	21.1	1.6	50,891
Smith County, Tennessee	17.9	2.3	14,180
Cannon County, Tennessee	16.9	3.2	10,397
Hickman County, Tennessee	13.6	2.2	18,044
Macon County, Tennessee	11.8	2.3	17,135
Trousdale County, Tennessee	10.1	2.7	8,490

Region table and statewide map

Region	Percent (%)	Margin of Error	Count
Estimates by Region
(2019–2023 ACS 5-Year ACS)
Davidson	47.3	0.7	493,272
Doughnut	35.4	1.4	715,926
Non-doughnut	16.9	2.3	179,896

Percent with a bachelor’s degree or higher
Tennessee, 2019-2023. Source: American Community Survey.

Code

Here is the code for making the region table and the statewide map.

# ============================================================
# 0. INSTALL AND LOAD REQUIRED PACKAGES
# ============================================================

if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("leaflet")) install.packages("leaflet")
if (!require("leaflet.extras2")) install.packages("leaflet.extras2")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")
if (!require("plotly")) install.packages("plotly")

library(tidyverse)
library(tidycensus)
library(sf)
library(leaflet)
library(leaflet.extras2)
library(gt)
library(gtExtras)
library(plotly)

# ============================================================
# 1. CENSUS API KEY
# ============================================================

# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")

# ============================================================
# 2. LOAD ACS CODEBOOKS
# ============================================================

DetailedTables <- load_variables(2023, "acs5", cache = TRUE)
SubjectTables  <- load_variables(2023, "acs5/subject", cache = TRUE)
ProfileTables  <- load_variables(2023, "acs5/profile", cache = TRUE)

# ============================================================
# 3. DEFINE VARIABLES OF INTEREST
# ============================================================

VariableList =
  c(
    Count_   = "DP02_0059",
    Percent_ = "DP02_0068P"
  )

# ============================================================
# 4. FETCH COUNTY SUBDIVISION DATA (TENNESSEE)
# ============================================================

mydata <- get_acs(
  geography = "county subdivision",
  state = "TN",
  variables = VariableList,
  year = 2023,
  survey = "acs5",
  output = "wide",
  geometry = TRUE
)

# ============================================================
# 5. CLEAN AND REFORMAT GEOGRAPHIC NAMES
# ============================================================

mydata <- mydata %>%
  separate_wider_delim(
    NAME,
    delim  = ", ",
    names  = c("Division", "County", "State")
  ) %>%
  mutate(County = str_remove(County, " County"))

# ============================================================
# NOTE: See Practical R parts 1 & 2 for sections 6 through 11
# of this script
# ============================================================

# ============================================================
# 12. REGION-LEVEL AGGREGATION AND TABLE
# ============================================================
CountyLevelData <- get_acs(
  geography = "county",
  state = "TN",
  variables = VariableList,
  year = 2023,
  survey = "acs5",
  output = "wide",
  geometry = FALSE
)

SelectedCounties <- CountyLevelData %>% 
  filter(NAME %in% c(
    "Cannon County, Tennessee","Cheatham County, Tennessee",
    "Davidson County, Tennessee","Dickson County, Tennessee",
    "Hickman County, Tennessee","Macon County, Tennessee",
    "Maury County, Tennessee","Robertson County, Tennessee",
    "Rutherford County, Tennessee","Smith County, Tennessee",
    "Sumner County, Tennessee","Trousdale County, Tennessee",
    "Williamson County, Tennessee","Wilson County, Tennessee")
  ) %>%
  select(NAME, Percent_E, Percent_M, Count_E) %>%
  rename(
    County  = NAME,
    Percent = Percent_E,
    Range   = Percent_M,
    Count   = Count_E
  ) %>%
  arrange(desc(Percent))

SelectedCounties <- SelectedCounties %>%
  mutate(
    CountyClean = str_remove(County, ", Tennessee"),
    Region = case_when(
      CountyClean == "Davidson County" ~ "Davidson",
      CountyClean %in% c(
        "Rutherford County","Williamson County","Cheatham County",
        "Robertson County","Sumner County","Wilson County"
      ) ~ "Doughnut",
      TRUE ~ "Non-doughnut"
    )
  )

RegionTableData <- SelectedCounties %>%
  group_by(Region) %>%
  summarise(
    Percent = mean(Percent, na.rm = TRUE),
    Range   = mean(Range, na.rm = TRUE),
    Count   = sum(Count, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Percent))

RegionTable <- RegionTableData %>%
  gt() %>%
  gt_theme_espn() %>%
  cols_label(
    Percent = "Percent (%)",
    Range   = "Margin of Error",
    Count   = "Count"
  ) %>%
  fmt_number(columns = c(Percent, Range), decimals = 1) %>%
  fmt_number(columns = c(Count), decimals = 0, use_seps = TRUE) %>%
  tab_header(
    title = "Estimates by Region",
    subtitle = "(2019–2023 ACS 5-Year ACS)"
  )

RegionTable

# ============================================================
# 13. MAP ENTIRE STATE
# ============================================================

mapdata3 <- mydata %>%
  rename(
    Percent   = Percent_E,
    PercentEM = Percent_M,
    Count     = Count_E,
    CountEM   = Count_M
  ) %>%
  st_as_sf() %>%
  st_transform(4326)

# Get Tennessee county boundaries for overlay
county_borders <- get_acs(
  geography = "county",
  state = "TN",
  variables = "B01001_001",  # dummy population variable
  year = 2023,
  geometry = TRUE
) %>%
  st_as_sf() %>%
  st_transform(4326)

# Viridis color palette
pal <- colorNumeric(
  palette = "viridis",
  domain = mapdata3$Percent
)

# Build leaflet map with county borders
DivisionMap3 <- leaflet(mapdata3) %>%
  # Base maps
  addProviderTiles("CartoDB.Positron", group = "Positron (Light)") %>%
  addProviderTiles("Esri.WorldStreetMap", group = "ESRI Streets") %>%
  addProviderTiles("Esri.WorldImagery", group = "ESRI Satellite") %>%
  # Main polygons
  addPolygons(
    fillColor = ~pal(Percent),
    fillOpacity = 0.7,
    color = "#333333",
    weight = 1,
    group = "Data Layer",
    popup = ~paste0(
      "<b>", Division, "</b><br>",
      "County: ", County, "<br>",
      "Percent: ", Percent, "%<br>",
      "Percent MOE: ±", PercentEM, "<br>",
      "Count: ", Count, "<br>",
      "Count MOE: ±", CountEM
    )
  ) %>%
  # County borders overlay
  addPolylines(
    data = county_borders,
    color = "black",
    weight = 1,
    opacity = 0.8,
    group = "County Borders"
  ) %>%
  # Legend
  addLegend(
    pal = pal,
    values = mapdata3$Percent,
    position = "topright",
    title = "Percent"
  ) %>%
  addLayersControl(
    baseGroups = c(
      "Positron (Light)",
      "ESRI Streets",
      "ESRI Satellite"
    ),
    overlayGroups = c("Data Layer", "County Borders"),
    options = layersControlOptions(
      collapsed = TRUE,
      position = "bottomleft"   # ?? Move control here
    )
  )

DivisionMap3

About the code

The code uses three key functions, case_when(), group_by() and summarize(), to distill the “Estimates by County” table’s data into the “Estimates by Region” table’s data.

The case_when() function

First, the case_when() function looks at each county’s name in the SelectedCounties data frame, classifies the county as either “Davidson,” “Doughnut,” or “Non-doughnut,” and puts the classification result in a new column called Region. Here’s the relevant section of the code:

SelectedCounties <- SelectedCounties %>%
  mutate(
    CountyClean = str_remove(County, ", Tennessee"),
    Region = case_when(
      CountyClean == "Davidson County" ~ "Davidson",
      CountyClean %in% c(
        "Rutherford County","Williamson County","Cheatham County",
        "Robertson County","Sumner County","Wilson County"
      ) ~ "Doughnut",
      TRUE ~ "Non-doughnut"
    )
  )

For more about how to use case_when(), see:

The group_by() and summarize() functions

Next, the group_by() and summarize() functions work together to collapse the data by the Region variable, then, within each region, average the percentages and error margins and add up the counts. Here’s the code:

RegionTableData <- SelectedCounties %>%
  group_by(Region) %>%
  summarise(
    Percent = mean(Percent, na.rm = TRUE),
    Range   = mean(Range, na.rm = TRUE),
    Count   = sum(Count, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Percent))

For more about how to use group_by() and summarize(), see:

Making the statewide map

Meanwhile, the code for making the statewide map works the same as the code for making the Nashville MSA map. The main difference is that the statewide map code omits this bit from the MSA map code:

filtereddata2 <- mydata %>%
  filter(
    County %in% c(
      "Cannon","Cheatham","Davidson","Dickson","Hickman",
      "Macon","Maury","Robertson","Rutherford","Smith",
      "Sumner","Trousdale","Williamson","Wilson"
    )
  )

With that filter omitted, the code simply uses all of the data in mydata data frame, rather than just the data for the counties in the Nashville MSA. So, when you map the data, you get borders and data for all counties in the state.