This last lesson in the “Practical R” series will show you how to make the Region table and statewide map in the output shown below. The region table shows the result of combining the counties that share a border with Davidson County into what Nashvillians ofen call “The Doughnut,” and the rest of the region’s counties (except for Davidson County, where Nashville is) into the “non-Doughnut” counties.
These geographic designations are meaningful in many ways. Davidson County is clearly the economic hub of the region. The counties that share a border with it - the “Doughnut” counties - have economies and cultures that are distinct from Davidson County but nonetheless closely tied to it, or are least more closely tied than to it than the counties farther away from Davidson.
The “Estimates by Region” table illustrates what I’m talking about. Generally - and certainly setting aside the incredibly high education level in Williamson County - education levels tend to decline as you move geographically outward from Davidson County. Many other ACS variables show the same pattern, or show it in reverse.
It’s a little table, but a lot goes into making it. See About the code for an explanation of the
case_when(), group_by() and
summarize() functions, which make the Estimates by Region
table possible and are all-around super handy data wrangling
functions.
The map, meanwhile, is just a statewide version of the Nashville MSA map. The main difference in the code is that the statewide map code simply omits the code that filters the mydata data frame for just the counties in the Nashville MSA.
But if you don’t have time for all of that, you can just skip to the Code, copy it, incorporate it into an R Markdown document, knit and publish the document, submit the URL, and be done.
Percent with a bachelor’s degree or higher
Rutherford County, 2019-2023. Source: American Community
Survey
Percent with a bachelor’s degree or higher
Nashville MSA, 2019-2023. Source: American Community
Survey.
| Estimates by County | |||
| (2019–2023 ACS 5-Year ACS) | |||
| County | Percent (%) | Margin of Error | Count |
|---|---|---|---|
| Williamson County, Tennessee | 61.8 | 1.2 | 167,620 |
| Davidson County, Tennessee | 47.3 | 0.7 | 493,272 |
| Wilson County, Tennessee | 37.2 | 1.2 | 105,888 |
| Rutherford County, Tennessee | 34.4 | 1.1 | 223,304 |
| Sumner County, Tennessee | 32.6 | 1.2 | 138,609 |
| Maury County, Tennessee | 26.5 | 1.5 | 73,091 |
| Cheatham County, Tennessee | 25.1 | 2.3 | 29,614 |
| Dickson County, Tennessee | 21.4 | 1.9 | 38,559 |
| Robertson County, Tennessee | 21.1 | 1.6 | 50,891 |
| Smith County, Tennessee | 17.9 | 2.3 | 14,180 |
| Cannon County, Tennessee | 16.9 | 3.2 | 10,397 |
| Hickman County, Tennessee | 13.6 | 2.2 | 18,044 |
| Macon County, Tennessee | 11.8 | 2.3 | 17,135 |
| Trousdale County, Tennessee | 10.1 | 2.7 | 8,490 |
| Estimates by Region | |||
| (2019–2023 ACS 5-Year ACS) | |||
| Region | Percent (%) | Margin of Error | Count |
|---|---|---|---|
| Davidson | 47.3 | 0.7 | 493,272 |
| Doughnut | 35.4 | 1.4 | 715,926 |
| Non-doughnut | 16.9 | 2.3 | 179,896 |
Percent with a bachelor’s degree or higher
Tennessee, 2019-2023. Source: American Community Survey.
Here is the code for making the region table and the statewide map.
# ============================================================
# 0. INSTALL AND LOAD REQUIRED PACKAGES
# ============================================================
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("leaflet")) install.packages("leaflet")
if (!require("leaflet.extras2")) install.packages("leaflet.extras2")
if (!require("gt")) install.packages("gt")
if (!require("gtExtras")) install.packages("gtExtras")
if (!require("plotly")) install.packages("plotly")
library(tidyverse)
library(tidycensus)
library(sf)
library(leaflet)
library(leaflet.extras2)
library(gt)
library(gtExtras)
library(plotly)
# ============================================================
# 1. CENSUS API KEY
# ============================================================
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# ============================================================
# 2. LOAD ACS CODEBOOKS
# ============================================================
DetailedTables <- load_variables(2023, "acs5", cache = TRUE)
SubjectTables <- load_variables(2023, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2023, "acs5/profile", cache = TRUE)
# ============================================================
# 3. DEFINE VARIABLES OF INTEREST
# ============================================================
VariableList =
c(
Count_ = "DP02_0059",
Percent_ = "DP02_0068P"
)
# ============================================================
# 4. FETCH COUNTY SUBDIVISION DATA (TENNESSEE)
# ============================================================
mydata <- get_acs(
geography = "county subdivision",
state = "TN",
variables = VariableList,
year = 2023,
survey = "acs5",
output = "wide",
geometry = TRUE
)
# ============================================================
# 5. CLEAN AND REFORMAT GEOGRAPHIC NAMES
# ============================================================
mydata <- mydata %>%
separate_wider_delim(
NAME,
delim = ", ",
names = c("Division", "County", "State")
) %>%
mutate(County = str_remove(County, " County"))
# ============================================================
# NOTE: See Practical R parts 1 & 2 for sections 6 through 11
# of this script
# ============================================================
# ============================================================
# 12. REGION-LEVEL AGGREGATION AND TABLE
# ============================================================
CountyLevelData <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2023,
survey = "acs5",
output = "wide",
geometry = FALSE
)
SelectedCounties <- CountyLevelData %>%
filter(NAME %in% c(
"Cannon County, Tennessee","Cheatham County, Tennessee",
"Davidson County, Tennessee","Dickson County, Tennessee",
"Hickman County, Tennessee","Macon County, Tennessee",
"Maury County, Tennessee","Robertson County, Tennessee",
"Rutherford County, Tennessee","Smith County, Tennessee",
"Sumner County, Tennessee","Trousdale County, Tennessee",
"Williamson County, Tennessee","Wilson County, Tennessee")
) %>%
select(NAME, Percent_E, Percent_M, Count_E) %>%
rename(
County = NAME,
Percent = Percent_E,
Range = Percent_M,
Count = Count_E
) %>%
arrange(desc(Percent))
SelectedCounties <- SelectedCounties %>%
mutate(
CountyClean = str_remove(County, ", Tennessee"),
Region = case_when(
CountyClean == "Davidson County" ~ "Davidson",
CountyClean %in% c(
"Rutherford County","Williamson County","Cheatham County",
"Robertson County","Sumner County","Wilson County"
) ~ "Doughnut",
TRUE ~ "Non-doughnut"
)
)
RegionTableData <- SelectedCounties %>%
group_by(Region) %>%
summarise(
Percent = mean(Percent, na.rm = TRUE),
Range = mean(Range, na.rm = TRUE),
Count = sum(Count, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(Percent))
RegionTable <- RegionTableData %>%
gt() %>%
gt_theme_espn() %>%
cols_label(
Percent = "Percent (%)",
Range = "Margin of Error",
Count = "Count"
) %>%
fmt_number(columns = c(Percent, Range), decimals = 1) %>%
fmt_number(columns = c(Count), decimals = 0, use_seps = TRUE) %>%
tab_header(
title = "Estimates by Region",
subtitle = "(2019–2023 ACS 5-Year ACS)"
)
RegionTable
# ============================================================
# 13. MAP ENTIRE STATE
# ============================================================
mapdata3 <- mydata %>%
rename(
Percent = Percent_E,
PercentEM = Percent_M,
Count = Count_E,
CountEM = Count_M
) %>%
st_as_sf() %>%
st_transform(4326)
# Get Tennessee county boundaries for overlay
county_borders <- get_acs(
geography = "county",
state = "TN",
variables = "B01001_001", # dummy population variable
year = 2023,
geometry = TRUE
) %>%
st_as_sf() %>%
st_transform(4326)
# Viridis color palette
pal <- colorNumeric(
palette = "viridis",
domain = mapdata3$Percent
)
# Build leaflet map with county borders
DivisionMap3 <- leaflet(mapdata3) %>%
# Base maps
addProviderTiles("CartoDB.Positron", group = "Positron (Light)") %>%
addProviderTiles("Esri.WorldStreetMap", group = "ESRI Streets") %>%
addProviderTiles("Esri.WorldImagery", group = "ESRI Satellite") %>%
# Main polygons
addPolygons(
fillColor = ~pal(Percent),
fillOpacity = 0.7,
color = "#333333",
weight = 1,
group = "Data Layer",
popup = ~paste0(
"<b>", Division, "</b><br>",
"County: ", County, "<br>",
"Percent: ", Percent, "%<br>",
"Percent MOE: ±", PercentEM, "<br>",
"Count: ", Count, "<br>",
"Count MOE: ±", CountEM
)
) %>%
# County borders overlay
addPolylines(
data = county_borders,
color = "black",
weight = 1,
opacity = 0.8,
group = "County Borders"
) %>%
# Legend
addLegend(
pal = pal,
values = mapdata3$Percent,
position = "topright",
title = "Percent"
) %>%
addLayersControl(
baseGroups = c(
"Positron (Light)",
"ESRI Streets",
"ESRI Satellite"
),
overlayGroups = c("Data Layer", "County Borders"),
options = layersControlOptions(
collapsed = TRUE,
position = "bottomleft" # ?? Move control here
)
)
DivisionMap3
The code uses three key functions, case_when(),
group_by() and summarize(), to distill the
“Estimates by County” table’s data into the “Estimates by Region”
table’s data.
The case_when() function
First, the case_when() function looks at each county’s
name in the SelectedCounties data frame, classifies the
county as either “Davidson,” “Doughnut,” or “Non-doughnut,” and puts the
classification result in a new column called Region. Here’s
the relevant section of the code:
SelectedCounties <- SelectedCounties %>%
mutate(
CountyClean = str_remove(County, ", Tennessee"),
Region = case_when(
CountyClean == "Davidson County" ~ "Davidson",
CountyClean %in% c(
"Rutherford County","Williamson County","Cheatham County",
"Robertson County","Sumner County","Wilson County"
) ~ "Doughnut",
TRUE ~ "Non-doughnut"
)
)
For more about how to use case_when(), see:
The group_by() and summarize()
functions
Next, the group_by() and summarize()
functions work together to collapse the data by the Region
variable, then, within each region, average the percentages and error
margins and add up the counts. Here’s the code:
RegionTableData <- SelectedCounties %>%
group_by(Region) %>%
summarise(
Percent = mean(Percent, na.rm = TRUE),
Range = mean(Range, na.rm = TRUE),
Count = sum(Count, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(Percent))
For more about how to use group_by() and summarize(),
see:
Making the statewide map
Meanwhile, the code for making the statewide map works the same as the code for making the Nashville MSA map. The main difference is that the statewide map code omits this bit from the MSA map code:
filtereddata2 <- mydata %>%
filter(
County %in% c(
"Cannon","Cheatham","Davidson","Dickson","Hickman",
"Macon","Maury","Robertson","Rutherford","Smith",
"Sumner","Trousdale","Williamson","Wilson"
)
)
With that filter omitted, the code simply uses all of the data in
mydata data frame, rather than just the data for the
counties in the Nashville MSA. So, when you map the data, you get
borders and data for all counties in the state.