This R Markdown file fulfills the Practical R Part 3
assignment. It produces a region‑level summary table for the 14 counties
that make up the Nashville metropolitan statistical area and a statewide
map showing the percentage of adults with at least a bachelor’s degree.
The document follows the numbered sections of the assignment and
includes an explanation of how the code works. To ensure the map renders
reliably even when the Census geometry API is unavailable, the
geographic shapes are retrieved using the tigris
package rather than via geometry = TRUE in
tidycensus::get_acs().
The ACS codebooks provide metadata for variables. Loading them caches the metadata for later use. We won’t display their contents here because they are large.
DetailedTables <- load_variables(2023, "acs5", cache = TRUE)
SubjectTables <- load_variables(2023, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2023, "acs5/profile", cache = TRUE)
The assignment calls for two measures from the ACS: the count and
percentage of adults age 25 or older with a bachelor’s degree or higher.
These correspond to DP02_0059 (estimate) and
DP02_0068P (percentage).
VariableList <- c(
Count_ = "DP02_0059",
Percent_ = "DP02_0068P"
)
VariableList
## Count_ Percent_
## "DP02_0059" "DP02_0068P"
We first download ACS estimates for all minor civil divisions (county
subdivisions) in Tennessee using get_acs(). We set
geometry = FALSE here because we will acquire the shapes
separately using tigris. After downloading, we split
the NAME field into Division,
County and State and strip the “County” suffix
for readability.
mydata <- get_acs(
geography = "county subdivision",
state = "TN",
variables = VariableList,
year = 2023,
survey = "acs5",
output = "wide",
geometry = FALSE,
show_progress = FALSE
)
mydata <- mydata %>%
separate_wider_delim(
NAME,
delim = ", ",
names = c("Division", "County", "State")
) %>%
mutate(County = str_remove(County, " County"))
head(mydata)
## # A tibble: 6 × 8
## GEOID Division County State Count_E Count_M Percent_E Percent_M
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 4700190002 District 1 Anderson Tennessee 6994 682 24 4.1
## 2 4700190192 District 2 Anderson Tennessee 7777 404 18.4 4.8
## 3 4700190382 District 3 Anderson Tennessee 7917 706 30.9 4.4
## 4 4700190572 District 4 Anderson Tennessee 6449 589 10.9 3
## 5 4700190762 District 5 Anderson Tennessee 6925 746 16.8 4.3
## 6 4700190952 District 6 Anderson Tennessee 6197 436 32.5 4.3
This step reproduces the “Estimates by Region” table from the
assignment. We first fetch county‑level ACS estimates (without
geometry), then filter for the 14 counties that make up the Nashville
metropolitan statistical area. Using case_when(), we
classify each county as either Davidson,
Doughnut (the six counties surrounding Davidson) or
Non‑doughnut. Finally, we compute the average
percentage of college‑educated adults, the average margin of error, and
the total count for each region.
CountyLevelData <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2023,
survey = "acs5",
output = "wide",
geometry = FALSE,
show_progress = FALSE
)
SelectedCounties <- CountyLevelData %>%
filter(NAME %in% c(
"Cannon County, Tennessee", "Cheatham County, Tennessee",
"Davidson County, Tennessee", "Dickson County, Tennessee",
"Hickman County, Tennessee", "Macon County, Tennessee",
"Maury County, Tennessee", "Robertson County, Tennessee",
"Rutherford County, Tennessee", "Smith County, Tennessee",
"Sumner County, Tennessee", "Trousdale County, Tennessee",
"Williamson County, Tennessee","Wilson County, Tennessee"
)) %>%
select(NAME, Percent_E, Percent_M, Count_E) %>%
rename(
County = NAME,
Percent = Percent_E,
Range = Percent_M,
Count = Count_E
) %>%
arrange(desc(Percent))
SelectedCounties <- SelectedCounties %>%
mutate(
CountyClean = str_remove(County, ", Tennessee"),
Region = case_when(
CountyClean == "Davidson County" ~ "Davidson",
CountyClean %in% c(
"Rutherford County", "Williamson County",
"Cheatham County", "Robertson County",
"Sumner County", "Wilson County"
) ~ "Doughnut",
TRUE ~ "Non-doughnut"
)
)
RegionTableData <- SelectedCounties %>%
group_by(Region) %>%
summarise(
Percent = mean(Percent, na.rm = TRUE),
Range = mean(Range, na.rm = TRUE),
Count = sum(Count, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(Percent))
RegionTable <- RegionTableData %>%
gt() %>%
gt_theme_espn() %>%
cols_label(
Percent = "Percent (%)",
Range = "Margin of Error",
Count = "Count"
) %>%
fmt_number(columns = c(Percent, Range), decimals = 1) %>%
fmt_number(columns = c(Count), decimals = 0, use_seps = TRUE) %>%
tab_header(
title = "Estimates by Region",
subtitle = "(2019–2023 ACS 5‑Year ACS)"
)
RegionTable
| Estimates by Region | |||
| (2019–2023 ACS 5‑Year ACS) | |||
| Region | Percent (%) | Margin of Error | Count |
|---|---|---|---|
| Davidson | 47.3 | 0.7 | 493,272 |
| Doughnut | 35.4 | 1.4 | 715,926 |
| Non-doughnut | 16.9 | 2.3 | 179,896 |
To build the statewide map, we first rename the ACS columns to shorter names and convert the estimates and margins of error to numeric. We then obtain cartographic boundary files for minor civil divisions and counties using tigris. After joining the estimates onto the subdivision shapes, we build an interactive Leaflet map.
# Rename and convert variables so that Percent and PercentEM are numeric
mapdata3 <- mydata %>%
rename(
Percent = Percent_E,
PercentEM = Percent_M,
Count = Count_E,
CountEM = Count_M
) %>%
mutate(
Percent = as.numeric(Percent),
PercentEM = as.numeric(PercentEM)
)
# Retrieve subdivision and county shapes via tigris
tn_subdivs <- county_subdivisions(state = "TN", cb = TRUE, year = 2023) %>%
st_transform(4326)
county_borders <- counties(state = "TN", cb = TRUE, year = 2023) %>%
st_transform(4326)
# Join the estimates onto the subdivision shapes using GEOID
map_sf <- tn_subdivs %>%
left_join(mapdata3, by = "GEOID")
# Colour palette for the map
pal <- colorNumeric(
palette = "viridis",
domain = map_sf$Percent
)
DivisionMap3 <- leaflet(map_sf) %>%
addProviderTiles("CartoDB.Positron", group = "Positron (Light)") %>%
addProviderTiles("Esri.WorldStreetMap", group = "ESRI Streets") %>%
addProviderTiles("Esri.WorldImagery", group = "ESRI Satellite") %>%
addPolygons(
fillColor = ~pal(Percent),
fillOpacity = 0.7,
color = "#333333",
weight = 1,
group = "Data Layer",
popup = ~paste0(
"<b>", NAME, "</b><br>",
"County: ", County, "<br>",
"Percent: ", Percent, "%<br>",
"Margin of Error: ±", PercentEM, "<br>",
"Count: ", Count, "<br>",
"Count MOE: ±", CountEM
)
) %>%
addPolylines(
data = county_borders,
color = "black",
weight = 1,
opacity = 0.8,
group = "County Borders"
) %>%
addLegend(
pal = pal,
values = map_sf$Percent,
position = "topright",
title = "Percent"
) %>%
addLayersControl(
baseGroups = c(
"Positron (Light)",
"ESRI Streets",
"ESRI Satellite"
),
overlayGroups = c("Data Layer", "County Borders"),
options = layersControlOptions(
collapsed = TRUE,
position = "bottomleft"
)
)
DivisionMap3
Several key functions make this script work:
get_acs() from
tidycensus downloads ACS estimates. We call it twice:
once for county subdivisions and once for counties. In both cases we
request estimates (output = "wide") and turn off geometry
because the Census geometry service can fail.
county_subdivisions() and
counties() from tigris
fetch cartographic boundary shapefiles for minor civil divisions and
counties. These functions are more reliable than requesting geometry
through get_acs().
case_when() classifies counties
into Davidson, Doughnut and
Non‑doughnut groups based on their names.
group_by() and
summarise() compute average percentages,
average margins of error and total counts for each region.
The gt package formats the regional summary into a clean table, while leaflet constructs an interactive statewide map with base layers, shaded polygons, pop‑ups and a legend.