Exploring data from the US Census Bureau reveals inequality across the state of Maryland in important ways. Part A will explore the percentage of the population in Maryland that has a graduate level degree, showing a large disparity between the counties with the highest and lowest percentages. We will also explore how margins of error affect interpretation of the results. Part B will explore median household income in Maryland at both the county and census tract level. While county-level mapping shows abrupt changes in income between counties, census tract-level mapping shows a more nuanced perspective, revealing small clusters of high- and low-income pockets within counties. Overall, it’s important to consider the geographic scale of the variable being mapped because it can reveal distinct geographic patterns. The data for this project is from the US Census Bureau’s 2024 1-year and 5-year American Community Surveys.
library(tidycensus)
library(tidyverse)
library(plotly)
library(tidyr)
library(scales)
library(mapview)
library(tigris)
library(sf)
We will use the get_acs() function to access data from the most recent 1-year American Community Survey. We will retrieve the variable representing the percentage of population that has a graduate degree (variable code “DP02_0066P”). We will grab this data at the county level for the state of Maryland.
# fetch data on the percentage of the population that has a graduate degree (DP02_0066P)
md_grad <- get_acs(
geography = "county", # at the county level
state = "MD", # for the state of Maryland
variables = c(percent_grad = "DP02_0066P"))
md_grad
## # A tibble: 24 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 24001 Allegany County, Maryland percent_grad 9.3 0.8
## 2 24003 Anne Arundel County, Maryland percent_grad 20.6 0.7
## 3 24005 Baltimore County, Maryland percent_grad 19.1 0.4
## 4 24009 Calvert County, Maryland percent_grad 16.6 1.1
## 5 24011 Caroline County, Maryland percent_grad 6.7 1
## 6 24013 Carroll County, Maryland percent_grad 15.7 0.7
## 7 24015 Cecil County, Maryland percent_grad 11.2 1
## 8 24017 Charles County, Maryland percent_grad 14.9 0.8
## 9 24019 Dorchester County, Maryland percent_grad 7.4 1.1
## 10 24021 Frederick County, Maryland percent_grad 20.5 0.7
## # ℹ 14 more rows
# sort the values in descending order
arrange(md_grad, desc(estimate))
## # A tibble: 24 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 24027 Howard County, Maryland percent_grad 34.1 0.9
## 2 24031 Montgomery County, Maryland percent_grad 33.4 0.5
## 3 24003 Anne Arundel County, Maryland percent_grad 20.6 0.7
## 4 24021 Frederick County, Maryland percent_grad 20.5 0.7
## 5 24005 Baltimore County, Maryland percent_grad 19.1 0.4
## 6 24029 Kent County, Maryland percent_grad 18.1 2.2
## 7 24510 Baltimore city, Maryland percent_grad 18.1 0.5
## 8 24041 Talbot County, Maryland percent_grad 17.7 1.7
## 9 24025 Harford County, Maryland percent_grad 17 0.7
## 10 24009 Calvert County, Maryland percent_grad 16.6 1.1
## # ℹ 14 more rows
Answer: Howard and Montgomery Counties have the largest percentages of graduate degree holders.
# sort the values in descending order
arrange(md_grad, estimate)
## # A tibble: 24 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 24011 Caroline County, Maryland percent_grad 6.7 1
## 2 24019 Dorchester County, Maryland percent_grad 7.4 1.1
## 3 24039 Somerset County, Maryland percent_grad 7.6 1.6
## 4 24001 Allegany County, Maryland percent_grad 9.3 0.8
## 5 24043 Washington County, Maryland percent_grad 9.9 0.8
## 6 24015 Cecil County, Maryland percent_grad 11.2 1
## 7 24045 Wicomico County, Maryland percent_grad 11.7 1.1
## 8 24023 Garrett County, Maryland percent_grad 12.4 1.6
## 9 24047 Worcester County, Maryland percent_grad 13.9 1.3
## 10 24017 Charles County, Maryland percent_grad 14.9 0.8
## # ℹ 14 more rows
Answer: Caroline, Dorchester, and Somerset Counties have the smallest percentages of graduate degree holders.
# create an error plot
md_grad_plot_errorbar <- ggplot(md_grad, aes(x = estimate, # take the md_grad data frame
y = reorder(NAME, estimate))) + # reorder counties by estimate
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), # create error bars
width = 0.5, linewidth = 0.5) +
geom_point(color = "darkblue", size = 2) +
scale_x_continuous(labels = label_percent(scale = 1)) + # format x axis as percentages
scale_y_discrete(labels = function(x) str_remove(x, " County, Maryland|, Maryland")) + # clean up y labels
labs(title = "Percentage of Population with a Graduate Degree in Maryland, 2024 ACS", # add title
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.", # add caption
x = "ACS estimate", # add labels
y = "Maryland County") +
theme_minimal(base_size = 12)
# make error bar plot interactive
ggplotly(md_grad_plot_errorbar, tooltip = "x")
Answer: The resulting error plot clearly shows the Howard and Montgomery Counties have a much higher percentage of their population that has a graduate degree compared to the other counties in Maryland. When the estimates of the other counties are considered in the context of their margins of error, comparing the counties becomes much more difficult. For example, the counties that fall within the 10-20% ACS estimate are much more difficult to compare when the error bars are taken into account.
# investigate variables from the 2024 5-year American Community Survey
vars <- load_variables(2024, "acs5")
# explore the variables
view(vars)
# filter to a variable of interest
# B19013 = Median household income (in the past 12 months)
vars %>%
filter(str_detect(name, "B19013"))
## # A tibble: 10 × 4
## name label concept geography
## <chr> <chr> <chr> <chr>
## 1 B19013A_001 Estimate!!Median household income in the past … Median… <NA>
## 2 B19013B_001 Estimate!!Median household income in the past … Median… <NA>
## 3 B19013C_001 Estimate!!Median household income in the past … Median… <NA>
## 4 B19013D_001 Estimate!!Median household income in the past … Median… <NA>
## 5 B19013E_001 Estimate!!Median household income in the past … Median… <NA>
## 6 B19013F_001 Estimate!!Median household income in the past … Median… <NA>
## 7 B19013G_001 Estimate!!Median household income in the past … Median… <NA>
## 8 B19013H_001 Estimate!!Median household income in the past … Median… <NA>
## 9 B19013I_001 Estimate!!Median household income in the past … Median… <NA>
## 10 B19013_001 Estimate!!Median household income in the past … Median… <NA>
There are many different flavors of median household income. We will select variable code B19013_001 for total population (does not include race).
With a variable selected, we will now retrieve this data for Maryland at the county level.
# fetch data on the median household income (B19013_001) at the county level for Maryland
md_income <- get_acs(
geography = "county",
variables = "B19013_001",
state = "MD",
year = 2024,
survey = "acs5",
geometry = TRUE,
progress_bar = FALSE
)
mapview(md_income, zcol = "estimate")
This map makes it look like Median household income changes abruptly between counties. Let’s re-visualize the data at the census tract level to view the same data with more granularity.
Look at the data but from the census tract level.
# fetch data on the median household income (B19013_001) at the census tract level for Maryland
md_tract_income <- get_acs(
geography = "tract",
variables = "B19013_001",
state = "MD",
year = 2024,
survey = "acs5",
geometry = TRUE,
progress_bar = FALSE
)
mapview(md_tract_income, zcol = "estimate")
Looking at the data by census tract shows the changes in median household income are much more gradual than abrupt. There is also a greater range of median household income at the census tract level than at the county level.
# choropleth map using ggplot
md_tract_income_map <- ggplot(md_tract_income) +
geom_sf(aes(fill = estimate), color = NA) +
scale_fill_viridis_c(
labels = label_dollar(),
name = "Median household\nincome"
) +
labs(
title = "Median Household Income by Census Tract in Maryland",
subtitle = "2024 ACS 5-year estimates",
caption = "Data acquired with R using tidycensus",
x = NULL,
y = NULL
) +
theme_minimal(base_size = 12) +
theme(
axis.text = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(face = "bold"),
legend.position = "right"
)
md_tract_income_map
This choropleth map shows the median household income for census tracts in Maryland. This map highlights pockets of high- and low-income areas within counties a reveals a more gradual spatial pattern of changes in median household income.