I. Introduction

Exploring data from the US Census Bureau reveals inequality across the state of Maryland in important ways. Part A will explore the percentage of the population in Maryland that has a graduate level degree, showing a large disparity between the counties with the highest and lowest percentages. We will also explore how margins of error affect interpretation of the results. Part B will explore median household income in Maryland at both the county and census tract level. While county-level mapping shows abrupt changes in income between counties, census tract-level mapping shows a more nuanced perspective, revealing small clusters of high- and low-income pockets within counties. Overall, it’s important to consider the geographic scale of the variable being mapped because it can reveal distinct geographic patterns. The data for this project is from the US Census Bureau’s 2024 1-year and 5-year American Community Surveys.

II. Data Preparation

Load packages

library(tidycensus)
library(tidyverse)
library(plotly)
library(tidyr)
library(scales)
library(mapview)
library(tigris)
library(sf)

Retrieve Data

We will use the get_acs() function to access data from the most recent 1-year American Community Survey. We will retrieve the variable representing the percentage of population that has a graduate degree (variable code “DP02_0066P”). We will grab this data at the county level for the state of Maryland.

# fetch data on the percentage of the population that has a graduate degree (DP02_0066P) 

md_grad <- get_acs(
  geography = "county", # at the county level
  state = "MD", # for the state of Maryland
  variables = c(percent_grad = "DP02_0066P"))

md_grad
## # A tibble: 24 × 5
##    GEOID NAME                          variable     estimate   moe
##    <chr> <chr>                         <chr>           <dbl> <dbl>
##  1 24001 Allegany County, Maryland     percent_grad      9.3   0.8
##  2 24003 Anne Arundel County, Maryland percent_grad     20.6   0.7
##  3 24005 Baltimore County, Maryland    percent_grad     19.1   0.4
##  4 24009 Calvert County, Maryland      percent_grad     16.6   1.1
##  5 24011 Caroline County, Maryland     percent_grad      6.7   1  
##  6 24013 Carroll County, Maryland      percent_grad     15.7   0.7
##  7 24015 Cecil County, Maryland        percent_grad     11.2   1  
##  8 24017 Charles County, Maryland      percent_grad     14.9   0.8
##  9 24019 Dorchester County, Maryland   percent_grad      7.4   1.1
## 10 24021 Frederick County, Maryland    percent_grad     20.5   0.7
## # ℹ 14 more rows

III. Analysis

Part A

Question 1. Which counties in Maryland have the largest percentages of graduate degree holders?

# sort the values in descending order
arrange(md_grad, desc(estimate))
## # A tibble: 24 × 5
##    GEOID NAME                          variable     estimate   moe
##    <chr> <chr>                         <chr>           <dbl> <dbl>
##  1 24027 Howard County, Maryland       percent_grad     34.1   0.9
##  2 24031 Montgomery County, Maryland   percent_grad     33.4   0.5
##  3 24003 Anne Arundel County, Maryland percent_grad     20.6   0.7
##  4 24021 Frederick County, Maryland    percent_grad     20.5   0.7
##  5 24005 Baltimore County, Maryland    percent_grad     19.1   0.4
##  6 24029 Kent County, Maryland         percent_grad     18.1   2.2
##  7 24510 Baltimore city, Maryland      percent_grad     18.1   0.5
##  8 24041 Talbot County, Maryland       percent_grad     17.7   1.7
##  9 24025 Harford County, Maryland      percent_grad     17     0.7
## 10 24009 Calvert County, Maryland      percent_grad     16.6   1.1
## # ℹ 14 more rows

Answer: Howard and Montgomery Counties have the largest percentages of graduate degree holders.

Question 2: Which counties in Maryland have the smallest percentages of graduate degree holders?

# sort the values in descending order
arrange(md_grad, estimate)
## # A tibble: 24 × 5
##    GEOID NAME                        variable     estimate   moe
##    <chr> <chr>                       <chr>           <dbl> <dbl>
##  1 24011 Caroline County, Maryland   percent_grad      6.7   1  
##  2 24019 Dorchester County, Maryland percent_grad      7.4   1.1
##  3 24039 Somerset County, Maryland   percent_grad      7.6   1.6
##  4 24001 Allegany County, Maryland   percent_grad      9.3   0.8
##  5 24043 Washington County, Maryland percent_grad      9.9   0.8
##  6 24015 Cecil County, Maryland      percent_grad     11.2   1  
##  7 24045 Wicomico County, Maryland   percent_grad     11.7   1.1
##  8 24023 Garrett County, Maryland    percent_grad     12.4   1.6
##  9 24047 Worcester County, Maryland  percent_grad     13.9   1.3
## 10 24017 Charles County, Maryland    percent_grad     14.9   0.8
## # ℹ 14 more rows

Answer: Caroline, Dorchester, and Somerset Counties have the smallest percentages of graduate degree holders.

Question 3. Make a margin of error plot. Does this method work well for your state?

# create an error plot
md_grad_plot_errorbar <- ggplot(md_grad, aes(x = estimate, # take the md_grad data frame
                                        y = reorder(NAME, estimate))) + # reorder counties by estimate
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), # create error bars
                width = 0.5, linewidth = 0.5) +
  geom_point(color = "darkblue", size = 2) + 
  scale_x_continuous(labels = label_percent(scale = 1)) + # format x axis as percentages
  scale_y_discrete(labels = function(x) str_remove(x, " County, Maryland|, Maryland")) + # clean up y labels
  labs(title = "Percentage of Population with a Graduate Degree in Maryland, 2024 ACS", # add title
       caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.", # add caption
       x = "ACS estimate", # add labels
       y = "Maryland County") + 
  theme_minimal(base_size = 12)

# make error bar plot interactive
ggplotly(md_grad_plot_errorbar, tooltip = "x")

Answer: The resulting error plot clearly shows the Howard and Montgomery Counties have a much higher percentage of their population that has a graduate degree compared to the other counties in Maryland. When the estimates of the other counties are considered in the context of their margins of error, comparing the counties becomes much more difficult. For example, the counties that fall within the 10-20% ACS estimate are much more difficult to compare when the error bars are taken into account.

Part B

Load variables

# investigate variables from the 2024 5-year American Community Survey
vars <- load_variables(2024, "acs5")

# explore the variables
view(vars)

# filter to a variable of interest
# B19013 = Median household income (in the past 12 months)
vars %>%
  filter(str_detect(name, "B19013"))
## # A tibble: 10 × 4
##    name        label                                           concept geography
##    <chr>       <chr>                                           <chr>   <chr>    
##  1 B19013A_001 Estimate!!Median household income in the past … Median… <NA>     
##  2 B19013B_001 Estimate!!Median household income in the past … Median… <NA>     
##  3 B19013C_001 Estimate!!Median household income in the past … Median… <NA>     
##  4 B19013D_001 Estimate!!Median household income in the past … Median… <NA>     
##  5 B19013E_001 Estimate!!Median household income in the past … Median… <NA>     
##  6 B19013F_001 Estimate!!Median household income in the past … Median… <NA>     
##  7 B19013G_001 Estimate!!Median household income in the past … Median… <NA>     
##  8 B19013H_001 Estimate!!Median household income in the past … Median… <NA>     
##  9 B19013I_001 Estimate!!Median household income in the past … Median… <NA>     
## 10 B19013_001  Estimate!!Median household income in the past … Median… <NA>

There are many different flavors of median household income. We will select variable code B19013_001 for total population (does not include race).

With a variable selected, we will now retrieve this data for Maryland at the county level.

Retrieve Data

# fetch data on the median household income (B19013_001) at the county level for Maryland 
md_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  state = "MD",
  year = 2024,
  survey = "acs5",
  geometry = TRUE, 
  progress_bar = FALSE
)

Create an interactive map using mapview()

mapview(md_income, zcol = "estimate")

This map makes it look like Median household income changes abruptly between counties. Let’s re-visualize the data at the census tract level to view the same data with more granularity.

Look at the data but from the census tract level.

# fetch data on the median household income (B19013_001) at the census tract level for Maryland 
md_tract_income <- get_acs(
  geography = "tract",
  variables = "B19013_001",
  state = "MD",
  year = 2024,
  survey = "acs5",
  geometry = TRUE,
  progress_bar = FALSE
)

mapview(md_tract_income, zcol = "estimate")

Looking at the data by census tract shows the changes in median household income are much more gradual than abrupt. There is also a greater range of median household income at the census tract level than at the county level.

Create a Choropleth map

# choropleth map using ggplot
md_tract_income_map <- ggplot(md_tract_income) +
  geom_sf(aes(fill = estimate), color = NA) +
  scale_fill_viridis_c(
    labels = label_dollar(),
    name = "Median household\nincome"
  ) +
  labs(
    title = "Median Household Income by Census Tract in Maryland",
    subtitle = "2024 ACS 5-year estimates",
    caption = "Data acquired with R using tidycensus",
    x = NULL,
    y = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text = element_blank(),
    panel.grid = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "right"
  )

md_tract_income_map

This choropleth map shows the median household income for census tracts in Maryland. This map highlights pockets of high- and low-income areas within counties a reveals a more gradual spatial pattern of changes in median household income.