Introduction

This report explores socioeconomic patterns using data from the American Community Survey (ACS). Two main analyses are conducted will be conducted, broken up into part A and part B. Within part A, a non-spatial analysis examines the percentage of the population with a graduate degree across counties in Florida. Following this in Part B, a spatial analysis investigates median household income across the same counties.

The goal of this report is to demonstrate how ACS data can be accessed, processed and visualized using R. Both interactive and static visualizations are used to better understand patterns and variability across geographic regions.

Data Sources

U.S. Census Bureau, American Community Survey 5-Year Estimates .Retrieved using the tidycensus package in R

Data Preparation

library(tidyverse)
library(tidycensus)
library(ggplot2)
library(scales)
library(plotly)
library(ggiraph)
library(sf)
library(mapview)

# Set state
state_name <- "Florida"

The data for this analysis, for Part A and Part B comes from the U.S. Census Bureau’s American Community Survey 5 year estimates. The tidycensus package is used to retrieve the data directly into R using an API key, retrieved from the Census Bureau.

Part A: The Non-spatial Analysis

Retrieving Graduate Degree Data

In this portion the graduate data is going to be retrieved, cleaned, and arranged accordingly.

grad_deg <- get_acs(
  geography = "county",
  state = state_name,
  variable = "DP02_0066P",
  survey = "acs5",
  year = 2023
)
## Getting data from the 2019-2023 5-year ACS
## Using the ACS Data Profile
grad_deg_clean <- grad_deg %>%
  select(GEOID, NAME, estimate, moe) %>%
  mutate(
    county = str_remove(NAME, paste0(", ", state_name))
  ) %>%
  arrange(desc(estimate))

Counties with the Highest and Lowest Percentages

Two questions that are being directly asked from the data is, which counties from Florida have the largest percentages of Graduate degree holders and which have the lowest percentages of Graduate degree holders?

highest_counties <- grad_deg_clean %>%
  slice_max(order_by = estimate, n = 5)

lowest_counties <- grad_deg_clean %>%
  slice_min(order_by = estimate, n = 5)

highest_counties %>% select(county, estimate, moe)
## # A tibble: 5 × 3
##   county           estimate   moe
##   <chr>               <dbl> <dbl>
## 1 Alachua County       24.7   0.9
## 2 Leon County          21     0.8
## 3 St. Johns County     18.4   0.8
## 4 Sarasota County      16.9   0.6
## 5 Collier County       16.4   0.7
lowest_counties %>% select(county, estimate, moe)
## # A tibble: 5 × 3
##   county           estimate   moe
##   <chr>               <dbl> <dbl>
## 1 Hendry County         2.5   1  
## 2 Glades County         2.8   1  
## 3 Dixie County          2.9   1.1
## 4 Hamilton County       3.3   1.1
## 5 Lafayette County      3.3   1.7

The results show clear variation in graduate degree attainment across counties, with some counties having significantly higher percentages than others. With the highest percentages being within the counties: Alachua, Leon and St. Johns county. The lowest percentages being within the counties: Hendry, Glades and Dixie County.

Margin of Error Plot

grad_plot <- ggplot(
  grad_deg_clean,
  aes(x = estimate, y = reorder(county, estimate))
) +
  geom_errorbarh(
    aes(xmin = estimate - moe, xmax = estimate + moe),
    height = 0.2,
    color = "black"
  ) +
  geom_point(color = "darkred", size = 2.5) +
  scale_x_continuous(labels = label_number(suffix = "%")) +
  labs(
    title = "Adults with a Graduate Degree by County",
    subtitle = "Florida (2019–2023 ACS)",
    x = "Percentage (%)",
    y = NULL,
  ) +
  theme_minimal()

grad_plot
Fig 1.1 This plot shows both the estimated percentage and the margin of error for each county. Data: ACS 5 Year Estimates. Error bars represent margin of error.

Fig 1.1 This plot shows both the estimated percentage and the margin of error for each county. Data: ACS 5 Year Estimates. Error bars represent margin of error.

This plot shows the estimated percentage and the uncertainty associated with each estimate. Counties with a large range of error bars, 10-20%, are difficult to interpret and subsequently should be done so cautiously as their true values may or may not be significantly different.

Interactive Plot

ggplotly(grad_plot)
## `height` was translated to `width`.

The interactive version allows users to explore county level values more easily by hovering over each point.

Interactive Plot (optional ggiraph version)

grad_plot_ggiraph <- ggplot(
  grad_deg_clean,
  aes(
    x = estimate,
    y = reorder(county, estimate),
    tooltip = paste0(
      "County: ", county,
      "<br>Estimate: ", round(estimate, 1), "%",
      "<br>MOE: ±", round(moe, 1)
    ),
    data_id = GEOID
  )
) +
  geom_errorbarh(
    aes(xmin = estimate - moe, xmax = estimate + moe),
    height = 0.2,
    color = "black"
  ) +
  geom_point_interactive(color = "darkred", size = 2.5) +
  scale_x_continuous(labels = label_number(suffix = "%")) +
  labs(
    title = "Adults with a Graduate Degree by County",
    subtitle = paste("Counties in", state_name),
    x = "Percentage (%)",
    y = NULL
  ) +
  theme_minimal()

girafe(ggobj = grad_plot_ggiraph)
## `height` was translated to `width`.

This ggiraph version does provide a more interactive visualization that allows for greater comparision per county

Part B: Spatial Analysis

Exploring Variables

For this portion, Census data of the median household income within the counties of Florida was selected for spatial analysis. The reasoning for this selection is there should be a direct correlation between education levels and household income levels per county. Also this portion laid the practice groundwork with income data from the Census bureau for my own personal term project.

The first task will be to retrieve the median household income data (B19013_001) from the Census Bureau. Following this the data will be cleaned up within R.

acs_vars <- load_variables(2023, "acs5")

acs_vars %>%
  filter(str_detect(label, "Median household income")) %>%
  select(name, label, concept)
## # A tibble: 25 × 3
##    name        label                                                     concept
##    <chr>       <chr>                                                     <chr>  
##  1 B19013A_001 Estimate!!Median household income in the past 12 months … Median…
##  2 B19013B_001 Estimate!!Median household income in the past 12 months … Median…
##  3 B19013C_001 Estimate!!Median household income in the past 12 months … Median…
##  4 B19013D_001 Estimate!!Median household income in the past 12 months … Median…
##  5 B19013E_001 Estimate!!Median household income in the past 12 months … Median…
##  6 B19013F_001 Estimate!!Median household income in the past 12 months … Median…
##  7 B19013G_001 Estimate!!Median household income in the past 12 months … Median…
##  8 B19013H_001 Estimate!!Median household income in the past 12 months … Median…
##  9 B19013I_001 Estimate!!Median household income in the past 12 months … Median…
## 10 B19013_001  Estimate!!Median household income in the past 12 months … Median…
## # ℹ 15 more rows

Cleaning Spatial Data

income_data <- get_acs(
  geography = "county",
  state = state_name,
  variable = "B19013_001",
  survey = "acs5",
  year = 2023,
  geometry = TRUE
)

income_clean <- income_data %>%
  mutate(
    county = str_remove(NAME, paste0(", ", state_name))
  )

Interactive Map Creation

mapview(
  income_clean,
  zcol = "estimate",
  layer.name = "Median Household Income"
)

From this interactive map allows users to explore the income variation across each counties and would be the perfect comparison for the interactive graduate map to compare graduate rates with the Income rates.

Choropleth Map Creation

income_map <- ggplot(income_clean) +
  geom_sf(aes(fill = estimate), color = "white", linewidth = 0.2) +
  scale_fill_viridis_c(
    labels = dollar_format()
  ) +
  labs(
    title = "Median Household Income by County",
    subtitle = "Florida (2019–2023 ACS)",
    fill = "Income",
    caption = "Data: ACS 5-Year Estimates"
  ) +
  theme_minimal()

income_map

This Choropleth map highlights spatial patterns in income. Higher income counties are easy to distinguish from the lower income ones for easy identification.

Analysis

The results from both analyses reveal clear geographic patterns in education and income. Counties with higher graduate degree attainment often align with higher income regions which gives substance to the idea that there is a suggested relationship between education and economic outcomes.

The margin of error (MOE) plot emphasizes the importance of considering uncertainty when interpreting ACS data. It is also important to note that the spatial visualizations provide a clear geographic context which help to identify clusters of higher and lower income.

From this practice and study it is shown that these methods demonstrate how combining statistical and spatial analysis can provide a more comprehensive understanding of socioeconomic patterns.