This report explores socioeconomic patterns using data from the American Community Survey (ACS). Two main analyses are conducted will be conducted, broken up into part A and part B. Within part A, a non-spatial analysis examines the percentage of the population with a graduate degree across counties in Florida. Following this in Part B, a spatial analysis investigates median household income across the same counties.
The goal of this report is to demonstrate how ACS data can be accessed, processed and visualized using R. Both interactive and static visualizations are used to better understand patterns and variability across geographic regions.
U.S. Census Bureau, American Community Survey 5-Year Estimates .Retrieved using the tidycensus package in R
library(tidyverse)
library(tidycensus)
library(ggplot2)
library(scales)
library(plotly)
library(ggiraph)
library(sf)
library(mapview)
# Set state
state_name <- "Florida"
The data for this analysis, for Part A and Part B comes from the U.S. Census Bureau’s American Community Survey 5 year estimates. The tidycensus package is used to retrieve the data directly into R using an API key, retrieved from the Census Bureau.
In this portion the graduate data is going to be retrieved, cleaned, and arranged accordingly.
grad_deg <- get_acs(
geography = "county",
state = state_name,
variable = "DP02_0066P",
survey = "acs5",
year = 2023
)
## Getting data from the 2019-2023 5-year ACS
## Using the ACS Data Profile
grad_deg_clean <- grad_deg %>%
select(GEOID, NAME, estimate, moe) %>%
mutate(
county = str_remove(NAME, paste0(", ", state_name))
) %>%
arrange(desc(estimate))
Two questions that are being directly asked from the data is, which counties from Florida have the largest percentages of Graduate degree holders and which have the lowest percentages of Graduate degree holders?
highest_counties <- grad_deg_clean %>%
slice_max(order_by = estimate, n = 5)
lowest_counties <- grad_deg_clean %>%
slice_min(order_by = estimate, n = 5)
highest_counties %>% select(county, estimate, moe)
## # A tibble: 5 × 3
## county estimate moe
## <chr> <dbl> <dbl>
## 1 Alachua County 24.7 0.9
## 2 Leon County 21 0.8
## 3 St. Johns County 18.4 0.8
## 4 Sarasota County 16.9 0.6
## 5 Collier County 16.4 0.7
lowest_counties %>% select(county, estimate, moe)
## # A tibble: 5 × 3
## county estimate moe
## <chr> <dbl> <dbl>
## 1 Hendry County 2.5 1
## 2 Glades County 2.8 1
## 3 Dixie County 2.9 1.1
## 4 Hamilton County 3.3 1.1
## 5 Lafayette County 3.3 1.7
The results show clear variation in graduate degree attainment across counties, with some counties having significantly higher percentages than others. With the highest percentages being within the counties: Alachua, Leon and St. Johns county. The lowest percentages being within the counties: Hendry, Glades and Dixie County.
grad_plot <- ggplot(
grad_deg_clean,
aes(x = estimate, y = reorder(county, estimate))
) +
geom_errorbarh(
aes(xmin = estimate - moe, xmax = estimate + moe),
height = 0.2,
color = "black"
) +
geom_point(color = "darkred", size = 2.5) +
scale_x_continuous(labels = label_number(suffix = "%")) +
labs(
title = "Adults with a Graduate Degree by County",
subtitle = "Florida (2019–2023 ACS)",
x = "Percentage (%)",
y = NULL,
) +
theme_minimal()
grad_plot
Fig 1.1 This plot shows both the estimated percentage and the margin of error for each county. Data: ACS 5 Year Estimates. Error bars represent margin of error.
This plot shows the estimated percentage and the uncertainty associated with each estimate. Counties with a large range of error bars, 10-20%, are difficult to interpret and subsequently should be done so cautiously as their true values may or may not be significantly different.
ggplotly(grad_plot)
## `height` was translated to `width`.
The interactive version allows users to explore county level values more easily by hovering over each point.
grad_plot_ggiraph <- ggplot(
grad_deg_clean,
aes(
x = estimate,
y = reorder(county, estimate),
tooltip = paste0(
"County: ", county,
"<br>Estimate: ", round(estimate, 1), "%",
"<br>MOE: ±", round(moe, 1)
),
data_id = GEOID
)
) +
geom_errorbarh(
aes(xmin = estimate - moe, xmax = estimate + moe),
height = 0.2,
color = "black"
) +
geom_point_interactive(color = "darkred", size = 2.5) +
scale_x_continuous(labels = label_number(suffix = "%")) +
labs(
title = "Adults with a Graduate Degree by County",
subtitle = paste("Counties in", state_name),
x = "Percentage (%)",
y = NULL
) +
theme_minimal()
girafe(ggobj = grad_plot_ggiraph)
## `height` was translated to `width`.
This ggiraph version does provide a more interactive visualization that allows for greater comparision per county
For this portion, Census data of the median household income within the counties of Florida was selected for spatial analysis. The reasoning for this selection is there should be a direct correlation between education levels and household income levels per county. Also this portion laid the practice groundwork with income data from the Census bureau for my own personal term project.
The first task will be to retrieve the median household income data (B19013_001) from the Census Bureau. Following this the data will be cleaned up within R.
acs_vars <- load_variables(2023, "acs5")
acs_vars %>%
filter(str_detect(label, "Median household income")) %>%
select(name, label, concept)
## # A tibble: 25 × 3
## name label concept
## <chr> <chr> <chr>
## 1 B19013A_001 Estimate!!Median household income in the past 12 months … Median…
## 2 B19013B_001 Estimate!!Median household income in the past 12 months … Median…
## 3 B19013C_001 Estimate!!Median household income in the past 12 months … Median…
## 4 B19013D_001 Estimate!!Median household income in the past 12 months … Median…
## 5 B19013E_001 Estimate!!Median household income in the past 12 months … Median…
## 6 B19013F_001 Estimate!!Median household income in the past 12 months … Median…
## 7 B19013G_001 Estimate!!Median household income in the past 12 months … Median…
## 8 B19013H_001 Estimate!!Median household income in the past 12 months … Median…
## 9 B19013I_001 Estimate!!Median household income in the past 12 months … Median…
## 10 B19013_001 Estimate!!Median household income in the past 12 months … Median…
## # ℹ 15 more rows
income_data <- get_acs(
geography = "county",
state = state_name,
variable = "B19013_001",
survey = "acs5",
year = 2023,
geometry = TRUE
)
income_clean <- income_data %>%
mutate(
county = str_remove(NAME, paste0(", ", state_name))
)
mapview(
income_clean,
zcol = "estimate",
layer.name = "Median Household Income"
)
From this interactive map allows users to explore the income variation across each counties and would be the perfect comparison for the interactive graduate map to compare graduate rates with the Income rates.
income_map <- ggplot(income_clean) +
geom_sf(aes(fill = estimate), color = "white", linewidth = 0.2) +
scale_fill_viridis_c(
labels = dollar_format()
) +
labs(
title = "Median Household Income by County",
subtitle = "Florida (2019–2023 ACS)",
fill = "Income",
caption = "Data: ACS 5-Year Estimates"
) +
theme_minimal()
income_map
This Choropleth map highlights spatial patterns in income. Higher income counties are easy to distinguish from the lower income ones for easy identification.
The results from both analyses reveal clear geographic patterns in education and income. Counties with higher graduate degree attainment often align with higher income regions which gives substance to the idea that there is a suggested relationship between education and economic outcomes.
The margin of error (MOE) plot emphasizes the importance of considering uncertainty when interpreting ACS data. It is also important to note that the spatial visualizations provide a clear geographic context which help to identify clusters of higher and lower income.
From this practice and study it is shown that these methods demonstrate how combining statistical and spatial analysis can provide a more comprehensive understanding of socioeconomic patterns.