In this lab, I explore Louisiana’s socio-economic landscape by looking at two important factors: education levels and income. By analyzing data at the parish level, I aim to understand how these factors vary across the state.
Using data from the American Community Survey (ACS) and visualization tools in R, we will explore county-level trends related to graduate degrees and median household income.
We use the tidycensus package to retrieve ACS data for
Louisiana.
# # Pull education data from ACS — focusing on graduate degrees
grad_data <- get_acs(
geography = "county",
variables = c(
masters = "B15003_022",
professional = "B15003_023",
doctorate = "B15003_024"
),
state = "LA",
survey = "acs5",
year = 2021,
quiet = TRUE
)
# Aggregate total graduate degrees and compute MOE correctly
grad_data <- grad_data %>%
group_by(GEOID, NAME) %>%
summarise(
grad_degree_count = sum(estimate, na.rm = TRUE),
moe = sqrt(sum(moe^2, na.rm = TRUE)) # Correct MOE calculation
)
# Get total adult population (for percentage calculation)
total_pop <- get_acs(
geography = "county",
variables = "B15003_001",
state = "LA",
survey = "acs5",
year = 2021,
quiet = TRUE
)
# Merge and compute graduate degree percentage
grad_data <- grad_data %>%
left_join(total_pop %>% select(GEOID, total_pop = estimate), by = "GEOID") %>%
mutate(
grad_degree_pct = (grad_degree_count / total_pop) * 100,
moe_pct = (moe / total_pop) * 100 # Convert MOE to percentage
)
# Inspect first few rows
head(grad_data)
## # A tibble: 6 × 7
## # Groups: GEOID [6]
## GEOID NAME grad_degree_count moe total_pop grad_degree_pct moe_pct
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 22001 Acadia Parish… 4840 539. 38314 12.6 1.41
## 2 22003 Allen Parish,… 1969 338. 16245 12.1 2.08
## 3 22005 Ascension Par… 22791 1543. 81640 27.9 1.89
## 4 22007 Assumption Pa… 1567 329. 14934 10.5 2.20
## 5 22009 Avoyelles Par… 3200 530. 27356 11.7 1.94
## 6 22011 Beauregard Pa… 4370 527. 24337 18.0 2.17
highest <- grad_data %>%
arrange(desc(grad_degree_pct)) %>%
drop_na(grad_degree_pct) %>%
head(5)
lowest <- grad_data %>%
arrange(grad_degree_pct) %>%
drop_na(grad_degree_pct) %>%
head(5)
filtered_grad_data <- bind_rows(highest, lowest)
# Display results
filtered_grad_data
## # A tibble: 10 × 7
## # Groups: GEOID [10]
## GEOID NAME grad_degree_count moe total_pop grad_degree_pct moe_pct
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 22071 Orleans Pari… 101316 2324. 274825 36.9 0.846
## 2 22061 Lincoln Pari… 9562 780. 26493 36.1 2.95
## 3 22033 East Baton R… 102287 2878. 289379 35.3 0.995
## 4 22103 St. Tammany … 63254 2104. 180157 35.1 1.17
## 5 22055 Lafayette Pa… 53111 2005. 162259 32.7 1.24
## 6 22035 East Carroll… 501 159. 4960 10.1 3.20
## 7 22007 Assumption P… 1567 329. 14934 10.5 2.20
## 8 22117 Washington P… 3307 470. 31218 10.6 1.51
## 9 22091 St. Helena P… 819 251. 7523 10.9 3.34
## 10 22123 West Carroll… 777 154. 6913 11.2 2.22
The following plot visualizes the percentage of graduate degree holders across Louisiana parishes, incorporating margins of error.
grad_plot <- ggplot(filtered_grad_data, aes(x = grad_degree_pct, y = reorder(NAME, grad_degree_pct))) +
geom_errorbar(aes(xmin = grad_degree_pct - moe_pct, xmax = grad_degree_pct + moe_pct), width = 0.4, color = "black") +
geom_point(color = "darkred", size = 2) +
scale_x_continuous(labels = scales::percent_format(scale = 1)) + # Format as percentage
labs(
title = "Graduate Degree Holders in Louisiana (ACS 2021)",
subtitle = "Top 5 and Bottom 5 Parishes by Graduate Degree Percentage",
x = "Percentage of Residents (%)", # Update label for clarity
y = "Parish",
caption = "Source: U.S. Census Bureau (ACS 5-Year 2021)"
) +
theme_minimal()
# Display plot
print(grad_plot)
Figure 1: This chart highlights the parishes in Louisiana with the highest and lowest rates of graduate degree holders. The error bars show ACS margins of error, which are larger in rural areas due to smaller sample sizes.
grad_plot_interactive <- ggplot(filtered_grad_data, aes(y = reorder(NAME, grad_degree_pct), x = grad_degree_pct)) +
geom_col(fill = "darkred") +
labs(
title = "Graduate Degree Holders in Louisiana (ACS 2021)",
subtitle = "Top 5 and Bottom 5 Parishes by Graduate Degree Percentage",
x = "ACS Estimate (%)",
y = "Parish"
) +
theme_minimal()
# Convert to interactive
ggplotly(grad_plot_interactive)
Figure 2: Interactive bar chart displaying graduate degree percentages for selected Louisiana parishes.
We now retrieve and analyze median household income data.
income_data <- get_acs(
geography = "county",
variables = "B19013_001",
state = "LA",
year = 2021,
quiet = TRUE
) %>%
rename(median_income = estimate) # Rename for clarity
# Inspect first few rows
head(income_data)
## # A tibble: 6 × 5
## GEOID NAME variable median_income moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 22001 Acadia Parish, Louisiana B19013_001 42368 3789
## 2 22003 Allen Parish, Louisiana B19013_001 47660 4627
## 3 22005 Ascension Parish, Louisiana B19013_001 86256 3532
## 4 22007 Assumption Parish, Louisiana B19013_001 42831 3982
## 5 22009 Avoyelles Parish, Louisiana B19013_001 37903 4966
## 6 22011 Beauregard Parish, Louisiana B19013_001 57130 4205
To ensure reliable access to geographic boundaries, we use the tigris package to manually load parish boundaries, as Census Bureau servers may occasionally experience downtime.
# Load Louisiana county boundaries
la_counties <- counties(state = "LA", cb = TRUE)
# GEOID needs to be a character for a successful merge
income_data <- income_data %>% mutate(GEOID = as.character(GEOID))
la_counties <- la_counties %>% mutate(GEOID = as.character(GEOID))
# Merge income data with spatial data
income_data_sf <- la_counties %>% left_join(income_data, by = "GEOID")
# Ensure NAME column exists and handle missing values
if (!"NAME" %in% colnames(income_data_sf)) {
income_data_sf <- income_data_sf %>% mutate(NAME = "Unknown Parish")
}
# Handle missing income values properly (avoid setting to 0)
income_data_sf <- income_data_sf %>%
mutate(median_income = ifelse(is.na(median_income), NA, median_income))
# Create static map
income_map <- ggplot(income_data_sf) +
geom_sf(aes(fill = median_income), color = "white", size = 0.2) +
scale_fill_viridis_c(name = "Median Income ($)", option = "turbo") +
labs(
title = "Median Household Income by Parish in Louisiana (ACS 2021)",
subtitle = "Data from the U.S. Census Bureau (ACS 5-Year 2021)",
caption = "Source: tidycensus"
) +
theme_minimal()
# Display static plot
print(income_map)
Figure 3: Median household income by parish in Louisiana based on ACS 2021 data.
income_data_sf <- income_data_sf %>%
mutate(median_income = ifelse(is.na(median_income), 0, median_income)) # Replace NA with 0
mapview(income_data_sf, zcol = "median_income", legend = TRUE)
Figure 4: Interactive map of Louisiana parishes displaying median household income levels.
We now join the two datasets to explore potential relationships.
merged_data <- grad_data %>%
left_join(income_data %>% select(GEOID, median_income), by = "GEOID") %>%
drop_na(median_income, grad_degree_pct) # Ensure no missing data
# Inspect first few rows
head(merged_data)
## # A tibble: 6 × 8
## # Groups: GEOID [6]
## GEOID NAME grad_degree_count moe total_pop grad_degree_pct moe_pct
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 22001 Acadia Parish… 4840 539. 38314 12.6 1.41
## 2 22003 Allen Parish,… 1969 338. 16245 12.1 2.08
## 3 22005 Ascension Par… 22791 1543. 81640 27.9 1.89
## 4 22007 Assumption Pa… 1567 329. 14934 10.5 2.20
## 5 22009 Avoyelles Par… 3200 530. 27356 11.7 1.94
## 6 22011 Beauregard Pa… 4370 527. 24337 18.0 2.17
## # ℹ 1 more variable: median_income <dbl>
# Ensure NAME column exists and handle missing values
if (!"NAME" %in% colnames(income_data_sf)) {
income_data_sf <- income_data_sf %>% mutate(NAME = "Unknown Parish")
}
income_data_sf <- income_data_sf %>%
mutate(
median_income = ifelse(is.na(median_income), 0, median_income), # Replace NA with 0
NAME = ifelse(is.na(NAME), "Unknown Parish", NAME) # Ensure NAME has no missing values
)
leaflet(data = income_data_sf) %>%
addTiles() %>%
addPolygons(
fillColor = ~colorNumeric("YlOrRd", median_income, na.color = "gray")(median_income),
color = "black",
weight = 1,
opacity = 1,
fillOpacity = 0.7,
label = ~paste0(NAME, ": $", format(median_income, big.mark = ","))
) %>%
addLegend(
pal = colorNumeric("YlOrRd", income_data_sf$median_income, na.color = "gray"),
values = income_data_sf$median_income,
title = "Median Income ($)",
position = "bottomright"
)
Figure 5: Interactive choropleth map showing the distribution of median household income across Louisiana parishes.
This analysis confirmed what I expected: education and income are closely linked, especially in urban areas like Baton Rouge and New Orleans. It was interesting to see how rural areas lag behind, highlighting the educational and economic divide within the state. If I had more time, I’d explore whether cost of living adjustments change these findings, as some lower-income areas might still offer decent purchasing power.
tidycensus, tidyverse,
ggplot2, plotly, leaflet,
mapview, tigris, sf.