The following R libraries are used to address these questions:
# Load necessary libraries
library(tidyverse)
library(tidycensus)
library(classInt)
library(sf)
library(scales)
library(plotly)
library(ggiraph)
library(mapview)
The U.S. Census Bureau’s American Community Survey (ACS) provides data regarding socioeconomic and demographic information in the United States. Users interested in viewing the data can go to https://data.census.gov/ or use alternative methods such as tidycensus. Tidycensus is an R package that allows users to interface with a select number of the US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included (Walker).
In Part A, the percentage of each county’s population who has a graduate degree for 2021 is retrieved and visualized to answer the following questions:
The ACS data is plotted to visualize and compare each county’s population of graduate degrees. The plot also includes the associated error for the estimates.
# Set Census API Key (only needed once,=)
# Get your own API Key at https://api.census.gov/data/key_signup.html
census_api_key("de6af7a084eb868e8de1ba6656afe26fa041fd3a", overwrite = TRUE)
#Get ACS 5-year estimates for graduate degree percentages in WA counties
wa_grad_degrees <- get_acs(
geography = "county",
state = "WA",
variables = "DP02_0066P",
survey = "acs5",
year = 2021
)
#Find top 5 counties with the highest percentage of graduate degree holders
top_counties <- wa_grad_degrees %>%
arrange(desc(estimate)) %>%
head(5)
#Top 5 Counties with Lowest Graduate Degree Holders:"
print(top_counties)
## # A tibble: 5 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 53075 Whitman County, Washington DP02_0066P 22.6 1.9
## 2 53033 King County, Washington DP02_0066P 22.1 0.3
## 3 53055 San Juan County, Washington DP02_0066P 22.1 1.1
## 4 53031 Jefferson County, Washington DP02_0066P 18.8 1.4
## 5 53029 Island County, Washington DP02_0066P 14.1 1.1
#Find bottom 5 counties with the lowest percentage of graduate degree holders
bottom_counties <- wa_grad_degrees %>%
arrange(estimate) %>%
head(5)
#Bottom 5 Counties with Lowest Graduate Degree Holders:"
print(bottom_counties)
## # A tibble: 5 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 53025 Grant County, Washington DP02_0066P 5.1 0.7
## 2 53015 Cowlitz County, Washington DP02_0066P 5.7 0.6
## 3 53001 Adams County, Washington DP02_0066P 5.8 1.4
## 4 53027 Grays Harbor County, Washington DP02_0066P 6.1 0.8
## 5 53045 Mason County, Washington DP02_0066P 6.1 0.9
#Tidy and transform Washington Graduate Degrees by splitting NAME column into distinct county and state columns and removing " County" suffix
wa_grad_degrees <- wa_grad_degrees %>%
separate_wider_delim(cols=NAME, delim=",", names=c("county", "state")) %>%
mutate(county = str_remove(county, " County"),
# get decimal/percentage values for estimates (makes plotting better with scales library)
estimate = estimate/100,
moe = moe/100) %>%
arrange(-estimate)
#Create ggplot object for estimates with error margins
wa_plot <- ggplot(wa_grad_degrees, aes(x=estimate, y=reorder(county,estimate))) +
geom_errorbar(aes(xmin=estimate - moe, xmax=estimate+moe),
width=0.5, linewidth=0.5) +
geom_point(color="green", size=2) +
#Format x-axis labels as percentage
scale_x_continuous(labels=label_percent()) +
labs(title = "Percentage of Graduate Degree Holders in Washington State",
subtitle = "Including margin of error from ACS 5-Year Estimates (2021)",
caption = "Data Source: U.S. Census Bureau, American Community Survey 5-Year Estimates (2021)",
x = "ACS Estimate (Percentage)",
y = "County") +
theme_minimal(base_size=12)
#Convert ggplot object to to ggplotly to make it interactive
ggplotly(wa_plot, tooltip = "x", height = 600)
Whitman, San Juan, King, Jefferson, and Island counties have the highest percentage of graduate degree holders in Washington State. This is largely influenced by the presence of major universities, such as Washington State University in Whitman County and the University of Washington in King County, which attract faculty, researchers, and graduate students. Additionally, San Juan and Jefferson counties are known for attracting successful retirees, many of whom hold advanced degrees.
Grant, Cowlitz, Adams, Grays Harbor, and Mason counties posses the smallest percentage of graduate degree holders in Washington State.Several factors may contribute to this, most notably being that all counties listed here have economies centered on agriculture which traditionally do not require graduate-level education.
In Part B, each tract’s within the County of Los Angeles, California in 2021 is retrieved and visualized to answer the following question:
To retrieve spatial ACS Data, we will use get_acs() within the Tidycensus package with geometry = TRUE to include spatial boundaries for counties.
The ACS data is plotted to visualize and compare each tract’s count of households with no access to personal vehicles in Los Angeles County.
#Get ACS 5-year estimates at the tract level for Los Angeles County, CA
ca_acs_data <- get_acs(
geography = "tract",
state = "CA",
county = "Los Angeles",
variables = "B08201_002", #Households with No Vehicles Available
survey = "acs5",
year = 2021,
geometry = TRUE
) %>%
rename(Tract = NAME, No_Vehicle_Households = estimate) %>%
select(Tract, No_Vehicle_Households, geometry)
#Remove missing values and Census Tracts for Islands in Los Angeles County
ca_acs_data <- ca_acs_data %>% filter(!is.na(No_Vehicle_Households) & No_Vehicle_Households >1 & Tract != "Census Tract 5991, Los Angeles County, California" & Tract != "Census Tract 5990, Los Angeles County, California")
#Define a yellow-to-red color scale
color_palette <- colorRampPalette(c("lightyellow","orange", "red","darkred"))
#Display an interactive map
mapview(ca_acs_data,
zcol = "No_Vehicle_Households",
legend = TRUE,
col.regions = color_palette(100))
A choropleth map is created to visualize this data in a static view.
#Categorize data into four bins and assign labels
ca_acs_data <- ca_acs_data %>%
mutate(Category = cut(No_Vehicle_Households,
breaks = c(0, 100, 250, 500, Inf),
labels = c("Low (0-100)","Low-Medium (101-200)",
"Medium-High (201-500)", "High (501+)"),
include.lowest = TRUE))
#Create a choropleth map using ggplot2 with four categories
ggplot(data = ca_acs_data) +
geom_sf(aes(fill = Category), size = 0) +
scale_fill_manual(values = c("Low (0-100)" = "lightyellow",
"Low-Medium (101-200)" = "orange",
"Medium-High (201-500)" = "red",
"High (501+)" = "darkred"),
name = "Count of Households with No Vehicle") +
labs(title = "Households with No Vehicles in Los Angeles County",
caption = "Source: ACS 2021 5-Year Estimates") +
theme_classic() +
theme(
plot.background = element_rect(fill = "white", color = "black", size = 2),
panel.background = element_blank(),
legend.position = "right",
axis.text.x = element_text(angle = 45, hjust = 1)
)
The results show that tracts in downtown Los Angeles have higher counts of households with no vehicle. This makes sense as residents living in areas are in walkable distances to places like employment offices and grocery stores. This walkable setting allows residents to meet daily needs without relying on a car. Additionally, limited parking availability and frequent traffic congestion in the downtown area can deter vehicle ownership. Residents may find it more convenient and cost-effective to utilize public transportation or other mobility options.
Walker, K. (n.d.). tidycensus: Load US Census boundary and attribute data as ‘tidyverse’ and ‘sf’-ready data frames. Retrieved February 17, 2025, from https://walker-data.com/tidycensus/