Part A: Graduate Degree Holders in Washington State

The following R libraries are used to address these questions:

# Load necessary libraries
library(tidyverse)     
library(tidycensus)    
library(classInt)      
library(sf)            
library(scales)        
library(plotly)        
library(ggiraph)       
library(mapview)       

Data Acquisition

The U.S. Census Bureau’s American Community Survey (ACS) provides data regarding socioeconomic and demographic information in the United States. Users interested in viewing the data can go to https://data.census.gov/ or use alternative methods such as tidycensus. Tidycensus is an R package that allows users to interface with a select number of the US Census Bureau’s data APIs and return tidyverse-ready data frames, optionally with simple feature geometry included (Walker).

In Part A, the percentage of each county’s population who has a graduate degree for 2021 is retrieved and visualized to answer the following questions:

1. Which county populations in Washington State have the highest percentage of graduate degree holders?

2. Which county populations in Washington State have the smallest percentage of graduate degree holders?

Data Processing/Visualization

The ACS data is plotted to visualize and compare each county’s population of graduate degrees. The plot also includes the associated error for the estimates.

# Set Census API Key (only needed once,=)
# Get your own API Key at https://api.census.gov/data/key_signup.html

census_api_key("de6af7a084eb868e8de1ba6656afe26fa041fd3a", overwrite = TRUE)

#Get ACS 5-year estimates for graduate degree percentages in WA counties
wa_grad_degrees <- get_acs(
  geography = "county",
  state = "WA",
  variables = "DP02_0066P",
  survey = "acs5",
  year = 2021
)

#Find top 5 counties with the highest percentage of graduate degree holders
top_counties <- wa_grad_degrees %>%
  arrange(desc(estimate)) %>%
  head(5)

#Top 5 Counties with Lowest Graduate Degree Holders:"
print(top_counties)
## # A tibble: 5 × 5
##   GEOID NAME                         variable   estimate   moe
##   <chr> <chr>                        <chr>         <dbl> <dbl>
## 1 53075 Whitman County, Washington   DP02_0066P     22.6   1.9
## 2 53033 King County, Washington      DP02_0066P     22.1   0.3
## 3 53055 San Juan County, Washington  DP02_0066P     22.1   1.1
## 4 53031 Jefferson County, Washington DP02_0066P     18.8   1.4
## 5 53029 Island County, Washington    DP02_0066P     14.1   1.1
#Find bottom 5 counties with the lowest percentage of graduate degree holders
bottom_counties <- wa_grad_degrees %>%
  arrange(estimate) %>%
  head(5)

#Bottom 5 Counties with Lowest Graduate Degree Holders:"
print(bottom_counties)
## # A tibble: 5 × 5
##   GEOID NAME                            variable   estimate   moe
##   <chr> <chr>                           <chr>         <dbl> <dbl>
## 1 53025 Grant County, Washington        DP02_0066P      5.1   0.7
## 2 53015 Cowlitz County, Washington      DP02_0066P      5.7   0.6
## 3 53001 Adams County, Washington        DP02_0066P      5.8   1.4
## 4 53027 Grays Harbor County, Washington DP02_0066P      6.1   0.8
## 5 53045 Mason County, Washington        DP02_0066P      6.1   0.9
#Tidy and transform Washington Graduate Degrees by splitting NAME column into distinct county and state columns and removing " County" suffix
wa_grad_degrees <- wa_grad_degrees %>%
  separate_wider_delim(cols=NAME, delim=",", names=c("county", "state")) %>%
  mutate(county = str_remove(county, " County"),
         # get decimal/percentage values for estimates (makes plotting better with scales library)
         estimate = estimate/100,
         moe = moe/100) %>%
  arrange(-estimate)

#Create ggplot object for estimates with error margins
wa_plot <- ggplot(wa_grad_degrees, aes(x=estimate, y=reorder(county,estimate))) +
  geom_errorbar(aes(xmin=estimate - moe, xmax=estimate+moe),
                width=0.5, linewidth=0.5) +
  geom_point(color="green", size=2) +
  #Format x-axis labels as percentage
  scale_x_continuous(labels=label_percent()) +
  labs(title = "Percentage of Graduate Degree Holders in Washington State",
       subtitle = "Including margin of error from ACS 5-Year Estimates (2021)",
       caption = "Data Source: U.S. Census Bureau, American Community Survey 5-Year Estimates (2021)",
       x = "ACS Estimate (Percentage)",
       y = "County") +
  theme_minimal(base_size=12)

#Convert ggplot object to to ggplotly to make it interactive
ggplotly(wa_plot, tooltip = "x", height = 600)

Results

  1. Which county populations in Washington State have the highest percentage of graduate degree holders?

Whitman, San Juan, King, Jefferson, and Island counties have the highest percentage of graduate degree holders in Washington State. This is largely influenced by the presence of major universities, such as Washington State University in Whitman County and the University of Washington in King County, which attract faculty, researchers, and graduate students. Additionally, San Juan and Jefferson counties are known for attracting successful retirees, many of whom hold advanced degrees.

  1. Which county populations in Washington State have the smallest percentage of graduate degree holders?

Grant, Cowlitz, Adams, Grays Harbor, and Mason counties posses the smallest percentage of graduate degree holders in Washington State.Several factors may contribute to this, most notably being that all counties listed here have economies centered on agriculture which traditionally do not require graduate-level education.

Part B: Households with no access to automobiles in Los Angeles County, CA

In Part B, each tract’s within the County of Los Angeles, California in 2021 is retrieved and visualized to answer the following question:

Which tracts have a population with low, medium, and high counts of households with no access to vehicles.

Data Acquisition

To retrieve spatial ACS Data, we will use get_acs() within the Tidycensus package with geometry = TRUE to include spatial boundaries for counties.

Data Processing/Visualization

The ACS data is plotted to visualize and compare each tract’s count of households with no access to personal vehicles in Los Angeles County.

#Get ACS 5-year estimates at the tract level for Los Angeles County, CA
ca_acs_data <- get_acs(
  geography = "tract",
  state = "CA",
  county = "Los Angeles",
  variables = "B08201_002",  #Households with No Vehicles Available
  survey = "acs5",
  year = 2021,
  geometry = TRUE
) %>%
  rename(Tract = NAME, No_Vehicle_Households = estimate) %>%
  select(Tract, No_Vehicle_Households, geometry)

#Remove missing values and Census Tracts for Islands in Los Angeles County
ca_acs_data <- ca_acs_data %>% filter(!is.na(No_Vehicle_Households) & No_Vehicle_Households >1 & Tract != "Census Tract 5991, Los Angeles County, California"  & Tract != "Census Tract 5990, Los Angeles County, California")

#Define a yellow-to-red color scale
color_palette <- colorRampPalette(c("lightyellow","orange", "red","darkred"))

#Display an interactive map
mapview(ca_acs_data, 
        zcol = "No_Vehicle_Households", 
        legend = TRUE, 
        col.regions = color_palette(100))

A choropleth map is created to visualize this data in a static view.

#Categorize data into four bins and assign labels
ca_acs_data <- ca_acs_data %>%
  mutate(Category = cut(No_Vehicle_Households,
                        breaks = c(0, 100, 250, 500, Inf),
                        labels = c("Low (0-100)","Low-Medium (101-200)", 
                                   "Medium-High (201-500)", "High (501+)"),
                        include.lowest = TRUE))

#Create a choropleth map using ggplot2 with four categories
ggplot(data = ca_acs_data) +
  geom_sf(aes(fill = Category), size = 0) +
  scale_fill_manual(values = c("Low (0-100)" = "lightyellow",
                               "Low-Medium (101-200)" = "orange",
                               "Medium-High (201-500)" = "red", 
                               "High (501+)" = "darkred"),
                    name = "Count of Households with No Vehicle") +
  labs(title = "Households with No Vehicles in Los Angeles County",
       caption = "Source: ACS 2021 5-Year Estimates") +
  theme_classic() +
  theme(
    plot.background = element_rect(fill = "white", color = "black", size = 2),
    panel.background = element_blank(),
    legend.position = "right",
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Results

The results show that tracts in downtown Los Angeles have higher counts of households with no vehicle. This makes sense as residents living in areas are in walkable distances to places like employment offices and grocery stores. This walkable setting allows residents to meet daily needs without relying on a car. Additionally, limited parking availability and frequent traffic congestion in the downtown area can deter vehicle ownership. Residents may find it more convenient and cost-effective to utilize public transportation or other mobility options.

Works Cited

Walker, K. (n.d.). tidycensus: Load US Census boundary and attribute data as ‘tidyverse’ and ‘sf’-ready data frames. Retrieved February 17, 2025, from https://walker-data.com/tidycensus/