Income and Education in Washington State

Matthew Sommer

2024-04-01


Introduction

Achieving a high level of education is often considered a way to boost income. Many people go to college or graduate school in order to qualify for higher paying jobs. The US Census Bureau collects and publishes data related to these and many other demographic variables. Using the tidycensus package in R, this analysis will compare those variables by county in Washington State.

Data

Data for this analysis is taken from the most recent five year American Community Survey (ACS). This data is accessed through the tidycensus package in R. This package allows easy access to Census Bureau APIs and returns data in a tidy format. The most recent ACS data is from a five year period between 2017 and 2021. From this data, the percentage of residents who have a graduate degree by county in Washington State is retrieved.

Other data that is used is median income for each county in Washington. This data is also pulled from the Census Bureau API through tidycensus.

# import packages - tidy census for interacting with the api
#    and tidyverse for cleaning doing more data tidying
library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(sf)
library(mapview)

After the data is retrieved through the API call, the highest and lowest percentages are determined and the results are displayed in Table 1.

# select the percentage of residents with a graduate degree by 
#    county in washington state. geometry is included for later analysis
#    results are not included for api calls to limit clutter in markdown 
#    document

wa_grad <- get_acs(geography = "county",
        variables = "DP02_0066P",
        year = 2021,
        state = "WA",
        geometry = TRUE)
# re-arrange in descending percent order
wa_grad_desc <- wa_grad %>% 
  filter(estimate == max(estimate)
         | estimate == min(estimate))

  knitr::kable(wa_grad_desc,
               caption = "Table 1: Grant County has the lowest percentage of residents
               with a graduate degree at 5.1%. Whitman County, home of Washington
               State University, has the highest proportion of residents with a 
               graduate degree at 22.6%")
Table 1: Grant County has the lowest percentage of residents with a graduate degree at 5.1%. Whitman County, home of Washington State University, has the highest proportion of residents with a graduate degree at 22.6%
GEOID NAME variable estimate moe geometry
53025 Grant County, Washington DP02_0066P 5.1 0.7 MULTIPOLYGON (((-120.0423 4…
53075 Whitman County, Washington DP02_0066P 22.6 1.9 MULTIPOLYGON (((-118.2491 4…

Analysis

Educational Attainment

ACS data is provided as estimates with a margin of error. In order to get a full picture of the data, it is important to visualize the uncertainty around the estimates. For the percentage of residents that have a graduate degree, Figure 1 shows the ACS estimate within a margin of error bar. This plot shows that some estimates have a much larger range of possible values than others. Because the value given by the ACS is only an estimate, comparisons like this may not be that useful. The larger the margin of error, the less certain the estimate is. Since different counties have different levels of uncertainty, it is very hard to determine whether the order shown in Figure 1 is correct.

# plot the grad degree data with margin of error bars
wa_grad_plot <- ggplot(data = wa_grad) +
  geom_point(aes(x = estimate,
                 y = reorder(NAME, estimate)),
             color = 'darkred') +
  geom_errorbar(aes(x = estimate,
                    y = reorder(NAME, estimate),
                    xmin = estimate - moe,
                    xmax = estimate + moe)) +
  labs(title = "Residents with Graduate Degree, Washington, ACS 2017-2021",
       subtitle = "Error bars represent margin of error around estimated percent value.",
       x = "Percent",
       y = "County") +
  scale_y_discrete(labels = function(x) str_remove(x, " County, Washington")) +
  theme_minimal()

wa_grad_plot
Figure 1: Percentage of residents with graduate degrees fall within a margin of error. Because the margin of error is different for different counties, the order could actually be different than depicted here.
Figure 1: Percentage of residents with graduate degrees fall within a margin of error. Because the margin of error is different for different counties, the order could actually be different than depicted here.

Figure 2 shows an interactive version of Figure 1. This plot can be used to zoom into a section of the plot to compare values that are close to each other. Using this tool and comparing the top three counties, we can zoom in to see that Whitman and San Juan counties both have higher uncertainty than King County. Given that, it is possible that King or San Juan actually has the highest proportion of residents with a graduate degree.

# make plot interactive
ggplotly(wa_grad_plot, tooltip = "x")

Figure 2: An interactive version of the plot in Figure 1.

Median Income

The tidycensus package can also be used to get median income data by county. Figure 3 shows an interactive map of that data.The size of each circle represents the median income for the county. When each hovering over each marker, the name of the county will show up as a tool tip. If you click on the marker, a pop up with some information about each county, including the income estimate and the margin of error, will appear.

# income data to compare to education data
wa_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  state = "WA",
  year = 2021,
  geometry = TRUE)
inc_centroids <- st_centroid(wa_income)

mapview(inc_centroids, 
        cex = "estimate",
        label = "NAME",
        legend = FALSE)

Now we can overlay the median income data over the educational attainment data. Figure 4 shows a static map of with the same representation of income data from Figure 3 over a choropleth map representing the data in Figures 1 and 2. One surprising observation from this map is that Whitman County has one of the highest proportions of residents with a graduate degree, but has a much lower median income than other counties with similar levels of educational attainment.

# two ggplot data layers, one for a choropleth map of graduate degrees and
#    one for a graduated symbols map for median income.
ggplot() +
  geom_sf(data = wa_grad,
          color = "black",
          aes(fill = estimate)) +
  geom_sf(data = inc_centroids,
          aes(size = estimate),
              color = "darkred") +
  labs(title = "Educational Attainment Compared to Income",
       subtitle = "By County in Washington State - ACS 2021",
       size = "Income",
       fill = "% Grad Degree") +
  theme_classic()
Figure 4: Most counties seem to have a correlation, but Whitman County, where Washington State University is, there is a high proportion of residents with graduate degrees but a relatively low median income.
Figure 4: Most counties seem to have a correlation, but Whitman County, where Washington State University is, there is a high proportion of residents with graduate degrees but a relatively low median income.

Pullman, Washington is in Whitman County, and Pullman is the city that is home to Washington State University, one of the biggest universities in the state. That might explain the high proportion of graduate degree holders. On the other hand, college students tend not to make much money while in school, which might explain the relatively low median income. Or perhaps college professors are severely underpaid compared to workers with graduate degrees working in private industry. Further analysis could include examining income by profession or doing some statistical analysis of this data to determine whether there is truly a correlation between educational attainment and income.

Conclusion

Educational attainment and income may be positively correlated, but further analysis would be required to say for sure. Mapping these demographic data together can give an initial look at the data to determine what sort of analysis to continue with. The American Community Survey is an excellent resource for this type of demographic analysis and the tidycensus package makes it easy to use for R users.