Using the tidycensus package we are able to pull American Community Survey (ACS) data on the county level for the state of Virginia. All data pulled originated from the 2021 release of the 5 year ACS which represents data from 2017-2021. The state of Virginia was chosen because it is where the author currently resides there and was aware of the high levels of educational attainment and income levels particularly in the Northern Virginia area.
#loading all the libraries
library(sf)
library(ggplot2)
library(dplyr)
library(tidyr)
library(mapview)
library(mapedit)
library(mapboxapi)
library(leafsync)
library(spdep)
library(segregation)
library(ggiraph)
library(tidycensus)
library(tidyverse)
library(ggiraph)
library(plotly)
library(here)
library(skimr)
library(janitor)
library(scales)
Using the ACS variable “DP02_0066P” we will be examining 5 year ACS estimates of the percentage of graduate degree holders at the county level for the state of Virginia.
#We use the get_acs function to pull ACS data for a given variable at the county level for the state of VA
virginia_grad_degree <- get_acs(
geography = "county",
variables = "DP02_0066P",
state = "VA",
year = 2021,
geometry = TRUE,
progress_bar = FALSE
)
#Generating a table from the data showing the 5 counties with the lowest levels of graduate degrees
least_grad <- virginia_grad_degree %>%
arrange(estimate)
table_least_grad <- least_grad[1:5, 1:5]
knitr::kable(table_least_grad,
caption = "Counties with Lowest % of Graduate Degrees")
| GEOID | NAME | variable | estimate | moe | geometry |
|---|---|---|---|---|---|
| 51051 | Dickenson County, Virginia | DP02_0066P | 2.0 | 0.8 | MULTIPOLYGON (((-82.55384 3… |
| 51105 | Lee County, Virginia | DP02_0066P | 2.4 | 0.9 | MULTIPOLYGON (((-83.67461 3… |
| 51670 | Hopewell city, Virginia | DP02_0066P | 3.2 | 1.4 | MULTIPOLYGON (((-77.33764 3… |
| 51167 | Russell County, Virginia | DP02_0066P | 3.5 | 1.0 | MULTIPOLYGON (((-82.40224 3… |
| 51081 | Greensville County, Virginia | DP02_0066P | 3.9 | 1.4 | MULTIPOLYGON (((-77.76712 3… |
#Generating a table from the data showing the 5 counties with the highest levels of graduate degrees
most_grad <- virginia_grad_degree %>%
arrange(-estimate)
table_most_grad <- most_grad[1:5, 1:5]
knitr::kable(table_most_grad,
caption = "Counties with Highest % of Graduate Degrees")
| GEOID | NAME | variable | estimate | moe | geometry |
|---|---|---|---|---|---|
| 51610 | Falls Church city, Virginia | DP02_0066P | 49.1 | 3.7 | MULTIPOLYGON (((-77.19471 3… |
| 51013 | Arlington County, Virginia | DP02_0066P | 41.0 | 1.0 | MULTIPOLYGON (((-77.17228 3… |
| 51510 | Alexandria city, Virginia | DP02_0066P | 34.1 | 1.2 | MULTIPOLYGON (((-77.1438 38… |
| 51059 | Fairfax County, Virginia | DP02_0066P | 32.0 | 0.4 | MULTIPOLYGON (((-77.31648 3… |
| 51600 | Fairfax city, Virginia | DP02_0066P | 31.0 | 2.5 | MULTIPOLYGON (((-77.3348 38… |
Given the data represents estimates, there is a degree of uncertainty involved. The below charts represent the margin of error when graphing graduate degree levels including all Virginia counties. As can be seen in the plots, even when accounting for possible error, Falls Church by far has the highest rate of graduate degree holders with an absurd estimated 49.1% of people. While partially explainable due to the county also being a small city, it is a well known area that attracts a lot of college graduates working in the DC metro area.
#creating an errorbar plot
va_plot_grad_errorbar <- ggplot(virginia_grad_degree,
aes(x = estimate,
y = reorder(NAME, estimate))) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe),
width = 0.5, linewidth = 0.5) +
geom_point(color = "darkturquoise", size = 2) +
#dark turquoise was chosen to more closely align with the RMD theme colors
scale_y_discrete(labels = function(x) str_remove(x, " County, Virginia|, Virginia")) +
labs(title = "% of Population with a Graduate Degree, 2017-2021 ACS",
subtitle = "Counties in Virginia",
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 12)
va_plot_grad_errorbar
Now here is the plotly errorbar chart, which allows for interactivity.
ggplotly(va_plot_grad_errorbar, tooltip = "x")
Here is an interactive map with both the previously analyzed graduate degree percentage as a spatially represented layer, as well as a median income layer represented at the county level. By turning them on and off, it is obvious that many of the counties with the highest rates of graduate degrees also have the highest household median incomes (and vice versa). Even if it doesn’t align perfectly.
#get the acs household median income data for counties in VA
VA_median_household_income <- get_acs(
geography = "county",
variables = "B19013A_001",
state = 'VA',
year = 2021,
geometry = TRUE,
progress_bar = FALSE
)
#map both pulls of data from the ACS as separate layers
mapview(list(VA_median_household_income['estimate'],virginia_grad_degree['estimate']),
map.types = 'CartoDB.Positron',
layer.name = c("Household Median Income 2021 5-Yr. ACS", "% of Pop. w/ a Graduate Degree 2021 5-Yr. ACS"),
burst = TRUE,
hide = TRUE)