1. Introduction

Is there a pattern in Maryland, at the county-level with higher graduate degree holders when comparing county-level incomes? This study examines the state of Maryland and the percentage of graduate degree holders in the 2021 estimate (2017 to 2021 ACS estimate). It also looks at other data including median income data at the county level. Graduate degrees include both Master’s and Doctorate degrees in any field of study. The target population are male or female greater than 21 years of age.

This analysis will assess the percentage of inhabitants that hold graduate degrees at the county level in the state of Maryland, United States. Several R-packages will be used including tidyverse for data wrangling; tidycensus to programmatically access, download and prepare U.S. Census data; sf to work with spatial data; ggplot2 to visualize the data; and ggiraph to make the plots more intuitive through interactivity.

tidycensus provides an R interface to the Decennial Census (Decennial count is a complete count of the U.S. population done every 10 years), American Community Survey (an annual estimate (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group), Population Estimates Program, and Public Use Microdata Series APIs.

Session A. “Working with the 2021 American Community Survey with R and tidycensus”

#* Provide package descriptions*

library(tidycensus) #* to access, download, and prepare ACS census data*
library(tidyverse) #* to transform and present data more clearly - readr, dplyr, purr, ggplot2*
library(tigris) #* to download TIGER/line shapefiles*
options(tigris_cache = TRUE)

library(sf) #* to support simple features sf; standardized way to encode spatial vector data*
library(mapview) #* sf package provides
library(plotly) #* sf package  to easily create interactive plots*
library(ggiraph) #* extension to ggplot2 to make plots interactive*

2. Data Access, Download, and Preparation

2.1 Data Access, Download Datasets of Interest.

The R-package tidycensus will request an aggregation level (geography = enumeration units); and legal entity for Maryland state and its counties; and download and merge census geometries for data sampling. The code below will be plugged in and instructed to call the census open data API that communicates with the census website; and access the variable that holds the education data for graduate degree holders (DP02_0066P).

2.2 Non-spatial American Community Survey (ACS) Data Viewing

library(tidycensus)

md_education <- get_acs( #* Use the tidycensus get_acs function to access Census API to stream data to app*
  geography = "county", #* requested geography*
  state = "MD", #* requested state of Maryland*
  variables = "DP02_0066P", #* request education data-grad degree holders*
  year = 2021) #* population holders of graduate degrees in Maryland in 2021 were requested and delivered programmatically*

Which county in Maryland has the estimate of the highest number of graduate degree holders?

arrange(md_education, desc(estimate)) # arrange education data (graduate degrees) for Maryland in descending order*
## # A tibble: 24 × 5
##    GEOID NAME                          variable   estimate   moe
##    <chr> <chr>                         <chr>         <dbl> <dbl>
##  1 24027 Howard County, Maryland       DP02_0066P     32.9   0.8
##  2 24031 Montgomery County, Maryland   DP02_0066P     32.5   0.5
##  3 24003 Anne Arundel County, Maryland DP02_0066P     18.4   0.6
##  4 24021 Frederick County, Maryland    DP02_0066P     18.4   0.7
##  5 24041 Talbot County, Maryland       DP02_0066P     18.4   1.6
##  6 24005 Baltimore County, Maryland    DP02_0066P     17.8   0.4
##  7 24510 Baltimore city, Maryland      DP02_0066P     17     0.4
##  8 24009 Calvert County, Maryland      DP02_0066P     16     1.3
##  9 24025 Harford County, Maryland      DP02_0066P     16     0.7
## 10 24029 Kent County, Maryland         DP02_0066P     15.5   2.1
## # ℹ 14 more rows

Table 1. Median education estimates for Maryland at the county-level, estimate of persons with graduate degrees in the estimate column (descending order) with an associated margin of error for the estimate in the moe column. The moe represents the uncertainty associated with this estimate. Howard and Montgomery counties have the highest percentage of graduate degree holders at 32.9 % (+/- 0.8% moe) and 32.5 % (+/- 0.5% moe). The margins of error (moe) for both counties are low suggesting they would align nicely with the published ACS 1-year data estimates.

Which county in Maryland has the estimate of the lowest number of graduate degree holders?

arrange(md_education, estimate) #* arrange education data in ascending order*
## # A tibble: 24 × 5
##    GEOID NAME                        variable   estimate   moe
##    <chr> <chr>                       <chr>         <dbl> <dbl>
##  1 24039 Somerset County, Maryland   DP02_0066P      5.1   1  
##  2 24011 Caroline County, Maryland   DP02_0066P      7.5   1.2
##  3 24001 Allegany County, Maryland   DP02_0066P      8.2   0.8
##  4 24019 Dorchester County, Maryland DP02_0066P      9     1.3
##  5 24043 Washington County, Maryland DP02_0066P      9.4   0.7
##  6 24015 Cecil County, Maryland      DP02_0066P     10.2   0.9
##  7 24045 Wicomico County, Maryland   DP02_0066P     11.7   1  
##  8 24023 Garrett County, Maryland    DP02_0066P     11.8   1.5
##  9 24047 Worcester County, Maryland  DP02_0066P     12     1.2
## 10 24017 Charles County, Maryland    DP02_0066P     12.6   0.9
## # ℹ 14 more rows

Table 2. The top 10 lowest Maryland counties for percentage population with graduate degrees in ascending order (lowest to highest). Somerset and Caroline counties have the lowest percentage of population with graduate degrees at 5.1 % (+/- 1.0 % moe) and 7.5% (+/- 1.2% moe), respectively.

2.3 Spatial ACS Data Mapping and Analysis

Get the education estimate data from the 2017-2021 5-Year ACS and take a look at the columns

Interactive viewing with mapview() to gain deeper insight into data distribution of education data in Maryland.

library(tidycensus)

md_education <- get_acs( #* Use the tidycensus get_acs function to access Census API to stream data to app*
  geography = "county", #* requested geography*
  state = "MD", #* requested state of Maryland*
  variables = "DP02_0066P",
  geometry = TRUE,
  progress_bar = FALSE, #* request education data-grad degree holders*
  year = 2021) #* population holders of graduate degrees in Maryland in 2021 were requested and delivered programmatically*

library(mapview) #* allows for interactive viewing of spatial data in R*
library(leaflet)
mapview(md_education, zcol = "estimate")

Figure 1. Interactive mapview for estimate pf the percent population holding graduate degrees at the county-level in Maryland. The yellow-lightblue-darkblue-purple color ramp portrays the counties from highest percentage grad degree holders in yellow to lowest income in dark blue-purple. The two highest counties were Howard County at 32.9% (0.8% moe) and Montgomery County at 32.5% (0.5% moe).These two counties should align nicely with the published ACS estimates due to the low level of uncertainty of the estimate.

Create a plot using ggiraph to visualize the percentage population across the state of Maryland at the county-level of graduate degree holders including an error bar to show the breadth of the margin of error (moe), representing the estimate level of uncertainty relative to the published ACS data estimate.

#* Load packages required that had become dormant*  

library(ggiraph)
library(scales)
library(tidyverse) 


md_education_plot_ggiraph <- ggplot(md_education, aes(x = estimate, y = reorder(NAME, estimate),
                                       tooltip = estimate,
                                       date_id = GEOID)) +
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), #* Add/design layers geoms*
                  width = 0.5, linewidth = 0.5) +
  geom_point_interactive(color = "darkred", size = 2) +
      scale_x_continuous(labels = label_percent()) +
    scale_y_discrete(labels = function(x) str_remove(x, "County, Maryland|, Maryland")) +
    labs(title = "Percentage Population with Graduate Degrees, 2017-2021, ACS", #* Add labels*
subtitle = "Counties in Maryland",
caption = "Data acquired with R and tidycensus",
  x = "ACS estimate",
  y = "") +
theme_minimal(base_size = 12)

md_education_plot_ggiraph

Figure 2. Static point estimate of percent population holding graduate degrees. The error bars show the percent uncertainty in the estimate (from minimum to maximum). Kent County in Maryland has the highest margin of error in the ACS estimate.

Provide interactivity through ggiraph() to bring chart elements to life; specify what geom() or layer to interact with; then render as a plot; and customize the interactivity.

library(ggiraph)

girafe(ggobj = md_education_plot_ggiraph) %>%
  girafe_options(opts_hover(css = "fill:cyan;")) #* enable percentage points with hover*

Figure 3. Interactive point estimate of percent population in Maryland at county-level, graduate degree holders represented by the red dots enabled by hover to show % population degree holders. Error bar shows the range of uncertainty of the estimate. There are two obvious outliers, Howard County (32.9% +/- 0.8%) and Montgomery County (32.5% +/- 0.5%). The moe’s were retrieved from Table 2.

How do the two highest percent degree holders compare for geographic size and population? Montgomery County has a population of 1,057,201 and has a narrower margin of error than Howard County with a population of 330,000 suggesting that the percent population of Montgomery County would align better with the published ACS data estimate Maryland Counties by Population.

2.4 Digging Deeper Data Preparation, Viewing of tidy or long-form data (default in tidycensus) 2017 to 2021 5-year ACS estimate; helpful for groupwise analysis and visualizations. What is the median income distribution in Howard and Montgomery Counties relative to the rest of Maryland?

Take a look at the Maryland county-level incomes for 2021 to gain insight on the affordability of a graduate degree first and foremost.

library(tidycensus)
maryland_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  state = "MD",
  year = 2021,
  geometry = TRUE, 
  progress_bar = FALSE
)

Take a look at the income data estimate table for Maryland to gain deeper insight into why the two county outliers exist.

maryland_income #* view the first 10 lines of the table to understand what the columns mean so they can be mapped*
## Simple feature collection with 24 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -79.48765 ymin: 37.91172 xmax: -75.04894 ymax: 39.72304
## Geodetic CRS:  NAD83
## First 10 features:
##    GEOID                             NAME   variable estimate  moe
## 1  24047       Worcester County, Maryland B19013_001    71262 2787
## 2  24003    Anne Arundel County, Maryland B19013_001   108048 1910
## 3  24033 Prince George's County, Maryland B19013_001    91124 1389
## 4  24025         Harford County, Maryland B19013_001    98495 1917
## 5  24015           Cecil County, Maryland B19013_001    81817 3281
## 6  24011        Caroline County, Maryland B19013_001    63027 3391
## 7  24023         Garrett County, Maryland B19013_001    58011 3632
## 8  24029            Kent County, Maryland B19013_001    64451 7427
## 9  24041          Talbot County, Maryland B19013_001    79349 4240
## 10 24045        Wicomico County, Maryland B19013_001    63610 2190
##                          geometry
## 1  MULTIPOLYGON (((-75.66061 3...
## 2  MULTIPOLYGON (((-76.83849 3...
## 3  MULTIPOLYGON (((-77.07995 3...
## 4  MULTIPOLYGON (((-76.0921 39...
## 5  MULTIPOLYGON (((-76.23326 3...
## 6  MULTIPOLYGON (((-76.01505 3...
## 7  MULTIPOLYGON (((-79.48765 3...
## 8  MULTIPOLYGON (((-76.27737 3...
## 9  MULTIPOLYGON (((-76.34647 3...
## 10 MULTIPOLYGON (((-75.92033 3...

Table 4. Spatial data for the first ten records for Maryland county-level data for median household income in the estimate column with associated margins of error for income in the moe column representing the uncertainty associated with this estimate. Geometry is also populated to connect the income data to the county polygons.

Interactive viewing with mapview()

library(mapview) #* allows for interactive viewing of spatial data in R*
mapview(maryland_income, zcol = "estimate")

Figure 4. Median household income estimate for Maryland, at the county-level from the 2021 1-year ACS estimate showing income variation within the state of Maryland. The two counties with the highest estimate of median income was in Howard County at 129,549 US dollars 9in yellow) and Montogmery County was third highest at 117.345 US dollars (in light orange). This is a high income relative to the lowest median county income in Maryland at 48,661 US dollars (in purple) the furthest south county in the state.

Small Area Spatial Demographic Data for Howard County, the county with the highest median income in Maryland. We have gained insight into the distribution of the 1-year ACS median income estimate at the county-level now lest look at the census tract-level data.

howard_income <- get_acs( #* access the census data ACS data API*
  geography = "tract", #* request geography by tract*
  variables = "B19013_001", #* request variable for median income*
  state = "MD", #* request the state of Maryland*
  county = "Howard", #* request the county of Howard*
  geometry = TRUE,
  progress_bar = FALSE
  ) #* request the geometry for the tracts to link income data*

mapview(howard_income, zcol = "estimate") #* map the census tract median income level data*

Figure 5. Median household income estimate for Howard County, Maryland at the tract-level from the 2021 ACS estimate showing income variation within Howard County census tracts. The median income ranged between 239,173 US dollar (in yellow) to a low at 64,447 US dollars (in purple). Howard County is wealthy county.

3. Conclusions and Future Work

3.1 Conclusion

  1. Howard and Montgomery counties have the highest percentage of graduate degree holders at 32.9 % (+/- 0.8% moe) and 32.5 % (+/- 0.5% moe). The margin of errors are low suggesting the data estimates would align nicely with the published ACS estimates.

  2. Montgomery County has the highest graduate degree holders with the smallest uncertainty for a 2022 population of 1,057,021 providing a more accurate estimate than counties with smaller populations like Howard County with a population of 330,000 inhabitants.

  3. The two counties with the highest estimate of median income are in Howard County at 129,549 US dollars. Montogmery County was third highest at 117,345 US dollars (in light orange). These two counties have high income estimates relative to the lowest median county income in Maryland at 48,661 US dollars (in purple) the furthest south county in the state.

  4. High median incomes surely facilitate inhabitants to further education including obtaining a graduate degree.

3.2 Future Work

Q1-Are there patterns in proximity to Universities that provide graduate degrees at the clounty level? Q2-Are there patterns as far as access to financial support for students seeking graduate degrees at the county level? Q3-are there differences across different fields of study at the county-levels? Q4-how does Maryland compare to other states as far as percentage graduate degree holders?

References

Plotly. plotly

Census Code Meanings. Census code meanings

Moraga, Paula. The sf package for spatial analysis.The sf package for spatial analysis

Sievert, C. (xxxx) Interactive Web-Based Data Visualization with R, plotly, and shiny (Chapman & Hall/CRC The R Series) (p. 96). CRC Press. Kindle Edition.

Walker, K. (2023). Analyzing U.S. Census Data: Methods, Maps and Models in R. Analyzing U.S. Census Data