Is there a pattern in Maryland, at the county-level with higher graduate degree holders when comparing county-level incomes? This study examines the state of Maryland and the percentage of graduate degree holders in the 2021 estimate (2017 to 2021 ACS estimate). It also looks at other data including median income data at the county level. Graduate degrees include both Master’s and Doctorate degrees in any field of study. The target population are male or female greater than 21 years of age.
This analysis will assess the percentage of inhabitants that hold graduate degrees at the county level in the state of Maryland, United States. Several R-packages will be used including tidyverse for data wrangling; tidycensus to programmatically access, download and prepare U.S. Census data; sf to work with spatial data; ggplot2 to visualize the data; and ggiraph to make the plots more intuitive through interactivity.
tidycensus provides an R interface to the Decennial Census (Decennial count is a complete count of the U.S. population done every 10 years), American Community Survey (an annual estimate (for geographies of population 65,000 and greater) and 5-year estimates (for geographies down to the block group), Population Estimates Program, and Public Use Microdata Series APIs.
Session A. “Working with the 2021 American Community Survey with R and tidycensus”
#* Provide package descriptions*
library(tidycensus) #* to access, download, and prepare ACS census data*
library(tidyverse) #* to transform and present data more clearly - readr, dplyr, purr, ggplot2*
library(tigris) #* to download TIGER/line shapefiles*
options(tigris_cache = TRUE)
library(sf) #* to support simple features sf; standardized way to encode spatial vector data*
library(mapview) #* sf package provides
library(plotly) #* sf package to easily create interactive plots*
library(ggiraph) #* extension to ggplot2 to make plots interactive*
The R-package tidycensus will request an aggregation level (geography = enumeration units); and legal entity for Maryland state and its counties; and download and merge census geometries for data sampling. The code below will be plugged in and instructed to call the census open data API that communicates with the census website; and access the variable that holds the education data for graduate degree holders (DP02_0066P).
library(tidycensus)
md_education <- get_acs( #* Use the tidycensus get_acs function to access Census API to stream data to app*
geography = "county", #* requested geography*
state = "MD", #* requested state of Maryland*
variables = "DP02_0066P", #* request education data-grad degree holders*
year = 2021) #* population holders of graduate degrees in Maryland in 2021 were requested and delivered programmatically*
Which county in Maryland has the estimate of the highest number of graduate degree holders?
arrange(md_education, desc(estimate)) # arrange education data (graduate degrees) for Maryland in descending order*
## # A tibble: 24 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 24027 Howard County, Maryland DP02_0066P 32.9 0.8
## 2 24031 Montgomery County, Maryland DP02_0066P 32.5 0.5
## 3 24003 Anne Arundel County, Maryland DP02_0066P 18.4 0.6
## 4 24021 Frederick County, Maryland DP02_0066P 18.4 0.7
## 5 24041 Talbot County, Maryland DP02_0066P 18.4 1.6
## 6 24005 Baltimore County, Maryland DP02_0066P 17.8 0.4
## 7 24510 Baltimore city, Maryland DP02_0066P 17 0.4
## 8 24009 Calvert County, Maryland DP02_0066P 16 1.3
## 9 24025 Harford County, Maryland DP02_0066P 16 0.7
## 10 24029 Kent County, Maryland DP02_0066P 15.5 2.1
## # ℹ 14 more rows
Table 1. Median education estimates for Maryland at the county-level, estimate of persons with graduate degrees in the estimate column (descending order) with an associated margin of error for the estimate in the moe column. The moe represents the uncertainty associated with this estimate. Howard and Montgomery counties have the highest percentage of graduate degree holders at 32.9 % (+/- 0.8% moe) and 32.5 % (+/- 0.5% moe). The margins of error (moe) for both counties are low suggesting they would align nicely with the published ACS 1-year data estimates.
Which county in Maryland has the estimate of the lowest number of graduate degree holders?
arrange(md_education, estimate) #* arrange education data in ascending order*
## # A tibble: 24 × 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 24039 Somerset County, Maryland DP02_0066P 5.1 1
## 2 24011 Caroline County, Maryland DP02_0066P 7.5 1.2
## 3 24001 Allegany County, Maryland DP02_0066P 8.2 0.8
## 4 24019 Dorchester County, Maryland DP02_0066P 9 1.3
## 5 24043 Washington County, Maryland DP02_0066P 9.4 0.7
## 6 24015 Cecil County, Maryland DP02_0066P 10.2 0.9
## 7 24045 Wicomico County, Maryland DP02_0066P 11.7 1
## 8 24023 Garrett County, Maryland DP02_0066P 11.8 1.5
## 9 24047 Worcester County, Maryland DP02_0066P 12 1.2
## 10 24017 Charles County, Maryland DP02_0066P 12.6 0.9
## # ℹ 14 more rows
Table 2. The top 10 lowest Maryland counties for percentage population with graduate degrees in ascending order (lowest to highest). Somerset and Caroline counties have the lowest percentage of population with graduate degrees at 5.1 % (+/- 1.0 % moe) and 7.5% (+/- 1.2% moe), respectively.
Get the education estimate data from the 2017-2021 5-Year ACS and take a look at the columns
Interactive viewing with mapview() to gain deeper insight into data distribution of education data in Maryland.
library(tidycensus)
md_education <- get_acs( #* Use the tidycensus get_acs function to access Census API to stream data to app*
geography = "county", #* requested geography*
state = "MD", #* requested state of Maryland*
variables = "DP02_0066P",
geometry = TRUE,
progress_bar = FALSE, #* request education data-grad degree holders*
year = 2021) #* population holders of graduate degrees in Maryland in 2021 were requested and delivered programmatically*
library(mapview) #* allows for interactive viewing of spatial data in R*
library(leaflet)
mapview(md_education, zcol = "estimate")
Figure 1. Interactive mapview for estimate pf the percent population holding graduate degrees at the county-level in Maryland. The yellow-lightblue-darkblue-purple color ramp portrays the counties from highest percentage grad degree holders in yellow to lowest income in dark blue-purple. The two highest counties were Howard County at 32.9% (0.8% moe) and Montgomery County at 32.5% (0.5% moe).These two counties should align nicely with the published ACS estimates due to the low level of uncertainty of the estimate.
Create a plot using ggiraph to visualize the percentage population across the state of Maryland at the county-level of graduate degree holders including an error bar to show the breadth of the margin of error (moe), representing the estimate level of uncertainty relative to the published ACS data estimate.
#* Load packages required that had become dormant*
library(ggiraph)
library(scales)
library(tidyverse)
md_education_plot_ggiraph <- ggplot(md_education, aes(x = estimate, y = reorder(NAME, estimate),
tooltip = estimate,
date_id = GEOID)) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), #* Add/design layers geoms*
width = 0.5, linewidth = 0.5) +
geom_point_interactive(color = "darkred", size = 2) +
scale_x_continuous(labels = label_percent()) +
scale_y_discrete(labels = function(x) str_remove(x, "County, Maryland|, Maryland")) +
labs(title = "Percentage Population with Graduate Degrees, 2017-2021, ACS", #* Add labels*
subtitle = "Counties in Maryland",
caption = "Data acquired with R and tidycensus",
x = "ACS estimate",
y = "") +
theme_minimal(base_size = 12)
md_education_plot_ggiraph
Figure 2. Static point estimate of percent population holding graduate
degrees. The error bars show the percent uncertainty in the estimate
(from minimum to maximum). Kent County in Maryland has the highest
margin of error in the ACS estimate.
Provide interactivity through ggiraph() to bring chart elements to life; specify what geom() or layer to interact with; then render as a plot; and customize the interactivity.
library(ggiraph)
girafe(ggobj = md_education_plot_ggiraph) %>%
girafe_options(opts_hover(css = "fill:cyan;")) #* enable percentage points with hover*
Figure 3. Interactive point estimate of percent population in Maryland at county-level, graduate degree holders represented by the red dots enabled by hover to show % population degree holders. Error bar shows the range of uncertainty of the estimate. There are two obvious outliers, Howard County (32.9% +/- 0.8%) and Montgomery County (32.5% +/- 0.5%). The moe’s were retrieved from Table 2.
How do the two highest percent degree holders compare for geographic size and population? Montgomery County has a population of 1,057,201 and has a narrower margin of error than Howard County with a population of 330,000 suggesting that the percent population of Montgomery County would align better with the published ACS data estimate Maryland Counties by Population.
Take a look at the Maryland county-level incomes for 2021 to gain insight on the affordability of a graduate degree first and foremost.
library(tidycensus)
maryland_income <- get_acs(
geography = "county",
variables = "B19013_001",
state = "MD",
year = 2021,
geometry = TRUE,
progress_bar = FALSE
)
Take a look at the income data estimate table for Maryland to gain deeper insight into why the two county outliers exist.
maryland_income #* view the first 10 lines of the table to understand what the columns mean so they can be mapped*
## Simple feature collection with 24 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -79.48765 ymin: 37.91172 xmax: -75.04894 ymax: 39.72304
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME variable estimate moe
## 1 24047 Worcester County, Maryland B19013_001 71262 2787
## 2 24003 Anne Arundel County, Maryland B19013_001 108048 1910
## 3 24033 Prince George's County, Maryland B19013_001 91124 1389
## 4 24025 Harford County, Maryland B19013_001 98495 1917
## 5 24015 Cecil County, Maryland B19013_001 81817 3281
## 6 24011 Caroline County, Maryland B19013_001 63027 3391
## 7 24023 Garrett County, Maryland B19013_001 58011 3632
## 8 24029 Kent County, Maryland B19013_001 64451 7427
## 9 24041 Talbot County, Maryland B19013_001 79349 4240
## 10 24045 Wicomico County, Maryland B19013_001 63610 2190
## geometry
## 1 MULTIPOLYGON (((-75.66061 3...
## 2 MULTIPOLYGON (((-76.83849 3...
## 3 MULTIPOLYGON (((-77.07995 3...
## 4 MULTIPOLYGON (((-76.0921 39...
## 5 MULTIPOLYGON (((-76.23326 3...
## 6 MULTIPOLYGON (((-76.01505 3...
## 7 MULTIPOLYGON (((-79.48765 3...
## 8 MULTIPOLYGON (((-76.27737 3...
## 9 MULTIPOLYGON (((-76.34647 3...
## 10 MULTIPOLYGON (((-75.92033 3...
Table 4. Spatial data for the first ten records for Maryland county-level data for median household income in the estimate column with associated margins of error for income in the moe column representing the uncertainty associated with this estimate. Geometry is also populated to connect the income data to the county polygons.
Interactive viewing with mapview()
library(mapview) #* allows for interactive viewing of spatial data in R*
mapview(maryland_income, zcol = "estimate")
Figure 4. Median household income estimate for Maryland, at the county-level from the 2021 1-year ACS estimate showing income variation within the state of Maryland. The two counties with the highest estimate of median income was in Howard County at 129,549 US dollars 9in yellow) and Montogmery County was third highest at 117.345 US dollars (in light orange). This is a high income relative to the lowest median county income in Maryland at 48,661 US dollars (in purple) the furthest south county in the state.
Small Area Spatial Demographic Data for Howard County, the county with the highest median income in Maryland. We have gained insight into the distribution of the 1-year ACS median income estimate at the county-level now lest look at the census tract-level data.
howard_income <- get_acs( #* access the census data ACS data API*
geography = "tract", #* request geography by tract*
variables = "B19013_001", #* request variable for median income*
state = "MD", #* request the state of Maryland*
county = "Howard", #* request the county of Howard*
geometry = TRUE,
progress_bar = FALSE
) #* request the geometry for the tracts to link income data*
mapview(howard_income, zcol = "estimate") #* map the census tract median income level data*
Figure 5. Median household income estimate for Howard County, Maryland at the tract-level from the 2021 ACS estimate showing income variation within Howard County census tracts. The median income ranged between 239,173 US dollar (in yellow) to a low at 64,447 US dollars (in purple). Howard County is wealthy county.
Howard and Montgomery counties have the highest percentage of graduate degree holders at 32.9 % (+/- 0.8% moe) and 32.5 % (+/- 0.5% moe). The margin of errors are low suggesting the data estimates would align nicely with the published ACS estimates.
Montgomery County has the highest graduate degree holders with the smallest uncertainty for a 2022 population of 1,057,021 providing a more accurate estimate than counties with smaller populations like Howard County with a population of 330,000 inhabitants.
The two counties with the highest estimate of median income are in Howard County at 129,549 US dollars. Montogmery County was third highest at 117,345 US dollars (in light orange). These two counties have high income estimates relative to the lowest median county income in Maryland at 48,661 US dollars (in purple) the furthest south county in the state.
High median incomes surely facilitate inhabitants to further education including obtaining a graduate degree.
Q1-Are there patterns in proximity to Universities that provide graduate degrees at the clounty level? Q2-Are there patterns as far as access to financial support for students seeking graduate degrees at the county level? Q3-are there differences across different fields of study at the county-levels? Q4-how does Maryland compare to other states as far as percentage graduate degree holders?
Plotly. plotly
Census Code Meanings. Census code meanings
Moraga, Paula. The sf package for spatial analysis.The sf package for spatial analysis
Sievert, C. (xxxx) Interactive Web-Based Data Visualization with R, plotly, and shiny (Chapman & Hall/CRC The R Series) (p. 96). CRC Press. Kindle Edition.
Walker, K. (2023). Analyzing U.S. Census Data: Methods, Maps and Models in R. Analyzing U.S. Census Data