Consisting of 14 counties, Massachusetts hosts an estimated
population of 7.1
million people. This project will utilize US Census data through
tidycensus. This project will consist of two parts. The
first part will assess the percentage of the population in each county
that has a graduate degree through non-spatial means. The second part
will assess the median household incomes by Census tract within the
county of Suffolk utilizing ggplot2 and
mapview. This county holds the city of Boston and
surrounding metropolitan area.
The first part of this project pulls US Census data on the percentage
of population in the state of Massachusetts that has a graduate degree
utilizing 5-year ACS data from 2020-2024. To do this, the data is
imported utilizing tidyecnsus, arranged uzing
tidyverse, and plotted with ggplot2 and
plotly.
Utilizing library(), packages are first loaded to
include:tidycensus, tidyverse,
scales, plotly, and ggiraph.
# load packages
library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(ggiraph)
library(knitr)
Next, get_acs() function from tidycensus is
called. geography is set to “county” to define the census
geography. Variable “DP02_0066P” indicates the specific
variable of interest to be pulled utilizing census codes. This
particular variable is associated to teh percentage of the population
that has a graduate degree. State defines teh state of interest
utilizing two letter codes. Lastly, year is set to 2024 to
provide the target timeframe.
# import percent graduate degree data at the county level for Massachusetts using get_acs()
perc_grad_deg <- get_acs(
geography = "county",
variables = "DP02_0066P",
state = "MA",
year = 2024
)
kable(perc_grad_deg)
| GEOID | NAME | variable | estimate | moe |
|---|---|---|---|---|
| 25001 | Barnstable County, Massachusetts | DP02_0066P | 22.4 | 0.7 |
| 25003 | Berkshire County, Massachusetts | DP02_0066P | 18.7 | 0.9 |
| 25005 | Bristol County, Massachusetts | DP02_0066P | 11.4 | 0.4 |
| 25007 | Dukes County, Massachusetts | DP02_0066P | 24.4 | 4.5 |
| 25009 | Essex County, Massachusetts | DP02_0066P | 17.9 | 0.4 |
| 25011 | Franklin County, Massachusetts | DP02_0066P | 18.5 | 1.0 |
| 25013 | Hampden County, Massachusetts | DP02_0066P | 12.1 | 0.5 |
| 25015 | Hampshire County, Massachusetts | DP02_0066P | 27.6 | 1.1 |
| 25017 | Middlesex County, Massachusetts | DP02_0066P | 30.5 | 0.4 |
| 25019 | Nantucket County, Massachusetts | DP02_0066P | 21.9 | 3.9 |
| 25021 | Norfolk County, Massachusetts | DP02_0066P | 28.4 | 0.5 |
| 25023 | Plymouth County, Massachusetts | DP02_0066P | 16.1 | 0.5 |
| 25025 | Suffolk County, Massachusetts | DP02_0066P | 23.5 | 0.6 |
| 25027 | Worcester County, Massachusetts | DP02_0066P | 16.8 | 0.4 |
To identify the largest and smallest graduate degree population
percentage counties, tidyverse function
arrange() was called in descending order to arrange the
counties in the data table from largest to lowest. Table 1 shows the
census data ordered from highest to loweset percentage by county. This
indicates that Middlesex County with an estimated 30.5% +/- 0.4% of the
population holding a graduate degree has teh largest percentage of the
14 counties. Bristol County with an 11.4% +/- 0.4% has the lowest
percentage.
# arrange data in increasing order to identify the largest and smallest percentage counties
ordered_perc_grad_deg <- arrange (perc_grad_deg,
desc(estimate))
kable(ordered_perc_grad_deg)
| GEOID | NAME | variable | estimate | moe |
|---|---|---|---|---|
| 25017 | Middlesex County, Massachusetts | DP02_0066P | 30.5 | 0.4 |
| 25021 | Norfolk County, Massachusetts | DP02_0066P | 28.4 | 0.5 |
| 25015 | Hampshire County, Massachusetts | DP02_0066P | 27.6 | 1.1 |
| 25007 | Dukes County, Massachusetts | DP02_0066P | 24.4 | 4.5 |
| 25025 | Suffolk County, Massachusetts | DP02_0066P | 23.5 | 0.6 |
| 25001 | Barnstable County, Massachusetts | DP02_0066P | 22.4 | 0.7 |
| 25019 | Nantucket County, Massachusetts | DP02_0066P | 21.9 | 3.9 |
| 25003 | Berkshire County, Massachusetts | DP02_0066P | 18.7 | 0.9 |
| 25011 | Franklin County, Massachusetts | DP02_0066P | 18.5 | 1.0 |
| 25009 | Essex County, Massachusetts | DP02_0066P | 17.9 | 0.4 |
| 25027 | Worcester County, Massachusetts | DP02_0066P | 16.8 | 0.4 |
| 25023 | Plymouth County, Massachusetts | DP02_0066P | 16.1 | 0.5 |
| 25013 | Hampden County, Massachusetts | DP02_0066P | 12.1 | 0.5 |
| 25005 | Bristol County, Massachusetts | DP02_0066P | 11.4 | 0.4 |
To create an interactive margin of error plot for further analysis.
First, the ggplot() function was utilized to set the
characteristics of plot to include data, symbology, and respective
labels. This plot was saved as a value to be called by the function
ggplotly() to produce an interactive plot. The resulting
margin of error plot, Figure 1, shows that the largest percentage county
with the margin of error, does not overlap with any other county making
the confidence high in this assessment. The lowest two counties,
however, do overlap. This indicates that Hampden County could
potentially have a lower percentage of population than Bristol County
with a graduate degree reducing the confidence in the initial assessment
based on teh estimate value alone.
# create a margin of error plot
errorplot_perc_grad_degree <- ggplot(perc_grad_deg, aes(x= estimate,
y = reorder(NAME, estimate))) +
geom_errorbar(aes(xmin = estimate - moe,
xmax = estimate + moe),
width = 1,
linewidth = 0.5) +
geom_point(color = "darkred",
size = 2) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Massachusetts|, Massachusetts")) +
labs(title = "Percentage of Population with Graduate Degrees",
subtitle = "Counties in Massachusetts",
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
x = "ACS estimate (%)",
y = "") +
theme_minimal(base_size = 10)
# convert plot into an interactive plot with plotly
ggplotly(errorplot_perc_grad_degree, tooltip = "x")
Figure 1 - Interactive Margin of Error Plot of the percetage of the population with a graduate degree in each county of Massachusetts. The largest percentage county with the margin of error, does not overlap with any other county making the confidence high in this assessment. The lowest two counties, however, do overlap. This indicates that Hampden County could potentially have a lower percentage of population than Bristol County with a graduate degree reducing the confidence in the initial assessment based on teh estimate value alone.
The second part of this project assesses the median household income
within Suffolk County, Massachusetts. To conduct this assessment, the
necessary packages were loaded with tidycensus being
utilized to pull a list of variable to identify the corresponding
variable code of interest, tigris and
tidycensus are then utilized to pull the data of interest
along with the associated census tract geography. Finally,
ggplot() and ggiraph are utilized to produce
an interactive graduated map for spatial data analysis.
Additional packages necessary for US Census spatial data analysis are loaded.
library(mapview)
library(tigris)
library(sf)
load_variables() functuion is utilized to produce a list
of US Census variable codes. Filtering for codes defined by “Median
Income”, one finds code “B06011_001” defined as “Estimate!!Median income
in the past 12 months –!!Total”.
vars <- load_variables(2021, "acs5")
View(vars)
get_acs() function is called again to pull the median
household income data from the US Census with a census tract geography
for Suffolk County. geometry is set to “TRUE” to indicate that
associated geography data should be pulled as well to interact with
tigris and allow for spatial analysis.
# pulling median income by tract in Suffolk County, Massachusetts for analysis of Boston area
suffolk_income <- get_acs(
geography = "tract",
variables = "B06011_001",
state = "MA",
county = "Suffolk",
geometry = TRUE
)
## | | | 0% | |= | 2% | |== | 3% | |==== | 6% | |===== | 7% | |===== | 8% | |======= | 10% | |======== | 12% | |========== | 14% | |=========== | 15% | |=========== | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 20% | |================ | 23% | |================= | 24% | |================== | 25% | |================== | 26% | |==================== | 29% | |===================== | 29% | |===================== | 30% | |====================== | 31% | |======================= | 33% | |======================== | 34% | |========================== | 36% | |=========================== | 39% | |============================ | 40% | |============================== | 43% | |================================ | 46% | |================================== | 48% | |=================================== | 51% | |===================================== | 53% | |======================================= | 56% | |========================================= | 58% | |=========================================== | 61% | |============================================ | 63% | |============================================== | 66% | |================================================ | 68% | |================================================== | 71% | |=================================================== | 74% | |===================================================== | 76% | |======================================================= | 79% | |========================================================= | 81% | |=========================================================== | 84% | |============================================================ | 86% | |============================================================== | 89% | |================================================================ | 91% | |================================================================== | 94% | |=================================================================== | 96% | |===================================================================== | 99% | |======================================================================| 100%
Suffolk County is characterized by several small islands and the city
of Boston which surrounds a bay. US Census tract data does not adjust
tracts to account for water features. As such, the
erase_water() function was called to clip the tracts around
the water features making for a more visually accurate display of the
data.
# remove water from data
sf_use_s2(FALSE)
suffolk_erase <- erase_water(suffolk_income,
area_threshold = 0.5,
year = 2021)
mapview() provides an interactive map of the resulting
data (Figure 2). From this, one finds higher median incomes towards
Downtown Boston with lower incomes concentrated in the south-west region
of the county.
# use mapview() to display data as an interactive map
mapview(suffolk_erase, zcol = "estimate")
To more accurately analyze the median incomes across the county, an
interactive graduated symbol map was produced utilizing
ggiraph in conjuction with ggplot. Much like
utilizing plotly, the characteristics of the map are
defined by mapping the data with ggplot().
Thegiraph() was then called to create an interactive
element with each symbol. This map (Figure 3) continues to highlight the
concentration of higher median incomes located around Downtown Boston
with lower incomes in the south-west region.
# use ggplot() and ggiraph to display as interactive graduated symbols map ----
centroids <- st_centroid(suffolk_erase)
suf_inc <- ggplot() +
geom_sf(data = suffolk_erase, color = "ivory2", fill = "#0C2340") +
geom_sf_interactive(data = centroids, aes(size = estimate, tooltip = estimate),
alpha = 0.5, color = "#BD3039") +
theme_void() +
labs(title = "Median Household Income Totals by Census Tract",
subtitle = "2020-2024 5-year ACS, Suffolk County, Massachusetts",
size = "ACS Estimate",
caption = "Data acquired with R and tidycensus.") +
scale_size_area(max_size = 6)
girafe(ggobj = suf_inc) %>%
girafe_options(opts_hover(css = "fill:cyan;"))
Figure 3 - Interactive Graduated Symbol Map of Median Household Income by Census Tract for Suffolk County, Massachusetts. The map continues to highlight the concentration of higher median incomes located around Downtown Boston with lower incomes in the south-west region.
This project identified Middlesex County as having the highest percentage of the population with a graduate degree. Bristol County was found to have the lowest percentage. The project also assessed the median household incomes for Suffolk County finding the highest median incomes towards the Downtown Boston region with the lowest median incomes concentrated mostly in the south-west region of the county.