Introduction

Consisting of 14 counties, Massachusetts hosts an estimated population of 7.1 million people. This project will utilize US Census data through tidycensus. This project will consist of two parts. The first part will assess the percentage of the population in each county that has a graduate degree through non-spatial means. The second part will assess the median household incomes by Census tract within the county of Suffolk utilizing ggplot2 and mapview. This county holds the city of Boston and surrounding metropolitan area.

Graduate Degree Holding Population

The first part of this project pulls US Census data on the percentage of population in the state of Massachusetts that has a graduate degree utilizing 5-year ACS data from 2020-2024. To do this, the data is imported utilizing tidyecnsus, arranged uzing tidyverse, and plotted with ggplot2 and plotly.

Load Packages

Utilizing library(), packages are first loaded to include:tidycensus, tidyverse, scales, plotly, and ggiraph.

# load packages

library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(ggiraph)
library(knitr)

Import Data

Next, get_acs() function from tidycensus is called. geography is set to “county” to define the census geography. Variable “DP02_0066P” indicates the specific variable of interest to be pulled utilizing census codes. This particular variable is associated to teh percentage of the population that has a graduate degree. State defines teh state of interest utilizing two letter codes. Lastly, year is set to 2024 to provide the target timeframe.

# import percent graduate degree data at the county level for Massachusetts using get_acs()

perc_grad_deg <- get_acs(
  geography = "county",
  variables = "DP02_0066P",
  state = "MA",
  year = 2024
)

kable(perc_grad_deg)
GEOID NAME variable estimate moe
25001 Barnstable County, Massachusetts DP02_0066P 22.4 0.7
25003 Berkshire County, Massachusetts DP02_0066P 18.7 0.9
25005 Bristol County, Massachusetts DP02_0066P 11.4 0.4
25007 Dukes County, Massachusetts DP02_0066P 24.4 4.5
25009 Essex County, Massachusetts DP02_0066P 17.9 0.4
25011 Franklin County, Massachusetts DP02_0066P 18.5 1.0
25013 Hampden County, Massachusetts DP02_0066P 12.1 0.5
25015 Hampshire County, Massachusetts DP02_0066P 27.6 1.1
25017 Middlesex County, Massachusetts DP02_0066P 30.5 0.4
25019 Nantucket County, Massachusetts DP02_0066P 21.9 3.9
25021 Norfolk County, Massachusetts DP02_0066P 28.4 0.5
25023 Plymouth County, Massachusetts DP02_0066P 16.1 0.5
25025 Suffolk County, Massachusetts DP02_0066P 23.5 0.6
25027 Worcester County, Massachusetts DP02_0066P 16.8 0.4

Largest and Smallest Percentage Counties

To identify the largest and smallest graduate degree population percentage counties, tidyverse function arrange() was called in descending order to arrange the counties in the data table from largest to lowest. Table 1 shows the census data ordered from highest to loweset percentage by county. This indicates that Middlesex County with an estimated 30.5% +/- 0.4% of the population holding a graduate degree has teh largest percentage of the 14 counties. Bristol County with an 11.4% +/- 0.4% has the lowest percentage.

# arrange data in increasing order to identify the largest and smallest percentage counties

ordered_perc_grad_deg <- arrange (perc_grad_deg,
                                  desc(estimate))

kable(ordered_perc_grad_deg)
Table 1: County data arranged in descending order. This indicates that Middlesex County with an estimated 30.5% +/- 0.4% of the population holding a graduate degree has teh largest percentage of the 14 counties. Bristol County with an 11.4% +/- 0.4% has the lowest percentage.
GEOID NAME variable estimate moe
25017 Middlesex County, Massachusetts DP02_0066P 30.5 0.4
25021 Norfolk County, Massachusetts DP02_0066P 28.4 0.5
25015 Hampshire County, Massachusetts DP02_0066P 27.6 1.1
25007 Dukes County, Massachusetts DP02_0066P 24.4 4.5
25025 Suffolk County, Massachusetts DP02_0066P 23.5 0.6
25001 Barnstable County, Massachusetts DP02_0066P 22.4 0.7
25019 Nantucket County, Massachusetts DP02_0066P 21.9 3.9
25003 Berkshire County, Massachusetts DP02_0066P 18.7 0.9
25011 Franklin County, Massachusetts DP02_0066P 18.5 1.0
25009 Essex County, Massachusetts DP02_0066P 17.9 0.4
25027 Worcester County, Massachusetts DP02_0066P 16.8 0.4
25023 Plymouth County, Massachusetts DP02_0066P 16.1 0.5
25013 Hampden County, Massachusetts DP02_0066P 12.1 0.5
25005 Bristol County, Massachusetts DP02_0066P 11.4 0.4

Margin of Error Assessment

To create an interactive margin of error plot for further analysis. First, the ggplot() function was utilized to set the characteristics of plot to include data, symbology, and respective labels. This plot was saved as a value to be called by the function ggplotly() to produce an interactive plot. The resulting margin of error plot, Figure 1, shows that the largest percentage county with the margin of error, does not overlap with any other county making the confidence high in this assessment. The lowest two counties, however, do overlap. This indicates that Hampden County could potentially have a lower percentage of population than Bristol County with a graduate degree reducing the confidence in the initial assessment based on teh estimate value alone.

# create a margin of error plot

errorplot_perc_grad_degree <- ggplot(perc_grad_deg, aes(x= estimate,
                                                        y = reorder(NAME, estimate))) +
  geom_errorbar(aes(xmin = estimate - moe,
                    xmax = estimate + moe),
                width = 1,
                linewidth = 0.5) +
  geom_point(color = "darkred",
             size = 2) +
  scale_y_discrete(labels = function(x) str_remove(x, " County, Massachusetts|, Massachusetts")) +
  labs(title = "Percentage of Population with Graduate Degrees",
       subtitle = "Counties in Massachusetts",
       caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
       x = "ACS estimate (%)",
       y = "") +
  theme_minimal(base_size = 10)
  
# convert plot into an interactive plot with plotly

ggplotly(errorplot_perc_grad_degree, tooltip = "x")

Figure 1 - Interactive Margin of Error Plot of the percetage of the population with a graduate degree in each county of Massachusetts. The largest percentage county with the margin of error, does not overlap with any other county making the confidence high in this assessment. The lowest two counties, however, do overlap. This indicates that Hampden County could potentially have a lower percentage of population than Bristol County with a graduate degree reducing the confidence in the initial assessment based on teh estimate value alone.

Median Household Income Assessment of Suffolk County

The second part of this project assesses the median household income within Suffolk County, Massachusetts. To conduct this assessment, the necessary packages were loaded with tidycensus being utilized to pull a list of variable to identify the corresponding variable code of interest, tigris and tidycensus are then utilized to pull the data of interest along with the associated census tract geography. Finally, ggplot() and ggiraph are utilized to produce an interactive graduated map for spatial data analysis.

Load Additional Packages

Additional packages necessary for US Census spatial data analysis are loaded.

library(mapview)
library(tigris)
library(sf)

View US Census List of Variables

load_variables() functuion is utilized to produce a list of US Census variable codes. Filtering for codes defined by “Median Income”, one finds code “B06011_001” defined as “Estimate!!Median income in the past 12 months –!!Total”.

vars <- load_variables(2021, "acs5")

View(vars)

Pull Identified Data

get_acs() function is called again to pull the median household income data from the US Census with a census tract geography for Suffolk County. geometry is set to “TRUE” to indicate that associated geography data should be pulled as well to interact with tigris and allow for spatial analysis.

# pulling median income by tract in Suffolk County, Massachusetts for analysis of Boston area
suffolk_income <- get_acs(
  geography = "tract",
  variables = "B06011_001",
  state = "MA",
  county = "Suffolk",
  geometry = TRUE
)
##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |=======                                                               |  10%  |                                                                              |========                                                              |  12%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  15%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  18%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |================                                                      |  23%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  25%  |                                                                              |==================                                                    |  26%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  29%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  34%  |                                                                              |==========================                                            |  36%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |==============================                                        |  43%  |                                                                              |================================                                      |  46%  |                                                                              |==================================                                    |  48%  |                                                                              |===================================                                   |  51%  |                                                                              |=====================================                                 |  53%  |                                                                              |=======================================                               |  56%  |                                                                              |=========================================                             |  58%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  63%  |                                                                              |==============================================                        |  66%  |                                                                              |================================================                      |  68%  |                                                                              |==================================================                    |  71%  |                                                                              |===================================================                   |  74%  |                                                                              |=====================================================                 |  76%  |                                                                              |=======================================================               |  79%  |                                                                              |=========================================================             |  81%  |                                                                              |===========================================================           |  84%  |                                                                              |============================================================          |  86%  |                                                                              |==============================================================        |  89%  |                                                                              |================================================================      |  91%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%

Clean Up Data

Suffolk County is characterized by several small islands and the city of Boston which surrounds a bay. US Census tract data does not adjust tracts to account for water features. As such, the erase_water() function was called to clip the tracts around the water features making for a more visually accurate display of the data.

# remove water from data

sf_use_s2(FALSE)

suffolk_erase <- erase_water(suffolk_income, 
                          area_threshold = 0.5, 
                          year = 2021)

Display Data

mapview() provides an interactive map of the resulting data (Figure 2). From this, one finds higher median incomes towards Downtown Boston with lower incomes concentrated in the south-west region of the county.

# use mapview() to display data as an interactive map

mapview(suffolk_erase, zcol = "estimate")

Graduated Symbol Map

To more accurately analyze the median incomes across the county, an interactive graduated symbol map was produced utilizing ggiraph in conjuction with ggplot. Much like utilizing plotly, the characteristics of the map are defined by mapping the data with ggplot(). Thegiraph() was then called to create an interactive element with each symbol. This map (Figure 3) continues to highlight the concentration of higher median incomes located around Downtown Boston with lower incomes in the south-west region.

# use ggplot() and ggiraph to display as interactive graduated symbols map ----

centroids <- st_centroid(suffolk_erase)

suf_inc <- ggplot() + 
  geom_sf(data = suffolk_erase, color = "ivory2", fill = "#0C2340") + 
  geom_sf_interactive(data = centroids, aes(size = estimate, tooltip = estimate),
          alpha = 0.5, color = "#BD3039") + 
  theme_void() + 
  labs(title = "Median Household Income Totals by Census Tract",
       subtitle = "2020-2024 5-year ACS, Suffolk County, Massachusetts",
       size = "ACS Estimate",
       caption = "Data acquired with R and tidycensus.") + 
  scale_size_area(max_size = 6)

girafe(ggobj = suf_inc) %>%
  girafe_options(opts_hover(css = "fill:cyan;"))

Figure 3 - Interactive Graduated Symbol Map of Median Household Income by Census Tract for Suffolk County, Massachusetts. The map continues to highlight the concentration of higher median incomes located around Downtown Boston with lower incomes in the south-west region.

Image credit: Pete Loeser, 12 April 2019
Image credit: Pete Loeser, 12 April 2019

Conclusion

This project identified Middlesex County as having the highest percentage of the population with a graduate degree. Bristol County was found to have the lowest percentage. The project also assessed the median household incomes for Suffolk County finding the highest median incomes towards the Downtown Boston region with the lowest median incomes concentrated mostly in the south-west region of the county.