Introduction

This report documents US Census data pertaining to the percentage of the population within New York state which have a graduate degree. This information is organized at the County level.

Data Preparation

Load Packages

The following R packages were used to obtain, tidy, organize, and plot the US Census data.

library(tidycensus) #used to obtain US Census data
library(here) #used for easy referencing 
library(tidyverse) #used to tidy data
library(ggplot2) #used to plot data
library(scales) #customise appearance of plots
library(plotly) #read in package to allow interactive plots
library(htmlwidgets) #read in package to save as HTML file 

Obtaining and Preparing Data

To focus on only data pertinent to answering questions about graduates within New York, only specific data was downloaded and read in from US Census, American Community Survey (ACS) data. Using get_acs() from tidycensus, the percentage of graduates (“DP02_0066P”) for the state of New York at the county level, for the period of from 2017 through 2021, was obtained and read into R.

#Masters ONLY = ny_education3
ny_education3 <- get_acs(
  geography = "county", #at the county level
  state = "NY", #for New York state
  variables = c(percent_graduate = "DP02_0066P"), #%masters 
  year = 2021, #for the 2021 year
  output = "wide" #display wide format
)

Analysis

Evaluating the Data

According to the data, approximate 31% of residents within Tompkins County held graduate degrees; this represented the highest percentage within New York. Approximately 6% of residents within Orleans County held graduate degrees; this represented the lowest percentage within the state.

Sort data based on the percentages of graduates

This information was obtained by arranging the percentage of graduates (“DP02_0066P”) data for the state of New York.

names(ny_education3) #identify the correct names of the columns
## [1] "GEOID"             "NAME"              "percent_graduateE"
## [4] "percent_graduateM"
#2Q) Which counties in the selected state have the largest percentages of graduate degree holders?
#2A) Tompkins County, New York with 31%

arrange(ny_education3, desc(percent_graduateE)) #sort by deceasing order
## # A tibble: 62 × 4
##    GEOID NAME                         percent_graduateE percent_graduateM
##    <chr> <chr>                                    <dbl>             <dbl>
##  1 36109 Tompkins County, New York                 31                 1.6
##  2 36061 New York County, New York                 30.6               0.5
##  3 36119 Westchester County, New York              25.8               0.5
##  4 36001 Albany County, New York                   21.8               0.8
##  5 36059 Nassau County, New York                   21.5               0.3
##  6 36091 Saratoga County, New York                 19.1               0.8
##  7 36087 Rockland County, New York                 18.9               0.7
##  8 36079 Putnam County, New York                   18.1               1.3
##  9 36055 Monroe County, New York                   18                 0.4
## 10 36027 Dutchess County, New York                 17.6               0.7
## # ℹ 52 more rows
#3Q) Which have the smallest percentages?
#3A) Orleans County, New York with 5.9%
arrange(ny_education3, percent_graduateE) #sort by ascending order
## # A tibble: 62 × 4
##    GEOID NAME                        percent_graduateE percent_graduateM
##    <chr> <chr>                                   <dbl>             <dbl>
##  1 36073 Orleans County, New York                  5.9               0.9
##  2 36121 Wyoming County, New York                  7                 0.8
##  3 36005 Bronx County, New York                    7.6               0.2
##  4 36035 Fulton County, New York                   7.9               1.1
##  5 36049 Lewis County, New York                    7.9               1.1
##  6 36017 Chenango County, New York                 8.2               1  
##  7 36057 Montgomery County, New York               8.4               0.9
##  8 36115 Washington County, New York               8.6               0.9
##  9 36011 Cayuga County, New York                   8.9               0.8
## 10 36037 Genesee County, New York                  8.9               0.8
## # ℹ 52 more rows

Visualizing the Data

To help visualize the percentage of the population (for each county) which hold a graduate degree, the data was plotted using ggplot2.

Details were added to the plot graph to enhance readability and understanding. This included adding a title, caption, and X and y references. Additionally the appearance was changed; the wording “New York” was removed from each data point, the base color was removed, the point color was changed to read, and additional breaks were added.

Static Graph

ggplot(ny_education3, aes(x = percent_graduateE, 
                                              y = reorder(NAME, percent_graduateE)))+ # sort by value
    geom_errorbar(aes(xmin = percent_graduateE - percent_graduateM, #add margin error information 
                    xmax = percent_graduateE + percent_graduateM),
                width = 0.5, linewidth = 0.5)+ # depict margin error info with lines
  geom_point(color = "steelblue", size = 2)+ #change how it looks
  scale_x_continuous(n.breaks=12) + #create more breaks in the data for better understanding
  scale_y_discrete(labels = function(x) str_remove(x, " County, New York|, New York")) +
  labs(title = "Percentage of Graduates, 2017-2021 ACS",
         subtitle = "Counties in New York",
         caption = "Data acquired with R and tidycensus",
         x = "ACS Percentages",
         y = "") + 
  theme_minimal(base_size = 12)

Interactive Graph

Additionally, the margin of error was included for each county; the minimum and maximum error ranges are depicted with lines connected to each point. Interactive aspects were added to the plot to show details and enable users to focus on specific details.

Finally, the figure was saved in order to be incorporated in this HTML publication.

ny_plot_errorbar <- ggplot(ny_education3, aes(x = percent_graduateE, 
                                              y = reorder(NAME, percent_graduateE)))+ # sort by value
  geom_errorbar(aes(xmin = percent_graduateE - percent_graduateM, #add margin error information 
                    xmax = percent_graduateE + percent_graduateM),
                width = 0.5, linewidth = 0.5)+ # depict margin error info with lines
  geom_point(color = "steelblue", size = 2)+ #change how it looks
  scale_x_continuous(n.breaks=12) + #create more breaks in the data for better understanding
  scale_y_discrete(labels = function(x) str_remove(x, " County, New York|, New York")) +
  labs(title = "Percentage of Graduates, 2017-2021 ACS",
         subtitle = "Counties in New York",
         caption = "Data acquired with R and tidycensus",
         x = "ACS Percentages",
         y = "") + 
  theme_minimal(base_size = 12)

ggplotly(ny_plot_errorbar, tooltip = "x") #save the plot which can be displayed in a HTML

Issues

While the graph presents enough information to be able to tell which counties had the highest and lowest residents who hold graduate degrees, the figure is congested. There appears to be too many counties and the fiugure is a little overwhelming. The interactive aspect of the graph helps identify information; however, it still seems too busy.