This report documents US Census data pertaining to the percentage of the population within New York state which have a graduate degree. This information is organized at the County level.
The following R packages were used to obtain, tidy, organize, and plot the US Census data.
library(tidycensus) #used to obtain US Census data
library(here) #used for easy referencing
library(tidyverse) #used to tidy data
library(ggplot2) #used to plot data
library(scales) #customise appearance of plots
library(plotly) #read in package to allow interactive plots
library(htmlwidgets) #read in package to save as HTML file
To focus on only data pertinent to answering questions about graduates within New York, only specific data was downloaded and read in from US Census, American Community Survey (ACS) data. Using get_acs() from tidycensus, the percentage of graduates (“DP02_0066P”) for the state of New York at the county level, for the period of from 2017 through 2021, was obtained and read into R.
#Masters ONLY = ny_education3
ny_education3 <- get_acs(
geography = "county", #at the county level
state = "NY", #for New York state
variables = c(percent_graduate = "DP02_0066P"), #%masters
year = 2021, #for the 2021 year
output = "wide" #display wide format
)
According to the data, approximate 31% of residents within Tompkins County held graduate degrees; this represented the highest percentage within New York. Approximately 6% of residents within Orleans County held graduate degrees; this represented the lowest percentage within the state.
This information was obtained by arranging the percentage of graduates (“DP02_0066P”) data for the state of New York.
names(ny_education3) #identify the correct names of the columns
## [1] "GEOID" "NAME" "percent_graduateE"
## [4] "percent_graduateM"
#2Q) Which counties in the selected state have the largest percentages of graduate degree holders?
#2A) Tompkins County, New York with 31%
arrange(ny_education3, desc(percent_graduateE)) #sort by deceasing order
## # A tibble: 62 × 4
## GEOID NAME percent_graduateE percent_graduateM
## <chr> <chr> <dbl> <dbl>
## 1 36109 Tompkins County, New York 31 1.6
## 2 36061 New York County, New York 30.6 0.5
## 3 36119 Westchester County, New York 25.8 0.5
## 4 36001 Albany County, New York 21.8 0.8
## 5 36059 Nassau County, New York 21.5 0.3
## 6 36091 Saratoga County, New York 19.1 0.8
## 7 36087 Rockland County, New York 18.9 0.7
## 8 36079 Putnam County, New York 18.1 1.3
## 9 36055 Monroe County, New York 18 0.4
## 10 36027 Dutchess County, New York 17.6 0.7
## # ℹ 52 more rows
#3Q) Which have the smallest percentages?
#3A) Orleans County, New York with 5.9%
arrange(ny_education3, percent_graduateE) #sort by ascending order
## # A tibble: 62 × 4
## GEOID NAME percent_graduateE percent_graduateM
## <chr> <chr> <dbl> <dbl>
## 1 36073 Orleans County, New York 5.9 0.9
## 2 36121 Wyoming County, New York 7 0.8
## 3 36005 Bronx County, New York 7.6 0.2
## 4 36035 Fulton County, New York 7.9 1.1
## 5 36049 Lewis County, New York 7.9 1.1
## 6 36017 Chenango County, New York 8.2 1
## 7 36057 Montgomery County, New York 8.4 0.9
## 8 36115 Washington County, New York 8.6 0.9
## 9 36011 Cayuga County, New York 8.9 0.8
## 10 36037 Genesee County, New York 8.9 0.8
## # ℹ 52 more rows
To help visualize the percentage of the population (for each county) which hold a graduate degree, the data was plotted using ggplot2.
Details were added to the plot graph to enhance readability and understanding. This included adding a title, caption, and X and y references. Additionally the appearance was changed; the wording “New York” was removed from each data point, the base color was removed, the point color was changed to read, and additional breaks were added.
ggplot(ny_education3, aes(x = percent_graduateE,
y = reorder(NAME, percent_graduateE)))+ # sort by value
geom_errorbar(aes(xmin = percent_graduateE - percent_graduateM, #add margin error information
xmax = percent_graduateE + percent_graduateM),
width = 0.5, linewidth = 0.5)+ # depict margin error info with lines
geom_point(color = "steelblue", size = 2)+ #change how it looks
scale_x_continuous(n.breaks=12) + #create more breaks in the data for better understanding
scale_y_discrete(labels = function(x) str_remove(x, " County, New York|, New York")) +
labs(title = "Percentage of Graduates, 2017-2021 ACS",
subtitle = "Counties in New York",
caption = "Data acquired with R and tidycensus",
x = "ACS Percentages",
y = "") +
theme_minimal(base_size = 12)
Additionally, the margin of error was included for each county; the minimum and maximum error ranges are depicted with lines connected to each point. Interactive aspects were added to the plot to show details and enable users to focus on specific details.
Finally, the figure was saved in order to be incorporated in this HTML publication.
ny_plot_errorbar <- ggplot(ny_education3, aes(x = percent_graduateE,
y = reorder(NAME, percent_graduateE)))+ # sort by value
geom_errorbar(aes(xmin = percent_graduateE - percent_graduateM, #add margin error information
xmax = percent_graduateE + percent_graduateM),
width = 0.5, linewidth = 0.5)+ # depict margin error info with lines
geom_point(color = "steelblue", size = 2)+ #change how it looks
scale_x_continuous(n.breaks=12) + #create more breaks in the data for better understanding
scale_y_discrete(labels = function(x) str_remove(x, " County, New York|, New York")) +
labs(title = "Percentage of Graduates, 2017-2021 ACS",
subtitle = "Counties in New York",
caption = "Data acquired with R and tidycensus",
x = "ACS Percentages",
y = "") +
theme_minimal(base_size = 12)
ggplotly(ny_plot_errorbar, tooltip = "x") #save the plot which can be displayed in a HTML
While the graph presents enough information to be able to tell which counties had the highest and lowest residents who hold graduate degrees, the figure is congested. There appears to be too many counties and the fiugure is a little overwhelming. The interactive aspect of the graph helps identify information; however, it still seems too busy.