For this portion of the lab, I elected to examine the percentage of
population in Illinois with graduate degrees. The data was gathered at
the county level using tidycensus from 2017-2021.
The first step, as with any R project, is to load the relevant packages.
# Loading Packages ----
library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(ggiraph)
Next, we gather the pertinent data using get_acs() from
the tidycensus package. We are examining at the variable
DP02_0066P, which is available at the county level, and depicts
percent graduate degrees in Illinois.
# Getting the 5 yr ACS data for counties with grad degrees in IL
IL_grad_degree <- get_acs(
geography = "county",
state = "IL",
variables = "DP02_0066P",
year = 2021
)
With the data gathered, the first plot is created using
ggplot. The data is reordered so the largest estimate of
graduate education will be listed first on the y-axis and the smallest
at the bottom. Additionally, error bars are created using the variable
moe.
# Creating plot of IL counties with grad degrees
IL_graded_error <- ggplot(IL_grad_degree, aes(x = estimate,
y = reorder(NAME, estimate))) +
geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), # Creating Error Bars
width = 0.5, linewidth = 0.5, color = "red") +
geom_point(color = "purple", size = 2) +
scale_x_continuous(labels = percent_format(scale = 1)) +
scale_y_discrete(labels = function(x) str_remove(x, " County, Illinois")) + # Removing "county"
labs(title = "Percent of Population w/ Graduate Degree, 2017-2021 ACS",
subtitle = "Counties in Illinois",
caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
x = "ACS estimate",
y = "County") +
theme_bw(base_size = 10) +
theme(axis.text.y = element_text(size = 8, face = "bold")) # Changing Y axis text size
IL_graded_error
Figure 1. Percentage of the population by county with graduate education.
Next, we create two interactive plots. The first will use
plotly. The process is fairly straightforward, where the
original plot, which is stored in variable IL_graded_error is
passed into the ggplotly function. The tooltip is set to
show botht the x- and y-axis variables, although they are not cleaned
up.
# Creating interactive plots
ggplotly(IL_graded_error, tooltip = c("x", "y"))
Figure 2. Percentage of the population by county with graduate education. This plot is interactive and shows the exact value for each point
The second interactive plot uses ggiraph. Again, we pass
the original plot variable, IL_graded_error, through the
girafe function, setting the highlight color for each point
to red.
girafe(ggobj = IL_graded_error) %>%
girafe_options(opts_hover(css = "fill:red;"))
Figure 3. Percentage of the population by county with graduate education. This plot is interactive and shows the exact value for each point.
Examiningg all three figures, which are built on the same data, show that Champaign county in central Illinois has the highest percentage of population with graduate degrees by a significant margin (~4.5% higher than Dupage County). Considering Champaign County is home to the University of Illinois, it is no surprise their is a high population of graduate education. Dupage, Lake, and Cook counties, ranked 2, 3, and 5 respectively, encompass Chicago and surrounding suburbs, which also house universities, such as Northwestern. Jackson Count, number 4 on the list, is home to Southern Illinois University. Of not, the error bars for Cook County are extremely tight, which is due to the high population in the county.
The county with the least graduate education, at less than 2.5%, is Brown County, located in the western portion of central Illinois.