Examining Graduate Degrees in Illinois

Introduction

For this portion of the lab, I elected to examine the percentage of population in Illinois with graduate degrees. The data was gathered at the county level using tidycensus from 2017-2021.

Data Preparation

The first step, as with any R project, is to load the relevant packages.

# Loading Packages ----
library(tidycensus)
library(tidyverse)
library(scales)
library(plotly)
library(ggiraph)

Next, we gather the pertinent data using get_acs() from the tidycensus package. We are examining at the variable DP02_0066P, which is available at the county level, and depicts percent graduate degrees in Illinois.

# Getting the 5 yr ACS data for counties with grad degrees in IL
IL_grad_degree <- get_acs(
  geography = "county",
  state = "IL",
  variables = "DP02_0066P",
  year = 2021
)

Plotting the Data

With the data gathered, the first plot is created using ggplot. The data is reordered so the largest estimate of graduate education will be listed first on the y-axis and the smallest at the bottom. Additionally, error bars are created using the variable moe.

# Creating plot of IL counties with grad degrees
IL_graded_error <- ggplot(IL_grad_degree, aes(x = estimate, 
                                        y = reorder(NAME, estimate))) + 
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), # Creating Error Bars
                width = 0.5, linewidth = 0.5, color = "red") +
  geom_point(color = "purple", size = 2) + 
  scale_x_continuous(labels = percent_format(scale = 1)) + 
  scale_y_discrete(labels = function(x) str_remove(x, " County, Illinois")) + # Removing "county"
  labs(title = "Percent of Population w/ Graduate Degree, 2017-2021 ACS",
       subtitle = "Counties in Illinois",
       caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
       x = "ACS estimate",
       y = "County") + 
  theme_bw(base_size = 10) +
  theme(axis.text.y = element_text(size = 8, face = "bold")) # Changing Y axis text size
IL_graded_error
Figure 1. Percentage of the population by county with graduate education.

Figure 1. Percentage of the population by county with graduate education.

Interactive Plots

Next, we create two interactive plots. The first will use plotly. The process is fairly straightforward, where the original plot, which is stored in variable IL_graded_error is passed into the ggplotly function. The tooltip is set to show botht the x- and y-axis variables, although they are not cleaned up.

# Creating interactive plots
ggplotly(IL_graded_error, tooltip = c("x", "y"))

Figure 2. Percentage of the population by county with graduate education. This plot is interactive and shows the exact value for each point

The second interactive plot uses ggiraph. Again, we pass the original plot variable, IL_graded_error, through the girafe function, setting the highlight color for each point to red.

girafe(ggobj = IL_graded_error) %>%
  girafe_options(opts_hover(css = "fill:red;"))

Figure 3. Percentage of the population by county with graduate education. This plot is interactive and shows the exact value for each point.

Analysis

Examiningg all three figures, which are built on the same data, show that Champaign county in central Illinois has the highest percentage of population with graduate degrees by a significant margin (~4.5% higher than Dupage County). Considering Champaign County is home to the University of Illinois, it is no surprise their is a high population of graduate education. Dupage, Lake, and Cook counties, ranked 2, 3, and 5 respectively, encompass Chicago and surrounding suburbs, which also house universities, such as Northwestern. Jackson Count, number 4 on the list, is home to Southern Illinois University. Of not, the error bars for Cook County are extremely tight, which is due to the high population in the county.

The county with the least graduate education, at less than 2.5%, is Brown County, located in the western portion of central Illinois.