Overview

For this lab, we’ll work with the tidycensus package to work with data from the U.S. Census Bureau. To do this, we’ll investigate two phenomenon for the state of Michigan:

Part A: the percentage of the population (at a county level) with a graduate-level degree
Part B: TBD

Part A will investigate the data in a aspatial way, while Part B will focus on a spatial analysis.

library(tidycensus)
library(kableExtra)
library(dplyr)
library(stringr)
library(ggplot2)
library(forcats)
library(plotly)
library(sf)
library(mapview)
library(leafpop)

Part A: Graduate Degree Rates

For Part A, we’ll investigate the graduate degree rates for the state of Michigan. While this information is at different geographic resolutions, we’ll focus on county-level data.

Data Construction

In order to get the data we need, we’ll use the tidycensus::get_acs() function.

mi_county_graduate_degrees <- get_acs(
  geography = "county",
  variables = "DP02_0066P",
  state = "Michigan"
)

This results in a dataset with 83 rows, which corresponds to the number of counties in our dataset. Let’s take a peek at the dataframe as it is from the tidycensus package.

mi_county_graduate_degrees %>%
  head(5) %>%
  kbl() %>%
  kable_material(c("striped", "hover"))

GEOID	NAME	variable	estimate	moe
26001	Alcona County, Michigan	DP02_0066P	8.1	1.2
26003	Alger County, Michigan	DP02_0066P	6.4	1.1
26005	Allegan County, Michigan	DP02_0066P	8.4	0.7
26007	Alpena County, Michigan	DP02_0066P	6.5	1.1
26009	Antrim County, Michigan	DP02_0066P	13.2	1.2

The data is already tidy enough for visualization, which is great! Since we’re only focused on 1 state, we can trim the county names to only be the actual county name & remove the “, Michigan” part. To do this, we’ll make use of the stringr package.

mi_county_graduate_degrees <- mi_county_graduate_degrees %>%
  mutate(NAME = str_remove(NAME, ", Michigan"))

mi_county_graduate_degrees %>%
  head(5) %>%
  kbl() %>%
  kable_material(c("striped", "hover"))

GEOID	NAME	variable	estimate	moe
26001	Alcona County	DP02_0066P	8.1	1.2
26003	Alger County	DP02_0066P	6.4	1.1
26005	Allegan County	DP02_0066P	8.4	0.7
26007	Alpena County	DP02_0066P	6.5	1.1
26009	Antrim County	DP02_0066P	13.2	1.2

Results

First, let’s look at our extreme values: which county has the highest and lowest percentage of graduate degree holders?

max_pct <- max(mi_county_graduate_degrees$estimate)
min_pct <- min(mi_county_graduate_degrees$estimate)
max_county <- mi_county_graduate_degrees$NAME[mi_county_graduate_degrees$estimate == max_pct]
min_county <- mi_county_graduate_degrees$NAME[mi_county_graduate_degrees$estimate == min_pct]

data.frame(
  Category = c("Highest", "Lowest"),
  County = c(max_county, min_county),
  Percent = c(scales::percent(max_pct/100), scales::percent(min_pct/100))
) %>%
  kbl() %>%
  kable_material(c("striped", "hover"))

Category	County	Percent
Highest	Washtenaw County	31%
Lowest	Lake County	3%

And we can create an error bar chart to visualize all of the counties & their margins-of-error (MoE).

mi_county_graduate_degrees %>%
  mutate(NAME = fct_reorder(NAME, estimate)) %>%
  ggplot(aes(x=NAME, y=estimate)) +
  geom_pointrange(aes(ymin=estimate-moe, ymax=estimate+moe)) +
  scale_y_continuous(labels = scales::percent_format(scale=1)) +
  coord_flip() +
  theme_minimal() +
  labs(
    title = "Graduate degree percentages in Michigan",
    y = "Percent of Residents with Gradute Degree",
    x = "County"
  )

We can clearly see in the plot that Washtenaw County stands out as an outlier for the rate of graduate degree holders. This makes sense though - Washtenaw County contains the city of Ann Arbor, home of the University of Michigan; plenty of Wasthenaw County residents will be faculty with higher education degrees (including some Ph.D. students!).

We can use the plotly package to make this type of chart interactive.

mi_county_graduate_degrees %>%
  mutate(NAME = fct_reorder(NAME, estimate)) %>%
  plot_ly(
  type = "scatter",
  y = ~NAME,
  x = ~estimate/100,
  mode = "markers",
  error_x = ~list(array = moe/100,
                        color = '#000000'),
  height = 800
  ) %>%
  layout(
    title = "Graduate degree percentages in Michigan",
    xaxis = list(
      title = "Percent of Residents with Gradute Degree",
      tickformat = ".0%"
    ),
    yaxis = list(title = "County")
  )

Part B: Median Age

For a spatial analysis of U.S. Census Data, we’ll pivot to a different variable: median age.

Data Construction

Assuming that we don’t know which variable contains this information in the ACS data, we’ll use the tidycensus::load_variables function to find it. The table below shows a preview of what’s returned by the function for the 2023 American Community Survey 5-Year Estimates.

load_variables(2023, "acs5") %>%
  head(5) %>%
  kbl() %>%
  kable_material(c("striped", "hover"))

name	label	concept	geography
B01001A_001	Estimate!!Total:	Sex by Age (White Alone)	tract
B01001A_002	Estimate!!Total:!!Male:	Sex by Age (White Alone)	tract
B01001A_003	Estimate!!Total:!!Male:!!Under 5 years	Sex by Age (White Alone)	tract
B01001A_004	Estimate!!Total:!!Male:!!5 to 9 years	Sex by Age (White Alone)	tract
B01001A_005	Estimate!!Total:!!Male:!!10 to 14 years	Sex by Age (White Alone)	tract

Thumbing through table, we find that the overall median age variable is "B01002A_001". So then we go back to our get_acs() function to get this table; except this time, we’ll also use the function to fetch the spatial information for the county. The table below shows a preview of what the data looks like (very similar to the prior table, but with the additional spatial column).

mi_median_age <- get_acs(
  geography = "county",
  variables = "B01002A_001",
  state = "Michigan",
  geometry = TRUE
) %>%
  mutate(NAME = str_remove(NAME, ", Michigan"))

mi_median_age %>%
  head(5) %>%
  kbl() %>%
  kable_material(c("striped", "hover"))

GEOID	NAME	variable	estimate	moe	geometry
26041	Delta County	B01002A_001	49.7	0.5	MULTIPOLYGON (((-86.65922 4…
26055	Grand Traverse County	B01002A_001	44.5	0.3	MULTIPOLYGON (((-85.58321 4…
26007	Alpena County	B01002A_001	48.7	0.6	MULTIPOLYGON (((-83.21207 4…
26103	Marquette County	B01002A_001	40.0	0.4	MULTIPOLYGON (((-87.40383 4…
26147	St. Clair County	B01002A_001	45.6	0.2	MULTIPOLYGON (((-82.99626 4…

Another data construction step we need for creating a proportional symbol map: we need to calculate the centroid of each county - this is very easily done via the sf::st_centroid function.

mi_county_centroids <- st_centroid(mi_median_age, of_largest_polygon = TRUE)

Results

Now that we have the dataset we need, we can calculate both an static & interactive spatial view of the data. Let’s start with the static map - this will be a proportional symbol map.

ggplot() +
  geom_sf(data = mi_median_age, fill = "lightgrey") +
  geom_sf(data = mi_county_centroids, aes(size = estimate, color = estimate)) +
  scale_color_continuous(limits=c(30, 60), breaks=seq(30, 60, by=10)) +
  scale_size_continuous(limits=c(30, 60), breaks=seq(30, 60, by=10)) +
  guides(color= guide_legend(), size=guide_legend()) +
  theme_void() +
  labs(
    title = "Median Age for Counties in Michigan",
    color = "Median Age",
    size = "Median Age"
  )

From this static map, we can see a relatively clear pattern: the southern side of Michigan is seemingly younger than the northern part of the state. This also makes sense - northern Michigan is known to be the area retirees in Michigan live to enjoy a more natural, serene life.

To make the map more interactive, we can use the same dataset but instead use the mapview R package.

mapview(mi_median_age, alpha.regions = 0.2, legend = FALSE, popup = FALSE, label = FALSE) +
  mapview(
    mi_county_centroids,
    layer.name = "Median Age",
    zcol = "estimate",
    popup = popupTable(
      mi_county_centroids,
      zcol=c("NAME", "estimate", "moe")
    )
  )

GEOG 588 Lab 5

Alan Budd

2025-02-17

Overview

Part A: Graduate Degree Rates

Data Construction

Results

Part B: Median Age

Data Construction

Results