Background
I hold a PhD in Environmental Science and I am always interested in the education levels of a population. I like to pursue questions about PhD holders. For this project, I was inspired by the University of Virginia’s Racial Dot Map (no longer available because of outdated data). I was curious about the spatial distribution of the population holding PhDs in New York State (my home state and my PhD granting state).
I followed this and this tutorial.
Load Libraries
library(tidyverse)
library(sf)
library(tidycensus)
library(ggshadow)
Get Census Data
The US Census American Community Survey (ACS) has a wide variety of demographic information, including education attainment. The census variables are complex to navigate and I like to challenge myself with sourcing this data. The tidycensus R package simplifies the process somewhat. Tidycensus provides not only the numerical data but also the spatial data (starting from block-group level up to national level).
# Get most recent American Community Survey (collected every 5 years) (2017 - 2021) data for educational attainment
ny_ed <- tidycensus::get_acs(
# at census tract level
geography = "tract",
# people with doctorate
variables = paste0("B15003_025"),
# total population
summary_var = "B15003_001",
state = "NY",
geometry = TRUE
)
# get the shapes of NY counties because I want to map those separately
counties <- tidycensus::get_acs(
state = "NY",
geography = "county",
# Total Population
variables = "B01001_001",
geometry = TRUE
)
# For some reason, there are 17 empty polygons in the dataset. We will drop those
ny_ed_clean <- ny_ed %>%
filter(!st_is_empty(.))
View Census Tracts
mapview::mapview(ny_ed_clean)
Process Data
The Dot Density process takes the population of a shape (in this case a census tract) and turns that population into random dots inside the shape. For large areas, it is good practice to turn groups of individuals into one dot. For our case, we will turn every 10 individuals into 1 dot. We don’t know the location of each individual, so to simulate approximate dot locations the dots will be randomly distributed across each census tract. (This is going to take a while)
# Divide the population in the census tract by 10. Each dot will represent 10 people
num_dots <- ceiling(select(as.data.frame(ny_ed_clean), estimate) / 10)
# Turn the number of people in each census tract into it's own randomly located point. Each dot will represent 10 people
sf_dots <- purrr::map_df(names(num_dots),
~ st_sample(ny_ed_clean, size = num_dots[,.x], type = "random") %>%
# generate the points in each polygon
st_cast("POINT") %>%
# cast the geom set as 'POINT' data
st_coordinates() %>%
# pull out coordinates into a matrix
as_tibble() %>%
# convert to tibble
setNames(c("lon","lat")))
# Plot the dots
ggplot() +
# map county boundaries
geom_sf(data = counties, fill = "transparent", colour = "white", lwd = 3) +
# add dots and make them glow pink with the ggshadow package
ggshadow::geom_glowpoint(data = sf_dots, aes(lon, lat), color="#FB2576") +
# set coordinate system
coord_sf(crs = 4326, datum = NA) +
# Make the base size large
theme_void(base_size = 48) +
# dark background
theme(
plot.background = element_rect(fill = "#212121", color = NA),
panel.background = element_rect(fill = "#212121", color = NA),
legend.background = element_rect(fill = "#212121", color = NA))
# Save the image as a really large png. The large size is necessary to display the dots correctly, otherwise they will show up really huge
ggsave("Plots/phd_points_clean.png", dpi = 320, width = 85, height = 70, units = "cm")