Dot Density Map of NY PhDs

Kinga Stryszowska-Hill

November 2022

Background

I hold a PhD in Environmental Science and I am always interested in the education levels of a population. I like to pursue questions about PhD holders. For this project, I was inspired by the University of Virginia’s Racial Dot Map (no longer available because of outdated data). I was curious about the spatial distribution of the population holding PhDs in New York State (my home state and my PhD granting state).

I followed this and this tutorial.

Load Libraries

library(tidyverse) 
library(sf)
library(tidycensus)
library(ggshadow)

Get Census Data

The US Census American Community Survey (ACS) has a wide variety of demographic information, including education attainment. The census variables are complex to navigate and I like to challenge myself with sourcing this data. The tidycensus R package simplifies the process somewhat. Tidycensus provides not only the numerical data but also the spatial data (starting from block-group level up to national level).

# Get most recent American Community Survey (collected every 5 years) (2017 - 2021) data for educational attainment
ny_ed <- tidycensus::get_acs(
  # at census tract level
 geography = "tract",
 # people with doctorate
 variables = paste0("B15003_025"),  
 # total population 
 summary_var = "B15003_001",            
 state = "NY",
 geometry = TRUE
 )

# get the shapes of NY counties because I want to map those separately
counties <- tidycensus::get_acs(
  state = "NY",
  geography = "county",
  # Total Population
  variables = "B01001_001",
  geometry = TRUE
)

# For some reason, there are 17 empty polygons in the dataset. We will drop those
ny_ed_clean <- ny_ed %>% 
  filter(!st_is_empty(.))

View Census Tracts

mapview::mapview(ny_ed_clean)

Process Data

The Dot Density process takes the population of a shape (in this case a census tract) and turns that population into random dots inside the shape. For large areas, it is good practice to turn groups of individuals into one dot. For our case, we will turn every 10 individuals into 1 dot. We don’t know the location of each individual, so to simulate approximate dot locations the dots will be randomly distributed across each census tract. (This is going to take a while)

# Divide the population in the census tract by 10. Each dot will represent 10 people
num_dots <- ceiling(select(as.data.frame(ny_ed_clean), estimate) / 10)

# Turn the number of people in each census tract into it's own randomly located point. Each dot will represent 10 people
sf_dots <- purrr::map_df(names(num_dots), 
                  ~ st_sample(ny_ed_clean, size = num_dots[,.x], type = "random") %>% 
                    # generate the points in each polygon
                    st_cast("POINT") %>%                                          
                    # cast the geom set as 'POINT' data
                    st_coordinates() %>%                                          
                    # pull out coordinates into a matrix
                    as_tibble() %>%                                               
                    # convert to tibble
                    setNames(c("lon","lat"))) 

# Plot the dots
ggplot() +
  # map county boundaries
  geom_sf(data = counties, fill = "transparent", colour = "white", lwd = 3) +
  # add dots and make them glow pink with the ggshadow package
  ggshadow::geom_glowpoint(data = sf_dots, aes(lon, lat), color="#FB2576") + 
  # set coordinate system
  coord_sf(crs = 4326, datum = NA) +
  # Make the base size large
  theme_void(base_size = 48) +
  # dark background
  theme(
        plot.background = element_rect(fill = "#212121", color = NA), 
        panel.background = element_rect(fill = "#212121", color = NA),
        legend.background = element_rect(fill = "#212121", color = NA))

# Save the image as a really large png. The large size is necessary to display the dots correctly, otherwise they will show up really huge
ggsave("Plots/phd_points_clean.png", dpi = 320, width = 85, height = 70, units = "cm")

Final Map

PhDs in New York are clustered around cities