Cartography

Not even a real lab today – just a brief tour of the cartographical capabilities built into R. By way of caveats:

  • R can do full blown geographic computation – proximty math, density counts, geographical statistics. But that’s not what we’re talking about today.

  • You can do proper cartography. But we’re just doing a thin simulacrum.

1. Simple Features

When I started working with geographical data in the late 00s, we were stuck with horrid *.shp files.

And these files were horrid:



The .shp files would be read with rgdal::readOGR(..., layer = "shapefile")\ or something comparably naff. Simple features, an open source geospatial data storage standard, has simplified things.

We’re going to do a simple exercise: visit the Ohio Secretary of State’s website (https://www.ohiosos.gov/elections/ohio-candidates/district-maps/) and download the most recent Congressional district shapefile.



Open the zipfile in an accessible directory, and load it

library(tidyverse)
library(magrittr)
library(sf)

s1 <- "March_2nd_CD_SHPs.shp" %>% 
  read_sf

We can plot s1

s1 %>%
  select(DISTRICT) %>% 
  plot

Whoooooaaa hang on we can do dplyr bullshit to an sf object?!?

Confirm this for yourself

s1 %>%
  class
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"

Ohhhhh this is awesome – all these independently developed R standards are interoperable! What a miracle Hadley has been for this computational 1

Let’s see if we can make the figure more interesting by coloring each district by their Trump vote share

t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/national_cong119_district_data.csv" %>%
  read_csv()

t1
## # A tibble: 435 × 37
##    CONG119     STATE DISTRICT TOTPOP20 `NH White` NH Black plus NH Black and W…¹
##    <chr>       <chr> <chr>       <dbl>      <dbl>                          <dbl>
##  1 AK-AT-LARGE AK    AT-LARGE   733391      0.575                         0.0355
##  2 AL-01       AL    01         717754      0.723                         0.170 
##  3 AL-02       AL    02         717754      0.416                         0.499 
##  4 AL-03       AL    03         717754      0.700                         0.212 
##  5 AL-04       AL    04         717754      0.802                         0.0759
##  6 AL-05       AL    05         717754      0.678                         0.184 
##  7 AL-06       AL    06         717755      0.712                         0.182 
##  8 AL-07       AL    07         717754      0.388                         0.533 
##  9 AR-01       AR    01         752509      0.733                         0.181 
## 10 AR-02       AR    02         752710      0.663                         0.213 
## # ℹ 425 more rows
## # ℹ abbreviated name: ¹​`NH Black plus NH Black and White`
## # ℹ 31 more variables: `NH Asian plus NH Asian and White` <dbl>,
## #   `NH American Indian plus NH American Indian and White` <dbl>,
## #   `NH Pacific Islander plus NH Pacific Islander and White` <dbl>,
## #   `NH Some Other Race plus NH Some Other Race and White` <dbl>,
## #   `NH Other multiple-race` <dbl>, Hispanic <dbl>, TOT_HOUS22_est <dbl>, …

Ok to join these attributes to our shapefile we’ll need a common DISTRICT variable, which we can simply make from t1

s1 <- s1 %>% 
  mutate(
    DISTRICT = DISTRICT %>% 
      as.numeric
  ) %>% 
  left_join(
    t1 %>% 
      filter(STATE == "OH") %>% 
      mutate(
        DISTRICT = DISTRICT %>% 
          as.numeric
      )
  )

s1 %>% 
  ggplot(
    aes(
      fill = G20PRERTRU_pct
    )
  ) +
  geom_sf() +
  scale_fill_gradient2(
    high = "#e91b23",
    low = "#0048c0",
    mid = "#753272",
    midpoint = .5
  )

I hope you’re experiencing something akin to Pip in Great Expectations at the end of the novel, where he realizes that all the time toiling in the forge with Joe Gargery at the start, which he found to be dreary and servile, this experience actually instilled in him character and loyalty.

Or maybe it’s like in Karate Kid where Daniel learns that he wasn’t just sanding Mr Miyagi’s deck. Whatever, I’m aware your generation is averse to books.

Now it’s all just little stuff, like graphical conventions, and geographical conventions. To embed the intuition, let’s make a simple figure for changes in counties’ presidential votes.

We’ll load a nice data set for counties’ presidential vote

# install if necessary
# devtools::install_github("UrbanInstitute/urbnmapr")

library(urbnmapr)

t2 <- "https://github.com/thomasjwood/ps7160/raw/refs/heads/master/countyvote/pres_vote_by_county_1868_2024.rds" %>% 
  url %>% 
  readRDS

s_states <- "states" %>% 
  get_urbn_map(sf = TRUE)

s_counties <- "counties" %>% 
  get_urbn_map(sf = TRUE)

Say we want to depict the change in presidential vote for every county from 1992-2024 – we might

t3 <- s_counties %>% 
  full_join(
    t2 %>% 
      select(
        fips, election_year, r_change
      ) %>% 
      filter(
        election_year %>% 
          is_in(
            seq(1992, 2024, 4)
          )
      ) %>% 
      rename(
        county_fips = fips
      ) 
  )

t3
## Simple feature collection with 28130 features and 8 fields (with 12 geometries empty)
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -2600000 ymin: -2363000 xmax: 2516374 ymax: 732352.2
## Projected CRS: NAD27 / US National Atlas Equal Area
## First 10 features:
##    county_fips state_abbv state_fips    county_name fips_class state_name
## 1        04015         AZ         04  Mohave County         H1    Arizona
## 2        04015         AZ         04  Mohave County         H1    Arizona
## 3        04015         AZ         04  Mohave County         H1    Arizona
## 4        04015         AZ         04  Mohave County         H1    Arizona
## 5        04015         AZ         04  Mohave County         H1    Arizona
## 6        04015         AZ         04  Mohave County         H1    Arizona
## 7        04015         AZ         04  Mohave County         H1    Arizona
## 8        04015         AZ         04  Mohave County         H1    Arizona
## 9        04015         AZ         04  Mohave County         H1    Arizona
## 10       12035         FL         12 Flagler County         H1    Florida
##    election_year    r_change                       geometry
## 1           1992 -12.5871235 MULTIPOLYGON (((-1321573 -8...
## 2           1996   1.1791508 MULTIPOLYGON (((-1321573 -8...
## 3           2000   6.2862648 MULTIPOLYGON (((-1321573 -8...
## 4           2004   5.9546176 MULTIPOLYGON (((-1321573 -8...
## 5           2008   2.5251611 MULTIPOLYGON (((-1321573 -8...
## 6           2012   4.8266617 MULTIPOLYGON (((-1321573 -8...
## 7           2016   5.3850410 MULTIPOLYGON (((-1321573 -8...
## 8           2020  -0.9755463 MULTIPOLYGON (((-1321573 -8...
## 9           2024   2.0835188 MULTIPOLYGON (((-1321573 -8...
## 10          1992 -12.2409199 MULTIPOLYGON (((1785548 -15...

Ok we have that nice field t3$r_change to that looks like counties shifting lefty righty over time. So we might just go

t3 %>% 
  ggplot(
    aes(
      fill = r_change
    )
  ) +
  geom_sf() +
  facet_wrap(~election_year)

Hmmm yucky defaults let’s refine

t3 %>% 
  filter(
    r_change %>% 
      is.na %>% 
      not
  ) %>% 
  ggplot(
    aes(
      fill = r_change
    )
  ) +
  geom_sf() +
  facet_wrap(
    ~election_year,
    ncol = 3
  ) +
  scale_fill_gradient2(
    high = "#e91b23",
    low = "#0048c0",
    mid = "#753272",
    midpoint = .5
  )

Ooofff this still sucks because of the freaking county boundaries! This wrecks county choropleths constantly – check out this, where the simple historical quirk of smaller counties in east wrecked the ability to infer something from the fill dimension of the map

t3 %>% 
  filter(
    r_change %>% 
      is.na %>% 
      not
  ) %>% 
  ggplot(
    aes(
      fill = r_change
    )
  ) +
  geom_sf(
    color = "transparent"
  ) +
  facet_wrap(
    ~election_year,
    ncol = 3
  ) +
  scale_fill_gradient2(
    high = "#e91b23",
    low = "#0048c0",
    mid = "#753272",
    midpoint = 0
  )

A little better, but we can even better accentuate the change by removing stability

t3 %>% 
  filter(
    r_change %>% 
      is.na %>% 
      not
  ) %>% 
  ggplot(
    aes(
      fill = r_change
    )
  ) +
  geom_sf(
    color = "transparent"
  ) +
  facet_wrap(
    ~election_year,
    ncol = 3
  ) +
  scale_fill_gradient2(
    high = "#e91b23",
    low = "#0048c0",
    mid = "white",
    midpoint = 0
  )

A little better, but let’s also add state boundaries for clarity

t3 %>% 
  filter(
    r_change %>% 
      is.na %>% 
      not
  ) %>% 
  ggplot() +
  geom_sf(
    aes(
      fill = r_change
    ),
    color = "transparent"
  ) +
  geom_sf(
    linewidth  = .3,
    color = "grey30",
    fill = "transparent",
    data = s_states
  ) +
  facet_wrap(
    ~election_year,
    ncol = 3
  ) +
  scale_fill_gradient2(
    high = "#e91b23",
    low = "#0048c0",
    mid = "white",
    midpoint = 0
  )

Let’s just clean up the aesthetics

p1 <- t3 %>% 
  filter(
    r_change %>% 
      is.na %>% 
      not
  ) %>% 
  ggplot() +
  geom_sf(
    aes(
      fill = r_change
    ),
    color = "transparent"
  ) +
  geom_sf(
    linewidth  = .1,
    color = "grey30",
    fill = "transparent",
    data = s_states
  ) +
  facet_wrap(
    ~election_year,
    ncol = 3
  ) +
  labs(
    title = "Change in two-party presidential vote by county, 1992-2024",
    fill = "GOP two-party (%) -\npreceding cycle's GOP two-party (%)",
  ) +
  scale_y_continuous(breaks = NULL) +
  scale_x_continuous(breaks = NULL) +
  scale_fill_gradient2(
    low =  "#2fabe1",
    midpoint = 0,
    high = "#e41b22",
    mid = "white",
    guide = guide_colorbar(
      direction = "horizontal"
    )
  ) + 
  theme_minimal(
    base_family = "Public Sans"
  ) +
  theme(
    panel.background  = 
      element_rect(color = "grey95",
                   fill = "grey95"),
    panel.grid = element_blank(),
    strip.text = element_text(
      size =  8
    ),
    strip.background = element_rect(color = "grey95",
                                    fill = "grey95"),
    legend.background = element_rect(color = "white",
                                     fill = "white"),
    legend.margin = margin(-.1, 0, 0, 0, "cm"),
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold", hjust = 0),
    plot.caption = element_text(face = "italic", size = 7),
    # legend.position = c(.75, .25),
    legend.position = "bottom",
    strip.text.y = element_text(angle = 0),
    legend.title.position = "left",
    legend.title = element_text(hjust = .5),
    axis.text = element_blank(),
    axis.ticks = element_blank(),      
    axis.line = element_blank()
  )


Ok an exercise!

An excercise

How would we replicate this map in R?


  1. Now it’s separately a super interesting question of sociology/economics/business management, that these decentralized, unplanned, spontaneous orders, among actors with no profit motives or other incentives to coordinator, why do software emerging from such a development work so well? No managers, no one explicitly coordinating, no one forcing a new dev to make their software interoperable, no one even getting paid or promised an equity windfall – and yet this software crushes proprietary slop. What’s our theory of human behavior/organizational incentives to account for this?↩︎