Not even a real lab today – just a brief tour of the cartographical capabilities built into R. By way of caveats:
R can do full blown geographic computation – proximty math, density counts, geographical statistics. But that’s not what we’re talking about today.
You can do proper cartography. But we’re just doing a thin simulacrum.
When I started working with geographical data in the late 00s, we
were stuck with horrid *.shp
files.
And these files were horrid:
The .shp
files would be read with
rgdal::readOGR(..., layer = "shapefile")\
or something
comparably naff. Simple features, an open source geospatial data storage
standard, has simplified things.
We’re going to do a simple exercise: visit the Ohio Secretary of State’s website (https://www.ohiosos.gov/elections/ohio-candidates/district-maps/) and download the most recent Congressional district shapefile.
Open the zipfile in an accessible directory, and load it
library(tidyverse)
library(magrittr)
library(sf)
s1 <- "March_2nd_CD_SHPs.shp" %>%
read_sf
We can plot s1
s1 %>%
select(DISTRICT) %>%
plot
Whoooooaaa hang on we can do dplyr
bullshit to an
sf
object?!?
Confirm this for yourself
s1 %>%
class
## [1] "sf" "tbl_df" "tbl" "data.frame"
Ohhhhh this is awesome – all these independently developed R standards are interoperable! What a miracle Hadley has been for this computational 1
Let’s see if we can make the figure more interesting by coloring each district by their Trump vote share
t1 <- "https://github.com/thomasjwood/code_lab/raw/main/data/national_cong119_district_data.csv" %>%
read_csv()
t1
## # A tibble: 435 × 37
## CONG119 STATE DISTRICT TOTPOP20 `NH White` NH Black plus NH Black and W…¹
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 AK-AT-LARGE AK AT-LARGE 733391 0.575 0.0355
## 2 AL-01 AL 01 717754 0.723 0.170
## 3 AL-02 AL 02 717754 0.416 0.499
## 4 AL-03 AL 03 717754 0.700 0.212
## 5 AL-04 AL 04 717754 0.802 0.0759
## 6 AL-05 AL 05 717754 0.678 0.184
## 7 AL-06 AL 06 717755 0.712 0.182
## 8 AL-07 AL 07 717754 0.388 0.533
## 9 AR-01 AR 01 752509 0.733 0.181
## 10 AR-02 AR 02 752710 0.663 0.213
## # ℹ 425 more rows
## # ℹ abbreviated name: ¹`NH Black plus NH Black and White`
## # ℹ 31 more variables: `NH Asian plus NH Asian and White` <dbl>,
## # `NH American Indian plus NH American Indian and White` <dbl>,
## # `NH Pacific Islander plus NH Pacific Islander and White` <dbl>,
## # `NH Some Other Race plus NH Some Other Race and White` <dbl>,
## # `NH Other multiple-race` <dbl>, Hispanic <dbl>, TOT_HOUS22_est <dbl>, …
Ok to join these attributes to our shapefile we’ll need a common
DISTRICT
variable, which we can simply make from
t1
s1 <- s1 %>%
mutate(
DISTRICT = DISTRICT %>%
as.numeric
) %>%
left_join(
t1 %>%
filter(STATE == "OH") %>%
mutate(
DISTRICT = DISTRICT %>%
as.numeric
)
)
s1 %>%
ggplot(
aes(
fill = G20PRERTRU_pct
)
) +
geom_sf() +
scale_fill_gradient2(
high = "#e91b23",
low = "#0048c0",
mid = "#753272",
midpoint = .5
)
I hope you’re experiencing something akin to Pip in Great Expectations at the end of the novel, where he realizes that all the time toiling in the forge with Joe Gargery at the start, which he found to be dreary and servile, this experience actually instilled in him character and loyalty.
Or maybe it’s like in Karate Kid where Daniel learns that he wasn’t just sanding Mr Miyagi’s deck. Whatever, I’m aware your generation is averse to books.
Now it’s all just little stuff, like graphical conventions, and geographical conventions. To embed the intuition, let’s make a simple figure for changes in counties’ presidential votes.
We’ll load a nice data set for counties’ presidential vote
# install if necessary
# devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr)
t2 <- "https://github.com/thomasjwood/ps7160/raw/refs/heads/master/countyvote/pres_vote_by_county_1868_2024.rds" %>%
url %>%
readRDS
s_states <- "states" %>%
get_urbn_map(sf = TRUE)
s_counties <- "counties" %>%
get_urbn_map(sf = TRUE)
Say we want to depict the change in presidential vote for every county from 1992-2024 – we might
t3 <- s_counties %>%
full_join(
t2 %>%
select(
fips, election_year, r_change
) %>%
filter(
election_year %>%
is_in(
seq(1992, 2024, 4)
)
) %>%
rename(
county_fips = fips
)
)
t3
## Simple feature collection with 28130 features and 8 fields (with 12 geometries empty)
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -2600000 ymin: -2363000 xmax: 2516374 ymax: 732352.2
## Projected CRS: NAD27 / US National Atlas Equal Area
## First 10 features:
## county_fips state_abbv state_fips county_name fips_class state_name
## 1 04015 AZ 04 Mohave County H1 Arizona
## 2 04015 AZ 04 Mohave County H1 Arizona
## 3 04015 AZ 04 Mohave County H1 Arizona
## 4 04015 AZ 04 Mohave County H1 Arizona
## 5 04015 AZ 04 Mohave County H1 Arizona
## 6 04015 AZ 04 Mohave County H1 Arizona
## 7 04015 AZ 04 Mohave County H1 Arizona
## 8 04015 AZ 04 Mohave County H1 Arizona
## 9 04015 AZ 04 Mohave County H1 Arizona
## 10 12035 FL 12 Flagler County H1 Florida
## election_year r_change geometry
## 1 1992 -12.5871235 MULTIPOLYGON (((-1321573 -8...
## 2 1996 1.1791508 MULTIPOLYGON (((-1321573 -8...
## 3 2000 6.2862648 MULTIPOLYGON (((-1321573 -8...
## 4 2004 5.9546176 MULTIPOLYGON (((-1321573 -8...
## 5 2008 2.5251611 MULTIPOLYGON (((-1321573 -8...
## 6 2012 4.8266617 MULTIPOLYGON (((-1321573 -8...
## 7 2016 5.3850410 MULTIPOLYGON (((-1321573 -8...
## 8 2020 -0.9755463 MULTIPOLYGON (((-1321573 -8...
## 9 2024 2.0835188 MULTIPOLYGON (((-1321573 -8...
## 10 1992 -12.2409199 MULTIPOLYGON (((1785548 -15...
Ok we have that nice field t3$r_change
to that looks
like counties shifting lefty righty over time. So we might just go
t3 %>%
ggplot(
aes(
fill = r_change
)
) +
geom_sf() +
facet_wrap(~election_year)
Hmmm yucky defaults let’s refine
t3 %>%
filter(
r_change %>%
is.na %>%
not
) %>%
ggplot(
aes(
fill = r_change
)
) +
geom_sf() +
facet_wrap(
~election_year,
ncol = 3
) +
scale_fill_gradient2(
high = "#e91b23",
low = "#0048c0",
mid = "#753272",
midpoint = .5
)
Ooofff this still sucks because of the freaking county boundaries! This wrecks county choropleths constantly – check out this, where the simple historical quirk of smaller counties in east wrecked the ability to infer something from the fill dimension of the map
t3 %>%
filter(
r_change %>%
is.na %>%
not
) %>%
ggplot(
aes(
fill = r_change
)
) +
geom_sf(
color = "transparent"
) +
facet_wrap(
~election_year,
ncol = 3
) +
scale_fill_gradient2(
high = "#e91b23",
low = "#0048c0",
mid = "#753272",
midpoint = 0
)
A little better, but we can even better accentuate the change by removing stability
t3 %>%
filter(
r_change %>%
is.na %>%
not
) %>%
ggplot(
aes(
fill = r_change
)
) +
geom_sf(
color = "transparent"
) +
facet_wrap(
~election_year,
ncol = 3
) +
scale_fill_gradient2(
high = "#e91b23",
low = "#0048c0",
mid = "white",
midpoint = 0
)
A little better, but let’s also add state boundaries for clarity
t3 %>%
filter(
r_change %>%
is.na %>%
not
) %>%
ggplot() +
geom_sf(
aes(
fill = r_change
),
color = "transparent"
) +
geom_sf(
linewidth = .3,
color = "grey30",
fill = "transparent",
data = s_states
) +
facet_wrap(
~election_year,
ncol = 3
) +
scale_fill_gradient2(
high = "#e91b23",
low = "#0048c0",
mid = "white",
midpoint = 0
)
Let’s just clean up the aesthetics
p1 <- t3 %>%
filter(
r_change %>%
is.na %>%
not
) %>%
ggplot() +
geom_sf(
aes(
fill = r_change
),
color = "transparent"
) +
geom_sf(
linewidth = .1,
color = "grey30",
fill = "transparent",
data = s_states
) +
facet_wrap(
~election_year,
ncol = 3
) +
labs(
title = "Change in two-party presidential vote by county, 1992-2024",
fill = "GOP two-party (%) -\npreceding cycle's GOP two-party (%)",
) +
scale_y_continuous(breaks = NULL) +
scale_x_continuous(breaks = NULL) +
scale_fill_gradient2(
low = "#2fabe1",
midpoint = 0,
high = "#e41b22",
mid = "white",
guide = guide_colorbar(
direction = "horizontal"
)
) +
theme_minimal(
base_family = "Public Sans"
) +
theme(
panel.background =
element_rect(color = "grey95",
fill = "grey95"),
panel.grid = element_blank(),
strip.text = element_text(
size = 8
),
strip.background = element_rect(color = "grey95",
fill = "grey95"),
legend.background = element_rect(color = "white",
fill = "white"),
legend.margin = margin(-.1, 0, 0, 0, "cm"),
panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold", hjust = 0),
plot.caption = element_text(face = "italic", size = 7),
# legend.position = c(.75, .25),
legend.position = "bottom",
strip.text.y = element_text(angle = 0),
legend.title.position = "left",
legend.title = element_text(hjust = .5),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.line = element_blank()
)
Ok an exercise!
How would we replicate this map in R?