First, we load some packages. tidyverse is helpful for general data manipulation, while sf will give us access to some powerful mapping tools.
We can use st_read() to read in the shapefile (.shp) of the 2022 New Jersey plan. I downloaded this shapefile from Dave’s Redistricting.
nj <- st_read("../data/district-shapes/nj_2022.shp")
## Reading layer `nj_2022' from data source
## `/Users/tylersimko/Documents/redist_workshop/data/district-shapes/nj_2022.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -75.56359 ymin: 38.78866 xmax: -73.88506 ymax: 41.35761
## Geodetic CRS: WGS 84
Let’s take a look at this data. In particular, take a look at the geometry column, which you may have never seen in an R object before. This column contains the geographic information for each district.
nj
## Simple feature collection with 12 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -75.56359 ymin: 38.78866 xmax: -73.88506 ymax: 41.35761
## Geodetic CRS: WGS 84
## First 10 features:
## id NAME TotalPop PopDevPc DemPct RepPct WhitePct Minority
## 1 1 1 773585 4.30e-07 0.6117779 0.3655424 0.6305235 0.3694765
## 2 2 2 773584 -8.60e-07 0.4615078 0.5141808 0.6738186 0.3261814
## 3 3 3 773586 1.72e-06 0.5399854 0.4390755 0.6587511 0.3412489
## 4 4 4 773586 1.72e-06 0.3680028 0.6105812 0.8197053 0.1802947
## 5 5 5 773585 4.30e-07 0.5381354 0.4404828 0.6120787 0.3879213
## 6 6 6 773586 1.72e-06 0.5804096 0.3948309 0.4515813 0.5484187
## 7 7 7 773584 -8.60e-07 0.4660477 0.5083813 0.7346289 0.2653710
## 8 8 8 773584 -8.60e-07 0.7573265 0.2211649 0.2680707 0.7319293
## 9 9 9 773584 -8.60e-07 0.6067528 0.3708425 0.4088846 0.5911154
## 10 10 10 773584 -8.60e-07 0.8149814 0.1682238 0.2029640 0.7970360
## BlackPct Hispanic PacificP AsianPct NativePc TotalVAP Margin
## 1 0.17844344 0.12632564 0.00161897 0.06003238 0.01627234 604707 0.11177791
## 2 0.12637696 0.14781000 0.00167327 0.04250452 0.01964466 616758 -0.03849221
## 3 0.14183861 0.09794999 0.00140184 0.09167181 0.01360326 614191 0.03998539
## 4 0.04219612 0.09093918 0.00084701 0.03250386 0.01145078 589130 -0.13199718
## 5 0.07020702 0.14914016 0.00131678 0.16666776 0.01059484 610580 0.03813545
## 6 0.12717887 0.22057558 0.00149212 0.19698467 0.01436822 611210 0.08040958
## 7 0.05877189 0.10787427 0.00098287 0.08696494 0.01050963 609441 -0.03395227
## 8 0.11181778 0.48949803 0.00222952 0.13332353 0.02234100 611790 0.25732646
## 9 0.10910322 0.39060566 0.00159533 0.10124300 0.01912904 601128 0.10675280
## 10 0.52712018 0.20071643 0.00195830 0.07503913 0.01460112 597968 0.31498139
## color opacity geometry
## 1 #00008b 0.5 POLYGON ((-74.93752 39.8902...
## 2 #ff9999 0.5 POLYGON ((-75.22902 39.7580...
## 3 #2020ff 0.5 POLYGON ((-74.74953 40.2608...
## 4 #960018 0.5 POLYGON ((-74.07647 40.3616...
## 5 #6060ff 0.5 POLYGON ((-74.34493 41.1920...
## 6 #2020ff 0.5 POLYGON ((-74.28338 40.4248...
## 7 #ffa7a7 0.5 POLYGON ((-75.1896 40.59178...
## 8 #00008b 0.5 POLYGON ((-74.2224 40.67513...
## 9 #00008b 0.5 POLYGON ((-74.17959 40.8872...
## 10 #00008b 0.5 POLYGON ((-74.24209 40.6617...
class(nj)
## [1] "sf" "data.frame"
head(nj)
## Simple feature collection with 6 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -75.56359 ymin: 38.78866 xmax: -73.88506 ymax: 41.35761
## Geodetic CRS: WGS 84
## id NAME TotalPop PopDevPc DemPct RepPct WhitePct Minority BlackPct
## 1 1 1 773585 4.30e-07 0.6117779 0.3655424 0.6305235 0.3694765 0.17844344
## 2 2 2 773584 -8.60e-07 0.4615078 0.5141808 0.6738186 0.3261814 0.12637696
## 3 3 3 773586 1.72e-06 0.5399854 0.4390755 0.6587511 0.3412489 0.14183861
## 4 4 4 773586 1.72e-06 0.3680028 0.6105812 0.8197053 0.1802947 0.04219612
## 5 5 5 773585 4.30e-07 0.5381354 0.4404828 0.6120787 0.3879213 0.07020702
## 6 6 6 773586 1.72e-06 0.5804096 0.3948309 0.4515813 0.5484187 0.12717887
## Hispanic PacificP AsianPct NativePc TotalVAP Margin color
## 1 0.12632564 0.00161897 0.06003238 0.01627234 604707 0.11177791 #00008b
## 2 0.14781000 0.00167327 0.04250452 0.01964466 616758 -0.03849221 #ff9999
## 3 0.09794999 0.00140184 0.09167181 0.01360326 614191 0.03998539 #2020ff
## 4 0.09093918 0.00084701 0.03250386 0.01145078 589130 -0.13199718 #960018
## 5 0.14914016 0.00131678 0.16666776 0.01059484 610580 0.03813545 #6060ff
## 6 0.22057558 0.00149212 0.19698467 0.01436822 611210 0.08040958 #2020ff
## opacity geometry
## 1 0.5 POLYGON ((-74.93752 39.8902...
## 2 0.5 POLYGON ((-75.22902 39.7580...
## 3 0.5 POLYGON ((-74.74953 40.2608...
## 4 0.5 POLYGON ((-74.07647 40.3616...
## 5 0.5 POLYGON ((-74.34493 41.1920...
## 6 0.5 POLYGON ((-74.28338 40.4248...
nj$geometry
## Geometry set for 12 features
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -75.56359 ymin: 38.78866 xmax: -73.88506 ymax: 41.35761
## Geodetic CRS: WGS 84
## First 5 geometries:
## POLYGON ((-74.93752 39.89029, -74.93551 39.8894...
## POLYGON ((-75.22902 39.75806, -75.22266 39.7575...
## POLYGON ((-74.74953 40.26085, -74.7496 40.26174...
## POLYGON ((-74.07647 40.36169, -74.0771 40.3613,...
## POLYGON ((-74.34493 41.19205, -74.33217 41.1861...
class(nj$geometry)
## [1] "sfc_POLYGON" "sfc"
We can useredist.plot.map() to make a simple plot. Since we have an sf column, the function knows how to reach the geographic information:
# Underneath, this is just a ggplot() if you are familiar with those!
redist.plot.map(nj)
We can use the plan argument to color by district:
redist.plot.map(nj, plan = id)
zoom_to takes a logical vector which lets you zoom the map onto a certain area:
redist.plot.map(nj, plan = id, zoom_to = (id == 6))
Instead of plans, you can also color by values in your data frame with the fill argument. There are built-in color scales you can use too, like those for partisanship:
redist.plot.map(nj, fill = DemPct,
boundaries = TRUE,
title = "New Jersey 2022 Congressional") +
scale_fill_party_c()
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
redist.plot.map(nj, fill = DemPct,
title = "New Jersey 2022 Congressional",
boundaries = TRUE) +
scale_fill_party_c() +
geom_sf_label(aes(label = id))
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
Let’s try downloading your own shapefile and making a map. Go to this list of official 2020 maps on Dave’s Redistricting and open whichever map seems interesting to you.
Look around for a few minutes, then click on the “Export Map” button in the top right (it is an arrow pointing northeast), and choose to export “District Shapes (as Shapefile, .zip).” Save that .zip folder in this folder and unzip it. Finally, read it in and make a nice map. Please ask if you have questions!
A simple evaluation statistic is the number of “competitive” districts based on some criteria. A common metric is looking at districts within 5 or 10% of 50% partisan vote share.
nrow(nj)
## [1] 12
sum(nj$DemPct > 0.45 & nj$DemPct < 0.55)
## [1] 4
sum(nj$DemPct > 0.4 & nj$DemPct < 0.6)
## [1] 6
ifelse(nj$DemPct > 0.5 & nj$DemPct < 0.55,
"Lean Dem.", "Not Lean Dem.")
## [1] "Not Lean Dem." "Not Lean Dem." "Lean Dem." "Not Lean Dem."
## [5] "Lean Dem." "Not Lean Dem." "Not Lean Dem." "Not Lean Dem."
## [9] "Not Lean Dem." "Not Lean Dem." "Not Lean Dem." "Not Lean Dem."
We can also measure the number of districts for which a single minority group is the majority. Such districts are often called Majority-Minority Districts:
sum(nj$BlackPct > 0.5)
## [1] 1
sum(nj$Hispanic > 0.5)
## [1] 0
Opportunity Districts are districts where a majority group is not necessarily above 50%, but still has sufficient electoral power to elect candidates of their choice. 40% is a common cutoff:
sum(nj$BlackPct > 0.4)
## [1] 1
sum(nj$Hispanic > 0.4)
## [1] 1
Coalition Districts occur when groups of minority voters form the majority:
sum((1 - nj$WhitePct) > 0.5)
## [1] 5
sum(nj$BlackPct + nj$Hispanic > 0.4)
## [1] 3
Compactness: many states require that districts be “compact.” Few states define what they mean by compactness, but generally it contains the idea that people in a district should live “near” to each other. A circular district would then be more compact than a long, thin district that meanders down a highway.
Polsby-Popper compactness is one measure, given by the following formula, where \(A_D\) is the area of the district and \(P_D\) is the perimeter of the district:
\[ P = 4\pi \frac{A_D}{P_D} \]
Polsby-Popper scores fall between 0 and 1, with more compact districts being closer to 1.
comp_polsby(nj$id, nj)
## [1] 0.3938246 0.2539342 0.2417608 0.2773128 0.2487897 0.1539016 0.2061013
## [8] 0.1091466 0.1656088 0.1173216 0.2157762 0.1791445
Here are lots of other ways of measuring compactness.
Often, you want to combine multiple geographies. For example, overlaying a shapefile of counties on to new congressional districts.
Let’s read a shapefile of voting precincts in New Jersey and figure out which congressional district each of these is assigned to. This data comes from the 2020 Census data published by the ALARM Project.
nj_vtd <- read_rds("../data/nj_vtd.rds")
nj_cd <- st_read("../data/district-shapes/POLYGON.shp")
## Reading layer `POLYGON' from data source
## `/Users/tylersimko/Documents/redist_workshop/data/district-shapes/POLYGON.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 12 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -75.56359 ymin: 38.78866 xmax: -73.88506 ymax: 41.35761
## Geodetic CRS: WGS 84
p <- redist.plot.map(nj_vtd, plan = county)
ggsave('../figs/nj_vtds.pdf', p)
## Saving 7 x 5 in image
colnames(nj)
## [1] "id" "NAME" "TotalPop" "PopDevPc" "DemPct" "RepPct"
## [7] "WhitePct" "Minority" "BlackPct" "Hispanic" "PacificP" "AsianPct"
## [13] "NativePc" "TotalVAP" "Margin" "color" "opacity" "geometry"
p <- ggplot(nj_vtd) +
geom_sf(fill = NA, color = "black") +
geom_sf(data = nj_cd, fill = NA, color = "blue")
ggsave('../figs/nj_both.pdf', p)
## Saving 7 x 5 in image
When dealing with multiple geographies, you want to make sure they are in the same projection. In R, we can check that with the st_crs() function:
# st_crs = coordinate reference system
st_crs(nj_vtd)
## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
st_crs(nj_cd)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
nj_vtd <- st_transform(nj_vtd, 4326)
nj_cd <- st_transform(nj_cd, 4326)
The geo_match() function in the geomander package will match across these two geographies. From a smaller to a larger geography, the function will tell you which row of the larger geography contains each row of the smaller geography.
districts <- geo_match(nj_vtd, nj_cd)
nj_vtd$cd_2020 <- districts
p <- redist.plot.map(nj_vtd, plan = cd_2020)
ggsave('../figs/nj_2020.pdf', p)
## Saving 7 x 5 in image
This is useful for many reasons. For example, now for every precinct you know (1) which county it is in and (2) which district it was assigned to. How could you find out whether the county is split between two districts?