In doing analyses of precinct-level voting data from the Howard County 2014 general election (using data from the Maryland State Board of Elections and the Howard County Board of Elections), one problem is that maps drawn using that data are inherently misleading: Precincts in western Howard County are much larger than precincts in Columbia and eastern Howard County, so they visually dominate the maps.
One way to reduce this effect is to create cartograms, maps in which the visual sizes of the geographic subdivisions are based not on their actual geographic area but on some other variable associated with them. In particular, for political maps of Howard County I would like display precincts sized according to the number of registered voters in each precinct.
Creating a voter-based cartogram of Howard County precincts requires distorting the precinct boundaries so as to change the area of each precincts while still preserving the overall shape of the precinct as much as possible, and also preserving its relationship to its neighboring precincts. This requires some relatively sophisticated mathematics, and unfortunately there is no existing R package that can do it well. Instead I use scapetoad, a Java-based application available for Microsoft Windows, Mac OS X, and Linux.
In this document I describe the overall process of preparing Howard County map data for use with the scapetoad application. In the next document (part 2) I describe running scapetoad to produce the desired cartograms.
I use the R statistical package run from the RStudio development environment, along with the dplyr package to do data manipulation and the sp and rgdal packages to read and write mapping data.
library("dplyr", warn.conflicts = FALSE)
library("sp")
library("rgdal")
## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.1, released 2014/09/24
## Path to GDAL shared files: /Library/Frameworks/GDAL.framework/Versions/1.11/Resources/gdal
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)
The rgdal package also requires installing the GDAL mapping library on the underlying operating system.
The scapetoad utility works only with map data in ESRI shapefile format, and requires that the variable used to determine precinct sizes be included with that data. The scapetoad utility also has a handy feature whereby you can include other map layers and distort them in tandem with the original map. I use this to create maps of the Howard County council districts, Maryland state legislative districts, and US Congressional districts that match the precinct cartogram.
First I download a CSV-format dataset containing Howard County 2014 general election turnout statistics by precincts. This dataset contains the number of registered voters per precinct on election day. I store a copy of the data in the local file hocomd-2014-general-election-turnout-by-precinct.csv.
download.file("https://raw.githubusercontent.com/frankhecker/hocodata/master/datasets/hocomd-2014-general-election-turnout-by-precinct.csv",
"hocomd-2014-general-election-turnout-by-precinct.csv",
method = "curl")
Then I download spatial data from data.howardcountymd.gov specifying the boundaries of Howard County election precincts, county council districts, Maryland state legislative districts, and US Congressional districts. I choose to use the ESRI shapefile format, and store the data locally in the files Voting_Precincts.zip, Council_Districts.zip. Congresional_Districts.zip.
(Notes: The council district map uses the new council boundaries in effect for the 2014 primary and general elections. The data.howardcountymd.gov site mistakenly lists these boundaries as not taking effect until December 2014. Also, strictly speaking the “legislative districts” map shows the House of Delegates districts. A Maryland State Senate district may be composed of multiple House of Delegates districts; thus, for example, State Senate district 9 is the combination of House of Delegates districts 9A and 9B.)
download.file("https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Voting_Precincts&outputFormat=shape-zip",
"Voting_Precincts.zip",
method = "curl")
download.file("https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Council_Districts&outputFormat=shape-zip",
"Council_Districts.zip",
method = "curl")
download.file("https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Legislative_Districts&outputFormat=shape-zip",
"Legislative_Districts.zip",
method = "curl")
download.file("https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Congressional_Districts&outputFormat=shape-zip",
"Congressional_Districts.zip",
method = "curl")
The shapefile data is downloaded in the form of compressed .zip files, so I then unzip the data into the current directory. Each .zip file produces five separate files when uncompressed, with suffixes .cst, .dbf, .prj, .shp, and .shx. (They also produce a sixth file wfsrequest.txt that we can ignore.)
unzip("Voting_Precincts.zip", overwrite = TRUE)
unzip("Council_Districts.zip", overwrite = TRUE)
unzip("Legislative_Districts.zip", overwrite = TRUE)
unzip("Congressional_Districts.zip", overwrite = TRUE)
First I read in the turnout data.
turnout_df <- read.csv("hocomd-2014-general-election-turnout-by-precinct.csv",
stringsAsFactors = FALSE)
str(turnout_df)
## 'data.frame': 118 obs. of 5 variables:
## $ Precinct : chr "001-001" "001-002" "001-003" "001-004" ...
## $ Polling.Place: chr "St. Augustine School Gym" "Elkridge Landing Middle School Gym" "Mayfield Woods Middle School Cafeteria" "Ilchester Elementary School Cafeteria" ...
## $ Reg.Voters : int 2051 524 1803 2738 664 1805 516 1789 1558 1407 ...
## $ Cards.Cast : int 976 255 765 1354 293 649 161 754 648 536 ...
## $ Pct.Turnout : num 47.6 48.7 42.4 49.5 44.1 ...
Then I read in the shapefile data for the precinct boundaries.
precinct_map <- readOGR(dsn = ".",
layer = "Voting_PrecinctsPolygon")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "Voting_PrecinctsPolygon"
## with 118 features and 9 fields
## Feature type: wkbPolygon with 2 dimensions
str(precinct_map@data)
## 'data.frame': 118 obs. of 9 variables:
## $ PRECINCT20: Factor w/ 118 levels "1-01","1-02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ CNGDISTRIC: Factor w/ 3 levels "2","3","7": 2 2 1 3 3 1 2 2 1 3 ...
## $ LEGDISTRIC: Factor w/ 4 levels "12","13","9A",..: 1 2 1 4 1 2 1 2 2 1 ...
## $ CODISTRICT: Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 2 2 2 1 2 2 ...
## $ POLLINGPLA: Factor w/ 76 levels "ALTHOLTON HIGH SCHOOL",..: 62 16 44 34 76 12 44 15 44 33 ...
## $ ADDRESS_2 : Factor w/ 77 levels "10220 WETHERBURN RD",..: 37 53 55 28 25 50 55 51 55 59 ...
## $ CITY : Factor w/ 16 levels "Clarksville",..: 4 4 4 5 5 4 4 4 4 5 ...
## $ ZIP : Factor w/ 19 levels "20723","20759",..: 13 13 13 9 9 13 13 13 13 9 ...
## $ LOCATION_2: Factor w/ 12 levels "ALL PURPOSE ROOM",..: 7 7 4 4 7 4 4 4 7 3 ...
Now I have to merge the registered voter data in turnout_df with the map data in precinct_map. Among other things, this requires converting the precinct designators to a common format (in this case ‘0-00’, e.g., ‘6-35’).
voters_df <- turnout_df %>%
mutate(PRECINCT20 = gsub("^00([0-9])-0([0-9][0-9])$", "\\1-\\2", Precinct)) %>%
select(PRECINCT20, Reg.Voters)
str(voters_df)
## 'data.frame': 118 obs. of 2 variables:
## $ PRECINCT20: chr "1-01" "1-02" "1-03" "1-04" ...
## $ Reg.Voters: int 2051 524 1803 2738 664 1805 516 1789 1558 1407 ...
precinct_map@data <- precinct_map@data %>%
left_join(voters_df, by = "PRECINCT20")
## Warning: joining character vector and factor, coercing into character
## vector
str(precinct_map@data)
## 'data.frame': 118 obs. of 10 variables:
## $ PRECINCT20: chr "1-01" "1-02" "1-03" "1-04" ...
## $ CNGDISTRIC: Factor w/ 3 levels "2","3","7": 2 2 1 3 3 1 2 2 1 3 ...
## $ LEGDISTRIC: Factor w/ 4 levels "12","13","9A",..: 1 2 1 4 1 2 1 2 2 1 ...
## $ CODISTRICT: Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 2 2 2 1 2 2 ...
## $ POLLINGPLA: Factor w/ 76 levels "ALTHOLTON HIGH SCHOOL",..: 62 16 44 34 76 12 44 15 44 33 ...
## $ ADDRESS_2 : Factor w/ 77 levels "10220 WETHERBURN RD",..: 37 53 55 28 25 50 55 51 55 59 ...
## $ CITY : Factor w/ 16 levels "Clarksville",..: 4 4 4 5 5 4 4 4 4 5 ...
## $ ZIP : Factor w/ 19 levels "20723","20759",..: 13 13 13 9 9 13 13 13 13 9 ...
## $ LOCATION_2: Factor w/ 12 levels "ALL PURPOSE ROOM",..: 7 7 4 4 7 4 4 4 7 3 ...
## $ Reg.Voters: int 2051 524 1803 2738 664 1805 516 1789 1558 1407 ...
Finally I write out a new shapefile for use as input to the scapetoad utility.
writeOGR(precinct_map,
dsn = ".",
layer = "Precinct_VotersPolygon",
driver = "ESRI Shapefile",
overwrite_layer = TRUE)
At this point there is nothing further to be done in R. The next document (part 2) describes the process of running the scapetoad application.
I used the following R environment in doing the analysis for this example:
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rgdal_0.9-1 sp_1.0-17 dplyr_0.4.0 RCurl_1.95-4.3
## [5] bitops_1.0-6
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 DBI_0.3.1 digest_0.6.4 evaluate_0.5.5
## [5] formatR_1.0 grid_3.1.2 htmltools_0.2.6 knitr_1.7
## [9] lattice_0.20-29 lazyeval_0.1.10 magrittr_1.0.1 parallel_3.1.2
## [13] Rcpp_0.11.3 rmarkdown_0.5.1 stringr_0.6.2 tools_3.1.2
## [17] yaml_2.1.13
The underlying GDAL library for the rgdal packages is from the KyngChaos GDAL Complete distribution version 1.11 for Mac OS X.
You can find the source code for this analysis and others at my HoCoData repository on GitHub. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.