There is a social media subculture of people who make maps of election results. Prominent people include @JMilesColeman, @gelliottmorris, @SenhorRaposa, @donnermaps, @ElectionMapsCo, and others.

You, too, can join this group! It is easy to create your own (election) maps using R.

The following is inspired, in part, by an GIS in R in R tutorial by Nick Eubank. You should check it out and other online sources if you want to know more.

Shapefiles

To start, you need to understand a common file format for maps, called a shapefile. ESRI, a Geographic Information Systems (GIS) software company, created the shapefile format. ESRI was one of the first companies in the GIS space, and their shapefile format gained widespread adoption.

There are other types of GIS files. Among the more common alternatives to shapefiles are JSON and KML files. These formats are common enough that many software programs can read, write, and manipulate them. JSON - JavaScript Object Notation - was designed to be used with JAVA. Google created .KML or .KMZ - Google Keyhole Markup Language - files.

We focus on shapefiles because of their popularity. Shapefiles really are several files that serve different purposes. A typical shapefile requires there three files, and may have others:

There are other optional files that are associated with a shapefile. I don’t endorse Wikipedia as a source for academic papers, but this is a good run down.

GIS in R packages

You’ll need some R packages to work with shapefiles and GIS data, generally. Use ’install.packages()to installsp,raster, andrgdal` for the following examples.

library(sp)
library(raster)
library(rgdal)
## rgdal: version: 1.4-4, (SVN revision 833)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
##  Path to GDAL shared files: C:/Users/drmic/Documents/R/win-library/3.6/rgdal/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/drmic/Documents/R/win-library/3.6/rgdal/proj
##  Linking to sp version: 1.3-1

Toy examples to understand how GIS works

Let’s start with some toy examples to understand how GIS works in R and other software packages.

We begin will plotting some points, which are simply x and y coordinates. (There are, of course, complexities when working with geographic spatial data on a 3-dimenstional globe that is projected onto a plane.)

toy.coordinates <- rbind(c(1.5, 2), c(2.5, 2), c(0.5, 0.5), c(1, 0.25), c(1.5, 0), c(2, 0), c(2.5, 0), c(3, 0.25), c(3.5, 0.5))

my.first.points <- SpatialPoints(toy.coordinates)  # ..converted into a spatial object
plot(my.first.points)

Here I’ve created an R object toy.coordinates that contains x and y coordinates. the function SpatialPoints() converts these into an R spatial object my.first.points that is then passed to plot(). Smile! You’ve plotted your first spatial object in R!

Electronic maps can be composed of points, but they often have lines that define borders. These borders are often called polygons.

This code snippet demonstrates how a polygon is a set of lines connected at points. The points are sometimes called “nodes” and the lines are called “edges”. The Polygon() function creates an object that is a set of points, but knows to connect these points with lines.

# create polyon objects from coordinates.  Each object is a single geometric
# polygon defined by a bounding line.
house1.building <- Polygon(rbind(c(1, 1), c(2, 1), c(2, 0), c(1, 0)))

house1.roof <- Polygon(rbind(c(1, 1), c(1.5, 2), c(2, 1)))
## Warning in Polygon(rbind(c(1, 1), c(1.5, 2), c(2, 1))): less than 4
## coordinates in polygon
house2.building <- Polygon(rbind(c(3, 1), c(4, 1), c(4, 0), c(3, 0)))

house2.roof <- Polygon(rbind(c(3, 1), c(3.5, 2), c(4, 1)))
## Warning in Polygon(rbind(c(3, 1), c(3.5, 2), c(4, 1))): less than 4
## coordinates in polygon
house2.door <- Polygon(rbind(c(3.25, 0.75), c(3.75, 0.75), c(3.75, 0), c(3.25, 
    0)), hole = TRUE)

# create lists of polygon objects from polygon objects and unique ID A
# `Polygons` is like a single observation.
h1 <- Polygons(list(house1.building, house1.roof), "house1")
h2 <- Polygons(list(house2.building, house2.roof, house2.door), "house2")

# create spatial polygons object from lists A SpatialPolygons is like a
# shapefile or layer.
houses <- SpatialPolygons(list(h1, h2))
plot(houses)

Often, we want to associate values with the polygons and associate these values with colors. These colors are sometimes referred to as a “choropleth”. Here, I creat a raster() object that is a grid with associated values, and then plot out the values, with a default color ramp moving from red to green.

Rasters

Rasters are more complex than points and polygons, and are not used much in the elections context. Still, you might run into them, so it is useful to understand what they are.

Rasters are generally grid data with a variable grid area. When assigned attribute data, they can be powerful visulaization tools.

basic_raster <- raster(ncol = 5, nrow = 10, xmn = 0, xmx = 5, ymn = 0, ymx = 10)
basic_raster
## class      : RasterLayer 
## dimensions : 10, 5, 50  (nrow, ncol, ncell)
## resolution : 1, 1  (x, y)
## extent     : 0, 5, 0, 10  (xmin, xmax, ymin, ymax)
## crs        : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
hasValues(basic_raster)
## [1] FALSE
values(basic_raster) <- 1:50  # Note 50 is the total number of cells in the grid. 

plot(basic_raster)

An important lesson for the observant is that even through the raster object is two-dimensional, the values are one-dimensional. This is generally true for geographic objects. Recall shapefiles are define two-dimensional .shp objects that are linked to a .dbf file, which has one row for each shape object.

Let’s look at some actual raster data, elevations at latitude and longitude points in San Franscisco.

To start, we need to get the data:

https://github.com/nickeubank/gis_in_r/blob/master/RGIS1_Data/SanFranciscoNorth.dem

setwd("D:/DropBox/Dropbox/Election Science/Class GIS")

raster_from_file <- raster("sanfrancisconorth.dem")
plot(raster_from_file)

These data are arranged onto a latitude and longitude grid and have elevation data for each square in the grid. Since there is so much of these data, they look nearly continuous.

Where to get GIS data for Election Science

There are a number of resources to get shapefiles to plot election data.

Census Bureau

The Census Bureau’s TIGER program maps the geography of the entire United States. If you need maps of states, counties, and cities (what the Census Bureau calls places), this is a good starting point. The Census Bureau also maps other features, like roads and water. Census population data are released in geographies known as census tracts, block groups, and blocks, which also have maps. Mapping population data into census blocks is a powerful visualization tool.

For election analyses, the Census Bureau also releases shapefiles for recent congressional and state legislative districts.

The Census Bureau uses codes to identify geographic locations. One important code is called the FIPS code - for Federal Information Processing System. There are state and county FIPS codes.

Precinct Data

The Census Bureau also releases data for precincts, wards, and election districts, what the Bureau generically calls “Voting Tabulation Districts” or VTDs. These data are not necessarily the current boundaries, nor are they entirely correct. We have released 2016 VTD boundary shapefiles with statewide election results on the Harvard Dataverse. Harvard and Stanford produced data for the 2008 election, using the 2010 census geography (these data have lots of errors). Some states are fairly good about releasing their data.

Historical Data

The National Historical Geographic Information System has a treasure trove of historical GIS and statistical data.

Florida Example

Let’s look at an example map of Florida counties from the Census Bureau. To get Florida’s shapefile data, we need to know that Florida’s FIPS code is 12.

Florida_County <- readOGR("D:/DropBox/Dropbox/Election Science/Class GIS","tl_2010_12_county10")
## OGR data source with driver: ESRI Shapefile 
## Source: "D:\DropBox\Dropbox\Election Science\Class GIS", layer: "tl_2010_12_county10"
## with 67 features
## It has 17 fields
## Integer64 fields read as strings:  ALAND10 AWATER10
plot(Florida_County)

The FIPS code is in the field COUNTYFP10, and listing it out explains why it is important to read keys as string formats.

Florida_County$COUNTYFP10
##  [1] 017 009 037 011 047 063 093 087 031 081 071 029 069 065 033 045 103
## [18] 067 023 129 013 086 057 117 085 083 003 073 109 101 121 007 115 019
## [35] 123 055 051 039 015 043 131 125 027 097 059 041 053 001 079 035 005
## [52] 061 075 105 099 095 107 111 021 119 091 049 089 127 077 113 133
## 67 Levels: 001 003 005 007 009 011 013 015 017 019 021 023 027 029 ... 133

We can add some options, like coloring borders and adding titles. Note how this

plot(Florida_County, border = "red")
title(main="Florida", sub ="Mail Ballots Requested")

Notice that unlike ggplot(), there is no + to continue the command (there is a way, though, to do these plots inggplot()).

Now, let’s plot some data. Florida posts election data, with reports on early voting on a daily basis during an election. Let’s visualize some of these early voter data.

To start, these data are not in a good format for R to read, as they have numbers with commas in them (why anyone would do this in a data file is beyond me). This code snippet can handle this issue.

setClass("num.with.commas")
setAs("character", "num.with.commas", 
      function(from) as.numeric(gsub(",", "", from) ) )

This code scrapes a data file for the 2018 general election. I won’t dwell on it since we’ve discussed data scraping in the past.

working.dir <- "D:/DropBox/Dropbox/Election Science/Class GIS"
  
url <- "http://fvrselectionfiles.elections.myflorida.com/countyballotreportfiles/Stats_10481_AbsProvided.txt"
destfile <- paste(working.dir, "Stats_10481_AbsProvided.txt", sep="")
download.file(url, destfile, mode = "wb")

readcolumns = c(rep("character", 5), rep("num.with.commas",5),"character")

mail_provided.file <- paste(working.dir,"Stats_10481_AbsProvided.txt", sep="")
mail_provided <- read.csv(mail_provided.file, header = TRUE, sep = "\t", quote = "", dec = ".",  colClasses = readcolumns, fill = TRUE, stringsAsFactors = FALSE)

Always a good idea to see the data, so take a look at it!

Do you see a problem?

Well, there is one. We don’t have a key to merge these data together. Here is the county identifier, which is not a FIPS code. This is a fairly common problem when working with data from different sources.

mail_provided$CountyName
##  [1] "State Totals" "Alachua"      "Baker"        "Bay"         
##  [5] "Bradford"     "Brevard"      "Broward"      "Calhoun"     
##  [9] "Charlotte"    "Citrus"       "Clay"         "Collier"     
## [13] "Columbia"     "DeSoto"       "Dixie"        "Duval"       
## [17] "Escambia"     "Flagler"      "Franklin"     "Gadsden"     
## [21] "Gilchrist"    "Glades"       "Gulf"         "Hamilton"    
## [25] "Hardee"       "Hendry"       "Hernando"     "Highlands"   
## [29] "Hillsborough" "Holmes"       "Indian River" "Jackson"     
## [33] "Jefferson"    "Lafayette"    "Lake"         "Lee"         
## [37] "Leon"         "Levy"         "Liberty"      "Madison"     
## [41] "Manatee"      "Marion"       "Martin"       "Miami-Dade"  
## [45] "Monroe"       "Nassau"       "Okaloosa"     "Okeechobee"  
## [49] "Orange"       "Osceola"      "Palm Beach"   "Pasco"       
## [53] "Pinellas"     "Polk"         "Putnam"       "Santa Rosa"  
## [57] "Sarasota"     "Seminole"     "St. Johns"    "St. Lucie"   
## [61] "Sumter"       "Suwannee"     "Taylor"       "Union"       
## [65] "Volusia"      "Wakulla"      "Walton"       "Washington"

To address this problem, we need a lookup table that has a FIPS code for each Florida county. I have one for just this purpose (I’ll need to insert these data into this Markdown document). We’ll talk in a later class how to do this merge.

countycodesfile <- paste(working.dir,"/county_codes.csv", sep="")
countycodes <- read.csv(countycodesfile, header = TRUE, sep = ",", colClasses = c("character", "numeric", "character", "character"))

mail_provided_index <- merge(mail_provided, countycodes, by.x ="CountyName", by.y = "CountyName")

Florida_County <- merge(Florida_County, mail_provided_index, by.x ="COUNTYFP10", by.y = "COUNTYFP10")

We can now do a plot by county of the mail ballots requested in the 2018 general election.

spplot(Florida_County, "GrandTotal", main = list(label = "Florida Mail Ballots Requested"))

Not surprisingly, the counties that have larger populations had more mail ballots.

Color Brewer

Maybe you don’t like the colors. There are plenty of options here using RColorBrewer.

library(RColorBrewer)
display.brewer.all()

You can also find more information online.

Let’s use a different color ramp, going from orange to red.

Here, I set an R object using brewer.pal() which creates 7 equally-spaced bins with increasing shades from orange to red. I then pass this to ssplot(). The number of cuts - or but points - is one less than the number of bins, or 6 cuts.

my.palette <- brewer.pal(n = 7, name = "OrRd")

spplot(Florida_County, "GrandTotal" , col.regions = my.palette, cuts = 6, col = "transparent")

What if I don’t like the fact that the breaks are evenly spaced, and want to group these data into bins with an equal number of cases?

library(classInt)

breaks.qt <- classIntervals(Florida_County$GrandTotal, n = 6, style = "quantile", intervalClosure = "right")

spplot(Florida_County, "GrandTotal", col = "transparent", col.regions = my.palette, at = breaks.qt$brks)

Overlaying plots

Let’s overlay Florida’s congressional districts

Florida_CD <- readOGR("D:/DropBox/Dropbox/Election Science/Class GIS", "tl_rd13_12_cd113")
## OGR data source with driver: ESRI Shapefile 
## Source: "D:\DropBox\Dropbox\Election Science\Class GIS", layer: "tl_rd13_12_cd113"
## with 27 features
## It has 12 fields
## Integer64 fields read as strings:  ALAND AWATER
plot(Florida_CD)

Is this really the current districts? (No.)

The Census Bureau only has a national map (when I last checked). Let’s use a map from the Florida state Senate website: http://www.flsenate.gov/Session/Redistricting/Plan/SC14-1905

Florida_CD <- readOGR("D:/DropBox/Dropbox/Election Science/Class GIS", "sc14-1905")
## OGR data source with driver: ESRI Shapefile 
## Source: "D:\DropBox\Dropbox\Election Science\Class GIS", layer: "sc14-1905"
## with 27 features
## It has 104 fields
## Integer64 fields read as strings:  IDEALPOP10 DEV10 PARTS RINGS
plot(Florida_CD)

plot(Florida_County)
plot(Florida_CD, border="blue", add = T)

# Dot plots

Another way to display data on a map to convey the density of data is to create a dot plot. These look like geocoded points, but in reality they are dots places on a map relative to the density of the data. A little random x and y noise are added so that the dots do not stack on top of one another.

# install.packages("maptools")

library(maptools)
## Checking rgeos availability: TRUE
set.seed(47)

ballots.per.dot = 1000

dots.d <- dotsInPolys(Florida_County, as.integer(Florida_County$TotalDem/ballots.per.dot))
dots.d$party <- "Democrat"

dots.r <- dotsInPolys(Florida_County, as.integer(Florida_County$TotalRep/ballots.per.dot))
dots.r$party <- "Republican"

dots.npa <- dotsInPolys(Florida_County, as.integer(Florida_County$TotalNpa/ballots.per.dot))
dots.npa$party <- "None"

# identical(names(dots.d[[1]]), names(dots.npa[[2]]) )
# names(dots.r[[1]]) <- names(dots.d[[2]])
# names(dots.npa[[1]]) <- names(dots.d[[2]])

# Gather all the dots into a single SpatialPoints
dots.all <- rbind(dots.d, dots.r, dots.npa)
proj4string(dots.all) <- CRS(proj4string(Florida_County))

# Since party is a string, order is alphabetical. You can change if you
# want by making these categoricals!
my.palette <- c("blue", "yellow", "red")
point.size <- 0.5

# Make sure stored as a factor
dots.all$party <- as.factor(dots.all$party)

# Make sp.layout list for the actually boundaries
Florida.layer <- list("sp.polygons", Florida_County)

spplot(dots.all, "party", sp.layout = Florida.layer, col.regions = my.palette, 
       cex = point.size, main = "Mail Ballots Requested by Party")

library(ggmap)
## Loading required package: ggplot2
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(rgeos)
## rgeos version: 0.5-1, (SVN revision 614)
##  GEOS runtime version: 3.6.1-CAPI-1.10.1 
##  Linking to sp version: 1.3-1 
##  Polygon checking: TRUE
library(maptools)

b <- bbox(Florida_County)
(lnd.b1 <- ggmap(get_map(location = b)))
## Source : http://tile.stamen.com/terrain/7/32/52.png
## Source : http://tile.stamen.com/terrain/7/33/52.png
## Source : http://tile.stamen.com/terrain/7/34/52.png
## Source : http://tile.stamen.com/terrain/7/35/52.png
## Source : http://tile.stamen.com/terrain/7/32/53.png
## Source : http://tile.stamen.com/terrain/7/33/53.png
## Source : http://tile.stamen.com/terrain/7/34/53.png
## Source : http://tile.stamen.com/terrain/7/35/53.png
## Source : http://tile.stamen.com/terrain/7/32/54.png
## Source : http://tile.stamen.com/terrain/7/33/54.png
## Source : http://tile.stamen.com/terrain/7/34/54.png
## Source : http://tile.stamen.com/terrain/7/35/54.png
## Source : http://tile.stamen.com/terrain/7/32/55.png
## Source : http://tile.stamen.com/terrain/7/33/55.png
## Source : http://tile.stamen.com/terrain/7/34/55.png
## Source : http://tile.stamen.com/terrain/7/35/55.png

map.file <- "D:/DropBox/Dropbox/Election Science/Class GIS/tl_2010_12_county10.shp"
county.map <- readShapePoly(map.file)
## Warning: readShapePoly is deprecated; use rgdal::readOGR or sf::st_read
county.map.f <- fortify(county.map, region = "COUNTYFP10")

county.map.f <- merge(county.map.f, mail_provided_index, by.x ="id", by.y = "COUNTYFP10")

lnd.b1 + geom_polygon(data = county.map.f, aes(x = long, y = lat, group = group, fill = GrandTotal), 
    alpha = 0.5) + scale_fill_continuous(low = "green", high = "red")