In this analysis I use precinct-level results from the Howard County 2014 general election (courtesy of the Maryland State Board of Elections) to look at Allan Kittleman’s margin of victory across the county on election day in the race for Howard County Executive. I’m interested in the general question of whether there was an “enthusiasm gap” in which Kittleman’s election-day results were particularly lopsided, e.g., due to increased turnout of Republican voters or unusually high support for Kittleman from Democrats and unaffiliated voters.
I present the data as a map of Howard County with the precincts colored according to Kittleman’s absolute and relative margins of victory, and with county council boundaries added. The map is based on precinct and council boundaries made available by the Howard County GIS division on the data.howardcountymd.gov site.
For this analysis I use the R statistical package run from the RStudio development environment, along with the dplyr and tidyr packages to do data manipulation and the ggplot2 package to draw the histogram and map.
library("dplyr", warn.conflicts = FALSE)
library("tidyr")
library("ggplot2")
I also need to load R packages used to manipulate spatial data in R. I first load the sp package, a prerequisite for using other spatial data packages. I use the rgdal package to load spatial data for boundaries downloaded from data.howardcountymd.gov.
library("sp")
library("rgdal")
## rgdal: version: 0.9-1, (SVN revision 518)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.1, released 2014/09/24
## Path to GDAL shared files: /Library/Frameworks/GDAL.framework/Versions/1.11/Resources/gdal
## Loaded PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480]
## Path to PROJ.4 shared files: (autodetected)
The rgdal package also requires installing the GDAL mapping library on the underlying operating system.
How would one best measure relative voter enthusiasm for Allan Kittleman vs. Courtney Watson? One measure would be how each candidate outperformed their “expected” vote, for example, how many votes Kittleman attracted in a given precinct vs. the number of registered Republicans in that precinct, and ditto for Watson vis-a-vis the number of registered Democrats. A related measure would look at Republican turnout (i.e., as a percentage of registered Republicans) in a given precinct vs. Democratic turnout.
In this document I confine myself to looking at simple margins of victory in each precinct. The Maryland State Board of Elections has now made available precinct-level data (in Microsoft Excel format) giving party turnout in the 2014 general election. I’ll take a look at that data in a later analysis.
As I mentioned above, this analysis is for election day voting only. Absentee ballots and votes cast at early voting centers are not included in the per-precinct totals as reported by the Maryland State Board of Elections. I’m not aware of any good method to assign absentee and early voting results to individual precincts.
First I download the CVS-format data file from the Maryland State Board of Elections containing Howard County 2014 general election results by precincts, and store a copy of the data in the local file Howard_By_Precinct_2014_General.csv.
download.file("http://elections.state.md.us/elections/2014/election_data/Howard_By_Precinct_2014_General.csv",
"Howard_By_Precinct_2014_General.csv",
method = "curl")
Then I download spatial data from data.howardcountymd.gov specifying the boundaries of Howard County election precincts and county council districts, using the new council boundaries in effect for the 2014 primary and general elections). (The data.howardcountymd.gov site mistakenly lists these boundaries as not taking effect until December 2014.) I choose to use the GeoJSON format, and store the data locally in the files precincts.json and districts.json.
download.file("https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Voting_Precincts&outputFormat=application/json",
"precincts.json",
method = "curl")
download.file("https://data.howardcountymd.gov/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=general:Council_Districts&outputFormat=application/json",
"districts.json",
method = "curl")
I then read in the CSV file for election results. I remove extraneous spaces from the names of the offices to make it easier to filter the results by office.
hoco_ge14_df <- read.csv("Howard_By_Precinct_2014_General.csv", stringsAsFactors = FALSE)
hoco_ge14_df$Office.Name <- gsub(" *$", "", hoco_ge14_df$Office.Name)
Finally I read in the GeoJSON precinct and council district boundary data.
precincts_spdf <- readOGR("precincts.json", "OGRGeoJSON")
## OGR data source with driver: GeoJSON
## Source: "precincts.json", layer: "OGRGeoJSON"
## with 118 features and 9 fields
## Feature type: wkbPolygon with 2 dimensions
districts_spdf <- readOGR("districts.json", "OGRGeoJSON")
## OGR data source with driver: GeoJSON
## Source: "districts.json", layer: "OGRGeoJSON"
## with 5 features and 5 fields
## Feature type: wkbPolygon with 2 dimensions
Recall that the readOGR() function returns a special type of data structure, a “SpatialPolygonsDataFrame” containing both a regular data frame and a list of polygons containing map data for the precincts.
The processing of the data is identical to that done for the last example I did analyzing Allan Kittleman’s victory margins in the County Executive race. For brevity I consolidate everything into a single data processing pipeline. The operations in the pipeline are as follows:
Office.Name to get the results of the County Executive race.Election.District and Election.Precinct variables into a single variable Precinct having the form ‘0-00’.Party variable in different rows and convert them into column variables REP and DEM, taking the values from the Election.Night.Votes variable.REP.Margin variable containing the absolute Republican margin of victory and a Pct.REP.Margin variable containing the Republican margin of victory in percentage terms (rounded to one digit past the decimal place).ak_margins_df <- hoco_ge14_df %>%
filter(Office.Name == "County Executive") %>%
select(Election.District, Election.Precinct, Party, Election.Night.Votes) %>%
filter(Party != "BOT") %>%
mutate(Precinct = paste(as.character(Election.District),
"-",
formatC(Election.Precinct, width = 2, flag = 0),
sep = "")) %>%
select(-Election.District, -Election.Precinct) %>%
spread(Party, Election.Night.Votes) %>%
mutate(REP.Margin = REP - DEM,
Pct.REP.Margin = round(100 * (REP - DEM) / (REP + DEM), 1))
As a check I print the five precincts in which Allan Kittleman received his highest margins on election day in terms of absolute votes:
ak_margins_df %>% arrange(desc(REP.Margin)) %>% head(5)
## Precinct DEM REP REP.Margin Pct.REP.Margin
## 1 3-02 281 936 655 53.8
## 2 4-04 221 843 622 58.5
## 3 5-19 292 910 618 51.4
## 4 4-03 192 769 577 60.0
## 5 4-05 149 718 569 65.6
and the five precincts in which Courtney Watson received her highest margins on election day in terms of absolute votes:
ak_margins_df %>% arrange(REP.Margin) %>% head(5)
## Precinct DEM REP REP.Margin Pct.REP.Margin
## 1 6-09 511 195 -316 -44.8
## 2 5-03 585 273 -312 -36.4
## 3 6-17 527 216 -311 -41.9
## 4 6-19 503 192 -311 -44.7
## 5 5-04 532 245 -287 -36.9
I also print summary statistics for the entire data set:
summary(ak_margins_df)
## Precinct DEM REP REP.Margin
## Length:118 Min. : 12.0 Min. : 10.0 Min. :-316.00
## Class :character 1st Qu.:209.2 1st Qu.:197.0 1st Qu.:-112.00
## Mode :character Median :303.0 Median :304.5 Median : 18.00
## Mean :313.8 Mean :355.5 Mean : 41.74
## 3rd Qu.:424.5 3rd Qu.:480.0 3rd Qu.: 141.00
## Max. :757.0 Max. :936.0 Max. : 655.00
## Pct.REP.Margin
## Min. :-47.00
## 1st Qu.:-19.98
## Median : 4.15
## Mean : 3.22
## 3rd Qu.: 21.43
## Max. : 65.60
Among other things this gives ranges for Kittleman’s margins in terms of votes (-316 to 655) and percentages (-47% to 65.6%). I use these later when assigning colors to the precincts based on Kittleman’s absolute or percentage margins of victory.
Now comes the fun part: actually mapping the data. First I convert the precincts and council district map data to normal data frames usable with the ggplot() function.
precincts_spdf@data$id <- rownames(precincts_spdf@data)
precincts_points <- fortify(precincts_spdf, region = "id")
precincts_df <- full_join(precincts_points, precincts_spdf@data, by = "id")
districts_spdf@data$id <- rownames(districts_spdf@data)
districts_points <- fortify(districts_spdf, region = "id")
districts_df <- full_join(districts_points, districts_spdf@data, by = "id")
Then I add the margins data to the precinct map data.
precincts_df <- precincts_df %>%
mutate(Precinct = as.character(PRECINCT20)) %>%
left_join(ak_margins_df, by = "Precinct")
Since I want to label the county council districts I next compute the centroids of the districts in order to position the labels on the map.
district_centers = coordinates(districts_spdf)
district_centers_df <- as.data.frame(district_centers)
names(district_centers_df) <- c("long", "lat")
district_centers_df$District = as.character(districts_spdf@data$DISTRICT20)
Next I plot Allan Kittleman’s victory margins by precinct, starting with the absolute margins in votes. This plot contains three layers:
I also tweak the plot as follows:
g <- ggplot() +
geom_polygon(data = precincts_df,
aes(x = long, y = lat, group = group, fill = REP.Margin)) +
geom_polygon(data = districts_df,
aes(x = long, y = lat, group = group),
fill = NA,
colour = "white") +
geom_text(data = district_centers_df,
aes(x = long, y = lat, label = District),
size = 5,
colour = "white",
show_guide = FALSE) +
coord_equal() +
scale_fill_gradient("Margin (Votes)",
limits = c(-700, 700),
low = "blue",
high = "red",
space = "Lab",
guide = "colourbar") +
theme(axis.title = element_blank(), axis.text = element_blank()) +
ggtitle("Allan Kittleman 2014 Margins by Precinct (Votes)")
print(g)
The second plot shows Allan Kittleman’s victory margins in terms of percentage of votes in each precinct. This graph is produced identically to the previous one, except that I use the Pct.REP.Margin variable to color the precincts (instead of REP.Margin) and I set the maximum red color to be used for a 70% winning margin for Kittleman and the maximum blue color to be used for a 70% winning margin for Courtney Watson.
g <- ggplot() +
geom_polygon(data = precincts_df,
aes(x = long, y = lat, group = group, fill = Pct.REP.Margin)) +
geom_polygon(data = districts_df,
aes(x = long, y = lat, group = group),
fill = NA,
colour = "white") +
geom_text(data = district_centers_df,
aes(x = long, y = lat, label = District),
size = 5,
colour = "white",
show_guide = FALSE) +
coord_equal() +
scale_fill_gradient("Margin (% of Vote)",
limits = c(-70, 70),
low = "blue",
high = "red",
space = "Lab",
guide = "colourbar") +
theme(axis.title = element_blank(), axis.text = element_blank()) +
ggtitle("Allan Kittleman 2014 Margins by Precinct (% of Vote)")
print(g)
The maps above look pretty red, not just in county council district 5, a traditional Republican stronghold, but also in large swaths of council districts 1 and 4. The only precincts where Courtney Watson had truly lopsided margins of victory appear to be in the Columbia portions of districts 2, 3, and 4.
While informative as to where Allan Kittleman had the most success (and the least), these plots do exaggerate the size of Kittleman’s margins of victory. That’s because visually the plot is dominated by those precincts having the largest geographic area, which happen to be the precincts in western Howard County in which Kittleman received his largest margins of victory.
One way to address this issue is to draw a cartogram, a special type of map in which areas are distorted so that their size is in proportion to some underlying variable, such as (in this case) the number of registered voters. That’s a project for the future if and when I have time.
I used the following R environment in doing the analysis for this example:
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] maptools_0.8-34 rgdal_0.9-1 sp_1.0-17 ggplot2_1.0.0
## [5] tidyr_0.1 dplyr_0.4.0 RCurl_1.95-4.3 bitops_1.0-6
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 colorspace_1.2-4 DBI_0.3.1 digest_0.6.4
## [5] evaluate_0.5.5 foreign_0.8-61 formatR_1.0 grid_3.1.2
## [9] gtable_0.1.2 htmltools_0.2.6 knitr_1.7 labeling_0.3
## [13] lattice_0.20-29 lazyeval_0.1.10 magrittr_1.0.1 MASS_7.3-35
## [17] munsell_0.4.2 parallel_3.1.2 plyr_1.8.1 proto_0.3-10
## [21] Rcpp_0.11.3 reshape2_1.4 rgeos_0.3-8 rmarkdown_0.5.1
## [25] scales_0.2.4 stringr_0.6.2 tools_3.1.2 yaml_2.1.13
The underlying GDAL library for the rgdal packages is from the KyngChaos GDAL Complete distribution version 1.11 for Mac OS X.
You can find the source code for this analysis and others at my HoCoData repository on GitHub. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.