In this document I analyze how the share of the vote for Governor Larry Hogan and Lt. Governor Boyd Rutherford in the 2018 Maryland gubernatorial election varied based on the density of registered voters in each county. I also look at how Hogan’s share of the vote varied based on the voter density of each precinct in Howard County. This analysis is based on election day voting only; it does not include early voting or absentee or provisional ballots.
For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.
I use the tidyverse package of functions for general data manipulation and plotting and the knitr package to create a formatted table. I also use the tools package to get the md5sum()
I use the precinct-level data for the 2018 Maryland gubernatorial election compiled by the Baltimore Sun Data Desk. For more information on how I obtained this data, see the “References” section.
The main variables of interest to me in this data are as follows (descriptions are from the Baltimore Sun’s
file where available, and otherwise are based on my interpretation):
: four-letter code for the jurisdiction (county or Baltimore City)county
: county in Maryland where the precinct is locatedNUMBER
: precinct number in its most compact formghost
: = 1 if the precinct is a ghost precinctactive_qualified_total
: total number of active, qualified voters as of the close of registrationhogan
: number of votes received by Larry Hogan and Boyd K. Rutherfordtotal_votes
: total number of votes receivedperc_hogan
: percentage of votes received by Larry Hogan and Boyd K. Rutherfordarea_mi
: area in square miles of the precinctAs the
file notes, the vote totals are from the election day precinct results, and do not reflect early voting or absentee or provisional ballots.
The dataset also contains a variable density
that is calculated as the number of people voting (total_votes
) divided by the area of the precinct in square miles (area_mi
). I do not use this variable in my analysis, instead calculating the voter density using the total number of registered voters. I think this is a better proxy for the population density; it also simplifies calculating voter density for an entire county.
I check to make sure that I have the file I expect, and stop the analysis if not.
stopifnot(md5sum("../maryland-2018-governor-precinct-map/output/results_processed.csv") == "c09cb0e3df2dee03d85aa1aa14838aab")
I begin by reading in the CSV file of processed results:
results <- read_csv("../maryland-2018-governor-precinct-map/output/results_processed.csv")
## Parsed with column specification:
## cols(
## .default = col_integer(),
## file_id = col_character(),
## JURIS = col_character(),
## NAME = col_character(),
## NUMBER = col_character(),
## preid = col_character(),
## NUMBER2 = col_character(),
## area = col_double(),
## name_1 = col_character(),
## county = col_character(),
## juris_1 = col_character(),
## area_mi = col_double(),
## perc_hogan = col_double(),
## perc_jealous = col_double(),
## perc_quinn = col_double(),
## perc_schlackman = col_double(),
## perc_write_in = col_double(),
## max_perc = col_double(),
## winner = col_character(),
## perc = col_double(),
## density = col_double()
## # ... with 5 more columns
## )
## See spec(...) for full column specifications.
I first create a table juris_county
mapping jurisdiction codes to display names for each jurisdiction, as follows:
and county
variable is valid. (Some rows have no data for this field.)juris_county <- results %>%
select(JURIS, county) %>%
filter(! %>%
unique() %>%
mutate(county = str_replace(county, " *County$", ""))
For the analysis by jurisdiction (county or Baltimore City) I need to summarize the data for all precincts in each jurisdiction and compute figures for voter density and for the percentage of the vote for Hogan and Rutherford.
I do this as follows:
table in order to add the county
variable for display purposes.county_results <- results %>%
group_by(JURIS) %>%
summarize(area_mi = sum(area_mi, na.rm = TRUE),
active_qualified = sum(active_qualified, na.rm = TRUE),
total_votes = sum(total_votes, na.rm = TRUE),
hogan = sum(hogan, na.rm = TRUE)) %>%
mutate(voters_per_sqmi = active_qualified / area_mi,
perc_hogan = 100 * hogan / total_votes) %>%
select(JURIS, voters_per_sqmi, perc_hogan) %>%
inner_join(juris_county, by = "JURIS")
For the analysis by precinct in Howard County I extract the data as follows:
hoco_results <- results %>%
filter(JURIS == "HOWA") %>%
filter(ghost == 0) %>%
filter(!( %>%
filter(!( %>%
filter(!( %>%
mutate(voters_per_sqmi = active_qualified / area_mi) %>%
select(NUMBER, voters_per_sqmi, perc_hogan)
I want to look at the percentage of the vote for Hogan and Rutherford as a function of the density of active and qualified voters in each county and (for Howard County) each precinct.
In Maryland votes cast during the early voting period are not allocated to particular precincts. The graphs for Maryland and for Howard County therefore reflect only votes cast on election day.
The data source does not contain a value for the actual population density for each precinct. For this analysis I decided to use the density of registered voters instead, since it was easy to compute both at a precinct and county level and should be at least roughly comparable to population density.
The precinct level data is from the maryland-2018-governor-precinct-map Github repository maintained by the Baltimore Sun Data Desk.
I obtained a copy of the data by cloning the repository:
git clone
I used the following R environment in doing the analysis above:
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.1 LTS
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/
## locale:
## attached base packages:
## [1] tools stats graphics grDevices utils datasets methods
## [8] base
## other attached packages:
## [1] bindrcpp_0.2.2 knitr_1.20 forcats_0.3.0 stringr_1.3.1
## [5] dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1
## [9] tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.18 highr_0.7 cellranger_1.1.0 pillar_1.3.0
## [5] compiler_3.5.1 plyr_1.8.4 bindr_0.1.1 digest_0.6.16
## [9] lubridate_1.7.4 jsonlite_1.5 evaluate_0.11 nlme_3.1-137
## [13] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.2 rlang_0.2.2
## [17] cli_1.0.0 rstudioapi_0.7 yaml_2.2.0 haven_1.1.2
## [21] withr_2.1.2 xml2_1.2.0 httr_1.3.1 hms_0.4.2
## [25] rprojroot_1.3-2 grid_3.5.1 tidyselect_0.2.4 glue_1.3.0
## [29] R6_2.2.2 readxl_1.1.0 rmarkdown_1.10 modelr_0.1.2
## [33] magrittr_1.5 backports_1.1.2 scales_1.0.0 htmltools_0.3.6
## [37] rvest_0.3.2 assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4
## [41] lazyeval_0.2.1 munsell_0.5.0 broom_0.5.0 crayon_1.3.4
You can find the source code for this analysis and others at my politics public Gitlab repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.