Introduction

In this document I analyze how the share of the vote for Governor Larry Hogan and Lt. Governor Boyd Rutherford in the 2018 Maryland gubernatorial election varied based on the density of registered voters in each county. I also look at how Hogan’s share of the vote varied based on the voter density of each precinct in Howard County. This analysis is based on election day voting only; it does not include early voting or absentee or provisional ballots.

For those readers unfamiliar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, I’ve included some additional explanation of various steps. For more information check out the the tutorial “Getting started with the Tidyverse”.

Setup and data preparation

Libraries

I use the tidyverse package of functions for general data manipulation and plotting and the knitr package to create a formatted table. I also use the tools package to get the md5sum() function.

library("tidyverse")
library("knitr")
library("tools")

Data sources

I use the precinct-level data for the 2018 Maryland gubernatorial election compiled by the Baltimore Sun Data Desk. For more information on how I obtained this data, see the “References” section.

The main variables of interest to me in this data are as follows (descriptions are from the Baltimore Sun’s README.md file where available, and otherwise are based on my interpretation):

  • JURIS: four-letter code for the jurisdiction (county or Baltimore City)
  • county: county in Maryland where the precinct is located
  • NUMBER: precinct number in its most compact form
  • ghost: = 1 if the precinct is a ghost precinct
  • active_qualified_total: total number of active, qualified voters as of the close of registration
  • hogan: number of votes received by Larry Hogan and Boyd K. Rutherford
  • total_votes: total number of votes received
  • perc_hogan: percentage of votes received by Larry Hogan and Boyd K. Rutherford
  • area_mi: area in square miles of the precinct

As the README.md file notes, the vote totals are from the election day precinct results, and do not reflect early voting or absentee or provisional ballots.

The dataset also contains a variable density that is calculated as the number of people voting (total_votes) divided by the area of the precinct in square miles (area_mi). I do not use this variable in my analysis, instead calculating the voter density using the total number of registered voters. I think this is a better proxy for the population density; it also simplifies calculating voter density for an entire county.

I check to make sure that I have the file I expect, and stop the analysis if not.

stopifnot(md5sum("../maryland-2018-governor-precinct-map/output/results_processed.csv") == "c09cb0e3df2dee03d85aa1aa14838aab")

Reading in and preparing the data

I begin by reading in the CSV file of processed results:

results <- read_csv("../maryland-2018-governor-precinct-map/output/results_processed.csv")
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   file_id = col_character(),
##   JURIS = col_character(),
##   NAME = col_character(),
##   NUMBER = col_character(),
##   preid = col_character(),
##   NUMBER2 = col_character(),
##   area = col_double(),
##   name_1 = col_character(),
##   county = col_character(),
##   juris_1 = col_character(),
##   area_mi = col_double(),
##   perc_hogan = col_double(),
##   perc_jealous = col_double(),
##   perc_quinn = col_double(),
##   perc_schlackman = col_double(),
##   perc_write_in = col_double(),
##   max_perc = col_double(),
##   winner = col_character(),
##   perc = col_double(),
##   density = col_double()
##   # ... with 5 more columns
## )
## See spec(...) for full column specifications.

I first create a table juris_county mapping jurisdiction codes to display names for each jurisdiction, as follows:

  1. Start with the full set of results.
  2. Select only the JURIS and county variables.
  3. Filter for rows in which the county variable is valid. (Some rows have no data for this field.)
  4. Eliminate duplicate rows.
  5. For all jurisdictions except Baltimore City remove the word “County” from the end of the name.
juris_county <- results %>%
  select(JURIS, county) %>%
  filter(!is.na(county)) %>%
  unique() %>%
  mutate(county = str_replace(county, "  *County$", ""))

Data by County

For the analysis by jurisdiction (county or Baltimore City) I need to summarize the data for all precincts in each jurisdiction and compute figures for voter density and for the percentage of the vote for Hogan and Rutherford.

I do this as follows:

  1. Start with the full set of results.
  2. Group the results by jurisdiction.
  3. Summarize the key data for each jurisdiction, removing from the sum any fields without valid values.
  4. Calculate the number of active and qualified voters per square mile in each jurisdiction, along with the percentage of the vote received by Hogan and Rutherford.
  5. Retain only the variables of interest.
  6. Join the resulting table with the juris_county table in order to add the county variable for display purposes.
county_results <- results %>%
  group_by(JURIS) %>%
  summarize(area_mi = sum(area_mi, na.rm = TRUE),
            active_qualified = sum(active_qualified, na.rm = TRUE),
            total_votes = sum(total_votes, na.rm = TRUE),
            hogan = sum(hogan, na.rm = TRUE)) %>%
  mutate(voters_per_sqmi = active_qualified / area_mi,
         perc_hogan = 100 * hogan / total_votes) %>%
  select(JURIS, voters_per_sqmi, perc_hogan) %>%
  inner_join(juris_county, by = "JURIS")

Data by Precinct in Howard County

For the analysis by precinct in Howard County I extract the data as follows:

  1. Start with the full set of results.
  2. Filter for the rows corresponding to Howard County precincts.
  3. Filter for non-ghost precincts.
  4. Filter for precincts that have valid data for area, number of active and qualified voters, and percentage of the vote for Hogan and Rutherford.
  5. Calculate the number of active and qualified voters per square mile in each precinct.
  6. Retain only the variables of interest.
hoco_results <- results %>%
  filter(JURIS == "HOWA") %>%
  filter(ghost == 0) %>%
  filter(!(is.na(area_mi))) %>%
  filter(!(is.na(active_qualified))) %>%
  filter(!(is.na(perc_hogan))) %>%
  mutate(voters_per_sqmi = active_qualified / area_mi) %>%
  select(NUMBER, voters_per_sqmi, perc_hogan)

Analysis

I want to look at the percentage of the vote for Hogan and Rutherford as a function of the density of active and qualified voters in each county and (for Howard County) each precinct.

Hogan 2018 Election Day Vote Share by County Voter Density

I first plot Hogan’s vote against the voters per square mile in each county. I use a logarithmic scale for the voter density because it varies so widely. I also add a trend line and color the county names based on whether Hogan and Rutherford received a majority of the election day vote in that county or not.

(Note that this is not exactly the same as Hogan and Rutherford winning a given county, since with more than two candidates in the race it’s possible to win a county with less than 50% of the vote.)

county_color <- ifelse(county_results$perc_hogan >= 50, "red", "blue")
county_results %>%
  ggplot(aes(x = voters_per_sqmi, y = perc_hogan, label = county)) +
  geom_text(size = 2.5, color = county_color) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_log10() +
  scale_y_continuous(breaks=seq(30,90,10)) +
  annotation_logticks(sides="b") +
  theme_classic() +
  labs(x = "Registered Voters Per Square Mile",
       y = "% Vote for Hogan/Rutherford") +
    ggtitle("Hogan 2018 Election Day Vote Share by County Voter Density")

The correlation coefficient between the two variables is -0.72. The correlation coefficient using the log of the density (i.e., what is being plotted above) is -0.74. Both of these are fairly strong correlations. (The correlation coefficient is negative because Hogan’s election day vote share decreases as the density of registered voters increases.)

The following table shows Hogan’s election day vote share as the voter density increases. The only counties in which Hogan received less than 50% of the vote are those in which the voter density exceeds 1,000 per square mile.

Hogan 2018 Vote Share and Voter Density
County Registered Voters Per Square Mile % Vote for Hogan/Rutherford
Garrett 30 88.6
Dorchester 37 76.1
Somerset 40 71.1
Kent 44 77.5
Caroline 62 83.0
Worcester 81 76.5
Queen Anne’s 96 87.2
Talbot 98 79.2
Allegany 100 84.5
Wicomico 159 69.0
Cecil 186 78.9
Saint Mary’s 194 77.6
Washington 207 79.1
Charles 244 50.5
Frederick 259 70.9
Carroll 267 84.9
Calvert 304 77.6
Harford 410 79.0
Howard 848 59.2
Baltimore 912 63.4
Anne Arundel 929 71.5
Prince George’s 1195 27.2
Montgomery 1304 46.7
Baltimore City 4775 32.1

Hogan 2018 Election Day Vote Share by Precinct Density in Howard County

I next plot Hogan’s election day vote against the voters per square mile in each precinct in Howard County. Again I use a logarithmic scale for the voter density, add a trend line, and color the precinct numbers based on whether Hogan and Rutherford received a majority of the election day vote in that precinct or not. (Again, this is not necessarily the same as winning the precinct.)

precinct_color <- ifelse(hoco_results$perc_hogan >= 50, "red", "blue")
hoco_results %>%
  ggplot(aes(x = voters_per_sqmi, y = perc_hogan, label = NUMBER)) +
  geom_text(size = 1.5, color = precinct_color) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_x_log10() +
  scale_y_continuous(breaks=seq(30,90,10)) +
  theme_classic() +
  annotation_logticks(sides="b") +
  labs(x = "Registered Voters Per Square Mile",
       y = "% Vote for Hogan/Rutherford") +
  ggtitle("Hogan 2018 Election Day Vote Share by Precinct Voter Density (Howard County)")

The correlation coefficient between the two variables is -0.59 (-0.56 using the log of the density). This is not quite as strong a correlation as seen for county voter density.

Caveats

In Maryland votes cast during the early voting period are not allocated to particular precincts. The graphs for Maryland and for Howard County therefore reflect only votes cast on election day.

The data source does not contain a value for the actual population density for each precinct. For this analysis I decided to use the density of registered voters instead, since it was easy to compute both at a precinct and county level and should be at least roughly comparable to population density.

References

The precinct level data is from the maryland-2018-governor-precinct-map Github repository maintained by the Baltimore Sun Data Desk.

I obtained a copy of the data by cloning the repository:

git clone https://github.com/baltimore-sun-data/maryland-2018-governor-precinct-map.git

Environment

I used the following R environment in doing the analysis above:

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.1 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2  knitr_1.20      forcats_0.3.0   stringr_1.3.1  
##  [5] dplyr_0.7.6     purrr_0.2.5     readr_1.1.1     tidyr_0.8.1    
##  [9] tibble_1.4.2    ggplot2_3.0.0   tidyverse_1.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.18     highr_0.7        cellranger_1.1.0 pillar_1.3.0    
##  [5] compiler_3.5.1   plyr_1.8.4       bindr_0.1.1      digest_0.6.16   
##  [9] lubridate_1.7.4  jsonlite_1.5     evaluate_0.11    nlme_3.1-137    
## [13] gtable_0.2.0     lattice_0.20-35  pkgconfig_2.0.2  rlang_0.2.2     
## [17] cli_1.0.0        rstudioapi_0.7   yaml_2.2.0       haven_1.1.2     
## [21] withr_2.1.2      xml2_1.2.0       httr_1.3.1       hms_0.4.2       
## [25] rprojroot_1.3-2  grid_3.5.1       tidyselect_0.2.4 glue_1.3.0      
## [29] R6_2.2.2         readxl_1.1.0     rmarkdown_1.10   modelr_0.1.2    
## [33] magrittr_1.5     backports_1.1.2  scales_1.0.0     htmltools_0.3.6 
## [37] rvest_0.3.2      assertthat_0.2.0 colorspace_1.3-2 stringi_1.2.4   
## [41] lazyeval_0.2.1   munsell_0.5.0    broom_0.5.0      crayon_1.3.4

Source code

You can find the source code for this analysis and others at my politics public Gitlab repository. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.