Introduction

In the 2014 gubernatorial general election in Howard County, Maryland, a subject of interest was relative turnout for Republican vs. Democratic voters, with higher Republican turnout seen as the key for Allan Kittleman’s election as County Executive (not to mention other Republican victories from Larry Hogan on down). A related question is whether turnout was depressed in certain Howard County council districts due to lack of opposition to the incumbent council members. (In particular, in districts 3 and 4 Jen Terrasa and Mary-Kay Sigaty respectively had no declared Republican opponents.)

In part 2 of this series I do some basic exploration of the turnout data, using a version of the statewide precinct-level dataset created in part 1.

Load packages

For this analysis I use the R statistical package run from the RStudio development environment, along with the dplyr and tidyr packages to do data manipulation and the ggplot2 package to create plots.

library("dplyr", warn.conflicts = FALSE)
library("tidyr", warn.conflicts = FALSE)
library("ggplot2")

Data sources

The Maryland State Board of Elections has published a number of turnout-related reports (in both PDF and Microsoft Excel format) as part of its 2014 general election reports. For this analysis I use my version of the state-wide per-precinct turnout statistics [Excel], as found in my hocodata GitHub repository.

download.file(
"https://raw.githubusercontent.com/frankhecker/hocodata/master/datasets/gg14-turnout-by-party-by-precinct.csv",
              "gg14-turnout-by-party-by-precinct.csv",
              method = "wget")

I then load the CSV file into the dataframe precinct_turnout.

precinct_turnout <- read.csv("gg14-turnout-by-party-by-precinct.csv")

Exploring turnout by party statewide

As a first step I look at turnout statistics for the entire state of Maryland by party, creating a new dataframe party_turnout as follows:

party_turnout <- precinct_turnout %>%
    group_by(Party) %>%
    summarise(Polls = sum(Polls),
              Early_Voting = sum(Early_Voting),
              Absentee = sum(Absentee),
              Provisional = sum(Provisional),
              Eligible_Voters = sum(Eligible_Voters)) %>%
    mutate(Actual_Voters = Polls + Early_Voting + Absentee + Provisional,
           Turnout = round(100 * Actual_Voters / Eligible_Voters, 1))
print.data.frame(party_turnout)
##           Party  Polls Early_Voting Absentee Provisional Eligible_Voters
## 1      Democrat 711724       189188    30866       20962         2036281
## 2         Green   2172          319      116         100            8445
## 3   Libertarian   4372          589      168         200           14477
## 4 Other Parties   9329         2200      966         248           34470
## 5    Republican 445609        87039    16963        7879          949564
## 6  Unaffiliated 177724        28330     5572        5673          658428
##   Actual_Voters Turnout
## 1        952740    46.8
## 2          2707    32.1
## 3          5329    36.8
## 4         12743    37.0
## 5        557490    58.7
## 6        217299    33.0

The total number of voters who are members of the smaller parties (Greens, Libertarians, and other parties) is very small compared to the number of unaffiliated voters, much less the number of Democrats and Republicans. Also, the turnout levels for the smaller parties (32.1%, 36.8%, and 37% respectively) are more similar to the turnout for unaffiliated voters (33%) than to the turnout for Democrats (46.8%) or Republicans (58.7%). I therefore recreate the party_turnout dataframe, this time assigning precinct-level data for the smaller parties into the “Other” category along with unaffiliated voters, and recalculating the turnout statistics:

party_categories <- c("Democrat" = "Democrat",
                      "Green" = "Other",
                      "Libertarian" = "Other",
                      "Other Parties" = "Other",
                      "Republican" = "Republican",
                      "Unaffiliated" = "Other")
party_turnout <- precinct_turnout %>%
    mutate(Party = party_categories[Party]) %>%
    group_by(Party) %>%
    summarise(Polls = sum(Polls),
              Early_Voting = sum(Early_Voting),
              Absentee = sum(Absentee),
              Provisional = sum(Provisional),
              Eligible_Voters = sum(Eligible_Voters)) %>%
    mutate(Actual_Voters = Polls + Early_Voting + Absentee + Provisional,
           Turnout = round(100 * Actual_Voters / Eligible_Voters, 1))
print.data.frame(party_turnout)
##        Party  Polls Early_Voting Absentee Provisional Eligible_Voters
## 1   Democrat 711724       189188    30866       20962         2036281
## 2      Other 193597        31438     6822        6221          715820
## 3 Republican 445609        87039    16963        7879          949564
##   Actual_Voters Turnout
## 1        952740    46.8
## 2        238078    33.3
## 3        557490    58.7

I then create a simple bar chart showing statewide turnout percentages for Democrats, Republicans, and other voters:

ggplot(party_turnout, aes(x = Party, y = Turnout)) +
    geom_bar(stat = "identity")

There were clearly significant differences in turnout among the three groups of voters, with Republicans turnout out at the highest rate. Of course there were many more registered Democrats (2036281) than Republicans (949564) or other voters (715820). Howver due to the differences in turnout the Democratic edge among those actually voting was smaller, as shown in the following graph.

ggplot(party_turnout, aes(x = Party, y = Actual_Voters)) +
    geom_bar(stat = "identity")

The total number of Democrats actually voting (952740) was still larger than Republicans and other voters combined (557490 + 238078 = 795568) though.

Exploring turnout by party in Howard County

Next I repeat the analysis above using only data for Howard County, filtering on the precinct_turnout dataframe to create a hoco_party_turnout dataframe. As when creating the state-wide party_turnout dataframe, I again assign precinct-level data for the smaller parties into the “Other” category along with unaffiliated voters:

hoco_party_turnout <- precinct_turnout %>%
    filter(LBE == "Howard") %>%
    mutate(Party = party_categories[Party]) %>%
    group_by(Party) %>%
    summarise(Polls = sum(Polls),
              Early_Voting = sum(Early_Voting),
              Absentee = sum(Absentee),
              Provisional = sum(Provisional),
              Eligible_Voters = sum(Eligible_Voters)) %>%
    mutate(Actual_Voters = Polls + Early_Voting + Absentee + Provisional,
           Turnout = round(100 * Actual_Voters / Eligible_Voters, 1))
print.data.frame(hoco_party_turnout)
##        Party Polls Early_Voting Absentee Provisional Eligible_Voters
## 1   Democrat 37784        12279     1249         853           93408
## 2      Other 15087         2969      399         417           46592
## 3 Republican 27657         6183      833         448           55440
##   Actual_Voters Turnout
## 1         52165    55.8
## 2         18872    40.5
## 3         35121    63.3

I then create a simple bar chart showing Howard County turnout percentages for Democrats, Republicans, and other voters:

ggplot(hoco_party_turnout, aes(x = Party, y = Turnout)) +
    geom_bar(stat = "identity")

Voters in all three groups turned out at a higher rate in Howard County than the corresponding groups state-wide. Interestingly, Democratic turnout in Howard County was significantly higher in percentage terms than statewide Democratic turnout (55.8 vs. 46.8), while Republican turnout was only somewhat higher in percentage terms (63.3 vs. 58.7).

However this effect was offset by the greater number of Republican registered voters vs. Democratic registered voters: Unlike the case statewide, the number of Democratic registered voters in Howard County (93408) was less than the combined number of Republican registered voters and other registered voters (55440 + 46592 = 110880).

The differences in those actually voting is shown in the following graph.

ggplot(hoco_party_turnout, aes(x = Party, y = Actual_Voters)) +
    geom_bar(stat = "identity")

Unlike the case state-wide, in Howard County the total number of Democrats actually voting (52165) was smaller than the number of Republicans and other voters combined (35121 + 18872 = 53993).

Appendix

I used the following R environment in doing the analysis for this example:

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_1.0.1  tidyr_0.3.1    dplyr_0.4.3    RCurl_1.95-4.7
## [5] bitops_1.0-6  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2      knitr_1.11       magrittr_1.5     MASS_7.3-43     
##  [5] munsell_0.4.2    colorspace_1.2-6 R6_2.1.1         stringr_1.0.0   
##  [9] plyr_1.8.3       tools_3.2.2      parallel_3.2.2   grid_3.2.2      
## [13] gtable_0.1.2     DBI_0.3.1        htmltools_0.2.6  lazyeval_0.1.10 
## [17] yaml_2.1.13      assertthat_0.1   digest_0.6.8     reshape2_1.4.1  
## [21] formatR_1.2.1    evaluate_0.8     rmarkdown_0.8.1  labeling_0.3    
## [25] stringi_1.0-1    scales_0.3.0     proto_0.3-10

You can find the source code for this analysis and others at my HoCoData repository on GitHub. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.