In the 2014 gubernatorial general election in Howard County, Maryland, a subject of interest was relative turnout for Republican vs. Democratic voters, with higher Republican turnout seen as the key for Allan Kittleman’s election as County Executive (not to mention other Republican victories from Larry Hogan on down). A related question is whether turnout was depressed in certain Howard County council districts due to lack of opposition to the incumbent council members. (In particular, in districts 3 and 4 Jen Terrasa and Mary-Kay Sigaty respectively had no declared Republican opponents.)
In part 2 of this series I do some basic exploration of the turnout data, using a version of the statewide precinct-level dataset created in part 1.
For this analysis I use the R statistical package run from the RStudio development environment, along with the dplyr and tidyr packages to do data manipulation and the ggplot2 package to create plots.
library("dplyr", warn.conflicts = FALSE)
library("tidyr", warn.conflicts = FALSE)
library("ggplot2")
The Maryland State Board of Elections has published a number of turnout-related reports (in both PDF and Microsoft Excel format) as part of its 2014 general election reports. For this analysis I use my version of the state-wide per-precinct turnout statistics [Excel], as found in my hocodata GitHub repository.
download.file(
"https://raw.githubusercontent.com/frankhecker/hocodata/master/datasets/gg14-turnout-by-party-by-precinct.csv",
"gg14-turnout-by-party-by-precinct.csv",
method = "wget")
I then load the CSV file into the dataframe precinct_turnout.
precinct_turnout <- read.csv("gg14-turnout-by-party-by-precinct.csv")
As a first step I look at turnout statistics for the entire state of Maryland by party, creating a new dataframe party_turnout as follows:
Polls, etc., to total the number of people casting votes at the polls, etc.Actual_Voters and Turnout to hold the the total numbers of people voting and those totals as a percentage of eligible voters.party_turnout <- precinct_turnout %>%
group_by(Party) %>%
summarise(Polls = sum(Polls),
Early_Voting = sum(Early_Voting),
Absentee = sum(Absentee),
Provisional = sum(Provisional),
Eligible_Voters = sum(Eligible_Voters)) %>%
mutate(Actual_Voters = Polls + Early_Voting + Absentee + Provisional,
Turnout = round(100 * Actual_Voters / Eligible_Voters, 1))
print.data.frame(party_turnout)
## Party Polls Early_Voting Absentee Provisional Eligible_Voters
## 1 Democrat 711724 189188 30866 20962 2036281
## 2 Green 2172 319 116 100 8445
## 3 Libertarian 4372 589 168 200 14477
## 4 Other Parties 9329 2200 966 248 34470
## 5 Republican 445609 87039 16963 7879 949564
## 6 Unaffiliated 177724 28330 5572 5673 658428
## Actual_Voters Turnout
## 1 952740 46.8
## 2 2707 32.1
## 3 5329 36.8
## 4 12743 37.0
## 5 557490 58.7
## 6 217299 33.0
The total number of voters who are members of the smaller parties (Greens, Libertarians, and other parties) is very small compared to the number of unaffiliated voters, much less the number of Democrats and Republicans. Also, the turnout levels for the smaller parties (32.1%, 36.8%, and 37% respectively) are more similar to the turnout for unaffiliated voters (33%) than to the turnout for Democrats (46.8%) or Republicans (58.7%). I therefore recreate the party_turnout dataframe, this time assigning precinct-level data for the smaller parties into the “Other” category along with unaffiliated voters, and recalculating the turnout statistics:
party_categories <- c("Democrat" = "Democrat",
"Green" = "Other",
"Libertarian" = "Other",
"Other Parties" = "Other",
"Republican" = "Republican",
"Unaffiliated" = "Other")
party_turnout <- precinct_turnout %>%
mutate(Party = party_categories[Party]) %>%
group_by(Party) %>%
summarise(Polls = sum(Polls),
Early_Voting = sum(Early_Voting),
Absentee = sum(Absentee),
Provisional = sum(Provisional),
Eligible_Voters = sum(Eligible_Voters)) %>%
mutate(Actual_Voters = Polls + Early_Voting + Absentee + Provisional,
Turnout = round(100 * Actual_Voters / Eligible_Voters, 1))
print.data.frame(party_turnout)
## Party Polls Early_Voting Absentee Provisional Eligible_Voters
## 1 Democrat 711724 189188 30866 20962 2036281
## 2 Other 193597 31438 6822 6221 715820
## 3 Republican 445609 87039 16963 7879 949564
## Actual_Voters Turnout
## 1 952740 46.8
## 2 238078 33.3
## 3 557490 58.7
I then create a simple bar chart showing statewide turnout percentages for Democrats, Republicans, and other voters:
ggplot(party_turnout, aes(x = Party, y = Turnout)) +
geom_bar(stat = "identity")
There were clearly significant differences in turnout among the three groups of voters, with Republicans turnout out at the highest rate. Of course there were many more registered Democrats (2036281) than Republicans (949564) or other voters (715820). Howver due to the differences in turnout the Democratic edge among those actually voting was smaller, as shown in the following graph.
ggplot(party_turnout, aes(x = Party, y = Actual_Voters)) +
geom_bar(stat = "identity")
The total number of Democrats actually voting (952740) was still larger than Republicans and other voters combined (557490 + 238078 = 795568) though.
Next I repeat the analysis above using only data for Howard County, filtering on the precinct_turnout dataframe to create a hoco_party_turnout dataframe. As when creating the state-wide party_turnout dataframe, I again assign precinct-level data for the smaller parties into the “Other” category along with unaffiliated voters:
hoco_party_turnout <- precinct_turnout %>%
filter(LBE == "Howard") %>%
mutate(Party = party_categories[Party]) %>%
group_by(Party) %>%
summarise(Polls = sum(Polls),
Early_Voting = sum(Early_Voting),
Absentee = sum(Absentee),
Provisional = sum(Provisional),
Eligible_Voters = sum(Eligible_Voters)) %>%
mutate(Actual_Voters = Polls + Early_Voting + Absentee + Provisional,
Turnout = round(100 * Actual_Voters / Eligible_Voters, 1))
print.data.frame(hoco_party_turnout)
## Party Polls Early_Voting Absentee Provisional Eligible_Voters
## 1 Democrat 37784 12279 1249 853 93408
## 2 Other 15087 2969 399 417 46592
## 3 Republican 27657 6183 833 448 55440
## Actual_Voters Turnout
## 1 52165 55.8
## 2 18872 40.5
## 3 35121 63.3
I then create a simple bar chart showing Howard County turnout percentages for Democrats, Republicans, and other voters:
ggplot(hoco_party_turnout, aes(x = Party, y = Turnout)) +
geom_bar(stat = "identity")
Voters in all three groups turned out at a higher rate in Howard County than the corresponding groups state-wide. Interestingly, Democratic turnout in Howard County was significantly higher in percentage terms than statewide Democratic turnout (55.8 vs. 46.8), while Republican turnout was only somewhat higher in percentage terms (63.3 vs. 58.7).
However this effect was offset by the greater number of Republican registered voters vs. Democratic registered voters: Unlike the case statewide, the number of Democratic registered voters in Howard County (93408) was less than the combined number of Republican registered voters and other registered voters (55440 + 46592 = 110880).
The differences in those actually voting is shown in the following graph.
ggplot(hoco_party_turnout, aes(x = Party, y = Actual_Voters)) +
geom_bar(stat = "identity")
Unlike the case state-wide, in Howard County the total number of Democrats actually voting (52165) was smaller than the number of Republicans and other voters combined (35121 + 18872 = 53993).
I used the following R environment in doing the analysis for this example:
sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_1.0.1 tidyr_0.3.1 dplyr_0.4.3 RCurl_1.95-4.7
## [5] bitops_1.0-6
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.2 knitr_1.11 magrittr_1.5 MASS_7.3-43
## [5] munsell_0.4.2 colorspace_1.2-6 R6_2.1.1 stringr_1.0.0
## [9] plyr_1.8.3 tools_3.2.2 parallel_3.2.2 grid_3.2.2
## [13] gtable_0.1.2 DBI_0.3.1 htmltools_0.2.6 lazyeval_0.1.10
## [17] yaml_2.1.13 assertthat_0.1 digest_0.6.8 reshape2_1.4.1
## [21] formatR_1.2.1 evaluate_0.8 rmarkdown_0.8.1 labeling_0.3
## [25] stringi_1.0-1 scales_0.3.0 proto_0.3-10
You can find the source code for this analysis and others at my HoCoData repository on GitHub. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.