This analysis uses precinct-level results from the Howard County 2014 general election (courtesy of the Maryland State Board of Elections) to look at Allan Kittleman’s margins of victory across the county on election day. I’m interested in the general question of whether there seemed to be an “enthusiasm gap” in which Kittleman’s election-day results were particularly lopsided, e.g., due to increased turnout of Republican voters or unusually high support for Kittleman from Democrats and unaffiliated voters.
In this document I present the data in the form of histograms.
For this analysis I use the R statistical package run from the RStudio development environment, along with the dplyr and tidyr packages to do data manipulation and the ggplot2 package to draw the histograms.
library("dplyr", warn.conflicts = FALSE)
library("tidyr")
library("ggplot2")
How would one best measure relative voter enthusiasm for Allan Kittleman vs. Courtney Watson? One measure would be how each candidate outperformed their “expected” vote on election day, for example, how many votes Kittleman attracted in a given precinct vs. the number of registered Republicans in that precinct, and ditto for Watson vis-a-vis the number of registered Democrats. A related measure would look at Republican turnout (i.e., as a percentage of registered Republicans) in a given precinct vs. Democratic turnout.
In this document I confine myself to looking at simple margins of victory in each precinct. The Maryland State Board of Elections has made available precinct-level data (in Microsoft Excel format) giving party turnout in the 2014 general election. I’ll take a look at that data later.
As I mentioned above, this analysis is for election day voting only. Absentee ballots and votes cast at early voting centers are not included in the per-precinct totals.
First I download the CSV-format data containing Howard County 2014 general election results, and store a copy of the data in the local file Howard_By_Precinct_2014_General.csv
.
download.file("http://elections.state.md.us/elections/2014/election_data/Howard_By_Precinct_2014_General.csv",
"Howard_By_Precinct_2014_General.csv",
method = "curl")
I then read in the CSV file for election results, and remove extraneous spaces from the names of the offices.
hoco_g14_df <- read.csv("Howard_By_Precinct_2014_General.csv", stringsAsFactors = FALSE)
hoco_g14_df$Office.Name <- gsub(" *$", "", hoco_g14_df$Office.Name)
str(hoco_g14_df)
## 'data.frame': 6887 obs. of 11 variables:
## $ County : int 14 14 14 14 14 14 14 14 14 14 ...
## $ Election.District : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Election.Precinct : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Candidate.Name : chr "Anthony G. Brown" "Larry Hogan" "Shawn Quinn" "Charles U. Smith" ...
## $ Party : chr "DEM" "REP" "LIB" "DEM" ...
## $ Office.Name : chr "Governor / Lt. Governor" "Governor / Lt. Governor" "Governor / Lt. Governor" "Governor / Lt. Governor" ...
## $ Office.District : chr "" "" "" "" ...
## $ Winner : chr "" "Y" "" "" ...
## $ Write.In. : chr "" "" "" "Y" ...
## $ Election.Night.Votes : int 333 617 22 0 0 1 512 431 0 2 ...
## $ Election.Night.Votes.Against: int NA NA NA NA NA NA NA NA NA NA ...
I start by looking at the results for the County Executive race. I filter the rows based on Office.Name
and select only the columns of interest.
temp1_df <- hoco_g14_df %>%
filter(Office.Name == "County Executive") %>%
select(Election.District, Election.Precinct, Party, Election.Night.Votes)
head(temp1_df)
## Election.District Election.Precinct Party Election.Night.Votes
## 1 1 1 DEM 445
## 2 1 1 REP 512
## 3 1 1 BOT 0
## 4 1 2 DEM 98
## 5 1 2 REP 153
## 6 1 2 BOT 0
The rows where the Party
variable has the value ‘BOT’ correspond to votes for write-in candidates. I do a quick summarization to show that the number of write-in votes is pretty small:
temp1_df %>% group_by(Party) %>% summarize(Votes = sum(Election.Night.Votes))
## Source: local data frame [3 x 2]
##
## Party Votes
## 1 BOT 81
## 2 DEM 37026
## 3 REP 41951
To simplify the analysis I filter out the write-in votes. I also combine the Election.District
and Election.Precinct
variables into a single variable Precinct
having the form ‘0-00’, and then discard the original variables:
temp2_df <- temp1_df %>%
filter(Party != "BOT") %>%
mutate(Precinct = paste(as.character(Election.District),
"-",
formatC(Election.Precinct, width = 2, flag = 0),
sep = "")) %>%
select(-Election.District, -Election.Precinct)
head(temp2_df)
## Party Election.Night.Votes Precinct
## 1 DEM 445 1-01
## 2 REP 512 1-01
## 3 DEM 98 1-02
## 4 REP 153 1-02
## 5 DEM 377 1-03
## 6 REP 362 1-03
The problem now is that I want to compute margins of victory, and to do that most easily I need to have both parties’ vote totals on the same row. Enter the spread()
function from the tidyr package. It takes the values of the Party
variables in different rows and converts them into multiple column variables named after the parties themselves. The values for the new variables are taken from the Election.Night.Votes
variables in the original rows for the parties.
temp3_df <- temp2_df %>%
spread(Party, Election.Night.Votes)
head(temp3_df)
## Precinct DEM REP
## 1 1-01 445 512
## 2 1-02 98 153
## 3 1-03 377 362
## 4 1-04 564 772
## 5 1-05 121 170
## 6 1-06 304 326
The final calculation is pretty simple: I just compute a Rep.Margin
variable containing the absolute Republican margin of victory and a Pct.Rep.Margin
variable containing the Republican margin of victory in percentage terms.
ak_margins_df <- temp3_df %>%
mutate(Rep.Margin = REP - DEM,
Pct.Rep.Margin = round(100 * (REP - DEM) / (REP + DEM), 1))
head(ak_margins_df)
## Precinct DEM REP Rep.Margin Pct.Rep.Margin
## 1 1-01 445 512 67 7.0
## 2 1-02 98 153 55 21.9
## 3 1-03 377 362 -15 -2.0
## 4 1-04 564 772 208 15.6
## 5 1-05 121 170 49 16.8
## 6 1-06 304 326 22 3.5
Now I want to see how precincts varied in terms of the Republican margin of victory. The easiest way to do this is using a histogram; I first look at the percentage margins of victory:
g <- ggplot(ak_margins_df, aes(x = Pct.Rep.Margin))
g <- g + geom_histogram()
print(g)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
The x axis corresponds to Allan Kittleman’s different margins of victory in various precincts, and the height of the bars corresponds to the number of precincts for which Kittleman’s margins of victory were in a given range.
To help better see this I change the width of the bars so they cover ranges of 10% (0-9%, 10-19%, and so on).
g <- ggplot(ak_margins_df, aes(x = Pct.Rep.Margin))
g <- g + geom_histogram(binwidth = 10)
g <- g + xlab("Margin of Victory (Percentage)")
g <- g + ylab("Number of Precincts")
g <- g + ggtitle("Allan Kittleman Precinct-Level Margins of Victory (%)")
print(g)
A couple of interesting points about this histogram: First, there were no precincts in which Courtney Watson ran 50 or more percentage points ahead of Allan Kittleman, and only 5 precincts in which her margin of victory was greater than 40%. On the other hand Kittleman had what looks like 12 precincts in which he ran more than 40% ahead of Watson.
I confirm this by sorting and filtering the ak_margins_df
data frame to show where Kittleman won by more than 40%:
ak_margins_df %>%
arrange(desc(Pct.Rep.Margin)) %>%
filter(Pct.Rep.Margin >= 40.0)
## Precinct DEM REP Rep.Margin Pct.Rep.Margin
## 1 4-05 149 718 569 65.6
## 2 4-03 192 769 577 60.0
## 3 4-01 146 558 412 58.5
## 4 4-04 221 843 622 58.5
## 5 3-06 198 688 490 55.3
## 6 4-02 215 737 522 54.8
## 7 3-02 281 936 655 53.8
## 8 4-06 249 800 551 52.5
## 9 5-19 292 910 618 51.4
## 10 5-20 242 596 354 42.2
## 11 3-01 219 537 318 42.1
## 12 5-11 206 487 281 40.5
and where he lost by more than 40%:
ak_margins_df %>%
arrange(Pct.Rep.Margin) %>%
filter(Pct.Rep.Margin <= -40.0)
## Precinct DEM REP Rep.Margin Pct.Rep.Margin
## 1 6-21 297 107 -190 -47.0
## 2 6-09 511 195 -316 -44.8
## 3 6-19 503 192 -311 -44.7
## 4 6-17 527 216 -311 -41.9
## 5 6-34 390 166 -224 -40.3
The other interesting thing in the histogram has to do with the precincts in which the results were relatively close (0-10% margin either way). It looks as if Allan Kittleman carried about twice as many of these swing precincts as Courtney Watson.
For completeness I also show summary statistics for the percentage margins of victory:
summary(ak_margins_df$Pct.Rep.Margin)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -47.00 -19.98 4.15 3.22 21.42 65.60
The maximum and minimum margins of victory are as listed above. The median value of 4.15 means that on election day Allan Kittleman carried half of all precincts by a 4.15% or greater margin.
Next I look at victory margins in terms of absolute numbers of votes as opposed to percentages.
g <- ggplot(ak_margins_df, aes(x = Rep.Margin))
g <- g + geom_histogram(binwidth = 100)
g <- g + xlab("Margin of Victory (Votes)")
g <- g + ylab("Number of Precincts")
g <- g + ggtitle("Allan Kittleman Precinct-Level Margins of Victory (Votes)")
print(g)
As with the other histogram this histogram is skewed to the right (both literally and figuratively). There were no precincts in which Courtney Watson’s election-day margin was more than 400 votes, but several where Allan Kittleman’s election day margin of victory was that large or larger.
There are 118 precincts in Howard County. In voting on election day Allan Kittleman won truly lopsided victories (40% or more winning margin) in about 10% of them, more than twice as many as Courtney Watson. This is consistent with a greater number of precincts that are overwhelmingly Republican (vs. Democratic), higher Republican turnout (vs. Democratic turnout), or a decided move to Kittleman on the part of swing voters. The evidence presented thus far isn’t sufficient to distinguish among these possibilities.
In my next document I’ll present the same data as above, but in the form of a precinct map.
I used the following R environment in doing the analysis for this example:
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_1.0.0 tidyr_0.1 dplyr_0.4.0 RCurl_1.95-4.3
## [5] bitops_1.0-6
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 colorspace_1.2-4 DBI_0.3.1 digest_0.6.4
## [5] evaluate_0.5.5 formatR_1.0 grid_3.1.2 gtable_0.1.2
## [9] htmltools_0.2.6 knitr_1.7 labeling_0.3 lazyeval_0.1.10
## [13] magrittr_1.0.1 MASS_7.3-35 munsell_0.4.2 parallel_3.1.2
## [17] plyr_1.8.1 proto_0.3-10 Rcpp_0.11.3 reshape2_1.4
## [21] rmarkdown_0.5.1 scales_0.2.4 stringr_0.6.2 tools_3.1.2
## [25] yaml_2.1.13
You can find the source code for this analysis and others at my HoCoData repository on GitHub. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you‘d like with it.