Introduction

This analysis uses precinct-level results from the Howard County 2014 general election (courtesy of the Maryland State Board of Elections) to look at Allan Kittleman’s margins of victory across the county on election day. I’m interested in the general question of whether there seemed to be an “enthusiasm gap” in which Kittleman’s election-day results were particularly lopsided, e.g., due to increased turnout of Republican voters or unusually high support for Kittleman from Democrats and unaffiliated voters.

In this document I present the data in the form of histograms.

Load packages

For this analysis I use the R statistical package run from the RStudio development environment, along with the dplyr and tidyr packages to do data manipulation and the ggplot2 package to draw the histograms.

library("dplyr", warn.conflicts = FALSE)
library("tidyr")
library("ggplot2")

General approach

How would one best measure relative voter enthusiasm for Allan Kittleman vs. Courtney Watson? One measure would be how each candidate outperformed their “expected” vote on election day, for example, how many votes Kittleman attracted in a given precinct vs. the number of registered Republicans in that precinct, and ditto for Watson vis-a-vis the number of registered Democrats. A related measure would look at Republican turnout (i.e., as a percentage of registered Republicans) in a given precinct vs. Democratic turnout.

In this document I confine myself to looking at simple margins of victory in each precinct. The Maryland State Board of Elections has made available precinct-level data (in Microsoft Excel format) giving party turnout in the 2014 general election. I’ll take a look at that data later.

As I mentioned above, this analysis is for election day voting only. Absentee ballots and votes cast at early voting centers are not included in the per-precinct totals.

Loading the data

First I download the CSV-format data containing Howard County 2014 general election results, and store a copy of the data in the local file Howard_By_Precinct_2014_General.csv.

download.file("http://elections.state.md.us/elections/2014/election_data/Howard_By_Precinct_2014_General.csv",
              "Howard_By_Precinct_2014_General.csv",
              method = "curl")

I then read in the CSV file for election results, and remove extraneous spaces from the names of the offices.

hoco_g14_df <- read.csv("Howard_By_Precinct_2014_General.csv", stringsAsFactors = FALSE)
hoco_g14_df$Office.Name <- gsub("  *$", "", hoco_g14_df$Office.Name)
str(hoco_g14_df)
## 'data.frame':    6887 obs. of  11 variables:
##  $ County                      : int  14 14 14 14 14 14 14 14 14 14 ...
##  $ Election.District           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Election.Precinct           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Candidate.Name              : chr  "Anthony G. Brown" "Larry Hogan" "Shawn Quinn" "Charles U. Smith" ...
##  $ Party                       : chr  "DEM" "REP" "LIB" "DEM" ...
##  $ Office.Name                 : chr  "Governor / Lt. Governor" "Governor / Lt. Governor" "Governor / Lt. Governor" "Governor / Lt. Governor" ...
##  $ Office.District             : chr  "" "" "" "" ...
##  $ Winner                      : chr  "" "Y" "" "" ...
##  $ Write.In.                   : chr  "" "" "" "Y" ...
##  $ Election.Night.Votes        : int  333 617 22 0 0 1 512 431 0 2 ...
##  $ Election.Night.Votes.Against: int  NA NA NA NA NA NA NA NA NA NA ...

Data processing

I start by looking at the results for the County Executive race. I filter the rows based on Office.Name and select only the columns of interest.

temp1_df <- hoco_g14_df %>%
    filter(Office.Name == "County Executive") %>%
    select(Election.District, Election.Precinct, Party, Election.Night.Votes)
head(temp1_df)
##   Election.District Election.Precinct Party Election.Night.Votes
## 1                 1                 1   DEM                  445
## 2                 1                 1   REP                  512
## 3                 1                 1   BOT                    0
## 4                 1                 2   DEM                   98
## 5                 1                 2   REP                  153
## 6                 1                 2   BOT                    0

The rows where the Party variable has the value ‘BOT’ correspond to votes for write-in candidates. I do a quick summarization to show that the number of write-in votes is pretty small:

temp1_df %>% group_by(Party) %>% summarize(Votes = sum(Election.Night.Votes))
## Source: local data frame [3 x 2]
## 
##   Party Votes
## 1   BOT    81
## 2   DEM 37026
## 3   REP 41951

To simplify the analysis I filter out the write-in votes. I also combine the Election.District and Election.Precinct variables into a single variable Precinct having the form ‘0-00’, and then discard the original variables:

temp2_df <- temp1_df %>%
    filter(Party != "BOT") %>%
    mutate(Precinct = paste(as.character(Election.District),
                            "-",
                            formatC(Election.Precinct, width = 2, flag = 0),
                            sep = "")) %>%
    select(-Election.District, -Election.Precinct)
head(temp2_df)
##   Party Election.Night.Votes Precinct
## 1   DEM                  445     1-01
## 2   REP                  512     1-01
## 3   DEM                   98     1-02
## 4   REP                  153     1-02
## 5   DEM                  377     1-03
## 6   REP                  362     1-03

The problem now is that I want to compute margins of victory, and to do that most easily I need to have both parties’ vote totals on the same row. Enter the spread() function from the tidyr package. It takes the values of the Party variables in different rows and converts them into multiple column variables named after the parties themselves. The values for the new variables are taken from the Election.Night.Votes variables in the original rows for the parties.

temp3_df <- temp2_df %>%
    spread(Party, Election.Night.Votes)
head(temp3_df)
##   Precinct DEM REP
## 1     1-01 445 512
## 2     1-02  98 153
## 3     1-03 377 362
## 4     1-04 564 772
## 5     1-05 121 170
## 6     1-06 304 326

The final calculation is pretty simple: I just compute a Rep.Margin variable containing the absolute Republican margin of victory and a Pct.Rep.Margin variable containing the Republican margin of victory in percentage terms.

ak_margins_df <- temp3_df %>%
    mutate(Rep.Margin = REP - DEM,
           Pct.Rep.Margin = round(100 * (REP - DEM) / (REP + DEM), 1))
head(ak_margins_df)
##   Precinct DEM REP Rep.Margin Pct.Rep.Margin
## 1     1-01 445 512         67            7.0
## 2     1-02  98 153         55           21.9
## 3     1-03 377 362        -15           -2.0
## 4     1-04 564 772        208           15.6
## 5     1-05 121 170         49           16.8
## 6     1-06 304 326         22            3.5

A histogram of precincts by victory margin

Now I want to see how precincts varied in terms of the Republican margin of victory. The easiest way to do this is using a histogram; I first look at the percentage margins of victory:

g <- ggplot(ak_margins_df, aes(x = Pct.Rep.Margin))
g <- g + geom_histogram()
print(g)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

The x axis corresponds to Allan Kittleman’s different margins of victory in various precincts, and the height of the bars corresponds to the number of precincts for which Kittleman’s margins of victory were in a given range.

To help better see this I change the width of the bars so they cover ranges of 10% (0-9%, 10-19%, and so on).

g <- ggplot(ak_margins_df, aes(x = Pct.Rep.Margin))
g <- g + geom_histogram(binwidth = 10)
g <- g + xlab("Margin of Victory (Percentage)")
g <- g + ylab("Number of Precincts")
g <- g + ggtitle("Allan Kittleman Precinct-Level Margins of Victory (%)")
print(g)

A couple of interesting points about this histogram: First, there were no precincts in which Courtney Watson ran 50 or more percentage points ahead of Allan Kittleman, and only 5 precincts in which her margin of victory was greater than 40%. On the other hand Kittleman had what looks like 12 precincts in which he ran more than 40% ahead of Watson.

I confirm this by sorting and filtering the ak_margins_df data frame to show where Kittleman won by more than 40%:

ak_margins_df %>%
    arrange(desc(Pct.Rep.Margin)) %>%
    filter(Pct.Rep.Margin >= 40.0)
##    Precinct DEM REP Rep.Margin Pct.Rep.Margin
## 1      4-05 149 718        569           65.6
## 2      4-03 192 769        577           60.0
## 3      4-01 146 558        412           58.5
## 4      4-04 221 843        622           58.5
## 5      3-06 198 688        490           55.3
## 6      4-02 215 737        522           54.8
## 7      3-02 281 936        655           53.8
## 8      4-06 249 800        551           52.5
## 9      5-19 292 910        618           51.4
## 10     5-20 242 596        354           42.2
## 11     3-01 219 537        318           42.1
## 12     5-11 206 487        281           40.5

and where he lost by more than 40%:

ak_margins_df %>%
    arrange(Pct.Rep.Margin) %>%
    filter(Pct.Rep.Margin <= -40.0)
##   Precinct DEM REP Rep.Margin Pct.Rep.Margin
## 1     6-21 297 107       -190          -47.0
## 2     6-09 511 195       -316          -44.8
## 3     6-19 503 192       -311          -44.7
## 4     6-17 527 216       -311          -41.9
## 5     6-34 390 166       -224          -40.3

The other interesting thing in the histogram has to do with the precincts in which the results were relatively close (0-10% margin either way). It looks as if Allan Kittleman carried about twice as many of these swing precincts as Courtney Watson.

For completeness I also show summary statistics for the percentage margins of victory:

summary(ak_margins_df$Pct.Rep.Margin)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -47.00  -19.98    4.15    3.22   21.42   65.60

The maximum and minimum margins of victory are as listed above. The median value of 4.15 means that on election day Allan Kittleman carried half of all precincts by a 4.15% or greater margin.

Next I look at victory margins in terms of absolute numbers of votes as opposed to percentages.

g <- ggplot(ak_margins_df, aes(x = Rep.Margin))
g <- g + geom_histogram(binwidth = 100)
g <- g + xlab("Margin of Victory (Votes)")
g <- g + ylab("Number of Precincts")
g <- g + ggtitle("Allan Kittleman Precinct-Level Margins of Victory (Votes)")
print(g)

As with the other histogram this histogram is skewed to the right (both literally and figuratively). There were no precincts in which Courtney Watson’s election-day margin was more than 400 votes, but several where Allan Kittleman’s election day margin of victory was that large or larger.

Conclusion

There are 118 precincts in Howard County. In voting on election day Allan Kittleman won truly lopsided victories (40% or more winning margin) in about 10% of them, more than twice as many as Courtney Watson. This is consistent with a greater number of precincts that are overwhelmingly Republican (vs. Democratic), higher Republican turnout (vs. Democratic turnout), or a decided move to Kittleman on the part of swing voters. The evidence presented thus far isn’t sufficient to distinguish among these possibilities.

In my next document I’ll present the same data as above, but in the form of a precinct map.

Appendix

I used the following R environment in doing the analysis for this example:

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_1.0.0  tidyr_0.1      dplyr_0.4.0    RCurl_1.95-4.3
## [5] bitops_1.0-6  
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   colorspace_1.2-4 DBI_0.3.1        digest_0.6.4    
##  [5] evaluate_0.5.5   formatR_1.0      grid_3.1.2       gtable_0.1.2    
##  [9] htmltools_0.2.6  knitr_1.7        labeling_0.3     lazyeval_0.1.10 
## [13] magrittr_1.0.1   MASS_7.3-35      munsell_0.4.2    parallel_3.1.2  
## [17] plyr_1.8.1       proto_0.3-10     Rcpp_0.11.3      reshape2_1.4    
## [21] rmarkdown_0.5.1  scales_0.2.4     stringr_0.6.2    tools_3.1.2     
## [25] yaml_2.1.13

You can find the source code for this analysis and others at my HoCoData repository on GitHub. This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you‘d like with it.