Data 606 Final Project Proposal - Election Data 2017-2018

Heather Geiger - April 8, 2018

Introduction

For my final project, I will analyze election data from after the presidential inauguration in 2017 (January 20, 2017) to March 2018 (Mar. 27, 2018).

This data is from FiveThirtyEight.

See here:

https://github.com/fivethirtyeight/data/tree/master/special-elections https://fivethirtyeight.com/features/be-skeptical-of-anyone-who-tells-you-they-know-how-democrats-can-win-in-november/

From FiveThirtyEight: This data includes “both state and federal special elections as well as regularly scheduled 2017 elections in New Jersey and Virginia (except for New Jersey General Assembly).”

For each geography (district or state) the file contains:

  1. Median household income.
  2. The percentage of its residents over age 25 who possess a bachelor’s degree or higher.
  3. The difference between Hillary Clinton’s 2016 margin and Barack Obama’s 2012 margin.
  4. The improvement in the Democratic candidate’s margin in the 2017 or 2018 special election over the district’s normal partisan lean. (FiveThirtyEight defines “partisan lean” as the average difference between how a constituency voted in the past two presidential elections and how the country voted overall, with 2016 results weighted 75 percent and 2012 results weighted 25 percent.)

Loading data and libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(stringr)
elections <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/special-elections/special-elections.csv",header=TRUE,stringsAsFactors=FALSE,check.names=FALSE)
dim(elections)
## [1] 200   7
head(elections)
##      Date         State          Race Median Household Income
## 1 3/27/18       Alabama         HD-21                $65,548 
## 2 3/13/18  Pennsylvania       18th CD                $62,283 
## 3 3/13/18     Tennessee         SD-14                $48,252 
## 4  3/6/18      Oklahoma         HD-51                $57,202 
## 5 2/27/18      Kentucky         HD-89                $37,858 
## 6 2/27/18 New Hampshire Belknap HD-03                $48,893 
##   % Bachelor's Degree or Higher Clinton Margin Improvement Over Obama
## 1                        42.10%                                    6%
## 2                        35.10%                                   -3%
## 3                        22.10%                                   -8%
## 4                        18.10%                                   -8%
## 5                        16.10%                                   -5%
## 6                        25.10%                                  -13%
##   2017-2018 Dem Margin Improvement Over Partisan Lean
## 1                                                 24%
## 2                                                 22%
## 3                                                  1%
## 4                                                 21%
## 5                                                 29%
## 6                                                 19%

Cleaning data - basic cleaning and formatting

Let’s change column names to something more R-friendly, convert Date to date format, etc.

colnames(elections)[4:ncol(elections)] <- c("Median.income",
        "Percent.bachelors",
        "Clinton.vs.Obama.margin",
        "Dem.margin.improvement")

elections$Date <- as.Date(elections$Date,format="%m/%d/%Y")
year(elections$Date) <- year(elections$Date) + 2000
head(elections$Date[order(elections$Date)])
## [1] "2017-01-31" "2017-02-14" "2017-02-25" "2017-02-28" "2017-02-28"
## [6] "2017-02-28"
elections$Median.income <- str_replace_all(elections$Median.income,pattern='\\$|,',replace='')
elections$Median.income <- as.numeric(trimws(elections$Median.income))
## Warning: NAs introduced by coercion
head(elections$Median.income[order(elections$Median.income)])
## [1] 20915 26265 30791 35910 36116 36200
elections$Percent.bachelors <- str_replace_all(elections$Percent.bachelors,pattern='\\%',replace='')
elections$Percent.bachelors <- as.numeric(elections$Percent.bachelors)
## Warning: NAs introduced by coercion
head(elections$Percent.bachelors[order(elections$Percent.bachelors)])
## [1]  9.2 11.0 11.9 12.9 13.0 13.2
#Got an error message about NAs for median income and percent Bachelor's. Let's check this out.

elections[is.na(elections$Median.income) == TRUE | is.na(elections$Percent.bachelors) == TRUE,]
##           Date         State               Race Median.income
## 168 2017-09-12 New Hampshire      Belknap HD-09            NA
## 188 2017-05-23 New Hampshire Hillsborough HD-44            NA
##     Percent.bachelors Clinton.vs.Obama.margin Dem.margin.improvement
## 168                NA                    -17%                    26%
## 188                NA                     -6%                    -1%
#Good to know - for two elections, we don't have info about income or education in those districts.

elections$Clinton.vs.Obama.margin <- str_replace_all(elections$Clinton.vs.Obama.margin,pattern='\\%',replace='')
elections$Clinton.vs.Obama.margin <- as.numeric(elections$Clinton.vs.Obama.margin)
head(elections$Clinton.vs.Obama.margin[order(elections$Clinton.vs.Obama.margin)])
## [1] -33 -25 -24 -23 -22 -21
tail(elections$Clinton.vs.Obama.margin[order(elections$Clinton.vs.Obama.margin)])
## [1] 21 21 22 22 23 35
elections$Dem.margin.improvement <- str_replace_all(elections$Dem.margin.improvement,pattern='\\%',replace='')
elections$Dem.margin.improvement <- as.numeric(elections$Dem.margin.improvement)
head(elections$Dem.margin.improvement[order(elections$Dem.margin.improvement)])
## [1] -37 -30 -24 -23 -23 -20
tail(elections$Dem.margin.improvement[order(elections$Dem.margin.improvement)])
## [1] 42 46 47 48 61 84

Cleaning data - categorizing by race type

Let’s also take a closer look at the “Race” column, for New Jersey and Virginia, and then for the remaining states.

new_jersey <- elections[elections$State == "New Jersey",]
new_jersey_minus_governor <- new_jersey[new_jersey$Race != "Governor",]
new_jersey_minus_governor$Race <- str_replace_all(new_jersey_minus_governor$Race,pattern='SD\\-',replace='')
new_jersey_minus_governor$Race <- as.numeric(new_jersey_minus_governor$Race)
new_jersey_minus_governor$Race[order(new_jersey_minus_governor$Race)]
##  [1]  1  2  3  5  6  7  8  9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25
## [24] 26 27 29 30 31 32 33 34 35 36 37 38 39 40
setdiff(1:40,new_jersey_minus_governor$Race[order(new_jersey_minus_governor$Race)])
## [1]  4 19 28

New Jersey races include the governor’s race, as well as race results from a bunch of districts. The FiveThirtyEight descriptions says this data does NOT include General Assembly results. Based on the description here (https://ballotpedia.org/New_Jersey_State_Senate_elections,_2017), going to assume these are State Senate results.

Googling we also find that the three missing districts are ones where there was only one major party candidate (https://ballotpedia.org/New_Jersey_State_Senate_District_4, https://ballotpedia.org/New_Jersey_State_Senate_District_19, https://ballotpedia.org/New_Jersey_State_Senate_District_28).

What about Virginia?

virginia <- elections[elections$State == "Virginia",]
virginia_minus_governor_etc <- virginia[virginia$Race != "Governor" & virginia$Race != "Attorney General" & virginia$Race != "Lt. Governor",]
virginia_minus_governor_etc$Race <- str_replace_all(virginia_minus_governor_etc$Race,pattern='HD\\-',replace='')
virginia_minus_governor_etc$Race <- as.numeric(virginia_minus_governor_etc$Race)
virginia_minus_governor_etc$Race[order(virginia_minus_governor_etc$Race)]
##  [1]   1   2   3   7   8   9  10  12  13  17  18  20  21  23  25  26  27
## [18]  28  29  30  31  32  33  34  38  40  42  49  50  51  54  55  56  58
## [35]  59  60  62  64  65  66  67  68  72  73  81  82  83  84  85  86  87
## [52]  88  91  93  94  96  97  98  99 100
setdiff(1:100,virginia_minus_governor_etc$Race)
##  [1]  4  5  6 11 14 15 16 19 22 24 35 36 37 39 41 43 44 45 46 47 48 52 53
## [24] 57 61 63 69 70 71 74 75 76 77 78 79 80 89 90 92 95

Similar to New Jersey, we check a few districts not included here and found they were ones where there was only one major party candidate (https://ballotpedia.org/Virginia_House_of_Delegates_District_4, https://ballotpedia.org/Virginia_House_of_Delegates_District_95).

Now, let’s look more at the other states.

elections_minus_VA_and_NJ <- elections[elections$State != "New Jersey" & elections$State != "Virginia",]

races_minus_VA_and_NJ <- elections_minus_VA_and_NJ$Race
grep('^SD|^HD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE)
##  [1] "HD-21"             "SD-14"             "HD-51"            
##  [4] "HD-89"             "HD-120"            "HD-49"            
##  [7] "HD-86"             "SD-27"             "HD-72"            
## [10] "HD-175"            "SD-54"             "HD-23B"           
## [13] "HD-144"            "HD-97"             "HD-129"           
## [16] "HD-39"             "HD-35"             "SD-10"            
## [19] "HD-06"             "HD-99"             "HD-111"           
## [22] "SD-17"             "SD-17"             "HD-58"            
## [25] "SD-03"             "HD-133"            "SD-37"            
## [28] "SD-45"             "HD-76"             "HD-119"           
## [31] "HD-151"            "HD-113"            "HD-109"           
## [34] "SD-08"             "HD-117"            "HD-26"            
## [37] "HD-56"             "SD-26"             "SD-39"            
## [40] "HD-07, Position 1" "HD-23"             "SD-07"            
## [43] "HD-01"             "SD-31"             "HD-31, Position 2"
## [46] "SD-06"             "SD-45"             "HD-04"            
## [49] "HD-31"             "SD-40"             "HD-116"           
## [52] "HD-46"             "SD-13"             "HD-82"            
## [55] "HD-50"             "SD-28"             "SD-16"            
## [58] "SD-44"             "HD-75"             "HD-70"            
## [61] "HD-48"             "HD-95"             "HD-84"            
## [64] "SD-30"             "SD-32"             "HD-28"            
## [67] "SD-02"             "HD-68"             "SD-32"            
## [70] "HD-115"            "SD-02"             "SD-10"            
## [73] "HD-32B"            "HD-89"
grep('^SD|^HD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE,invert=TRUE)
##  [1] "18th CD"                  "Belknap HD-03"           
##  [3] "AD-58"                    "U.S. Senate"             
##  [5] "Worcester & Middlesex SD" "Treasurer"               
##  [7] "1st Berkshire HD"         "Sullivan HD-01"          
##  [9] "Hillsborough HD-15"       "3rd CD"                  
## [11] "3rd Essex HD"             "Strafford HD-13"         
## [13] "Bristol & Norfolk SD"     "Rockingham HD-04"        
## [15] "Belknap HD-09"            "Grafton HD-09"           
## [17] "Merrimack HD-18"          "5th CD"                  
## [19] "6th CD"                   "At-Large CD"             
## [21] "AD-09"                    "Carroll HD-06"           
## [23] "Hillsborough HD-44"       "4th CD"                  
## [25] "34th CD"
grep('^SD|^HD|CD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE,invert=TRUE)
##  [1] "Belknap HD-03"            "AD-58"                   
##  [3] "U.S. Senate"              "Worcester & Middlesex SD"
##  [5] "Treasurer"                "1st Berkshire HD"        
##  [7] "Sullivan HD-01"           "Hillsborough HD-15"      
##  [9] "3rd Essex HD"             "Strafford HD-13"         
## [11] "Bristol & Norfolk SD"     "Rockingham HD-04"        
## [13] "Belknap HD-09"            "Grafton HD-09"           
## [15] "Merrimack HD-18"          "AD-09"                   
## [17] "Carroll HD-06"            "Hillsborough HD-44"
grep('SD|HD|CD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE,invert=TRUE)
## [1] "AD-58"       "U.S. Senate" "Treasurer"   "AD-09"
elections_minus_VA_and_NJ[grep('SD|HD|CD',races_minus_VA_and_NJ,perl=TRUE,invert=TRUE),]
##           Date     State        Race Median.income Percent.bachelors
## 21  2018-01-16 Wisconsin       AD-58         63758              29.0
## 28  2017-12-12   Alabama U.S. Senate         44758              24.0
## 32  2017-11-18 Louisiana   Treasurer         45652              23.0
## 185 2017-05-23  New York       AD-09        103602              36.9
##     Clinton.vs.Obama.margin Dem.margin.improvement
## 21                       -2                     27
## 28                       -6                     31
## 32                       -2                     10
## 185                     -11                     38

Going to assume “HD” means (State) House District, “SD” means (State) Senate District, and “CD” means Congressional District.

Also looked at a few and designation still holds if name has “HD” or “SD” in name even if does not start with this string.

Most special elections fall into these categories, except one Louisiana special election for Treasurer and an Alabama special election for Senate.

Going to assume that “AD” in Wisconsin and New York is equivalent to “HD” in most other states.

race_short_names <- elections$Race
race_short_names[elections$State == "New Jersey"] <- plyr::mapvalues(elections$Race[elections$State == "New Jersey"],
    from=c("Governor",grep('SD',elections$Race[elections$State == "New Jersey"],value=TRUE)),
    to=c("Governor",rep("State Senate standard election",times=length(grep('SD',elections$Race[elections$State == "New Jersey"],value=TRUE)))))
race_short_names[elections$State == "Virginia"] <- plyr::mapvalues(elections$Race[elections$State == "Virginia"],
    from=c("Governor","Attorney General","Lt. Governor",grep('HD',elections$Race[elections$State == "Virginia"],value=TRUE)),
    to=c("Governor","Attorney General","Lt. Governor",rep("State House standard election",length(grep('HD',elections$Race[elections$State == "Virginia"],value=TRUE)))))
race_short_names[elections$State != "Virginia" & elections$State != "New Jersey"] <- plyr::mapvalues(elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],
    from=c(unique(grep('SD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE)),
    unique(grep('HD|AD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE,perl=TRUE)),
    unique(grep('CD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE))),
    to=rep(c("State Senate special election","State House special election","National House special election"),
    times=c(length(unique(grep('SD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE))),
    length(unique(grep('HD|AD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE,perl=TRUE))),
    length(unique(grep('CD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE))))))

elections[race_short_names == "State House special election",c("Date","State","Race")]
##           Date          State               Race
## 1   2018-03-27        Alabama              HD-21
## 4   2018-03-06       Oklahoma              HD-51
## 5   2018-02-27       Kentucky              HD-89
## 6   2018-02-27  New Hampshire      Belknap HD-03
## 7   2018-02-27    Connecticut             HD-120
## 8   2018-02-20       Kentucky              HD-49
## 9   2018-02-17      Louisiana              HD-86
## 11  2018-02-13        Florida              HD-72
## 12  2018-02-13        Georgia             HD-175
## 14  2018-02-12      Minnesota             HD-23B
## 15  2018-02-06       Missouri             HD-144
## 16  2018-02-06       Missouri              HD-97
## 17  2018-02-06       Missouri             HD-129
## 18  2018-02-06       Missouri              HD-39
## 19  2018-01-23   Pennsylvania              HD-35
## 21  2018-01-16      Wisconsin              AD-58
## 22  2018-01-16           Iowa              HD-06
## 23  2018-01-16 South Carolina              HD-99
## 24  2018-01-09        Georgia             HD-111
## 27  2017-12-19        Florida              HD-58
## 30  2017-12-05   Pennsylvania             HD-133
## 35  2017-11-14       Oklahoma              HD-76
## 37  2017-11-07        Georgia             HD-119
## 38  2017-11-07       Missouri             HD-151
## 39  2017-11-07 South Carolina             HD-113
## 43  2017-11-07       Michigan             HD-109
## 47  2017-11-07  Massachusetts   1st Berkshire HD
## 48  2017-11-07        Georgia             HD-117
## 49  2017-11-07  New Hampshire     Sullivan HD-01
## 55  2017-11-07  New Hampshire Hillsborough HD-15
## 70  2017-11-07        Georgia              HD-26
## 97  2017-11-07          Maine              HD-56
## 126 2017-11-07     Washington  HD-07, Position 1
## 130 2017-11-07       Missouri              HD-23
## 139 2017-11-07       Michigan              HD-01
## 151 2017-11-07     Washington  HD-31, Position 2
## 153 2017-11-07  Massachusetts       3rd Essex HD
## 160 2017-11-07        Georgia              HD-04
## 161 2017-10-24  New Hampshire    Strafford HD-13
## 163 2017-09-26 South Carolina              HD-31
## 164 2017-09-26  New Hampshire   Rockingham HD-04
## 166 2017-09-26        Florida             HD-116
## 167 2017-09-12       Oklahoma              HD-46
## 168 2017-09-12  New Hampshire      Belknap HD-09
## 169 2017-09-05  New Hampshire      Grafton HD-09
## 171 2017-08-08           Iowa              HD-82
## 172 2017-08-08       Missouri              HD-50
## 175 2017-07-18  New Hampshire    Merrimack HD-18
## 177 2017-07-11       Oklahoma              HD-75
## 179 2017-06-20 South Carolina              HD-70
## 180 2017-06-20 South Carolina              HD-48
## 182 2017-06-15      Tennessee              HD-95
## 183 2017-05-30 South Carolina              HD-84
## 185 2017-05-23       New York              AD-09
## 186 2017-05-23  New Hampshire      Carroll HD-06
## 188 2017-05-23  New Hampshire Hillsborough HD-44
## 190 2017-05-09       Oklahoma              HD-28
## 192 2017-04-25    Connecticut              HD-68
## 196 2017-02-28    Connecticut             HD-115
## 199 2017-02-14      Minnesota             HD-32B
## 200 2017-01-31           Iowa              HD-89
elections[race_short_names == "State Senate special election",c("Date","State","Race")]
##           Date         State                     Race
## 3   2018-03-13     Tennessee                    SD-14
## 10  2018-02-13      Oklahoma                    SD-27
## 13  2018-02-12     Minnesota                    SD-54
## 20  2018-01-16     Wisconsin                    SD-10
## 25  2018-01-09       Georgia                    SD-17
## 26  2017-12-19     Tennessee                    SD-17
## 29  2017-12-12          Iowa                    SD-03
## 31  2017-12-05 Massachusetts Worcester & Middlesex SD
## 33  2017-11-14      Oklahoma                    SD-37
## 34  2017-11-14      Oklahoma                    SD-45
## 46  2017-11-07      Missouri                    SD-08
## 106 2017-11-07      New York                    SD-26
## 107 2017-11-07       Georgia                    SD-39
## 135 2017-11-07    Washington                    SD-07
## 148 2017-11-07    Washington                    SD-31
## 154 2017-11-07       Georgia                    SD-06
## 159 2017-11-07    Washington                    SD-45
## 162 2017-10-17 Massachusetts     Bristol & Norfolk SD
## 165 2017-09-26       Florida                    SD-40
## 170 2017-08-22  Rhode Island                    SD-13
## 173 2017-08-08      Missouri                    SD-28
## 174 2017-07-25 New Hampshire                    SD-16
## 176 2017-07-11      Oklahoma                    SD-44
## 187 2017-05-23      New York                    SD-30
## 189 2017-05-16       Georgia                    SD-32
## 191 2017-04-29     Louisiana                    SD-02
## 195 2017-02-28   Connecticut                    SD-32
## 197 2017-02-28   Connecticut                    SD-02
## 198 2017-02-25      Delaware                    SD-10
elections[race_short_names == "State House standard election",c("Date","State","Race")]
##           Date    State   Race
## 57  2017-11-07 Virginia  HD-20
## 59  2017-11-07 Virginia  HD-01
## 60  2017-11-07 Virginia  HD-02
## 61  2017-11-07 Virginia  HD-12
## 63  2017-11-07 Virginia  HD-34
## 64  2017-11-07 Virginia  HD-86
## 65  2017-11-07 Virginia  HD-25
## 67  2017-11-07 Virginia  HD-03
## 71  2017-11-07 Virginia  HD-33
## 73  2017-11-07 Virginia  HD-38
## 74  2017-11-07 Virginia  HD-72
## 75  2017-11-07 Virginia  HD-27
## 76  2017-11-07 Virginia  HD-42
## 77  2017-11-07 Virginia  HD-17
## 79  2017-11-07 Virginia  HD-93
## 80  2017-11-07 Virginia  HD-85
## 81  2017-11-07 Virginia  HD-62
## 84  2017-11-07 Virginia  HD-49
## 87  2017-11-07 Virginia  HD-81
## 88  2017-11-07 Virginia  HD-87
## 90  2017-11-07 Virginia  HD-28
## 91  2017-11-07 Virginia  HD-21
## 92  2017-11-07 Virginia  HD-56
## 94  2017-11-07 Virginia  HD-31
## 98  2017-11-07 Virginia  HD-55
## 99  2017-11-07 Virginia  HD-10
## 100 2017-11-07 Virginia  HD-32
## 101 2017-11-07 Virginia  HD-84
## 102 2017-11-07 Virginia  HD-26
## 103 2017-11-07 Virginia  HD-08
## 104 2017-11-07 Virginia  HD-09
## 112 2017-11-07 Virginia  HD-73
## 113 2017-11-07 Virginia  HD-51
## 114 2017-11-07 Virginia  HD-96
## 115 2017-11-07 Virginia  HD-65
## 116 2017-11-07 Virginia  HD-82
## 117 2017-11-07 Virginia  HD-91
## 119 2017-11-07 Virginia  HD-23
## 120 2017-11-07 Virginia  HD-59
## 121 2017-11-07 Virginia  HD-88
## 123 2017-11-07 Virginia  HD-29
## 124 2017-11-07 Virginia  HD-30
## 125 2017-11-07 Virginia  HD-98
## 128 2017-11-07 Virginia  HD-18
## 129 2017-11-07 Virginia  HD-66
## 131 2017-11-07 Virginia  HD-64
## 132 2017-11-07 Virginia  HD-50
## 134 2017-11-07 Virginia  HD-68
## 136 2017-11-07 Virginia  HD-07
## 138 2017-11-07 Virginia  HD-97
## 140 2017-11-07 Virginia  HD-94
## 141 2017-11-07 Virginia  HD-13
## 142 2017-11-07 Virginia  HD-54
## 144 2017-11-07 Virginia  HD-67
## 145 2017-11-07 Virginia  HD-83
## 146 2017-11-07 Virginia  HD-58
## 147 2017-11-07 Virginia  HD-40
## 149 2017-11-07 Virginia  HD-99
## 152 2017-11-07 Virginia HD-100
## 155 2017-11-07 Virginia  HD-60
elections[race_short_names == "State Senate standard election",c("Date","State","Race")]
##           Date      State  Race
## 36  2017-11-07 New Jersey SD-01
## 40  2017-11-07 New Jersey SD-33
## 41  2017-11-07 New Jersey SD-32
## 42  2017-11-07 New Jersey SD-03
## 44  2017-11-07 New Jersey SD-27
## 45  2017-11-07 New Jersey SD-36
## 50  2017-11-07 New Jersey SD-18
## 51  2017-11-07 New Jersey SD-31
## 52  2017-11-07 New Jersey SD-37
## 53  2017-11-07 New Jersey SD-20
## 54  2017-11-07 New Jersey SD-30
## 56  2017-11-07 New Jersey SD-34
## 58  2017-11-07 New Jersey SD-06
## 62  2017-11-07 New Jersey SD-29
## 66  2017-11-07 New Jersey SD-38
## 68  2017-11-07 New Jersey SD-07
## 69  2017-11-07 New Jersey SD-24
## 78  2017-11-07 New Jersey SD-13
## 82  2017-11-07 New Jersey SD-10
## 85  2017-11-07 New Jersey SD-05
## 86  2017-11-07 New Jersey SD-22
## 93  2017-11-07 New Jersey SD-35
## 95  2017-11-07 New Jersey SD-14
## 105 2017-11-07 New Jersey SD-11
## 109 2017-11-07 New Jersey SD-12
## 110 2017-11-07 New Jersey SD-15
## 111 2017-11-07 New Jersey SD-39
## 118 2017-11-07 New Jersey SD-17
## 122 2017-11-07 New Jersey SD-25
## 127 2017-11-07 New Jersey SD-26
## 133 2017-11-07 New Jersey SD-40
## 137 2017-11-07 New Jersey SD-09
## 143 2017-11-07 New Jersey SD-23
## 150 2017-11-07 New Jersey SD-08
## 156 2017-11-07 New Jersey SD-16
## 157 2017-11-07 New Jersey SD-21
## 158 2017-11-07 New Jersey SD-02
elections[race_short_names == "National House special election",c("Date","State","Race")]
##           Date          State        Race
## 2   2018-03-13   Pennsylvania     18th CD
## 96  2017-11-07           Utah      3rd CD
## 178 2017-06-20 South Carolina      5th CD
## 181 2017-06-20        Georgia      6th CD
## 184 2017-05-25        Montana At-Large CD
## 193 2017-04-11         Kansas      4th CD
## 194 2017-04-04     California     34th CD
elections[!(race_short_names %in% c("State House special election","State Senate special election","State House standard election","State Senate standard election","National House special election")),]
##           Date      State             Race Median.income Percent.bachelors
## 28  2017-12-12    Alabama      U.S. Senate         44758              24.0
## 32  2017-11-18  Louisiana        Treasurer         45652              23.0
## 72  2017-11-07   Virginia         Governor         66149              36.9
## 83  2017-11-07   Virginia Attorney General         66149              36.9
## 89  2017-11-07   Virginia     Lt. Governor         66149              36.9
## 108 2017-11-07 New Jersey         Governor         73702              37.5
##     Clinton.vs.Obama.margin Dem.margin.improvement
## 28                       -6                     31
## 32                       -2                     10
## 72                        1                      6
## 83                        1                      4
## 89                        1                      3
## 108                      -4                      2

Looks good! Except, let’s actually make a few different columns to describe the elections, including state vs. national, special vs. standard, and then position.

state_vs_national <- rep("State",times=nrow(elections))
state_vs_national[race_short_names == "National House special election" | race_short_names == "U.S. Senate"] <- "National"

special_vs_standard <- rep("Special",times=nrow(elections))
special_vs_standard[elections$State == "New Jersey" | elections$State == "Virginia"] <- "Standard"

positions <- rep("House",times=nrow(elections))
positions[grep('Senate',race_short_names)] <- "Senate"
positions[grep('House|Senate',race_short_names,invert=TRUE,perl=TRUE)] <- race_short_names[grep('House|Senate',race_short_names,invert=TRUE,perl=TRUE)]

Add all this to “elections” and save.

elections <- data.frame(Race.description = race_short_names,
    State.vs.national = state_vs_national,
    Special.vs.standard = special_vs_standard,
    Position = positions,
    elections,
    stringsAsFactors=FALSE)

Research questions

Following the analysis plan in this article (https://fivethirtyeight.com/features/be-skeptical-of-anyone-who-tells-you-they-know-how-democrats-can-win-in-november/), I will examine the correlation between presidentical election performance in 2016 as compared to 2012 (Clinton.vs.Obama.margin) and Democratic performance in 2017/2018 in those districts in either standard or special elections.

I will try analyzing standard and special elections both separately and together. Putting them together gives a nice large sample size, but special elections may also have characteristics different from standard elections, so would also be good to try separating these.

I will also check findings against possible confounding influences of education and income on the results.

Cases

table(race_short_names)
## race_short_names
##                Attorney General                        Governor 
##                               1                               2 
##                    Lt. Governor National House special election 
##                               1                               7 
##    State House special election   State House standard election 
##                              61                              60 
##   State Senate special election  State Senate standard election 
##                              29                              37 
##                       Treasurer                     U.S. Senate 
##                               1                               1

If we roll together state house and senate elections, there are 97 standard state legislative elections to look at, and 90 special state legislative elections.

There are also 8 national legislative special elections, 1 state-level special election for treasurer, and 4 other standard elections (for AG/Governor/Lt. Governor).

So we have a total of 99 special elections and 101 standard elections to look at.

Data collection and type of study

The data is based on observational data, collected based on people’s actual voting behavior rather than survey data.

Response and explanatory variables

Explanatory = Presidential election performance in 2016 as compared to 2012 (Clinton.vs.Obama.margin). Also possibly income and education (Median.income and Percent.bachelors).

Response = Democratic performance in 2017/2018 election relative to expectation (Dem.margin.improvement)

These are all numeric variables. However, after looking at summary statistics could also convert to categorical if that makes sense based on distributions.

Relevant summary statistics

Let’s make histograms of all variables of interest.

For now do not separate standard and special elections.

elections_long_for_hist <- gather(elections[,c("Median.income","Percent.bachelors","Clinton.vs.Obama.margin","Dem.margin.improvement")])

ggplot(elections_long_for_hist,
aes(value)) + geom_histogram(fill="lightgrey",col="black",bins=9) + facet_wrap(~key,scales="free") +
xlab("") +
ylab("Number of election districts or states")
## Warning: Removed 4 rows containing non-finite values (stat_bin).

Also run summary on each variable.

aggregate(value ~ key,FUN=summary,data=elections_long_for_hist)
##                       key   value.Min. value.1st Qu. value.Median
## 1 Clinton.vs.Obama.margin    -33.00000      -8.00000     -2.00000
## 2  Dem.margin.improvement    -37.00000       0.00000      5.00000
## 3           Median.income  20915.00000   50174.50000  63483.00000
## 4       Percent.bachelors      9.20000      23.85000     31.65000
##     value.Mean value.3rd Qu.   value.Max.
## 1     -1.83500       4.00000     35.00000
## 2      8.29000      15.25000     84.00000
## 3  67541.53535   79099.25000 177551.00000
## 4     33.86212      41.70000     74.90000

We find that the Democrats’ margin of improvement in 2017/2018 is very right-skewed. However, we should still be able to make inferences considering the large sample size.

Median income and percent with a bachelor’s degree are also a bit right-skewed, but again this should be fine with such a large sample size.