For my final project, I will analyze election data from after the presidential inauguration in 2017 (January 20, 2017) to March 2018 (Mar. 27, 2018).
This data is from FiveThirtyEight.
See here:
https://github.com/fivethirtyeight/data/tree/master/special-elections https://fivethirtyeight.com/features/be-skeptical-of-anyone-who-tells-you-they-know-how-democrats-can-win-in-november/
From FiveThirtyEight: This data includes “both state and federal special elections as well as regularly scheduled 2017 elections in New Jersey and Virginia (except for New Jersey General Assembly).”
For each geography (district or state) the file contains:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(stringr)
elections <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/special-elections/special-elections.csv",header=TRUE,stringsAsFactors=FALSE,check.names=FALSE)
dim(elections)
## [1] 200 7
head(elections)
## Date State Race Median Household Income
## 1 3/27/18 Alabama HD-21 $65,548
## 2 3/13/18 Pennsylvania 18th CD $62,283
## 3 3/13/18 Tennessee SD-14 $48,252
## 4 3/6/18 Oklahoma HD-51 $57,202
## 5 2/27/18 Kentucky HD-89 $37,858
## 6 2/27/18 New Hampshire Belknap HD-03 $48,893
## % Bachelor's Degree or Higher Clinton Margin Improvement Over Obama
## 1 42.10% 6%
## 2 35.10% -3%
## 3 22.10% -8%
## 4 18.10% -8%
## 5 16.10% -5%
## 6 25.10% -13%
## 2017-2018 Dem Margin Improvement Over Partisan Lean
## 1 24%
## 2 22%
## 3 1%
## 4 21%
## 5 29%
## 6 19%
Let’s change column names to something more R-friendly, convert Date to date format, etc.
colnames(elections)[4:ncol(elections)] <- c("Median.income",
"Percent.bachelors",
"Clinton.vs.Obama.margin",
"Dem.margin.improvement")
elections$Date <- as.Date(elections$Date,format="%m/%d/%Y")
year(elections$Date) <- year(elections$Date) + 2000
head(elections$Date[order(elections$Date)])
## [1] "2017-01-31" "2017-02-14" "2017-02-25" "2017-02-28" "2017-02-28"
## [6] "2017-02-28"
elections$Median.income <- str_replace_all(elections$Median.income,pattern='\\$|,',replace='')
elections$Median.income <- as.numeric(trimws(elections$Median.income))
## Warning: NAs introduced by coercion
head(elections$Median.income[order(elections$Median.income)])
## [1] 20915 26265 30791 35910 36116 36200
elections$Percent.bachelors <- str_replace_all(elections$Percent.bachelors,pattern='\\%',replace='')
elections$Percent.bachelors <- as.numeric(elections$Percent.bachelors)
## Warning: NAs introduced by coercion
head(elections$Percent.bachelors[order(elections$Percent.bachelors)])
## [1] 9.2 11.0 11.9 12.9 13.0 13.2
#Got an error message about NAs for median income and percent Bachelor's. Let's check this out.
elections[is.na(elections$Median.income) == TRUE | is.na(elections$Percent.bachelors) == TRUE,]
## Date State Race Median.income
## 168 2017-09-12 New Hampshire Belknap HD-09 NA
## 188 2017-05-23 New Hampshire Hillsborough HD-44 NA
## Percent.bachelors Clinton.vs.Obama.margin Dem.margin.improvement
## 168 NA -17% 26%
## 188 NA -6% -1%
#Good to know - for two elections, we don't have info about income or education in those districts.
elections$Clinton.vs.Obama.margin <- str_replace_all(elections$Clinton.vs.Obama.margin,pattern='\\%',replace='')
elections$Clinton.vs.Obama.margin <- as.numeric(elections$Clinton.vs.Obama.margin)
head(elections$Clinton.vs.Obama.margin[order(elections$Clinton.vs.Obama.margin)])
## [1] -33 -25 -24 -23 -22 -21
tail(elections$Clinton.vs.Obama.margin[order(elections$Clinton.vs.Obama.margin)])
## [1] 21 21 22 22 23 35
elections$Dem.margin.improvement <- str_replace_all(elections$Dem.margin.improvement,pattern='\\%',replace='')
elections$Dem.margin.improvement <- as.numeric(elections$Dem.margin.improvement)
head(elections$Dem.margin.improvement[order(elections$Dem.margin.improvement)])
## [1] -37 -30 -24 -23 -23 -20
tail(elections$Dem.margin.improvement[order(elections$Dem.margin.improvement)])
## [1] 42 46 47 48 61 84
Let’s also take a closer look at the “Race” column, for New Jersey and Virginia, and then for the remaining states.
new_jersey <- elections[elections$State == "New Jersey",]
new_jersey_minus_governor <- new_jersey[new_jersey$Race != "Governor",]
new_jersey_minus_governor$Race <- str_replace_all(new_jersey_minus_governor$Race,pattern='SD\\-',replace='')
new_jersey_minus_governor$Race <- as.numeric(new_jersey_minus_governor$Race)
new_jersey_minus_governor$Race[order(new_jersey_minus_governor$Race)]
## [1] 1 2 3 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25
## [24] 26 27 29 30 31 32 33 34 35 36 37 38 39 40
setdiff(1:40,new_jersey_minus_governor$Race[order(new_jersey_minus_governor$Race)])
## [1] 4 19 28
New Jersey races include the governor’s race, as well as race results from a bunch of districts. The FiveThirtyEight descriptions says this data does NOT include General Assembly results. Based on the description here (https://ballotpedia.org/New_Jersey_State_Senate_elections,_2017), going to assume these are State Senate results.
Googling we also find that the three missing districts are ones where there was only one major party candidate (https://ballotpedia.org/New_Jersey_State_Senate_District_4, https://ballotpedia.org/New_Jersey_State_Senate_District_19, https://ballotpedia.org/New_Jersey_State_Senate_District_28).
What about Virginia?
virginia <- elections[elections$State == "Virginia",]
virginia_minus_governor_etc <- virginia[virginia$Race != "Governor" & virginia$Race != "Attorney General" & virginia$Race != "Lt. Governor",]
virginia_minus_governor_etc$Race <- str_replace_all(virginia_minus_governor_etc$Race,pattern='HD\\-',replace='')
virginia_minus_governor_etc$Race <- as.numeric(virginia_minus_governor_etc$Race)
virginia_minus_governor_etc$Race[order(virginia_minus_governor_etc$Race)]
## [1] 1 2 3 7 8 9 10 12 13 17 18 20 21 23 25 26 27
## [18] 28 29 30 31 32 33 34 38 40 42 49 50 51 54 55 56 58
## [35] 59 60 62 64 65 66 67 68 72 73 81 82 83 84 85 86 87
## [52] 88 91 93 94 96 97 98 99 100
setdiff(1:100,virginia_minus_governor_etc$Race)
## [1] 4 5 6 11 14 15 16 19 22 24 35 36 37 39 41 43 44 45 46 47 48 52 53
## [24] 57 61 63 69 70 71 74 75 76 77 78 79 80 89 90 92 95
Similar to New Jersey, we check a few districts not included here and found they were ones where there was only one major party candidate (https://ballotpedia.org/Virginia_House_of_Delegates_District_4, https://ballotpedia.org/Virginia_House_of_Delegates_District_95).
Now, let’s look more at the other states.
elections_minus_VA_and_NJ <- elections[elections$State != "New Jersey" & elections$State != "Virginia",]
races_minus_VA_and_NJ <- elections_minus_VA_and_NJ$Race
grep('^SD|^HD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE)
## [1] "HD-21" "SD-14" "HD-51"
## [4] "HD-89" "HD-120" "HD-49"
## [7] "HD-86" "SD-27" "HD-72"
## [10] "HD-175" "SD-54" "HD-23B"
## [13] "HD-144" "HD-97" "HD-129"
## [16] "HD-39" "HD-35" "SD-10"
## [19] "HD-06" "HD-99" "HD-111"
## [22] "SD-17" "SD-17" "HD-58"
## [25] "SD-03" "HD-133" "SD-37"
## [28] "SD-45" "HD-76" "HD-119"
## [31] "HD-151" "HD-113" "HD-109"
## [34] "SD-08" "HD-117" "HD-26"
## [37] "HD-56" "SD-26" "SD-39"
## [40] "HD-07, Position 1" "HD-23" "SD-07"
## [43] "HD-01" "SD-31" "HD-31, Position 2"
## [46] "SD-06" "SD-45" "HD-04"
## [49] "HD-31" "SD-40" "HD-116"
## [52] "HD-46" "SD-13" "HD-82"
## [55] "HD-50" "SD-28" "SD-16"
## [58] "SD-44" "HD-75" "HD-70"
## [61] "HD-48" "HD-95" "HD-84"
## [64] "SD-30" "SD-32" "HD-28"
## [67] "SD-02" "HD-68" "SD-32"
## [70] "HD-115" "SD-02" "SD-10"
## [73] "HD-32B" "HD-89"
grep('^SD|^HD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE,invert=TRUE)
## [1] "18th CD" "Belknap HD-03"
## [3] "AD-58" "U.S. Senate"
## [5] "Worcester & Middlesex SD" "Treasurer"
## [7] "1st Berkshire HD" "Sullivan HD-01"
## [9] "Hillsborough HD-15" "3rd CD"
## [11] "3rd Essex HD" "Strafford HD-13"
## [13] "Bristol & Norfolk SD" "Rockingham HD-04"
## [15] "Belknap HD-09" "Grafton HD-09"
## [17] "Merrimack HD-18" "5th CD"
## [19] "6th CD" "At-Large CD"
## [21] "AD-09" "Carroll HD-06"
## [23] "Hillsborough HD-44" "4th CD"
## [25] "34th CD"
grep('^SD|^HD|CD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE,invert=TRUE)
## [1] "Belknap HD-03" "AD-58"
## [3] "U.S. Senate" "Worcester & Middlesex SD"
## [5] "Treasurer" "1st Berkshire HD"
## [7] "Sullivan HD-01" "Hillsborough HD-15"
## [9] "3rd Essex HD" "Strafford HD-13"
## [11] "Bristol & Norfolk SD" "Rockingham HD-04"
## [13] "Belknap HD-09" "Grafton HD-09"
## [15] "Merrimack HD-18" "AD-09"
## [17] "Carroll HD-06" "Hillsborough HD-44"
grep('SD|HD|CD',races_minus_VA_and_NJ,value=TRUE,perl=TRUE,invert=TRUE)
## [1] "AD-58" "U.S. Senate" "Treasurer" "AD-09"
elections_minus_VA_and_NJ[grep('SD|HD|CD',races_minus_VA_and_NJ,perl=TRUE,invert=TRUE),]
## Date State Race Median.income Percent.bachelors
## 21 2018-01-16 Wisconsin AD-58 63758 29.0
## 28 2017-12-12 Alabama U.S. Senate 44758 24.0
## 32 2017-11-18 Louisiana Treasurer 45652 23.0
## 185 2017-05-23 New York AD-09 103602 36.9
## Clinton.vs.Obama.margin Dem.margin.improvement
## 21 -2 27
## 28 -6 31
## 32 -2 10
## 185 -11 38
Going to assume “HD” means (State) House District, “SD” means (State) Senate District, and “CD” means Congressional District.
Also looked at a few and designation still holds if name has “HD” or “SD” in name even if does not start with this string.
Most special elections fall into these categories, except one Louisiana special election for Treasurer and an Alabama special election for Senate.
Going to assume that “AD” in Wisconsin and New York is equivalent to “HD” in most other states.
race_short_names <- elections$Race
race_short_names[elections$State == "New Jersey"] <- plyr::mapvalues(elections$Race[elections$State == "New Jersey"],
from=c("Governor",grep('SD',elections$Race[elections$State == "New Jersey"],value=TRUE)),
to=c("Governor",rep("State Senate standard election",times=length(grep('SD',elections$Race[elections$State == "New Jersey"],value=TRUE)))))
race_short_names[elections$State == "Virginia"] <- plyr::mapvalues(elections$Race[elections$State == "Virginia"],
from=c("Governor","Attorney General","Lt. Governor",grep('HD',elections$Race[elections$State == "Virginia"],value=TRUE)),
to=c("Governor","Attorney General","Lt. Governor",rep("State House standard election",length(grep('HD',elections$Race[elections$State == "Virginia"],value=TRUE)))))
race_short_names[elections$State != "Virginia" & elections$State != "New Jersey"] <- plyr::mapvalues(elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],
from=c(unique(grep('SD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE)),
unique(grep('HD|AD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE,perl=TRUE)),
unique(grep('CD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE))),
to=rep(c("State Senate special election","State House special election","National House special election"),
times=c(length(unique(grep('SD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE))),
length(unique(grep('HD|AD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE,perl=TRUE))),
length(unique(grep('CD',elections$Race[elections$State != "Virginia" & elections$State != "New Jersey"],value=TRUE))))))
elections[race_short_names == "State House special election",c("Date","State","Race")]
## Date State Race
## 1 2018-03-27 Alabama HD-21
## 4 2018-03-06 Oklahoma HD-51
## 5 2018-02-27 Kentucky HD-89
## 6 2018-02-27 New Hampshire Belknap HD-03
## 7 2018-02-27 Connecticut HD-120
## 8 2018-02-20 Kentucky HD-49
## 9 2018-02-17 Louisiana HD-86
## 11 2018-02-13 Florida HD-72
## 12 2018-02-13 Georgia HD-175
## 14 2018-02-12 Minnesota HD-23B
## 15 2018-02-06 Missouri HD-144
## 16 2018-02-06 Missouri HD-97
## 17 2018-02-06 Missouri HD-129
## 18 2018-02-06 Missouri HD-39
## 19 2018-01-23 Pennsylvania HD-35
## 21 2018-01-16 Wisconsin AD-58
## 22 2018-01-16 Iowa HD-06
## 23 2018-01-16 South Carolina HD-99
## 24 2018-01-09 Georgia HD-111
## 27 2017-12-19 Florida HD-58
## 30 2017-12-05 Pennsylvania HD-133
## 35 2017-11-14 Oklahoma HD-76
## 37 2017-11-07 Georgia HD-119
## 38 2017-11-07 Missouri HD-151
## 39 2017-11-07 South Carolina HD-113
## 43 2017-11-07 Michigan HD-109
## 47 2017-11-07 Massachusetts 1st Berkshire HD
## 48 2017-11-07 Georgia HD-117
## 49 2017-11-07 New Hampshire Sullivan HD-01
## 55 2017-11-07 New Hampshire Hillsborough HD-15
## 70 2017-11-07 Georgia HD-26
## 97 2017-11-07 Maine HD-56
## 126 2017-11-07 Washington HD-07, Position 1
## 130 2017-11-07 Missouri HD-23
## 139 2017-11-07 Michigan HD-01
## 151 2017-11-07 Washington HD-31, Position 2
## 153 2017-11-07 Massachusetts 3rd Essex HD
## 160 2017-11-07 Georgia HD-04
## 161 2017-10-24 New Hampshire Strafford HD-13
## 163 2017-09-26 South Carolina HD-31
## 164 2017-09-26 New Hampshire Rockingham HD-04
## 166 2017-09-26 Florida HD-116
## 167 2017-09-12 Oklahoma HD-46
## 168 2017-09-12 New Hampshire Belknap HD-09
## 169 2017-09-05 New Hampshire Grafton HD-09
## 171 2017-08-08 Iowa HD-82
## 172 2017-08-08 Missouri HD-50
## 175 2017-07-18 New Hampshire Merrimack HD-18
## 177 2017-07-11 Oklahoma HD-75
## 179 2017-06-20 South Carolina HD-70
## 180 2017-06-20 South Carolina HD-48
## 182 2017-06-15 Tennessee HD-95
## 183 2017-05-30 South Carolina HD-84
## 185 2017-05-23 New York AD-09
## 186 2017-05-23 New Hampshire Carroll HD-06
## 188 2017-05-23 New Hampshire Hillsborough HD-44
## 190 2017-05-09 Oklahoma HD-28
## 192 2017-04-25 Connecticut HD-68
## 196 2017-02-28 Connecticut HD-115
## 199 2017-02-14 Minnesota HD-32B
## 200 2017-01-31 Iowa HD-89
elections[race_short_names == "State Senate special election",c("Date","State","Race")]
## Date State Race
## 3 2018-03-13 Tennessee SD-14
## 10 2018-02-13 Oklahoma SD-27
## 13 2018-02-12 Minnesota SD-54
## 20 2018-01-16 Wisconsin SD-10
## 25 2018-01-09 Georgia SD-17
## 26 2017-12-19 Tennessee SD-17
## 29 2017-12-12 Iowa SD-03
## 31 2017-12-05 Massachusetts Worcester & Middlesex SD
## 33 2017-11-14 Oklahoma SD-37
## 34 2017-11-14 Oklahoma SD-45
## 46 2017-11-07 Missouri SD-08
## 106 2017-11-07 New York SD-26
## 107 2017-11-07 Georgia SD-39
## 135 2017-11-07 Washington SD-07
## 148 2017-11-07 Washington SD-31
## 154 2017-11-07 Georgia SD-06
## 159 2017-11-07 Washington SD-45
## 162 2017-10-17 Massachusetts Bristol & Norfolk SD
## 165 2017-09-26 Florida SD-40
## 170 2017-08-22 Rhode Island SD-13
## 173 2017-08-08 Missouri SD-28
## 174 2017-07-25 New Hampshire SD-16
## 176 2017-07-11 Oklahoma SD-44
## 187 2017-05-23 New York SD-30
## 189 2017-05-16 Georgia SD-32
## 191 2017-04-29 Louisiana SD-02
## 195 2017-02-28 Connecticut SD-32
## 197 2017-02-28 Connecticut SD-02
## 198 2017-02-25 Delaware SD-10
elections[race_short_names == "State House standard election",c("Date","State","Race")]
## Date State Race
## 57 2017-11-07 Virginia HD-20
## 59 2017-11-07 Virginia HD-01
## 60 2017-11-07 Virginia HD-02
## 61 2017-11-07 Virginia HD-12
## 63 2017-11-07 Virginia HD-34
## 64 2017-11-07 Virginia HD-86
## 65 2017-11-07 Virginia HD-25
## 67 2017-11-07 Virginia HD-03
## 71 2017-11-07 Virginia HD-33
## 73 2017-11-07 Virginia HD-38
## 74 2017-11-07 Virginia HD-72
## 75 2017-11-07 Virginia HD-27
## 76 2017-11-07 Virginia HD-42
## 77 2017-11-07 Virginia HD-17
## 79 2017-11-07 Virginia HD-93
## 80 2017-11-07 Virginia HD-85
## 81 2017-11-07 Virginia HD-62
## 84 2017-11-07 Virginia HD-49
## 87 2017-11-07 Virginia HD-81
## 88 2017-11-07 Virginia HD-87
## 90 2017-11-07 Virginia HD-28
## 91 2017-11-07 Virginia HD-21
## 92 2017-11-07 Virginia HD-56
## 94 2017-11-07 Virginia HD-31
## 98 2017-11-07 Virginia HD-55
## 99 2017-11-07 Virginia HD-10
## 100 2017-11-07 Virginia HD-32
## 101 2017-11-07 Virginia HD-84
## 102 2017-11-07 Virginia HD-26
## 103 2017-11-07 Virginia HD-08
## 104 2017-11-07 Virginia HD-09
## 112 2017-11-07 Virginia HD-73
## 113 2017-11-07 Virginia HD-51
## 114 2017-11-07 Virginia HD-96
## 115 2017-11-07 Virginia HD-65
## 116 2017-11-07 Virginia HD-82
## 117 2017-11-07 Virginia HD-91
## 119 2017-11-07 Virginia HD-23
## 120 2017-11-07 Virginia HD-59
## 121 2017-11-07 Virginia HD-88
## 123 2017-11-07 Virginia HD-29
## 124 2017-11-07 Virginia HD-30
## 125 2017-11-07 Virginia HD-98
## 128 2017-11-07 Virginia HD-18
## 129 2017-11-07 Virginia HD-66
## 131 2017-11-07 Virginia HD-64
## 132 2017-11-07 Virginia HD-50
## 134 2017-11-07 Virginia HD-68
## 136 2017-11-07 Virginia HD-07
## 138 2017-11-07 Virginia HD-97
## 140 2017-11-07 Virginia HD-94
## 141 2017-11-07 Virginia HD-13
## 142 2017-11-07 Virginia HD-54
## 144 2017-11-07 Virginia HD-67
## 145 2017-11-07 Virginia HD-83
## 146 2017-11-07 Virginia HD-58
## 147 2017-11-07 Virginia HD-40
## 149 2017-11-07 Virginia HD-99
## 152 2017-11-07 Virginia HD-100
## 155 2017-11-07 Virginia HD-60
elections[race_short_names == "State Senate standard election",c("Date","State","Race")]
## Date State Race
## 36 2017-11-07 New Jersey SD-01
## 40 2017-11-07 New Jersey SD-33
## 41 2017-11-07 New Jersey SD-32
## 42 2017-11-07 New Jersey SD-03
## 44 2017-11-07 New Jersey SD-27
## 45 2017-11-07 New Jersey SD-36
## 50 2017-11-07 New Jersey SD-18
## 51 2017-11-07 New Jersey SD-31
## 52 2017-11-07 New Jersey SD-37
## 53 2017-11-07 New Jersey SD-20
## 54 2017-11-07 New Jersey SD-30
## 56 2017-11-07 New Jersey SD-34
## 58 2017-11-07 New Jersey SD-06
## 62 2017-11-07 New Jersey SD-29
## 66 2017-11-07 New Jersey SD-38
## 68 2017-11-07 New Jersey SD-07
## 69 2017-11-07 New Jersey SD-24
## 78 2017-11-07 New Jersey SD-13
## 82 2017-11-07 New Jersey SD-10
## 85 2017-11-07 New Jersey SD-05
## 86 2017-11-07 New Jersey SD-22
## 93 2017-11-07 New Jersey SD-35
## 95 2017-11-07 New Jersey SD-14
## 105 2017-11-07 New Jersey SD-11
## 109 2017-11-07 New Jersey SD-12
## 110 2017-11-07 New Jersey SD-15
## 111 2017-11-07 New Jersey SD-39
## 118 2017-11-07 New Jersey SD-17
## 122 2017-11-07 New Jersey SD-25
## 127 2017-11-07 New Jersey SD-26
## 133 2017-11-07 New Jersey SD-40
## 137 2017-11-07 New Jersey SD-09
## 143 2017-11-07 New Jersey SD-23
## 150 2017-11-07 New Jersey SD-08
## 156 2017-11-07 New Jersey SD-16
## 157 2017-11-07 New Jersey SD-21
## 158 2017-11-07 New Jersey SD-02
elections[race_short_names == "National House special election",c("Date","State","Race")]
## Date State Race
## 2 2018-03-13 Pennsylvania 18th CD
## 96 2017-11-07 Utah 3rd CD
## 178 2017-06-20 South Carolina 5th CD
## 181 2017-06-20 Georgia 6th CD
## 184 2017-05-25 Montana At-Large CD
## 193 2017-04-11 Kansas 4th CD
## 194 2017-04-04 California 34th CD
elections[!(race_short_names %in% c("State House special election","State Senate special election","State House standard election","State Senate standard election","National House special election")),]
## Date State Race Median.income Percent.bachelors
## 28 2017-12-12 Alabama U.S. Senate 44758 24.0
## 32 2017-11-18 Louisiana Treasurer 45652 23.0
## 72 2017-11-07 Virginia Governor 66149 36.9
## 83 2017-11-07 Virginia Attorney General 66149 36.9
## 89 2017-11-07 Virginia Lt. Governor 66149 36.9
## 108 2017-11-07 New Jersey Governor 73702 37.5
## Clinton.vs.Obama.margin Dem.margin.improvement
## 28 -6 31
## 32 -2 10
## 72 1 6
## 83 1 4
## 89 1 3
## 108 -4 2
Looks good! Except, let’s actually make a few different columns to describe the elections, including state vs. national, special vs. standard, and then position.
state_vs_national <- rep("State",times=nrow(elections))
state_vs_national[race_short_names == "National House special election" | race_short_names == "U.S. Senate"] <- "National"
special_vs_standard <- rep("Special",times=nrow(elections))
special_vs_standard[elections$State == "New Jersey" | elections$State == "Virginia"] <- "Standard"
positions <- rep("House",times=nrow(elections))
positions[grep('Senate',race_short_names)] <- "Senate"
positions[grep('House|Senate',race_short_names,invert=TRUE,perl=TRUE)] <- race_short_names[grep('House|Senate',race_short_names,invert=TRUE,perl=TRUE)]
Add all this to “elections” and save.
elections <- data.frame(Race.description = race_short_names,
State.vs.national = state_vs_national,
Special.vs.standard = special_vs_standard,
Position = positions,
elections,
stringsAsFactors=FALSE)
Following the analysis plan in this article (https://fivethirtyeight.com/features/be-skeptical-of-anyone-who-tells-you-they-know-how-democrats-can-win-in-november/), I will examine the correlation between presidentical election performance in 2016 as compared to 2012 (Clinton.vs.Obama.margin) and Democratic performance in 2017/2018 in those districts in either standard or special elections.
I will try analyzing standard and special elections both separately and together. Putting them together gives a nice large sample size, but special elections may also have characteristics different from standard elections, so would also be good to try separating these.
I will also check findings against possible confounding influences of education and income on the results.
table(race_short_names)
## race_short_names
## Attorney General Governor
## 1 2
## Lt. Governor National House special election
## 1 7
## State House special election State House standard election
## 61 60
## State Senate special election State Senate standard election
## 29 37
## Treasurer U.S. Senate
## 1 1
If we roll together state house and senate elections, there are 97 standard state legislative elections to look at, and 90 special state legislative elections.
There are also 8 national legislative special elections, 1 state-level special election for treasurer, and 4 other standard elections (for AG/Governor/Lt. Governor).
So we have a total of 99 special elections and 101 standard elections to look at.
The data is based on observational data, collected based on people’s actual voting behavior rather than survey data.
Explanatory = Presidential election performance in 2016 as compared to 2012 (Clinton.vs.Obama.margin). Also possibly income and education (Median.income and Percent.bachelors).
Response = Democratic performance in 2017/2018 election relative to expectation (Dem.margin.improvement)
These are all numeric variables. However, after looking at summary statistics could also convert to categorical if that makes sense based on distributions.
Let’s make histograms of all variables of interest.
For now do not separate standard and special elections.
elections_long_for_hist <- gather(elections[,c("Median.income","Percent.bachelors","Clinton.vs.Obama.margin","Dem.margin.improvement")])
ggplot(elections_long_for_hist,
aes(value)) + geom_histogram(fill="lightgrey",col="black",bins=9) + facet_wrap(~key,scales="free") +
xlab("") +
ylab("Number of election districts or states")
## Warning: Removed 4 rows containing non-finite values (stat_bin).
Also run summary on each variable.
aggregate(value ~ key,FUN=summary,data=elections_long_for_hist)
## key value.Min. value.1st Qu. value.Median
## 1 Clinton.vs.Obama.margin -33.00000 -8.00000 -2.00000
## 2 Dem.margin.improvement -37.00000 0.00000 5.00000
## 3 Median.income 20915.00000 50174.50000 63483.00000
## 4 Percent.bachelors 9.20000 23.85000 31.65000
## value.Mean value.3rd Qu. value.Max.
## 1 -1.83500 4.00000 35.00000
## 2 8.29000 15.25000 84.00000
## 3 67541.53535 79099.25000 177551.00000
## 4 33.86212 41.70000 74.90000
We find that the Democrats’ margin of improvement in 2017/2018 is very right-skewed. However, we should still be able to make inferences considering the large sample size.
Median income and percent with a bachelor’s degree are also a bit right-skewed, but again this should be fine with such a large sample size.