Load Packages:
suppressWarnings(suppressMessages(library(operators)))
suppressWarnings(suppressMessages(library(plyr)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(reshape2)))
suppressWarnings(suppressMessages(library(RCurl)))
suppressWarnings(suppressMessages(library(knitr)))
The data source for congressional results was is Time Magazine’s GitHub Repo. I’m grateful for their work in tidying the data and providing results for each congressional election from 2004-2012 as both as json and csv file. The link I used for the 2012 data below can be found \(\href{https://github.com/TimeMagazine/congressional-election-results.git}{here}\)
First, I loaded and cleanded the data. The file contains records of senate elections, marked as “S” in the district field, and special elections. Special elections are indicated by additional text in the district field. We will excise this info for our purposes. Those congressional districts marked “00” are at-large districts for states with only one seat.
url <- "https://raw.githubusercontent.com/TimeMagazine/congressional-election-results/master/data/results_2012.csv"
cd2012 <- getURL(url)
cd2012 <- read.csv(textConnection(cd2012))
#cd2012 <- read.csv("results_2012.csv",header=T)
cd2012$votes <- as.numeric(as.character(cd2012$votes))
## Warning: NAs introduced by coercion
cd2012 <- subset(cd2012,district != "S")
cd2012$district[cd2012$district == "01 - FULL TERM"] <- "01"
cd2012$district[cd2012$district == "04 - FULL TERM"] <- "04"
cd2012$district[cd2012$district == "10 - FULL TERM"] <- "10"
cd2012$district[cd2012$district == "11 - FULL TERM"] <- "11"
kable(subset(cd2012, state=="CA" & district=="37"))
| id | year | name | state | district | votes | parties | |
|---|---|---|---|---|---|---|---|
| 138 | 2012_CA_37 | 2012 | Bass, Karen R. | CA | 37 | 207039 | Democratic |
| 139 | 2012_CA_37 | 2012 | Osborne, Morgan | CA | 37 | 32541 | Write-In,Republican,Republican |
By examining the records for the California 37th, we see that some major parties are not simply listed “Democratic” or “Republican”, but some local variation or party alliance. To simplify the results, we will lump all of the Republican and Democratic groups into their umbrella label. This will still leave a significant amount of independent and third parties in our dataset that are aligned with neither of the two.
d <- c("Democratic,Working Families","Working Families,Democratic",
"Democratic,Progressive,Working Families","Democratic-Farmer Labor",
"Democratic,Working Families,Independence","Democratic,Independence,Working Families",
"Democratic-Nonpartisan League","Write-In,Democratic,Democratic","Democratic")
r <- c("Republican,Tax Revolt,Conservative,Independence","Republican,Tax Revolt,Conservative",
"Republican,Tax Revolt","Republican,Conservative","Conservative,Republican",
"Republican,Conservative,Libertarian","Republican,Conservative,Independence",
"Independence,Republican","Independent Party,Republican", "Republican,Constitution",
"Write-In,Republican,Republican","Conservative,Independence,Republican","Republican")
for (i in 1:length(cd2012$parties)){
if (cd2012$parties[i] %in% d){
cd2012$parties[i] <- "Democratic"
}
else if (cd2012$parties[i] %in% r){
cd2012$parties[i] <- "Republican"
}
}
kable(subset(cd2012, state=="CA" & district=="37"))
| id | year | name | state | district | votes | parties | |
|---|---|---|---|---|---|---|---|
| 138 | 2012_CA_37 | 2012 | Bass, Karen R. | CA | 37 | 207039 | Democratic |
| 139 | 2012_CA_37 | 2012 | Osborne, Morgan | CA | 37 | 32541 | Republican |
The replacements worked! Now we will perform a a number of operations and aggregations to return useful dataframes.
# calc total vote by state
totalVoteByState <- aggregate(votes ~ state, cd2012, sum)
# calc total vote by district
totalVoteByDistrict <- aggregate(votes ~ state + district, cd2012, sum)
totalVoteByDistrict <- totalVoteByDistrict[order(totalVoteByDistrict$state,totalVoteByDistrict$district),]
# add threshold of simple majority based on each district's vote total
totalVoteByDistrict$simpleMajority <- round(totalVoteByDistrict$votes/2 + 1,0)
kable(head(totalVoteByState,10))
| state | votes |
|---|---|
| AK | 288840 |
| AL | 1927122 |
| AR | 1038054 |
| AZ | 2173317 |
| CA | 12204357 |
| CO | 2450839 |
| CT | 1466511 |
| DE | 388059 |
| FL | 7513534 |
| GA | 3553587 |
kable(head(totalVoteByDistrict,10))
| state | district | votes | simpleMajority | |
|---|---|---|---|---|
| 1 | AK | 00 | 288840 | 144421 |
| 8 | AL | 01 | 196374 | 98188 |
| 51 | AL | 02 | 283683 | 141842 |
| 94 | AL | 03 | 273447 | 136724 |
| 132 | AL | 04 | 268777 | 134390 |
| 167 | AL | 05 | 290957 | 145480 |
| 196 | AL | 06 | 307529 | 153766 |
| 222 | AL | 07 | 306355 | 153178 |
| 9 | AR | 01 | 246843 | 123422 |
| 52 | AR | 02 | 286598 | 143300 |
Next, I created a dataframe for total votes for each party in each district. I added a column for each party’s percentage of the vote in the district as well as a total of the number of votes “wasted” for each party. Our critera for wasted votes are:
districtParty <- aggregate(votes ~ state + district + parties, cd2012, sum)
districtParty <- districtParty[order(districtParty$state,districtParty$district),]
districtParty <- join(districtParty,totalVoteByDistrict,by=c("state","district"),type="left")
colnames(districtParty)[5] <- "totalVote"
districtParty <- districtParty[,c(1,2,5,6,4,3)]
districtParty$pctVote <- districtParty$votes/districtParty$totalVote
districtParty$wastedVote <- 0
for (i in 1:nrow(districtParty)){
if (districtParty$votes[i] < districtParty$simpleMajority[i]){
districtParty$wastedVote[i] <- districtParty$votes[i]
}
else if (districtParty$votes[i] >= districtParty$simpleMajority[i]){
districtParty$wastedVote[i] <- districtParty$votes[i] - districtParty$simpleMajority[i]
}
}
kable(head(districtParty,10))
| state | district | totalVote | simpleMajority | votes | parties | pctVote | wastedVote |
|---|---|---|---|---|---|---|---|
| AK | 00 | 288840 | 144421 | 82927 | Democratic | 0.2871036 | 82927 |
| AK | 00 | 288840 | 144421 | 15028 | Libertarian | 0.0520288 | 15028 |
| AK | 00 | 288840 | 144421 | 5589 | Non-Affiliated | 0.0193498 | 5589 |
| AK | 00 | 288840 | 144421 | 185296 | Republican | 0.6415178 | 40875 |
| AL | 01 | 196374 | 98188 | 196374 | Republican | 1.0000000 | 98186 |
| AL | 02 | 283683 | 141842 | 103092 | Democratic | 0.3634056 | 103092 |
| AL | 02 | 283683 | 141842 | 180591 | Republican | 0.6365944 | 38749 |
| AL | 03 | 273447 | 136724 | 98141 | Democratic | 0.3589032 | 98141 |
| AL | 03 | 273447 | 136724 | 175306 | Republican | 0.6410968 | 38582 |
| AL | 04 | 268777 | 134390 | 69706 | Democratic | 0.2593451 | 69706 |
Next, I created a column to indicate which party won in each district. To do so, I iterated over every district to find which party had the highest percentage of the vote. Winners were marked with “1” in the “win” column so that we could later aggregate results by party.
allStates <- unique(districtParty$state)
allDistricts <- unique(districtParty$district)
districtParty$win <- 0 #create column to indicate wins
for (i in 1:length(allDistricts)){
for (j in 1:length(allStates)){
row_to_find <- data.frame(state=allStates[j], district=allDistricts[i])
if(nrow(merge(row_to_find,districtParty))>0){
subFrame <- subset(districtParty,state==allStates[j] & district==allDistricts[i])
w <- which.max(subFrame$pctVote)
districtParty[districtParty$state==allStates[j] &
districtParty$district==allDistricts[i],]$win[w] <- 1
}
}
}
Calculate total seats won by state and overall:
seatsWon <- aggregate(win ~ parties,districtParty,sum)
seatsWon <- subset(seatsWon,win!=0)
seatsWon
## parties win
## 12 Democratic 199
## 39 Republican 234
seatsWonByState <- aggregate(win ~ state + parties,districtParty,sum)
seatsWonByState <- subset(seatsWonByState,win!=0)
seatsWonByState <- seatsWonByState[order(seatsWonByState$state,-seatsWonByState$win),]
kable(head(seatsWonByState,10))
| state | parties | win | |
|---|---|---|---|
| 156 | AK | Republican | 1 |
| 157 | AL | Republican | 6 |
| 19 | AL | Democratic | 1 |
| 158 | AR | Republican | 4 |
| 21 | AZ | Democratic | 5 |
| 159 | AZ | Republican | 4 |
| 22 | CA | Democratic | 38 |
| 160 | CA | Republican | 15 |
| 161 | CO | Republican | 4 |
| 23 | CO | Democratic | 3 |
A quick detour to get a rough comparison of the total votes and wasted votes by party:
totalWastedVoteByParty <- aggregate(wastedVote ~ parties, districtParty, sum)
totalWastedVoteByParty <- totalWastedVoteByParty[order(-totalWastedVoteByParty$wastedVote),]
totalVoteByParty <- aggregate(votes ~ parties, districtParty, sum)
totalVoteByParty <- totalVoteByParty[order(-totalVoteByParty$votes),]
kable(head(totalWastedVoteByParty,6))
| parties | wastedVote | |
|---|---|---|
| 12 | Democratic | 33763208 |
| 39 | Republican | 25146263 |
| 21 | Libertarian | 1302730 |
| 16 | Independent | 485517 |
| 25 | No Party Affiliation | 374428 |
| 14 | Green | 342912 |
kable(head(totalVoteByParty,6))
| parties | votes | |
|---|---|---|
| 12 | Democratic | 59637698 |
| 39 | Republican | 58265212 |
| 21 | Libertarian | 1302730 |
| 16 | Independent | 485517 |
| 25 | No Party Affiliation | 374428 |
| 14 | Green | 342912 |
Phenomenal! By looking at the at the aggregate, we see that Democrats and Republicans received roughly the same number of votes nationally. Although Democrats received 1,372,486 more votes, Republicans won a 35 seat majority in the House of Representatives. We also see that there are far more wasted votes for Democrats - 8,616,945 more to be precise - just about a third more wasted votes than for Republicans.
Now we will start to create our misapportionment metric by diving into the data for wasted votes for each party by state.
numDistricts <- data.frame(table(totalVoteByDistrict$state))
colnames(numDistricts) <- c("state","number")
numDistricts[9,2] <- 27 # correct, missing two from redistricting
sum(numDistricts$number) # check all 435 are there
## [1] 435
totalWastedVoteByPartyByState <- aggregate(wastedVote ~ state + parties, districtParty, sum)
totalVoteByPartyByState <- aggregate(votes ~ state + parties, districtParty, sum)
totalVoteByPartyByState <- totalVoteByPartyByState[order(totalVoteByPartyByState$state,-totalVoteByPartyByState$votes),]
totalWastedVoteByPartyByState <- totalWastedVoteByPartyByState[order(totalWastedVoteByPartyByState$state,-totalWastedVoteByPartyByState$wastedVote),]
kable(head(totalVoteByPartyByState,10))
| state | parties | votes | |
|---|---|---|---|
| 156 | AK | Republican | 185296 |
| 18 | AK | Democratic | 82927 |
| 104 | AK | Libertarian | 15028 |
| 146 | AK | Non-Affiliated | 5589 |
| 157 | AL | Republican | 1233624 |
| 19 | AL | Democratic | 693498 |
| 158 | AR | Republican | 637591 |
| 20 | AR | Democratic | 304770 |
| 70 | AR | Green | 57706 |
| 105 | AR | Libertarian | 37987 |
kable(head(totalWastedVoteByPartyByState,10))
| state | parties | wastedVote | |
|---|---|---|---|
| 18 | AK | Democratic | 82927 |
| 156 | AK | Republican | 40875 |
| 104 | AK | Libertarian | 15028 |
| 146 | AK | Non-Affiliated | 5589 |
| 19 | AL | Democratic | 540320 |
| 157 | AL | Republican | 423234 |
| 20 | AR | Democratic | 304770 |
| 158 | AR | Republican | 118560 |
| 70 | AR | Green | 57706 |
| 105 | AR | Libertarian | 37987 |
Our first attempt will take a simplified look at the results by ignoring third parties. We will subset the dataframe to only include Democrats and Republicans and then join it with the number of districts and total vote for each state. This will let us create new columns for the efficiency gap and the number of seats misapportioned.
The efficiency gap is calculated by the absolute value in the difference of wasted votes for each party, divided by the total number of votes in the state. The number of misapportioned seats is a product of the efficiency gap and the number of seats/districts in the state.
I also created columns for the advantage given to either party. This advantage is simply the seat misapportionment in their favor. The seat advantage for each state is determined by the party with fewer wasted votes.
twoPartyModel <- subset(totalWastedVoteByPartyByState,parties=="Democratic" | totalWastedVoteByPartyByState$parties=="Republican")
twoPartyModel <- dcast(twoPartyModel,state ~ parties,value.var="wastedVote")
# bind number districts and total vote
twoPartyModel <- join(twoPartyModel,numDistricts,by="state",type="left")
twoPartyModel <- join(twoPartyModel,totalVoteByState,by="state",type="left")
twoPartyModel$efficiency.gap <- abs(twoPartyModel$Democratic-twoPartyModel$Republican)/twoPartyModel$votes
twoPartyModel$seats.misapportioned <- twoPartyModel$efficiency.gap*twoPartyModel$number
colnames(twoPartyModel)[2:5] <- c("w.votes.D","w.votes.R","seats","total.votes")
twoPartyModel$adv.R <- 0
twoPartyModel$adv.D <- 0
for (i in 1:nrow(twoPartyModel)){
if (twoPartyModel$w.votes.D[i] < twoPartyModel$w.votes.R[i]){
twoPartyModel$adv.D[i] <- twoPartyModel$seats.misapportioned[i]
}
else if (twoPartyModel$w.votes.R[i] < twoPartyModel$w.votes.D[i]){
twoPartyModel$adv.R[i] <- twoPartyModel$seats.misapportioned[i]
}
}
sum(twoPartyModel$adv.R)
## [1] 37.18521
sum(twoPartyModel$adv.D)
## [1] 8.228584
By summing the total advantages for each party, we see that the Republicans have roughly a 29 seat advantage based on the efficiency gap. This measure is remarkably close to the excess number of seats Republicans hold in proportion to their share of the vote.
# Republican Seats in proportional system
(58265212/sum(totalVoteByParty$votes))*435
## [1] 208.982
# Libertarian Seats in proportional system
(1302730/sum(totalVoteByParty$votes))*435
## [1] 4.672549
We calculated above that Republicans won 234 seats in the 2012 election, but we see here that in a strictly proportional system they would have won 209. Of course, seats in the House of Representatives aren’t meant to be proportionally allocated, but here we may be able to disprove the notion that the Republican advantage is the product of political advantage in many smaller, less-populated districts. Our measure of the efficiency gap shows that they hold 29 seats more than they should. This demonstrates how perfect political symmetry - parity in wasted votes - differs from strict proportionality. We demonstrate that Republicans hold 29 seats more than they should, which would actually put them at 4 seats fewer than a strictly proportional allotmment. We see here that symmetry in wasted votes does not manifest strictly proportional outcomes. Some symmetric elections may benefit one of the two major parties at the cost of the other, but it appears in this case that the Republicans picked up most of the five seats Liberatarians would have won.
Now we will add some metrics to our two-party model to see how a symmetric election may have turned out.
twoPartyStateTotals <- subset(totalVoteByPartyByState,parties=="Democratic" | totalVoteByPartyByState$parties=="Republican")
twoPartyStateTotals <- dcast(twoPartyStateTotals,state ~ parties,value.var="votes")
colnames(twoPartyStateTotals)[2:3] <- c("d.votes","r.votes")
twoPartyModel <- join(twoPartyModel,twoPartyStateTotals,by="state",type="left")
# calculate projected seats
twoPartyModel$proj.D.seats <- (twoPartyModel$d.votes/twoPartyModel$total.votes)*twoPartyModel$seats
twoPartyModel$proj.R.seats <- (twoPartyModel$r.votes/twoPartyModel$total.votes)*twoPartyModel$seats
display <- twoPartyModel
#display[,2:9] <- round(display[,2:9],2)
kable(display[,c(1,4,7,12,13)])
| state | seats | seats.misapportioned | proj.D.seats | proj.R.seats |
|---|---|---|---|---|
| AK | 1 | 0.1455893 | 0.2871036 | 0.6415178 |
| AL | 7 | 0.4252985 | 2.5190341 | 4.4809659 |
| AR | 4 | 0.7175349 | 1.1743898 | 2.4568703 |
| AZ | 9 | 0.2580921 | 3.9216304 | 4.6863697 |
| CA | 53 | 1.2802878 | 32.1043754 | 19.6725347 |
| CO | 7 | 0.3431882 | 3.0859546 | 3.2668698 |
| CT | 5 | 0.9623896 | 3.2433476 | 1.7057151 |
| DE | 1 | 0.1903164 | 0.6440593 | 0.3343744 |
| FL | 27 | 2.9047930 | 12.1906488 | 13.7506657 |
| GA | 14 | 0.9293826 | 5.7080820 | 8.2894754 |
| HI | 2 | 0.3019603 | 1.3490258 | 0.6509742 |
| IA | 4 | 0.1096082 | 2.0121201 | 1.8925944 |
| ID | 2 | 0.3749705 | 0.6558284 | 1.2808642 |
| IL | 18 | 1.0471468 | 9.7638073 | 7.8567970 |
| IN | 9 | 1.3883687 | 4.0266283 | 4.7639194 |
| KS | 4 | 0.0627830 | 0.7393317 | 2.8021317 |
| KY | 6 | 0.7122713 | 2.3539121 | 3.5324701 |
| LA | 6 | 0.7682076 | 1.2635545 | 4.0209273 |
| MA | 9 | 0.3879411 | 6.4761451 | 2.1714945 |
| MD | 8 | 0.5785490 | 5.0451608 | 2.6620387 |
| ME | 2 | 0.5334815 | 1.2332614 | 0.7667386 |
| MI | 14 | 2.3112845 | 7.1244616 | 6.3863620 |
| MN | 8 | 0.1169094 | 4.4474870 | 3.4486441 |
| MO | 8 | 0.9638537 | 3.3470728 | 4.3756075 |
| MS | 4 | 0.0277807 | 1.3620477 | 2.3295797 |
| MT | 1 | 0.3946763 | 0.4271876 | 0.5325134 |
| NC | 13 | 2.7841548 | 6.5786734 | 6.3378995 |
| ND | 1 | 0.3682878 | 0.4178849 | 0.5495987 |
| NE | 3 | 0.6455137 | 1.0727520 | 1.9272480 |
| NH | 2 | 0.4085112 | 0.9997537 | 0.9138644 |
| NJ | 12 | 2.0638059 | 6.5609951 | 5.2300038 |
| NM | 3 | 0.2758823 | 1.6976062 | 1.2998526 |
| NV | 4 | 0.2595020 | 1.8621360 | 1.8782758 |
| NY | 27 | 2.0979047 | 17.2385141 | 9.3760276 |
| OH | 16 | 3.6932604 | 7.5091061 | 8.1559140 |
| OK | 5 | 0.8161184 | 1.5473006 | 3.2311991 |
| OR | 5 | 0.7597368 | 2.7839943 | 2.0164479 |
| PA | 18 | 4.3056870 | 9.0498016 | 8.7794030 |
| RI | 2 | 0.6688927 | 1.0889133 | 0.7577967 |
| SC | 7 | 1.4879715 | 2.9022655 | 4.0092606 |
| SD | 1 | 0.3510095 | 0.4255027 | 0.5744973 |
| TN | 9 | 0.2157263 | 3.1389991 | 5.3973430 |
| TX | 36 | 1.8254567 | 13.8561480 | 20.8049834 |
| UT | 4 | 0.2132312 | 1.2986684 | 2.5943536 |
| VA | 11 | 2.3454557 | 5.3209992 | 5.5294050 |
| VT | 1 | 0.0130324 | 0.7201472 | 0.2331779 |
| WA | 10 | 0.3250112 | 5.4443818 | 4.5556182 |
| WI | 8 | 1.1720342 | 4.0386575 | 3.9184214 |
| WV | 3 | 0.0323316 | 1.2026167 | 1.7973833 |
| WY | 1 | 0.0486101 | 0.2386891 | 0.6900852 |
Add actual seats won to the model and plot the results:
seatsWonByState <- dcast(seatsWonByState,state ~ parties,value.var="win",fill="0")
seatsWonByState$Democratic <- as.numeric(as.character(seatsWonByState$Democratic))
seatsWonByState$Republican <- as.numeric(as.character(seatsWonByState$Republican))
colnames(seatsWonByState)[2:3] <- c("act.D.seats","act.R.seats")
twoPartyModel <- join(twoPartyModel,seatsWonByState,by="state",type="left")
twoPartyModel$proj.diff <- abs(twoPartyModel$proj.R.seats - twoPartyModel$proj.D.seats)
twoPartyModel$act.diff <- abs(twoPartyModel$act.R.seats - twoPartyModel$act.D.seats)
ggplot(twoPartyModel) +
geom_point(aes(x=state, y=proj.D.seats),color="red") +
geom_point(aes(x=state, y=proj.R.seats),color="black")
ggplot(twoPartyModel) +
geom_point(aes(y=efficiency.gap, x=act.diff),color="black") +
geom_point(aes(y=efficiency.gap, x=proj.diff),color="white")
ggplot(twoPartyModel) +
geom_point(aes(x=state, y=seats.misapportioned),color="black") +
geom_point(aes(x=state, y=adv.R),color="red") +
geom_point(aes(x=state, y=adv.D),color="blue") +
ylim(0.5,5) +
ggtitle("Misapportioned Congressional Districts Seats By Party")
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 32 rows containing missing values (geom_point).
## Warning: Removed 42 rows containing missing values (geom_point).
Now we calculate Democrat:Republican vote ratio and plot against efficiency gap. Now we want to look at the distribution for the efficiency gap based on who has more votes in a state. States with a DR ratio less than 1 have more Republican votes, and those with a ratio greater than 1 have more Demcratic voters
twoPartyModel$marg.victory <- abs(twoPartyModel$d.votes-twoPartyModel$r.votes)/twoPartyModel$total.votes
twoPartyModel$voteratio <- ifelse(twoPartyModel$r.votes<twoPartyModel$d.votes,
twoPartyModel$r.votes/twoPartyModel$d.votes,
twoPartyModel$d.votes/twoPartyModel$r.votes)
twoPartyModel$majority <- ifelse(twoPartyModel$r.votes>twoPartyModel$d.votes,"R","D")
ggplot(twoPartyModel,aes(x=voteratio, y=efficiency.gap)) +
geom_point(aes(colour=majority)) +
scale_colour_manual(values=c("blue","red")) +
stat_smooth(se=TRUE,level=0.95) +
ylim(-0.2,0.4) + xlim(1,4)
## Warning: Removed 50 rows containing non-finite values (stat_smooth).
## Warning: Removed 50 rows containing missing values (geom_point).
ggplot(twoPartyModel,aes(x=voteratio, y=efficiency.gap,colour=majority)) +
geom_point() +
scale_colour_manual(values=c("blue","red")) +
geom_smooth(se=FALSE,method=lm,level=0.95)
ggplot(twoPartyModel,aes(x=marg.victory, y=efficiency.gap,colour=majority)) +
geom_point() +
scale_colour_manual(values=c("blue","red")) +
geom_smooth(se=FALSE,method=lm,level=0.95)
rgap <- subset(twoPartyModel,voteratio<1)
dgap <- subset(twoPartyModel,voteratio>1)
summary(rgap$efficiency.gap)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.006945 0.049450 0.124300 0.133100 0.191900 0.394700
summary(dgap$efficiency.gap)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
##
twoPartyModel[,c(1,6,10,11,20)]
## state efficiency.gap d.votes r.votes majority
## 1 AK 0.145589254 82927 185296 R
## 2 AL 0.060756921 693498 1233624 R
## 3 AR 0.179383731 304770 637591 R
## 4 AZ 0.028676903 946994 1131663 R
## 5 CA 0.024156373 7392703 4530012 D
## 6 CO 0.049026884 1080454 1143796 R
## 7 CT 0.192477929 951281 500290 D
## 8 DE 0.190316421 249933 129757 D
## 9 FL 0.107584926 3392402 3826522 R
## 10 GA 0.066384473 1448869 2104098 R
## 11 HI 0.150980146 285008 137531 D
## 12 IA 0.027402051 772387 726505 D
## 13 ID 0.187485241 208297 406814 R
## 14 IL 0.058174825 2743702 2207818 D
## 15 IN 0.154263188 1142554 1351760 R
## 16 KS 0.015695743 195505 740981 R
## 17 KY 0.118711889 684744 1027582 R
## 18 LA 0.128034606 359190 1143027 R
## 19 MA 0.043104563 2080594 697637 D
## 20 MD 0.072318627 1626872 858406 D
## 21 ME 0.266740751 427819 265982 D
## 22 MI 0.165091749 2327985 2086804 D
## 23 MN 0.014613678 1560984 1210409 D
## 24 MO 0.120481707 1119554 1463586 R
## 25 MS 0.006945186 411398 703635 R
## 26 MT 0.394676283 204939 255468 R
## 27 NC 0.214165756 2218357 2137167 D
## 28 ND 0.368287790 131869 173433 R
## 29 NE 0.215171226 276239 496276 R
## 30 NH 0.204255606 340925 311636 D
## 31 NJ 0.171983824 1794407 1430386 D
## 32 NM 0.091960755 422189 323269 D
## 33 NV 0.064875501 453310 457239 R
## 34 NY 0.077700175 4128031 2245236 D
## 35 OH 0.230828775 2412451 2620251 R
## 36 OK 0.163223687 410324 856872 R
## 37 OR 0.151947354 949660 687839 D
## 38 PA 0.239204835 2793538 2710070 D
## 39 RI 0.334446368 232679 161926 D
## 40 SC 0.212567357 742805 1026129 R
## 41 SD 0.351009465 153789 207640 R
## 42 TN 0.023969590 796513 1369562 R
## 43 TX 0.050707131 2949900 4429270 R
## 44 UT 0.053307799 324309 647873 R
## 45 VA 0.213223247 1806025 1876761 R
## 46 VT 0.013032386 208600 67543 D
## 47 WA 0.032501116 1636726 1369540 D
## 48 WI 0.146504274 1445015 1401995 D
## 49 WV 0.010777199 257101 384253 R
## 50 WY 0.048610103 57573 166452 R
The summary stats for the two efficiency gaps show that the two parties have roughly the same distributions. The mean for the Democratic efficiency gap is a just short of the Republican, but the median efficiency gap for Dems is higher. The Republican efficiency gap, however, is skewed higher given three states where the efficiency gap exceeds 0.35. (Intepretation: roughly one-third of the seats in those states are misapportioned towards Republicans).
Sampling - first create simulated results
sim <- data.frame(a=1:99,b=99:1)
sim$wasted.b <- ifelse(sim$b>=51,sim$b-51,sim$b)
sim$wasted.a <- ifelse(sim$a>=51,sim$a-51,sim$a)
sim$margin <- abs(sim$a-sim$b)
sim$vote.ratio <- ifelse(sim$a<sim$b,sim$a/sim$b,sim$b/sim$a)
sim$efficiency.gap <- abs(sim$wasted.a-sim$wasted.b)/100
# split at 0.33 and 0.93
packed <- sim[sim$vote.ratio<0.33,]
cracked <- sim[sim$vote.ratio>0.33 & sim$vote.ratio<0.93,]
competitive <- sim[sim$vote.ratio>0.93,]
f1 <- lm(efficiency.gap~vote.ratio,packed)
f2 <- lm(efficiency.gap~vote.ratio,cracked)
f3 <- lm(efficiency.gap~vote.ratio,competitive)
The linear regression fits we get for this model gives us the piecewise function:
\[Vote\:Ratio = VR\] \[ Efficiency\:Gap = \begin{cases} 0.467 - 1.51 \times VR, & VR < 0.33 \\ -0.224 + 0.782 \times VR, & 0.33 \leq VR \leq 0.93 \\ 0.510 - 0.510 \times VR, & 0.93 < VR \leq 1 \end{cases} \]
# efficiency gap by district
foo <- dcast(districtParty,state + district + totalVote ~ parties,value.var="wastedVote",fill=0)
foo$efficiency.gap <- abs(foo$Democratic-foo$Republican)/foo$totalVote
foo <- foo[,c("state","district","totalVote","Republican","Democratic","efficiency.gap")]
colnames(foo)[4:5] <- c("w.votes.R","w.votes.D")
bar <- dcast(districtParty,state + district ~ parties,value.var="votes",fill=0)
bar <- bar[,c("state","district","Republican","Democratic")]
colnames(bar)[3:4] <- c("votes.R","votes.D")
bar$vote.ratio <- ifelse(bar$votes.D<bar$votes.R,bar$votes.D/bar$votes.R,bar$votes.R/bar$votes.D)
foo <- join(foo,bar,by=c("state","district"),type="left")
foo$majority <- ifelse(foo$votes.R>foo$votes.D,"R","D")
foo$margin <- (abs(foo$votes.R-foo$votes.D)/foo$totalVote)*100
e2 <- aggregate(efficiency.gap~state,foo,mean) # agg. mean efficiency gap by state based on all districts
colnames(e2)[2] <- "mean.efficiency.gap"
twoPartyModel <- join(twoPartyModel,e2,by="state",type="left")
ggplot(sim) + geom_point(aes(x=margin,y=efficiency.gap))
ggplot(sim) + geom_point(aes(x=vote.ratio,y=efficiency.gap)) # interesting piece-wise function
ggplot() +
geom_point(data=sim, aes(x=vote.ratio,y=efficiency.gap)) +
geom_point(data=foo, aes(x=vote.ratio,y=efficiency.gap,color=majority)) +
scale_colour_manual(values=c("blue","red")) # +
# geom_point(data=twoPartyModel, aes(x=voteratio,y=mean.efficiency.gap,color=majority))
sampleState <- function(stateAbbrev){
record <- subset(twoPartyModel,state==stateAbbrev)
margin <- record["marg.victory"]
s <- as.numeric(record["seats"])
# create vector with "D","R",and "I" each repeated for number of votes in state
u <- c(rep("D",record["d.votes"]),rep("R",record["r.votes"]),
rep("I",record["total.votes"]-record["d.votes"]-record["r.votes"]))
# randomize
u <- sample(u)
# split
chunk <- split(u, ceiling(seq_along(u)/(length(u)/12)))
}
x = data.frame(num = 1:26, let = letters, LET = LETTERS) set.seed(10) derp <- split(x, sample(rep(1:2, 13)))
Here we’ll delve briefly into the results for third parties:
thirdParties <- subset(districtParty,districtParty$parties!="Democratic" & districtParty$parties!="Republican")
thirdParties$pctVote <- round(thirdParties$pctVote*100,1)
summary(thirdParties$pctVote)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.800 2.000 3.095 3.375 46.000
There are a remarkable number of independent candidates picked up more than 10% of the vote. California and Florida each have four district where third parties picked up a significant votes. The Green Party also picked up 16% of the vote in the Arizona 3rd and Liberatarian parties fared vary well in districts in AZ, KS, LA, and TX.
kable(subset(thirdParties,thirdParties$pctVote>10))
| state | district | totalVote | simpleMajority | votes | parties | pctVote | wastedVote | win | |
|---|---|---|---|---|---|---|---|---|---|
| 26 | AR | 03 | 245660 | 122831 | 39318 | Green | 16.0 | 39318 | 0 |
| 54 | AZ | 07 | 127827 | 63914 | 23338 | Libertarian | 18.3 | 23338 | 0 |
| 85 | CA | 13 | 288582 | 144292 | 38146 | No Party Preference | 13.2 | 38146 | 0 |
| 103 | CA | 23 | 216003 | 108002 | 57842 | No Party Preference | 26.8 | 57842 | 0 |
| 116 | CA | 29 | 150281 | 75142 | 38994 | No Party Preference | 25.9 | 38994 | 0 |
| 122 | CA | 33 | 318520 | 159261 | 146660 | No Party Preference | 46.0 | 146660 | 0 |
| 181 | CO | 05 | 307237 | 153620 | 53318 | Unaffiliated | 17.4 | 53318 | 0 |
| 221 | FL | 04 | 315470 | 157736 | 75236 | No Party Affiliation | 23.8 | 75236 | 0 |
| 262 | FL | 20 | 244285 | 122144 | 29553 | No Party Affiliation | 12.1 | 29553 | 0 |
| 265 | FL | 21 | 284400 | 142201 | 63137 | No Party Affiliation | 22.2 | 63137 | 0 |
| 271 | FL | 25 | 200229 | 100116 | 48763 | No Party Affiliation | 24.4 | 48763 | 0 |
| 337 | IL | 02 | 297712 | 148857 | 40006 | Independent | 13.4 | 40006 | 0 |
| 411 | KS | 03 | 293762 | 146882 | 92675 | Libertarian | 31.5 | 92675 | 0 |
| 442 | LA | 04 | 249531 | 124766 | 61637 | Libertarian | 24.7 | 61637 | 0 |
| 445 | LA | 05 | 260216 | 130109 | 37486 | No Party Affiliation | 14.4 | 37486 | 0 |
| 447 | LA | 06 | 306713 | 153358 | 32185 | Libertarian | 10.5 | 32185 | 0 |
| 448 | LA | 06 | 306713 | 153358 | 30975 | No Party Affiliation | 10.1 | 30975 | 0 |
| 469 | MA | 07 | 252836 | 126419 | 41199 | Independent | 16.3 | 41199 | 0 |
| 630 | MS | 03 | 293322 | 146662 | 58605 | Reform | 20.0 | 58605 | 0 |
| 993 | TN | 06 | 241241 | 120622 | 34766 | Independent | 14.4 | 34766 | 0 |
| 1059 | TX | 17 | 179262 | 89632 | 35978 | Libertarian | 20.1 | 35978 | 0 |
| 1064 | TX | 19 | 192063 | 96032 | 28824 | Libertarian | 15.0 | 28824 | 0 |