A generous election simulation

Somebody on some site the other day was talking about Nate Silver's forecasts, forecasts in general, skepticism about odds, and what-have-you. So I decided to make a quick-and-dirty simulation in an afternoon with the following assumptions:

The most Mitt-friendly state polls of the last month or so are true.
In fact, even those are systematically a couple points off - 2, to be precise.
There's a wide uniform “margin of error” - here specified to be 5. Maybe I should've just done a wide normal, but I wanted more mass at the tails because that might be more Mitt-friendly, as he needs the extreme results.

Of course, this is so unrealistic, it's almost about two different people, Mittens and Bronco Bama, but let's roll with it. I found, on http://www.electoral-vote.com, some polling data. Here's what it looks like:

library("plyr")
library("ggplot2")
polls <- read.csv("http://www.electoral-vote.com/evp2012/Pres/pres_polls.csv")
head(polls)

##     Day Len   State EV Dem GOP Ind   Date  X X.1 X.2 X.3 X.4 X.5 X.6
## 1 226.5   4 Alabama  9  36  54  NA Aug 16 NA  NA  NA  NA  NA  NA  NA
## 2 177.0   3 Alabama  9  36  51  NA Jun 27 NA  NA  NA  NA  NA  NA  NA
## 3   1.0   1 Alabama  9  39  60  NA Jan 01 NA  NA  NA  NA  NA  NA  NA
## 4   1.0   1  Alaska  3  38  62  NA Jan 01 NA  NA  NA  NA  NA  NA  NA
## 5 306.5   2 Arizona 11  46  53  NA Nov 03 NA  NA  NA  NA  NA  NA  NA
## 6 294.0   1 Arizona 11  44  52  NA Oct 21 NA  NA  NA  NA  NA  NA  NA
##           Pollster
## 1 Capital Survey-4
## 2 Capital Survey-3
## 3  Election 2008-1
## 4  Election 2008-1
## 5            PPP-2
## 6      Rasmussen-1

And there's stuff all the way down. Note that there are multiple polls for some states and some less interesting states have no polling data in the set, but, rather, have only 2008 voting data (these are polls with \( Day = 1.0 \) or \( Date = Jan 01 \)). Those are also the boring sorts of states that aren't going to change in the election.

The first order of business is to compute the difference between Mittens and Bronco (adding 2 to “unskew”“) and add a "margin of error” variable which will just be a constant (3).

polls1 <- ddply(polls, .(State, Day, Date, Pollster, EV), summarise, diff = (GOP + 
    2 - Dem), MoE = 5)
polls1 <- subset(ddply(polls1, .(State), transform, newdate = (max(Day) - Day)), 
    ((newdate < 30) & (Pollster != "Angus-Reid-3")))

I'm strapped for time at the moment, so I'm being lazy and looking at an odd definition of “most recent” polls - I look at the newest poll and go back 30 days. I'll go back later and change it to take the best Mittens poll in the last month unless there hasn't been one, in which case it will take the most recent Mittens poll. It's just a little too much mucking around with \( ddply \) for me right now. I'll also make an “average” poll data frame.

bestresult <- ddply(polls1, .(State, EV, MoE), summarise, maxdiff = max(diff))
avgresult <- ddply(polls1, .(State, EV, MoE), summarise, maxdiff = mean(diff))

This is far too generous, but we'll work with it for the moment. I also calculate an average result which is an average of the last month's polls with our bias. Now we'll just do 10,000 simulations of each. I'll write a function that outputs how many EVs Mittens gets.

simulationresult <- function(statematrix) {
    n <- dim(statematrix)[1]
    randvector <- runif(n, -1, 1)
    votevector <- statematrix[, 4] + randvector * statematrix[, 3]
    simulationresult <- sum((votevector > 0) * statematrix[, 2])
}
simlength <- 1e+05
z <- rep(0, simlength)
y <- rep(0, simlength)
for (i in 1:simlength) {
    z[i] = simulationresult(bestresult)
    y[i] = simulationresult(avgresult)
}
percent <- sum(z >= 270)/length(z)
colorz <- (z < 270)
percenty <- sum(y >= 270)/length(y)
colory <- (y < 270)

So it looks like Mittens wins 99.839 percent of the time in the generous simulation and 48.488 percent of the time in the average simulation.

qplot(z, geom = "histogram", binwidth = 1, fill = colorz)

plot of chunk unnamed-chunk-5

qplot(y, geom = "histogram", binwidth = 1, fill = colory)

plot of chunk unnamed-chunk-5

The first plot is the histogram of Mittens' results in the generous, best poll result + 2 scenario. The second plot is the histogram of the Mittens' results in the average of the last month's polls + 2 scenario. If we cherry-pick the best poll and add 2 points, Mittens will win. If we take the average of recent polls and add 2 points, Bronco wins more often than not (as of 11/4, 51.512 percent of the time).