A lazy, realistic simulation of the Mittens vs Bronco Bama election.

I previously made a Mittens-friendly simulation of the election out of curiosity (http://rpubs.com/gzt/mittenssimulation), but here's a rather simple “unbiased” simulation based on the same data. I do the following:

Take the most recent batch of polls from http://www.electoral-vote.com .
Look at the thirty days before the last poll for each state and take the average.
Add some noise - the site doesn't report margin of error, so I'm just going to pick something reasonable.
Run 100,000 simulations.

And I'll show you my source code because why not? This is a lazy simulation, the code is ugly and hacked togethr and not done well at all, but it does what I say it does.

library("plyr")
library("ggplot2")
polls <- read.csv("http://www.electoral-vote.com/evp2012/Pres/pres_polls.csv")
head(polls)

##     Day Len   State EV Dem GOP Ind   Date  X X.1 X.2 X.3 X.4 X.5 X.6
## 1 226.5   4 Alabama  9  36  54  NA Aug 16 NA  NA  NA  NA  NA  NA  NA
## 2 177.0   3 Alabama  9  36  51  NA Jun 27 NA  NA  NA  NA  NA  NA  NA
## 3   1.0   1 Alabama  9  39  60  NA Jan 01 NA  NA  NA  NA  NA  NA  NA
## 4   1.0   1  Alaska  3  38  62  NA Jan 01 NA  NA  NA  NA  NA  NA  NA
## 5 306.5   2 Arizona 11  46  53  NA Nov 03 NA  NA  NA  NA  NA  NA  NA
## 6 294.0   1 Arizona 11  44  52  NA Oct 21 NA  NA  NA  NA  NA  NA  NA
##           Pollster
## 1 Capital Survey-4
## 2 Capital Survey-3
## 3  Election 2008-1
## 4  Election 2008-1
## 5            PPP-2
## 6      Rasmussen-1

This is what the polling data looks like and how you get it. And there's stuff all the way down. Note that there are multiple polls for some states and some less interesting states have no polling data in the set, but, rather, have only 2008 voting data (these are polls with \( Day = 1.0 \) or \( Date = Jan 01 \)). Those are also the boring sorts of states that aren't going to change in the election.

The first order of business is to compute the difference between Mittens and Bronco and add a “margin of error” variable which will just be either 5 or half the range of the difference. Also, I removed a pollster who gave very bad results.

polls1 <- ddply(polls, .(State, Day, Date, Pollster, EV), summarise, diff = (GOP - 
    Dem))
polls1 <- subset(ddply(polls1, .(State), transform, newdate = (max(Day) - Day)), 
    ((newdate < 30) & (Pollster != "Angus-Reid-3")))
polls1 <- ddply(polls1, .(State), transform, MoE = max(0.5 * ((max(diff) - min(diff))), 
    5))

I'm strapped for time at the moment, so I'm being lazy and looking at an odd definition of “most recent” polls - I look at the newest poll and go back 30 days. I'll go back later and change it to take the polls in the last month unless there hasn't been one, in which case it will take the most recent poll. It's just a little too much mucking around with \( ddply \) for me right now.

avgresult <- ddply(polls1, .(State, EV, MoE), summarise, maxdiff = median(diff))

simulationresult <- function(statematrix) {
    n <- dim(statematrix)[1]
    randvector <- rnorm(n, 0, 1)
    votevector <- statematrix[, 4] + randvector * statematrix[, 3]
    simulationresult <- sum((votevector > 0) * statematrix[, 2])
}
simlength <- 1e+05
y <- rep(0, simlength)
for (i in 1:simlength) {
    y[i] = simulationresult(avgresult)
}
percenty <- sum(y >= 270)/length(y)
z <- (y < 270)

So it looks like Mittens wins 16.4 percent of the time.

qplot(y, geom = "histogram", binwidth = 1, fill = z)

plot of chunk unnamed-chunk-4

If we take the average of recent polls, Bronco wins more often than not (as of 11/4, 83.6 percent of the time). Of course, some of these polls are of low quality, some are “known” to be skewed, and perhaps the most recent polls should be weighted more heavily than polls even a couple weeks old, but I haven't done anything with that. You can see everything I did right here. You can fiddle around with the parameters. Leaving that one pollster in means more Obama victories and less noise means more Obama victories. I picked 5 as my default “margin of error” out of a hat, it coincidentally produces results rather close to Silver's. A different way of producing a margin of error is certainly possible, but I think this amount of noise is more generous than reality.