Polling data is usually wrong and even more wrong than they report. A 2022 study out of Berkely found that even among polls taken within one week of an election, only 60% had the actual election results fall within the polls’ reported margin of error.
Below, I analyze PPIC polls for Presidential and Gubernatorial races in California from 2012-2022 and find that on average the margin between the Democrat and Republican candidates reported in these polls is ~5.6 percentage points off from the actual election margin (not a knack on PPIC, as noted above this is a problem for the whole polling industry, in fact, PPIC has been very highly rated by FiveThirtyEight).
Most campaigns make decisions, not based on the polling toplines, but based on the subgroup (i.e. party, age group, ethnicity) level where even the polls’ own margins of error get wider and where this analysis finds errors exceeding even 30 percentage points.
Understanding the error of a pollster’s previous polls can help reduce the error of current polls. For example, using the average of PPIC’s error among NPP voters from 2012-2016 I can adjust the numbers of their 2018 polls and reduce the error among NPP voters from over 30 percentage points to ~12 percentage points. This kind of adjustment can help improve a campaign’s understanding of the election and their strategic decisionmaking.
From 2012 - 2022, over $3 million dollars was spent on polling and survey research per election cycle by candidate campaigns. For Democratic campaigns, the average effect of spending on polling was slightly positive in a regression analysis, for Republican campaigns it was slightly negative, but neither side produced results significantly different from no effect at a p <= 0.05 threshold. In short, campaigns are spending a lot of money on polling but it’s not producing reliably positive results for either side.
Make sure your data team is providing you with analysis that helps you make the best decisions from your most important data. If you don’t have a data team and have historical polling data and would like to have it analyzed to improve your decision making on future polls, call me at (916) 594-0961 or email me at datainstate@gmail.com. Keep an eye out for this same analysis on ballot measures in the next couple weeks.
I’m going to tell you something you probably already know:
polls are wrong. You probably also know what I will tell you next: polls
are more wrong than even they say. Polls being wrong isn’t very
controversial, nor is it the main problem I want to address here. Polls
are wrong, and polls will always be wrong. As someone who’s studied
statistics extensively, I see dealing with imperfect data as just part
of working with data. “Analytics is the art of being less wrong.” Most
of statistics and data analysis is working with estimates (just like a
public opinion poll provides an estimate of public opinion) and
measuring our confidence in that estimate (i.e., a margin of error of ±3
points), never really assuming that you have the right answer. You have
a range of possibilities that probably includes the right answer. The
main problem with polls, in my experience, is that most people make
strategic decisions based on polling numbers as if the polls aren’t
estimates but rather precise measurements.
I have been in countless meetings and on countless calls where the pollster shares the margin of error at the beginning of the presentation, and then it’s never mentioned again. Then, when the campaign and consultants make decisions on messaging, where to buy air-time, how much to spend, which demographics to target, etc., it’s based on the point estimates, not just of the overall polling sample, but usually of specific demographic groups and regions, where the margin of error would be even wider than what was reported. Rarely is the margin of error taken into account or the polling acknowledged as one dataset that could be complemented by additional data. Thousands, hundreds of thousands, and sometimes even millions of dollars are spent based on information we all know is wrong, but nobody treats it as wrong.
Probably the best study to date on this matter comes out of
Berkeley. In their paper, “Election Polls Are 95%
Confident But Only 60% Accurate,” Kotak and Moore analyze 1,931
polls across 14 election cycles. Their findings? Of polls taken a year
out from the election, only 40% have the election result fall within
their 95% confidence interval, and among polls taken within a week of
the election, the number only improves to 60%. More importantly, they
found that even when the margin of error and the inaccuracy of previous
margins of error are reported, people still tend to overestimate the
accuracy of polls reported, “even when informed about polls’ low
historical accuracy, people continue to have excessive faith in the
current poll’s predictive accuracy.”
Kotak and Moore estimate that margins of error would have to be twice as wide for polls one week out from the election to be 95% accurate and three times as wide for polls a year out from the election. This aligns with what many others have reported as well. For example, FiveThirtyEight has reported, “Taken all together, the polls in our pollster-ratings database have a weighted-average error of 6.0 points since 1998,” with 6 points being twice as much as the commonly reported ±3 points margin of error. Pew Research also reports, “The real margin of error is often about double the one reported…closer to 6 percentage points, not the 3 points implied by a typical margin of error.” and these are for the toplines. Confidence intervals of smaller populations within the polling sample would be even larger.
Evidence is abundant that polls are inaccurate. So why do we still treat them as if they’re not? Here again, Kotak and Moore provide incredible insight. The margin of error reported by most polls only takes into account sampling error that arises from random variation. However, several other types of error can arise from polling. Those who have sat through any statistics course that touches on survey research have probably heard about the Bradley effect. If you’re familiar with political polls you probably have seen a significantly higher rate of people polled saying they plan to vote than actually do vote. A political survey measures what someone is willing to share with a poll taker at that moment, not necessarily how they would act if a ballot were placed in their hand. Kotak and Moore point out that there are at least five other known sources of error in public opinion polling, rather than just the sampling error due to random variation, but statistical methods haven’t been developed that reliably measure these other forms of error. They say, “The overwhelming response to the difficulty quantifying these aspects of total survey error is to ignore them.” In my experience, this has been the overwhelming approach taken, not just by pollsters, but also by campaign decisionmakers when using political polls to determine strategy.
The Public Policy Institute of California makes its survey data publicly available. To provide examples of measuring polling error, I will compare their survey data to the actual results of the elections. I will look at their polling data from their ‘Californians and Their Government’ polls (not including other polls like Californians and the Environment or Californians and Education), taken during an election year (excluding polls from off years like 2021 and 2019). I will compare polling results to actual outcomes of the general election for presidential and gubernatorial races from 2012 through 2022. Only questions asking how someone will vote will be included, as opposed to if they favor a candidate or approve of an elected official who is also a candidate. I use the survey weighting methodology provided in the survey data for the weighting of the respondents. Results for likely voters and results for all surveyed will be reported separately. In a future post, I will measure the error between the polls and actual results of ballot measures.
It’s important to note that PPIC’s polling methodology is very reliable, and they are honest and upfront about the error in their polling. FiveThirtyEight gives them a 2.5-star rating, placing them in the top 15 percent of pollsters rated by FiveThirtyEight. An excerpt from the methodology for the October 2022 PPIC survey is included below. The point of this analysis is not to suggest that PPIC’s polls are poor, but rather that they face the same struggle in measuring public opinion that all pollsters do, and that they have more success overcoming those challenges than the vast majority of other pollsters.
The sampling error, taking design effects from weighting into consideration, is ±3.9 percent at the 95-percent confidence level for the total unweighted sample of 1,715 adults. This means that 95 times out of 100, the results will be within 3.9 percentage points of what they would be if all adults in California were interviewed. The sampling error for unweighted subgroups is larger: for the 1,439 registered voters, the sampling error is ±4.5 percent; for the 1,111 likely voters, it is ±5.1 percent. For the sampling errors of additional subgroups, please see the table at the end of this section. Sampling error is only one type of error to which surveys are subject. Results may also be affected by factors such as question wording, question order, and survey timing
The first measure we will use for how accurate the polls are is the absolute value of the difference in margins. Essentially, what is the margin a poll had a candidate winning/losing by, and what is the margin they actually won/lost by, and how many percentage points were those two values different?
As an example, let’s first look at the October poll for the 2022
gubernatorial race. In this poll, the PPIC survey had Newsom leading
Dahle by just under 20 points among likely voters. Among all survey
respondents, he led by over 24 points. The margin of the certified
results was ~18.5 points. This means the error of the poll among all
survey respondents was over 6 points and among likely voters it was ~1.5
points.
That’s an incredibly accurate poll and is right in line with
FiveThirtyEight’s
reporting that 2022 was the most accurate for pollsters since 1998.
Unfortunately polls aren’t always that accurate.
Across all the polls in our analysis set, the average error
was 7.8 percentage points off from the actual election result among
likely voters and about 5.6 points off among all survey respondents.
This is approximately in line with FiveThirtyEight’s analysis of polls
nationwide where they found the weighted average since 1998 has been
about 6 points off from actual election results. We’re seeing a lot of
consistency here we the findings of others’ research.
Let’s take a look at a more optimistic measure of the polling error to make us feel better, one where we let some of the error in one direction cancel out the error in the other direction. Let’s look at the bias.
For this example, we will use the September polls from 2018 and 2022 and look only at the likely voter results. Here we can see that for the 2018 election, the margin of victory for Newsom was 23.9 percentage points while the PPIC survey had him leading Cox by ~11.8 percentage points. The poll had Cox performing better than he actually did and so the poll was biased towards the Republican candidate and biased against Democrats. In the 2022 election, though, the September PPIC poll had Newsom leading Dahle by ~27.5 percentage points, and he only ended up winning by ~18.4 percentage points. In this case, the poll was biased towards the Democratic candidate by ~9 percentage points.
In the previous calculation of the error, I would take the positive value of both these errors (9.22 and -12.13 * -1 = 12.13) and then calculate the average to determine how far off the survey was. In that example, the average would be ~10.6 percentage points. But this time, I’m calculating in a way that takes into account the bias, and so some of that error cancels each other out. This time we leave the negative value as negative when we take the average, resulting in an average of ~-1.45. The negative value indicates that on average the PPIC polls used in this example were biased towards the Republican candidates.
In my calculations, a negative value of the difference between the election margin and the survey margin indicates bias towards Republicans, and positive values indicate bias toward Democrats. Changing up the order of the arithmetic would flip this around, but either way, one bias towards one side would have to result in negative numbers and bias towards the other side would have to result in positive numbers. I trust the readers to understand this is not meant to indicate that bias towards one party or the other is better or worse.
Now, let’s look at the average for our whole analysis set. We can see that, on average, the polling has been biased ~5 points towards Republicans among the reported likely voter numbers and ~1 point towards Republicans among the full set of survey respondents. It looks like the fact that less likely voters tend to lean towards Democrats helps make the polls based on the full set of respondents a little more accurate by overcoming the polls’ bias towards Republicans.
See, I told you this would look better. Especially in the numbers of all survey respondents, this measure of error looks a lot better! It may be overly optimistic for campaigns making decisions on polling data.
For most campaigns, the toplines are not the numbers they make decisions off of. You probably usually find yourself digging a little deeper into the numbers to figure out who you should be targeting and trying to persuade and/or turn out to vote.
For this example, we’ll take a look at the groups doing the most significant signalling as to how they will vote in partisan races: party. There’s generally a narrower floor and ceiling to how these groups will vote in partisan elections, and it’s easy to see why. A Republican is pretty likely in any given election year to vote for Republican candidates and less likely to vote for Democratic candidates. In the same way, Democrats will be pretty reliable voters for Democratic candidates. Even California voters without a party preference will tend to have a party they favor and generally vote for election after election, although this is more difficult to measure. Still, as we’re about to see, there is some significant error, even among this reliable identifier of party support in the polls.
The first way we will calculate this is by weighting the support the survey had for a candidate among a party by that party’s share of the vote on election day. For example, in the table below you can see the reported support among likely voters for PPIC’s October 2022 poll. The party column indicates the party of survey respondents. The survey_dem_yes column indicates the support among that party for the Democratic candidate. In 2022, about 49% of voters who turned out for the general election were Democrats, and about 29% of voters who turned out for the general election were Republicans. Not all of these voters participated in the Gubernatorial election, but the vast majority (over 98%) did. If the polling measures by party were correct, then the weighted average of the support in the survey by share of turnout should get us close to the actual result. When we do this for our example poll, we see that Newsom should have received about 55% of the vote in the 2022 election based on the polling after re-weighting for partisan turnout. We can calculate this for Dahle, the Republican candidate, as well and then get the margin based on the polls’ partisan support for the candidates and compare it to the actual election margin, similar to how we did above.
| survey | survey_date | office | election | party | survey_dem_yes | turnout_share | weighted_support_dem | |
|---|---|---|---|---|---|---|---|---|
| likely | 2022-10-26 | gov | g22 | dem | 0.90864502 | 0.48952726 | 0.444806509 | |
| likely | 2022-10-26 | gov | g22 | rep | 0.07361947 | 0.28774491 | 0.021183628 | |
| likely | 2022-10-26 | gov | g22 | oth | 0.16753853 | 0.05235065 | 0.008770751 | |
| likely | 2022-10-26 | gov | g22 | npp | 0.47248371 | 0.17037717 | 0.080500440 | |
| Total | — | — | — | — | — | — | — | 0.5552613 |
The average error based on the margins calculated this way gets us slightly better results than when calculated on the toplines, but not much. For the numbers reported among likely voters, our error still averages out to ~5 percentage points from 2012 to 2022 and ~4.7 percentage points among all survey respondents. It’s important to also understand, like in our measure of bias above, this measure is likely a little optimistic for measuring the accuracy of these groups because overestimating support for one candidate in one group can be canceled out by underestimating that candidate’s support in another group.
Hopefully, everything we have done so far is something you have seen before. Even if you don’t have a data team, none of the calculations above require specialized statistical or analytical knowledge. They were all calculated using simple arithmetic. If you have old polls of your own for previous elections and campaigns you have worked on, you can do everything that’s been done so far on your own, with or without a data team that provides your team with specialized analytical capabilities. If you would still like to have someone else do it for you, please contact me (916-594-0961 or datainstate@gmail.com) and I would be happy to work with you to meet your analytical needs.
The next step, however, does use statistical techniques to get estimates of support by party based on election data. While we can see right away how everyone as a whole voted on election day, to get reliable estimates of how subgroups (such as political parties, age groups, ethnicity, and more) voted in the actual election, a little more math is required. That’s what I do here.
If I take the estimates for support by party derived from election data and weight it by the share of the electorate, as I did above for the polling estimates of support by party, the average error of these estimates is incredibly low for the Presidential and Gubernatorial campaigns from 2012 to 2022. In fact, it is about one third of one percentage point, at ~0.350 percentage points. Because these estimates were based on election data, to compare that to polling data from prior to the election may seem unfair, but when strategists want to know the result of a past election they worked on, they don’t look at their polling; they look at the election results. In that same vein, the point here is to question why someone would build decisions about previous demographic group support based on polling data when you can use reliable estimates based on election data instead. It’s also important to note that in the same way I measured the error and bias of the polls using their point estimates and not the range of values determined by their margins of error, the estimates from election data also have confidence intervals that were not taken into account in this measure of error.
Now we’re getting to the good stuff. In my experience, this is where decisions on polling are made. The PowerPoint goes up, we go through a quick overview of methodology, a slide for the toplines, and then the slides of subgroup performance are where people sit up and start talking about who to mail, where to go up with ads, etc. The slide on how subgroups move based on questions that include some form of “would this make you more likely or less likely to vote for…” or “Having heard that, who would you vote for…” is also a big one, determining how to message to these groups. The inaccuracy for these subgroups is what costs campaigns the most. The inaccuracy is higher than for the whole survey because the sample size is smaller. A snapshot of the table showing the margin of errors for different groups from the October 2022 PPIC survey is included below. You can see that the margin of error for Democrats is significantly higher than for all adults in the survey, and for Republicans and No Party Preference voters it’s over double.
Here, now, we take a look at how far off the point estimate from the survey of each party’s support for the candidates is from the estimate we get from the election results.
When taking this measure for support for Democratic candidates, we can see that on average, the PPIC poll has underestimated Democrat voter support for Democratic candidates by a little over 12 percentage points and overestimated Republican support for Democratic candidates by over 11 percentage points, among all survey respondents. Those errors fall among likely voters. Noticeably, the error among NPP voters is significant, with the survey underestimating support for the Democrat candidate by about 33 percentage points among all survey respondents and likely voters. This error among NPP voters is not particularly surprising since they are also most likely to have responded that they are unsure of who they plan to vote for which allows for much greater swings once the election actually occurs.
For Republican candidates, interestingly, we see a significant underestimation of their support among Republican voters. This is surprising! Having that large of an error among one of the major parties and their support for a major party candidate seems odd. Looking into it though, among the PPIC polls in our analysis set, the Republican candidate averages only about 76% support among Republican survey respondents! I feel pretty confident that way more Republicans vote for the Republican Governor and Presidential candidates in California. For example, in the September 2014 survey, Neel Kashkari only got support from 64% of Republicans who were likely voter respondents to the survey. It seems unlikely to me that in an election where Neel Kashkari got about 40% of the vote and Republicans only made up about 34% of the electorate, he only received 64% of support among voters in his own party.
Alright, so we’ve just spent a lot of time measuring and demonstrating how wrong the polls are. Now let’s see if measuring it helps. Does knowing how wrong your previous polls have been help make better decisions with polls from the current election? Most importantly, I’m here to help campaigns use their data to make better decisions. Campaigns usually make decisions on subgroups, and we’ve already demonstrated that polling estimates for these subgroups can be off by over 30 points, even when averaged over the 2012 through 2022 elections (particularly among NPP voters who are often the most targeted partisan group in competitive campaigns). Because of this, I want to see if my analysis is particularly helpful in making that data (polling subgroup estimates) more accurate and actionable. How can we do this? Well, I have estimates of how wrong historical polls have been in the past among these subgroups, and I can use that. The reason we’re wrong in the past isn’t always why we’re wrong in the future, so I want to hedge against that. The easiest route forward to me seems to adjust the current polls by half the average error in previous polls (for clients, a more in-depth modeling and adjusting of these hyperparameters would be used to determine the best approach based on the client’s historical polling, but for now, let’s take it easy).
Let’s start with the 2018 election. Below is the average error for the 2018 polls in our analysis set. We can see that in 2018, PPIC’s polls underestimated Democrat voters’ support for Newsom by ~16 points in the survey of all adults and ~14 points in the survey of likely voters. The polls underestimated NPP support for Newsom by ~31/30 points in both surveys and overestimated support of Republican voters by ~9 points in both surveys and ~16 points for third-party voters. For a campaign, having this much error among the NPP voters in particular is likely going to lead the campaign strategists to make suboptimal decisions. Can this be fixed?
To see if I can improve this, I take the average error for each party’s support for the Democrat candidate in PPIC polls from 2012-2016. I then take half that error to hedge and make sure we don’t overcorrect, and adjust the 2018 polls accordingly. Below you can see the improvement, and it is significant! Republican and Democrat support both improve a fair amount. We do see the adjustment increase our error among third-party voters, which is regrettable but not too costly as third-party voters only made up about 4 and a half percent of the electorate in 2018. The increased error among that small group would not be significantly costly, and its small size is likely part of what makes it more difficult to estimate.
Among NPP voters, we see a huge improvement! Looking at your campaign’s budget for polling, would you rather look back after the end of the campaign and see that your decisions on messaging and targeting NPP voters were based on numbers that were about 30 points off of the numbers from election day, or would you rather see that those decisions were based on numbers that were about 12 points off?
Re-weighting these support levels, as we did above, based on actual turnout of these groups, we can see that the adjusted values bring us closer to the actual share of the vote received by the Democratic candidate (indicated by the horizontal black line) than just weighting the survey values alone by actual turnout. It looks like we can be pretty confident that in 2018, adjusting PPIC survey values by half the average error of the polls for the previous three elections improves the accuracy of the polls and, likely, decisions that would have been made on the polls.
What about for 2022? This analysis working once is encouraging but not particularly convincing (at least for me) of its value. I’d like to make sure it’s not a one-off success. I’m also curious to see how this strategy holds when there is a gap. The dataset does not include surveys from 2020, so the next example will adjust 2022 polling values based on the average error from 2012-2018. Does this kind of understanding of polling error still help improve accuracy, even with a gap of a whole election survey? Finally, as mentioned above, 2022 was the best cycle for polling in over a generation. Polls were the most accurate they have been since 1998, and as we saw above, the PPIC October 2022 survey was only off by 1.5 percentage points. Can we improve on the most accurate polling performance thus far in this century?
Let’s start by looking at the average error in 2022 of support by party compared to estimates derived from actual election data. Here we can see that the average error among these groups is significantly less than it was in 2018. Particularly among the NPP voters, we see that the survey has about the same average error as our adjusted numbers did in the last election. It’s going to be pretty difficult to be able to improve this performance.
And we still manage to improve the estimates. I guess good analytics can provide value even when the polling data is already at its best.
We still see the error in third-party voters go up, but again, they typically don’t make up a large part of the election. The battleground for most competitive campaigns is largely fought over the NPP voters and this reduces the error there by half. The original survey data underestimated Democrat support among NPP voters by ~12 points, but the adjusted numbers overestimate Democratic vote among these voters by ~6 points. Would you rather look back after the election and see that your decisions on messaging and targeting for NPP voters were based on numbers that were 12 points off from numbers from election data or 6 points off?
If we take these support numbers and weight them by the share of actual turnout in the 2022 election, we can see that even in the aggregate, our error-adjusted numbers improve the estimates compared to the actual election. Particularly the numbers from the survey of likely voters get us almost exactly on the actual share of the vote Newsom received, 59.2%.
Campaigns spend a lot of money of polling, and for many campaigns, polling data will be the most important dataset they have to make decisions off of. Because of this, campaigns invest heavily into it. From 2012 - 2022 all candidate committees for state office in California spent an average of over $3 million dollars per election cycle on polling, peaking in 2022 when it exceeded $5 million. That’s over $3 million dollars each election cycle being spent on data that is usually outside of even it’s own reported margin of error. This data can be incredibly helpful in reducing the uncertainty around campaign strategic decisions but only if the campaigns know how to use it appropriately.
A regression analysis found that controlling for variables like party reg advantage, the amount of NPP voters in a district and total expenditure advantage by the campaign (how much more or less they spent than the campaign of the opposing candidate), in state legislative races Democrat campaign spending on polling is positively correlated with better campaign performance, but not statistically significant at a p <= 0.05 threshold, suggesting that Democrat campaigns might be able to improve the value they extract from polling data. For Republicans it was actually negatively correlated! Luckily for them the findings were also not statistically significant at a threshold of p <= 0.05, but this still suggests Republicans usually have a difficult time translating their polling data into effective campaign strategy. Millions of dollars are spent each election cycle on polling data and yet this spending does not produce an effect that can translate into being significantly different from no effect at all. Part of that may be because the result of the polling data are so different from the actual results on election day, even in polls taken within one week of the election. Particularly at the subgroup level, where campaigns often make strategic decisions about how to who target and messaging, the numbers campaign strategists may be so far removed from real world behavior as to be not helpful.
If you have a data team, they should be helping you make the best decisions with your most important data. Make sure they are providing you with a solid understanding of how accurately your previous polling reflected election outcomes, especially at the level you use polling data to make decisions. If you don’t have a data team, or they need a little help in providing the analysis you need, please contact me at 916-594-0961 or email at datainstate@gmail.com and I’d be happy to help. Keep an eye out for the analysis for ballot measures in the next couple weeks.