Abstract

Recently Data for progress published a report here with an interesting new way of polling for the primaries. Their intention appears to be that they wanted to provide some ranked based voting statistics, alongside a “Not Considering”" and “Considering candidate” statistic. They asked voters over 40 questions dealing with voter demographics and voter choice. Specifically, they asked voters to check all candidates they would consider voting for, to rank all those candidates, and lastly to establish which candidates they would not consider voting for. The headline statistics that seem to be revealed from this were that Biden has 49% support level, Warren has a 40% support level, and Bernie has the highest “would not consider” percentage with 28%.

This is an interesting way to look at some of the lesser known candidates. Would not consider for Tulsi(24%), Bill De Blasio(25%), and several lesser known candidates(Moulton, Bennet, Williamson, Gravel all over 20%) shows that while these candidates are at less than 5% of overall consideration, that this isn’t necessarily a condition of candidates not being well known. Then again, perhaps their lack of name recognition is what influences these numbers. I did not attempt to explore this in the writeup, thought it was just an interesting thing to note.

This writeup attempts to validate the data that was published. I found it difficult to filter the data in a way that would produce the same results at the survey summary indicates. This is strange, as clearly the survey data indicates that 465 survey observations were polled on their preferences for the democratic party. When filtering for these results(done below), statistics don’t line up with the statistics prsented in the reporthere A mistake could have been made somewhere between time to publish and when the data was analyzed(or perhaps they were using somethign to further filter the data). The survey still provides some interesting information, below we will attempt to investigate some of the results.

Demographic data for our Survey

Worries About Representativeness of Survey Data

As seen below only 11.7 % of survey respondents did not vote in 2016 :

vote_2016 Freq
Clinton 42.4
Trump 38.5
Johnson 2.8
Stein 2.2
Mcmulin 0.4
Other 1.2
did not vote 11.7

it’s not just the overall vote breakdown, but the vote by age which of off here as well. According to censusdata the voter breakdown for 2016 by age group should look like the below figure

Voter by age group

Voter by age group

But the did not vote by agegroup in our dataset looks like:

age5 presvote16post vote_count percent_vote_by_age
18-29 did not vote 30 30.303030
30-39 did not vote 38 15.573771
40-49 did not vote 24 13.186813
50-64 did not vote 21 8.823529
65+ did not vote 12 3.921569

Below Age demographics for our sample

age Freq
18-29 9.3
30-39 22.8
40-49 17.0
50-64 22.3
65+ 28.6

Now for the data

The dataset itself is a little confusing and the key provided by Data for progress doesn’t actually line up completely with the dataset. To check that my imputation methods are correct I lined up statistics that dataforprogress published with my imputations. I am unsure what filter they used for democratic party voters as when i filter for democratic party voters there are slightly over 500 observations. However, it was easy enough to match their results by dropping all indexes which wern’t polled by candidate preferences. Still, I don’t get the same results they dispalyed in their writeup. I created a pivot table in excel on the direct dataset, and I receive the same results as I will print out below. I think the fact that the dataset published by dataforprogress doesn’t line up with the graphics they display in their writeup, is pretty damaging towards the authenticity of the survey, but it’s easy to make mistakes in data crunching. Perhaps I made some mistakes below, I will proceed ignoring these issues

Support for candidates

would_support would_not_support
Joe Biden 50.95 20.63
Elizabeth Warren 40.42 16
Bernie Sanders 34.11 27.58
Pete Buttigieg 33.05 13.26
Kamala Harris 31.79 16.21
Beto O’Rourke 27.58 18.11
Cory Booker 19.37 17.26
Amy Klobuchar 14.53 18.53
Stacey Abrams 13.26 13.68
Kirsten Gillibrand 11.58 22.74
Julián Castro 11.37 13.89

lets take a look at some of the results

opposition to other candidates

  • below I build out some tables based on some of the frontrunners(Biden,Bernie,Kamala,Booker,Warren,Beto,Klobuchar,Buttigieg)
  • I display how voters who said they would choose these candidates, said they wouldn’t vote for otehr candidates
    • Anti Biden,Bernie,Kamala, and Warren tables are built
    • Voters selected here merely indicated they would vote for the candidate, not that they would only vote for the candidate
      • Later I will explore voters who rank certain candidates as their #1 choice and how they feel about other candidates

Anti-Biden vote

1
biden_aginstbiden 3.31
bernie_aginstbiden 22.22
kamala_aginstbiden 15.89
beto_aginstbiden 11.45
booker_aginstbiden 21.74
Klobuchar_aginstbiden 18.84
Warren_aginstbiden 26.04
buttigigeg_aginstbiden 20.38

Anti-Bernie vote

1
biden_aginstbernie 24.38
bernie_aginstbernie 3.70
kamala_aginstbernie 33.11
beto_aginstbernie 32.82
booker_aginstbernie 34.78
Klobuchar_aginstbernie 37.68
Warren_aginstbernie 23.44
buttigigeg_aginstbernie 31.21

Anti-Kamala vote

1
biden_aginstkamala 10.74
bernie_aginstkamala 19.14
kamala_aginstkamala 1.99
beto_aginstkamala 9.16
booker_aginstkamala 9.78
Klobuchar_aginstkamala 5.80
Warren_aginstkamala 11.46
buttigigeg_aginstkamala 12.74

Anti Warren vote

1
biden_aginstWarren 14.46
bernie_aginstWarren 9.26
kamala_aginstWarren 9.93
beto_aginstWarren 16.03
booker_aginstWarren 16.30
Klobuchar_aginstWarren 14.49
Warren_aginstWarren 3.12
buttigigeg_aginstWarren 14.01

Below tables are built on voters choosing either Warren,Biden, or Bernie as their number 1 rank based choice

percent_bernie_voters_dissaprove
Joe Biden 8.60
Bernie Sanders 0.00
Kamala Harris 5.91
Beto O’Rourke 7.53
Cory Booker 7.53
Amy Klobuchar 8.60
Elizabeth Warren 1.08
John Hickenlooper 9.14
Kirsten Gillibrand 8.60
John Delaney 6.99
Julián Castro 5.91
Stacey Abrams 4.84
Tammy Baldwin 5.91
Bill DeBlasio 8.06
Tulsi Gabbard 6.45
Pete Buttigieg 5.38
Jay Inslee 6.99
Tim Ryan 6.45
Seth Moulton 6.99
Eric Swalwell 6.45
Andrew Yang 6.45
Marianne Williamson 7.53
Mike Gravel 6.45
Steve Bullock 7.53
Michael Bennet 8.60
Wayne Messam 6.45
None 6.99
percent_biden_voters_dissaprove
Joe Biden 1.08
Bernie Sanders 13.98
Kamala Harris 5.38
Beto O’Rourke 6.99
Cory Booker 5.91
Amy Klobuchar 8.06
Elizabeth Warren 8.06
John Hickenlooper 15.59
Kirsten Gillibrand 14.52
John Delaney 13.44
Julián Castro 8.06
Stacey Abrams 11.29
Tammy Baldwin 12.37
Bill DeBlasio 15.59
Tulsi Gabbard 15.59
Pete Buttigieg 8.60
Jay Inslee 11.83
Tim Ryan 10.75
Seth Moulton 13.98
Eric Swalwell 10.75
Andrew Yang 16.13
Marianne Williamson 15.05
Mike Gravel 15.59
Steve Bullock 13.44
Michael Bennet 10.75
Wayne Messam 14.52
None 15.59
percent_Warren_voters_dissaprove
Joe Biden 5.38
Bernie Sanders 8.06
Kamala Harris 3.23
Beto O’Rourke 3.23
Cory Booker 3.76
Amy Klobuchar 4.84
Elizabeth Warren 0.54
John Hickenlooper 4.84
Kirsten Gillibrand 4.30
John Delaney 4.30
Julián Castro 1.08
Stacey Abrams 1.08
Tammy Baldwin 3.23
Bill DeBlasio 6.99
Tulsi Gabbard 10.22
Pete Buttigieg 2.15
Jay Inslee 4.30
Tim Ryan 4.30
Seth Moulton 5.91
Eric Swalwell 3.76
Andrew Yang 4.30
Marianne Williamson 3.23
Mike Gravel 4.84
Steve Bullock 4.84
Michael Bennet 3.23
Wayne Messam 3.76
None 6.45
average_refuse_vote_for_other_candidates
avg_wont_vote_bernievoters 6.571087
avg_wont_vote_bidenvoters 11.589008
avg_wont_vote_warrenvoters 4.301075

Interesting Result

If you look back at my last report on Emerson data you will see that the data seemed to suport the idea that Bernie voters were “anti-Warren” and others. This is the way the poll was reported in the news, along side the fact that Bernie was in the lead in the poll. Here we can see that Biden voters are twice as likely as Bernie Voters and three times as liekly as Warren voters to not support other candidates. We also see in this dataset that 8% of Warren voters would not consider Bernie and 10% of bernie supporters would not support Warren. This is strangely high given it’s above the average for both groups general would not consider rates, considering how close these two are in policy positions. I hate the idea of the left eating itself alive, but it does appear that this trend continues in this survey data, even if it’s to a lesser extent. add biden rates