Abstract
Recently Data for progress published a report here with an interesting new way of polling for the primaries. Their intention appears to be that they wanted to provide some ranked based voting statistics, alongside a “Not Considering”" and “Considering candidate” statistic. They asked voters over 40 questions dealing with voter demographics and voter choice. Specifically, they asked voters to check all candidates they would consider voting for, to rank all those candidates, and lastly to establish which candidates they would not consider voting for. The headline statistics that seem to be revealed from this were that Biden has 49% support level, Warren has a 40% support level, and Bernie has the highest “would not consider” percentage with 28%.
This is an interesting way to look at some of the lesser known candidates. Would not consider for Tulsi(24%), Bill De Blasio(25%), and several lesser known candidates(Moulton, Bennet, Williamson, Gravel all over 20%) shows that while these candidates are at less than 5% of overall consideration, that this isn’t necessarily a condition of candidates not being well known. Then again, perhaps their lack of name recognition is what influences these numbers. I did not attempt to explore this in the writeup, thought it was just an interesting thing to note.
This writeup attempts to validate the data that was published. I found it difficult to filter the data in a way that would produce the same results at the survey summary indicates. This is strange, as clearly the survey data indicates that 465 survey observations were polled on their preferences for the democratic party. When filtering for these results(done below), statistics don’t line up with the statistics prsented in the reporthere A mistake could have been made somewhere between time to publish and when the data was analyzed(or perhaps they were using somethign to further filter the data). The survey still provides some interesting information, below we will attempt to investigate some of the results.
As seen below only 11.7 % of survey respondents did not vote in 2016 :
vote_2016 | Freq |
---|---|
Clinton | 42.4 |
Trump | 38.5 |
Johnson | 2.8 |
Stein | 2.2 |
Mcmulin | 0.4 |
Other | 1.2 |
did not vote | 11.7 |
it’s not just the overall vote breakdown, but the vote by age which of off here as well. According to censusdata the voter breakdown for 2016 by age group should look like the below figure
Voter by age group
But the did not vote by agegroup in our dataset looks like:
age5 | presvote16post | vote_count | percent_vote_by_age |
---|---|---|---|
18-29 | did not vote | 30 | 30.303030 |
30-39 | did not vote | 38 | 15.573771 |
40-49 | did not vote | 24 | 13.186813 |
50-64 | did not vote | 21 | 8.823529 |
65+ | did not vote | 12 | 3.921569 |
Below Age demographics for our sample
age | Freq |
---|---|
18-29 | 9.3 |
30-39 | 22.8 |
40-49 | 17.0 |
50-64 | 22.3 |
65+ | 28.6 |
The dataset itself is a little confusing and the key provided by Data for progress doesn’t actually line up completely with the dataset. To check that my imputation methods are correct I lined up statistics that dataforprogress published with my imputations. I am unsure what filter they used for democratic party voters as when i filter for democratic party voters there are slightly over 500 observations. However, it was easy enough to match their results by dropping all indexes which wern’t polled by candidate preferences. Still, I don’t get the same results they dispalyed in their writeup. I created a pivot table in excel on the direct dataset, and I receive the same results as I will print out below. I think the fact that the dataset published by dataforprogress doesn’t line up with the graphics they display in their writeup, is pretty damaging towards the authenticity of the survey, but it’s easy to make mistakes in data crunching. Perhaps I made some mistakes below, I will proceed ignoring these issues
Support for candidates
would_support | would_not_support | |
---|---|---|
Joe Biden | 50.95 | 20.63 |
Elizabeth Warren | 40.42 | 16 |
Bernie Sanders | 34.11 | 27.58 |
Pete Buttigieg | 33.05 | 13.26 |
Kamala Harris | 31.79 | 16.21 |
Beto O’Rourke | 27.58 | 18.11 |
Cory Booker | 19.37 | 17.26 |
Amy Klobuchar | 14.53 | 18.53 |
Stacey Abrams | 13.26 | 13.68 |
Kirsten Gillibrand | 11.58 | 22.74 |
Julián Castro | 11.37 | 13.89 |
Anti-Biden vote
1 | |
---|---|
biden_aginstbiden | 3.31 |
bernie_aginstbiden | 22.22 |
kamala_aginstbiden | 15.89 |
beto_aginstbiden | 11.45 |
booker_aginstbiden | 21.74 |
Klobuchar_aginstbiden | 18.84 |
Warren_aginstbiden | 26.04 |
buttigigeg_aginstbiden | 20.38 |
Anti-Bernie vote
1 | |
---|---|
biden_aginstbernie | 24.38 |
bernie_aginstbernie | 3.70 |
kamala_aginstbernie | 33.11 |
beto_aginstbernie | 32.82 |
booker_aginstbernie | 34.78 |
Klobuchar_aginstbernie | 37.68 |
Warren_aginstbernie | 23.44 |
buttigigeg_aginstbernie | 31.21 |
Anti-Kamala vote
1 | |
---|---|
biden_aginstkamala | 10.74 |
bernie_aginstkamala | 19.14 |
kamala_aginstkamala | 1.99 |
beto_aginstkamala | 9.16 |
booker_aginstkamala | 9.78 |
Klobuchar_aginstkamala | 5.80 |
Warren_aginstkamala | 11.46 |
buttigigeg_aginstkamala | 12.74 |
Anti Warren vote
1 | |
---|---|
biden_aginstWarren | 14.46 |
bernie_aginstWarren | 9.26 |
kamala_aginstWarren | 9.93 |
beto_aginstWarren | 16.03 |
booker_aginstWarren | 16.30 |
Klobuchar_aginstWarren | 14.49 |
Warren_aginstWarren | 3.12 |
buttigigeg_aginstWarren | 14.01 |
Below tables are built on voters choosing either Warren,Biden, or Bernie as their number 1 rank based choice
percent_bernie_voters_dissaprove | |
---|---|
Joe Biden | 8.60 |
Bernie Sanders | 0.00 |
Kamala Harris | 5.91 |
Beto O’Rourke | 7.53 |
Cory Booker | 7.53 |
Amy Klobuchar | 8.60 |
Elizabeth Warren | 1.08 |
John Hickenlooper | 9.14 |
Kirsten Gillibrand | 8.60 |
John Delaney | 6.99 |
Julián Castro | 5.91 |
Stacey Abrams | 4.84 |
Tammy Baldwin | 5.91 |
Bill DeBlasio | 8.06 |
Tulsi Gabbard | 6.45 |
Pete Buttigieg | 5.38 |
Jay Inslee | 6.99 |
Tim Ryan | 6.45 |
Seth Moulton | 6.99 |
Eric Swalwell | 6.45 |
Andrew Yang | 6.45 |
Marianne Williamson | 7.53 |
Mike Gravel | 6.45 |
Steve Bullock | 7.53 |
Michael Bennet | 8.60 |
Wayne Messam | 6.45 |
None | 6.99 |
percent_biden_voters_dissaprove | |
---|---|
Joe Biden | 1.08 |
Bernie Sanders | 13.98 |
Kamala Harris | 5.38 |
Beto O’Rourke | 6.99 |
Cory Booker | 5.91 |
Amy Klobuchar | 8.06 |
Elizabeth Warren | 8.06 |
John Hickenlooper | 15.59 |
Kirsten Gillibrand | 14.52 |
John Delaney | 13.44 |
Julián Castro | 8.06 |
Stacey Abrams | 11.29 |
Tammy Baldwin | 12.37 |
Bill DeBlasio | 15.59 |
Tulsi Gabbard | 15.59 |
Pete Buttigieg | 8.60 |
Jay Inslee | 11.83 |
Tim Ryan | 10.75 |
Seth Moulton | 13.98 |
Eric Swalwell | 10.75 |
Andrew Yang | 16.13 |
Marianne Williamson | 15.05 |
Mike Gravel | 15.59 |
Steve Bullock | 13.44 |
Michael Bennet | 10.75 |
Wayne Messam | 14.52 |
None | 15.59 |
percent_Warren_voters_dissaprove | |
---|---|
Joe Biden | 5.38 |
Bernie Sanders | 8.06 |
Kamala Harris | 3.23 |
Beto O’Rourke | 3.23 |
Cory Booker | 3.76 |
Amy Klobuchar | 4.84 |
Elizabeth Warren | 0.54 |
John Hickenlooper | 4.84 |
Kirsten Gillibrand | 4.30 |
John Delaney | 4.30 |
Julián Castro | 1.08 |
Stacey Abrams | 1.08 |
Tammy Baldwin | 3.23 |
Bill DeBlasio | 6.99 |
Tulsi Gabbard | 10.22 |
Pete Buttigieg | 2.15 |
Jay Inslee | 4.30 |
Tim Ryan | 4.30 |
Seth Moulton | 5.91 |
Eric Swalwell | 3.76 |
Andrew Yang | 4.30 |
Marianne Williamson | 3.23 |
Mike Gravel | 4.84 |
Steve Bullock | 4.84 |
Michael Bennet | 3.23 |
Wayne Messam | 3.76 |
None | 6.45 |
average_refuse_vote_for_other_candidates | |
---|---|
avg_wont_vote_bernievoters | 6.571087 |
avg_wont_vote_bidenvoters | 11.589008 |
avg_wont_vote_warrenvoters | 4.301075 |
Interesting Result
If you look back at my last report on Emerson data you will see that the data seemed to suport the idea that Bernie voters were “anti-Warren” and others. This is the way the poll was reported in the news, along side the fact that Bernie was in the lead in the poll. Here we can see that Biden voters are twice as likely as Bernie Voters and three times as liekly as Warren voters to not support other candidates. We also see in this dataset that 8% of Warren voters would not consider Bernie and 10% of bernie supporters would not support Warren. This is strangely high given it’s above the average for both groups general would not consider rates, considering how close these two are in policy positions. I hate the idea of the left eating itself alive, but it does appear that this trend continues in this survey data, even if it’s to a lesser extent. add biden rates