Political writing usually discusses things at the statewide level. Here instead using publicly available data we will drill down to see what we can learn about the individual donor. This paper will address the same questions to the three different levels: all politcial parties lumped in together , Republicans v Democrats, and finally, the candidates against each other regardless of party. We will look at the number of donors per zip, median donation per zip, highest participation level per zip, people who gave multiple times per zip.
Primarily we will be working with the Federal Election Data for NY state as provided by Udacity. We will suppliment this dataset with ZipPop2 a dataset which provides the population per zip code as of 2010 and lastly with a dataset , Candidate_Party ,that provides each candidates political party and that party’s color.
## # A tibble: 100,000 x 12
## cmte_id contbr_nm contbr_city Zip
## <fctr> <fctr> <fctr> <chr>
## 1 C00575795 SHIMANSKY, REBA NEW YORK 10023
## 2 C00575795 WEBB, LOYANA RIDGEWOOD 11385
## 3 C00580100 FELDMAN, ROB COMMACK 11725
## 4 C00577130 KING, MICHAEL RIDGEWOOD 11385
## 5 C00577130 DUNLOP, DAVID S. NEWBURGH 12550
## 6 C00575795 PRIORE, PATRICK NEW YORK 10011
## 7 C00458844 GANZ, ZEV HEWLETT 11557
## 8 C00575795 SMAYLOVSKY, BELLA BROOKLYN 11229
## 9 C00575795 COOPER, ANDREW BROOKLYN 11226
## 10 C00577130 ALLISON, MATT BROOKLYN 11206
## # ... with 99,990 more rows, and 8 more variables: contbr_employer <fctr>,
## # contbr_occupation <fctr>, contb_receipt_amt <dbl>,
## # contb_receipt_dt <fctr>, Last_Name <chr>, Population <int>,
## # Party <fctr>, Party_Color <fctr>
## [1] -9300
The number of donations per zip might just be another way of saying how many people live in each zip. Look at how the donations per zip across all of NYS’s thousands of zips and the population per zip graphs are almost interchangeable. Looked at this way we are comparing a NYC zip code that has many apartment buildings with a farming community, the first might have tens of thousands living in the zip while the second might have tens.
But let us zoom in by applying a Log Scale to the Y-axis to the number of donors per zip.
Now NYS donor participation can be seen for its vibrancy. In the non-Logged view it looked like only a few people particiapted and the rest were apathetic. Now that we have made an adjustment for poplualtion differences one can see a lot more involvement.
What was the median donation per zip in NYS dfor the 2016 presidential election?
Now let us look a the median donation across NYS on a log scale. Like the number of donatins per zip the median amount of donation per zip looks like the rest of NYS is not being obsured by the NYC.
Which Zip codes were the most enthusiatic, i.e., which had the highest percentage of donors? We will call this our Enthusiasm level and it is a way of putting all zips on the same playing field . Our Enthusiasm level is the particpation rate, the number of donors devided by a zip code’s population.
What occupations were the most likely to make a donation to a presidential candidate in 2016?
Of all of New York State’s 20million who were the people that donated the most number of times?
## contbr_nm Donation.Times
## Length:46623 Min. : 1.000
## Class :character 1st Qu.: 1.000
## Mode :character Median : 1.000
## Mean : 2.094
## 3rd Qu.: 2.000
## Max. :199.000
Our first question again is which zips each gave the most number of times. This is an absolute number and is not the same as Enthusiasm which is a percentage of the zip population.
Here is a big difference between the number of Republican and Democratic donors. According to this graph only the Democrats donated. But it turns out that we have to log and zoom in on our data to detect the number of Republican donors.
Now let us look at the differences between the two parties median donations accros the entire state. The Median donation for republicans was two times that of the democrates, $50 v $25.
And which zip codes gave the most number of times to each party? In the ordered barblots it looks like the Republicans had higher rate
## cmte_id contbr_nm contbr_city Zip
## C00580100:187 Length:336 Length:336 10165 : 1
## C00574624: 74 Class :character Class :character 10302 : 1
## C00573519: 49 Mode :character Mode :character 10314 : 1
## C00458844: 16 10454 : 1
## C00579458: 6 10470 : 1
## C00577981: 2 10509 : 1
## (Other) : 2 (Other):330
## contbr_employer contbr_occupation
## Length:336 RETIRED :141
## Class :character INFORMATION REQUESTED : 34
## Mode :character INFORMATION REQUESTED PER BEST EFFORTS: 11
## SALES : 5
## SELF-EMPLOYED : 5
## PHYSICIAN : 4
## (Other) :136
## contb_receipt_amt contb_receipt_dt Last_Name Population
## Min. : 2.00 12-Jul-16: 14 Trump :187 Min. : 2
## 1st Qu.: 25.75 11-Jul-16: 11 Cruz : 74 1st Qu.: 1592
## Median : 50.00 8-Aug-16 : 9 Carson : 49 Median : 3698
## Mean : 159.18 1-Jul-16 : 8 Rubio : 16 Mean : 9137
## 3rd Qu.: 100.00 9-Aug-16 : 8 Bush : 6 3rd Qu.:11456
## Max. :2700.00 19-Jul-16: 7 Huckabee: 2 Max. :99598
## (Other) :279 (Other) : 2
## Party Party_Color Freq Enthusiasm
## Conservative: 0 Blue : 0 Min. : 1.00 Min. : 0.00244
## Democrat : 0 Green : 0 1st Qu.: 3.00 1st Qu.: 0.08200
## Green : 0 Orange: 0 Median : 9.00 Median : 0.41503
## Libeterian : 0 Red :336 Mean : 24.37 Mean : 3.41790
## Republican :336 Yellow: 0 3rd Qu.: 30.00 3rd Qu.: 2.08818
## Max. :210.00 Max. :81.51724
##
## Times.Given
## Min. : 1.00
## 1st Qu.: 1.00
## Median : 1.00
## Mean : 1.78
## 3rd Qu.: 2.00
## Max. :36.00
##
## cmte_id contbr_nm contbr_city Zip
## C00575795:611 Length:1124 Length:1124 10001 : 1
## C00577130:513 Class :character Class :character 10002 : 1
## C00458844: 0 Mode :character Mode :character 10003 : 1
## C00500587: 0 10004 : 1
## C00573519: 0 10005 : 1
## C00574624: 0 10006 : 1
## (Other) : 0 (Other):1118
## contbr_employer contbr_occupation contb_receipt_amt
## Length:1124 RETIRED :183 Min. : 1.00
## Class :character NOT EMPLOYED :168 1st Qu.: 15.00
## Mode :character TEACHER : 36 Median : 27.00
## PROFESSOR : 24 Mean : 74.01
## INFORMATION REQUESTED: 19 3rd Qu.: 50.00
## ATTORNEY : 17 Max. :2700.00
## (Other) :677
## contb_receipt_dt Last_Name Population Party
## 29-Feb-16: 17 Clinton :611 Min. : 2 Conservative: 0
## 30-Apr-16: 14 Sanders :513 1st Qu.: 1670 Democrat :1124
## 31-May-16: 14 Bush : 0 Median : 5460 Green : 0
## 6-Nov-16 : 13 Carson : 0 Mean : 14403 Libeterian : 0
## 25-Oct-16: 12 Christie: 0 3rd Qu.: 19161 Republican : 0
## 30-Mar-16: 12 Cruz : 0 Max. :109931
## (Other) :1042 (Other) : 0
## Party_Color Freq Enthusiasm Times.Given
## Blue :1124 Min. : 1.00 Min. : 0.00107 Min. : 1.000
## Green : 0 1st Qu.: 6.00 1st Qu.: 0.06357 1st Qu.: 1.000
## Orange: 0 Median : 22.00 Median : 0.31572 Median : 3.000
## Red : 0 Mean : 80.47 Mean : 2.68706 Mean : 4.018
## Yellow: 0 3rd Qu.: 68.00 3rd Qu.: 1.52381 3rd Qu.: 5.000
## Max. :2582.00 Max. :100.00000 Max. :46.000
##
But if we look instead at the numbers of all donations made in the state a different picture ememrges.
Which occupations most frequently gave to each party?
**Do the two parties have any occupations in common among their top 10? Yes, attorneys,physicians, retired. More interestingly though are the self-described groups, “tribes”, that they do not have in common. In a face-off for the Republicans it would be Engineers, Homemakers,Sales,and the Self-Employed against the Democrat’s Consultants, Not-Employed, Professors & Teachers, and Writers. Two of these labels strike me as particuarly politicized in these times: Home maker and *Not-Employed.**
Which party was able to inspire people to give multiple times the most?
Did the Republicans and Democrats have any contributors in common in their top ten list? No.
intersect(Rep.Rank.Multi.Donors$contbr_nm,Dem.Rank.Multi.Donors$contbr_nm)
## character(0)
Finally, let us put our questions to the candidates themselves. If we focused on all 25 we would loose the forest for the trees but 2016 had the great fortune to have 3 viable candidates with seemingly very different bases of support. A mainstream democrat who leaned right, a populist republican something that has not been a main attraction since at least 1970, and a populist democrat who almost won the party nomination. Having these 3 viable candidates gives us the rare opportunity to refract each in turn to see the others in different light. In other words instead of everyone lining up behind their usual more money for schools Democrat or the increase the military budget Republican this cycle we chose from a more money for schools Democrat v anti-trade ,free college and healthcare for all Democrat v a anti- trade, more schooling is not the answer Republican both a anti-trade, anti-elite, Republican.
How do the candidates donations compare? Like in almost every situation we are looking at the number of donations not what the dollars totaled.
Her are allof the donations made to Trump, clinton, and Sanders. Considering how very different the candidates are the donations look similar.
Clinton clearly recieved a greater number of donations but let us look into this a little closer. Trump has the highest median donation.
Did any candidates inspire their donors multiple times?
Were some occupations more likely to give to one candidate rather than another?
Do our three candidates have any occupations in common in their respective Top 5 lists? Trump And Sanders share only the Retire voters
## (polygon[GRID.polygon.1354], polygon[GRID.polygon.1355], polygon[GRID.polygon.1356], polygon[GRID.polygon.1357], text[GRID.text.1358], text[GRID.text.1359], text[GRID.text.1360], text[GRID.text.1361], text[GRID.text.1362])
And what occupations do Clinton and Sanders have in common? They have the most overlap: Retired people, Lawyers, Teachers
Now let us look at tour three candidates respective levels of enthusiasm.
## geom_bar: width = NULL, na.rm = FALSE
## stat_count: width = NULL, na.rm = FALSE
## position_stack
Looking back on my exploration I see that what I expected to see was that as one zoomed in closer and closer what is thought of as NYS predisendial poling results disaggreagated into something not obviously at all like what statewide polls would have one think. Was it succesfull?
NY is thought of as a Blue , reliably, uncontested state. In popular poling results this is how it is presentedon the left when in fact it is mor elike the right. Granted the difference is that the right side has the Y-axis on log10 scale but I would suggest thuis does not exagerate the Republican presence but dims the light on NYC so the rest of the state can be seen.
If someone were asked to name scatterp;lot of all donations below how many would answer th that is a reliably blue state like NY?
Here is how polling is often presented. Same info as above but leaves one with avery different impression.
Now here is a little dissonance . Trump supporters were characterized as the Left Behind but infact contributions to Trump were the highest of the three major candidates.
Which is the real populist? I think one would have to aggree its the candidate who has a the category Not Employed to himself, Bernie Sanders. The other populsit plays best not with people who want a job tomorrow but rather withtheose who’s working career sare over. Nostalgia, perhaps.
And finally what of our own statistic , Enthusiasm? The 3d graph below pits a zip code’s population against its Enthusiasm. For each of the candidates as the zip code gets less populous , that is asn one goes down the vertical axis the the Enthusiasm , the diagnal axis, grows, but while for Clinton and Sanders the difference is small , for Trump the differnce is pronounced.
Below Clinton really sets herself apart. the diagnal line measures how oftena the same person gave. Clinton is almost alone in getting people to donate ove an and over
So in conclusion I come away surpised by how many Republican donors there are in the very blue state of NY, at how many people were inspired by Clinton , and surprised that Sanders did not do better in NY. All of these things became visible by drilling down into the zip code level.
What were some of the struggles in this paper? they are apparent. Things like manually assigning colors on some plots either took way too long or just never got done. Another time sink hole was turning the axis lables 45 deg. It would work on 1 or 2 of 3 but not all three. Seemingly identical code an=on the same data set. Exasperating. Clearly something differnt inhte data but I could not spot it. Also, in an earlier iteration of this aper when on ocation I “Viewed” the data I would find that the columns had been multiplying. Another time I found that dome of my columns wer dataframes INSIDE of my data frames and they do not behaive at all like I wanted. Or it took me a long time to find out that when one applies FILTER to a FACTOR it does not drop out the unwanted but only reduces their value to zero. This is so counterintuitive and I do not knwo when that would be the desired effect. I wish I had been able to import images of NYTime s polls.
Where to go from here? That is easy. Do a longitudinal study. If Udaacity assigned a sequal I would drill down on idvidual contributors through as many election cycles as I could get and see how many , if any, were swing voters and if any were, try to find a a correlation with who knows what? Do they always vote against the party that is in, do they vote acording to economic cycle, does the amount that they give parallel whether they are switching parties? That would be intersting to know. And perhaps take it up a notch and compare voters from different states. This would be one big data set. The NYS 2016 was 100megs. A hundred megs here and a hundred megs there and pretty soon…. This leads me to the next thing. Is it possible to set up AWS on a windows 10 machine and use Amazons machines to hold the data and to run RStudio on it?