Now that the 2016 Presidential Election cycle has been long wrapped up, there is a trove of data on it. One of the areas in campaigning that is seen as an important indicator of winning is campaign contributions.
For my Data Analyst Nanodegree from Udacity, I was tasked with finding and analyzing a dataset in the wild using R. To accomplish this, I set out to analyze campaign contributions to different presidential campaingns during the 2016 presidential election cycle from Alabama. Why Alabama? It was randomly chosen.
Data was downloaded from http://classic.fec.gov/disclosurep/PDownload.do.
Some questions I hope to answer: 1) Which candidates raised the most money? 2) How did donations sums/counts progress over time? 3) Which candidates had the lowest/highest average donations?
Importing relevant Libraries
## Skim summary statistics
## n obs: 55209
## n variables: 18
##
## Variable type: character
## variable missing complete n min max empty n_unique
## 1 cand_id 0 55209 55209 9 9 0 23
## 2 cand_nm 0 55209 55209 9 25 0 23
## 3 cmte_id 0 55209 55209 9 9 0 23
## 4 contb_receipt_dt 0 55209 55209 9 9 0 646
## 5 contbr_city 0 55209 55209 0 22 2 697
## 6 contbr_employer 20 55189 55209 0 38 281 4737
## 7 contbr_nm 0 55209 55209 4 40 0 16980
## 8 contbr_occupation 3 55206 55209 0 38 285 2785
## 9 contbr_st 0 55209 55209 2 2 0 1
## 10 contbr_zip 0 55209 55209 0 9 28 6666
## 11 election_tp 0 55209 55209 0 5 211 4
## 12 file_num 0 55209 55209 7 7 0 144
## 13 form_tp 0 55209 55209 4 5 0 3
## 14 memo_cd 0 55209 55209 0 1 40016 2
## 15 memo_text 0 55209 55209 0 67 43505 66
## 16 receipt_desc 0 55209 55209 0 67 54308 19
## 17 tran_id 0 55209 55209 5 20 0 55072
##
## Variable type: numeric
## variable missing complete n mean sd min p25 median
## 1 contb_receipt_amt 0 55209 55209 116.53 361.54 -7300 20 38
## p75 max hist
## 1 100 10800 ▁▁▁▇▁▁▁▁
The only numeric variable here, donation amount, had an intersting distribution. The mean was 116, the min a strange -7300, max of 10800, and a median of 38.
Some variables that jumped out to me: - Committee ID - Candidate ID - Candidate Name - Conbritor City - Contrib. Zip - Contrib. Occupation - Receipt Amount - Recepit Date - Election Type
It will interesting to do some exploration of giving amounts and counts over time, as well as some spatial potting. Another thing that could be possible is analysis of different attributes (city, occupation, etc.) of contributors, and see if there are any common features.
There are some columns which may not provide much insight: -receipt description - memo code - memo text - contbr state (all values are from AL) - contrib. employer (I am doubtful there are enough insightful values for each of the 55000 rows) - Contr. Name (these should be used to create IDs, then be removed, to de-identify the rows) If anything, I can use these to create a contributor ID to do some analysis on how many contributions the average Alabaman gave - form type (not sure what this represents) - file number(not sure what this represents)
There are few missing values for the columns here, the only substantial are in contributor employer.
Let’s now look at individual column values with the table() function.
##
## C00458844 C00500587 C00573519 C00574624 C00575449 C00575795 C00577130
## 1153 2 4372 8049 338 19789 7086
## C00577312 C00577981 C00578492 C00578658 C00578757 C00579458 C00580100
## 235 225 23 8 15 294 13148
## C00580159 C00580399 C00580480 C00581199 C00581215 C00581876 C00583146
## 4 21 101 38 3 194 2
## C00605568 C00623884
## 87 22
## [1] 23
##
## P00003392 P20002671 P20002721 P20003281 P20003984 P40003576 P60003670
## 19789 87 23 2 38 338 194
## P60005915 P60006046 P60006111 P60006723 P60007168 P60007242 P60007671
## 4372 101 8049 1153 7086 235 8
## P60007697 P60008059 P60008398 P60008521 P60008885 P60009685 P60022654
## 15 294 4 21 3 2 22
## P80001571 P80003478
## 13148 225
## [1] 23
##
## Bush, Jeb Carson, Benjamin S.
## 294 4372
## Christie, Christopher J. Clinton, Hillary Rodham
## 21 19789
## Cruz, Rafael Edward 'Ted' Fiorina, Carly
## 8049 235
## Graham, Lindsey O. Huckabee, Mike
## 15 225
## Jindal, Bobby Johnson, Gary
## 4 87
## Kasich, John R. Lessig, Lawrence
## 194 2
## McMullin, Evan O'Malley, Martin Joseph
## 22 8
## Paul, Rand Perry, James R. (Rick)
## 338 2
## Rubio, Marco Sanders, Bernard
## 1153 7086
## Santorum, Richard J. Stein, Jill
## 23 38
## Trump, Donald J. Walker, Scott
## 13148 101
## Webb, James Henry Jr.
## 3
## [1] 23
There were 23 candidates that recieved donation in this election cycle in Alabama. This database does no contain any information on candidate political party or gender. This will be added shortly.
A caveat here is this candidate information only represents candidates that recieved at least one donation, so does not serve as a list of all candidates that ran for president in this cycle.
From a cursory look, looking at the candidates names via the format here is unwieldy. Let’s use the last name for interpretability when doing EDA.
For most of the candidates listed here, I am able to ID which party they ran in. Apparently Lessig and McMullin ran as a Democrat and Independant, respectively.
## [1] 23
Now each row is assigned a party ID. Values were assigned using if-else statements, then condened into one column, “party”.
Each candidate now has a gender.
As stated earlier, last names are now being used for each cnadidate. This will make plots less un-wieldy for EDA.
This block above is for purposes of anonymizing the rows, but creating an index. This allows for some stats by donor, like number of donations, sums, means, etc.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -7300.0 20.0 38.0 116.5 100.0 10800.0
A min of -$7300? How is this possible. Let’s look at a random donor ID.
## cmte_id cand_id cand_nm contbr_city contbr_zip contbr_employer
## 1: C00580100 P80001571 Trump BIRMINGHAM 35209
## contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc
## 1: -7300 2016-12-01 Refund
## memo_text form_tp election_tp party cand_gdr nameID
## 1: SB28A G2016 1 1 5151
## cmte_id cand_id cand_nm contbr_city contbr_zip
## 1: C00605568 P20002671 Johnson BIRMINGHAM 35209
## 2: C00580100 P80001571 Trump BIRMINGHAM 35209
## 3: C00580100 P80001571 Trump BIRMINGHAM 35209
## contbr_employer contbr_occupation
## 1: TWICE REQUESTED, NOT YET RECEIVED TWICE REQUESTED, NOT YET RECEIVED
## 2:
## 3: COBB THEATERS OWNER
## contb_receipt_amt contb_receipt_dt receipt_desc memo_text form_tp
## 1: 2000 2016-09-30 SA17A
## 2: -7300 2016-12-01 Refund SB28A
## 3: 10000 2016-11-02 SA17A
## election_tp party cand_gdr nameID
## 1: G2016 4 1 5151
## 2: G2016 1 1 5151
## 3: G2016 1 1 5151
The receipt decription states “Refund”. This explains why there are some negative values. Let’s explore this further.
## cmte_id cand_id cand_nm contbr_city contbr_zip contbr_employer
## 1: C00580100 P80001571 Trump ALBERTVILLE 35951 RETIRED
## 2: C00580100 P80001571 Trump ALBERTVILLE 35951 RETIRED
## 3: C00580100 P80001571 Trump ALBERTVILLE 35951 RETIRED
## 4: C00580100 P80001571 Trump ATHENS 35613 RETIRED
## 5: C00580100 P80001571 Trump ATHENS 35613 RETIRED
## ---
## 930: C00580100 P80001571 Trump JASPER 35504 SELF-EMPLOYED
## 931: C00575795 P00003392 Clinton WETUMPKA 360931736
## 932: C00575795 P00003392 Clinton HUNTSVILLE 358013606
## 933: C00575795 P00003392 Clinton TUSCALOOSA 354042917
## 934: C00575795 P00003392 Clinton TUSCALOOSA 354042917
## contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc
## 1: RETIRED -28 2016-10-13
## 2: RETIRED -28 2016-10-20
## 3: RETIRED -28 2016-10-27
## 4: RETIRED -80 2016-10-18
## 5: RETIRED -80 2016-10-25
## ---
## 930: DENTIST -80 2016-10-18
## 931: -75 2016-09-04 Refund
## 932: -100 2016-10-01 Refund
## 933: -5 2016-10-14 Refund
## 934: -100 2016-10-14 Refund
## memo_text form_tp election_tp party cand_gdr nameID
## 1: SA18 G2016 1 1 4
## 2: SA18 G2016 1 1 4
## 3: SA18 G2016 1 1 4
## 4: SA18 G2016 1 1 26
## 5: SA18 G2016 1 1 26
## ---
## 930: SA18 G2016 1 1 16877
## 931: SB28A G2016 2 0 2060
## 932: SB28A G2016 2 0 90
## 933: SB28A G2016 2 0 1521
## 934: SB28A G2016 2 0 1521
There are 934 (~1.7% of total values) negative contributions. It is possible that these are refunds. Let’s look at the values with table().
##
##
## 379
## REATTRIBUTION TO SPOUSE
## 38
## REATTRIBUTION TO SPOUSE; SEE REDESIGNATION
## 1
## REDESIGNATION TO CRUZ FOR SENATE
## 71
## REDESIGNATION TO GENERAL
## 180
## REDESIGNATION TO PRESIDENTIAL GENERAL
## 1
## Refund
## 264
For the majority of these, there is no description. Others were redesignated to presidential campaigns (Hilary Victory Fund, redisgnation). These will be adjusted to have positive values, and the term type changed to G2016 (for presidential contributions). Others were redistributed to spouses. All redistributed (to spouse and to general/senate) values occur in the dataset twice, and people who wanted refunds wanted their donation rescinded. The safest approach would be to remove all negative values.
Now all negative values are removed. Let’s look at the the new variable summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.24 23.00 40.00 124.15 100.00 10800.00
Interesting. The mean, median and 25/75% quartiles went up a few values. The data looks to skew left. Let’s look at the number of records for the higher end:
## [1] 556
## [1] 832
## [1] 1873
There aren’t many values above 500, so they will excluded from the following “quick and dirty” plot.
The variable is now ready for EDA!
####Donation Histogram
The distribution includes some very large outliers, which distorts the plot. To get a sense of it is showing we need to zoom in a bit more using coord_cartesion() and setting the limits to 0:500
That is cleaner, and we can see some patterns. Most donations were in the ranges of $0-100. Large upticks of donations occur at certain intervals (200, 250, 100, 50), as one would expect.
Let’s zoom in even further, on donations under 100.
Looking at the values zoomed in below 100, the most common values are 10, 25, 50, and 100. The majority of donations were below $50.
## cand_nm count totalDonations meanDonation medianDonation
## 1: Trump 12864 1997848.40 155.30538 80.00
## 2: Clinton 19615 1917914.35 97.77794 25.00
## 3: Cruz 7723 790043.98 102.29755 50.00
## 4: Carson 4338 595010.17 137.16233 50.00
## 5: Rubio 1114 512313.73 459.88665 100.00
## 6: Sanders 7029 301434.87 42.88446 27.00
## 7: Bush 285 279389.00 980.31228 265.00
## 8: Huckabee 222 86866.50 391.29054 100.00
## 9: Paul 334 57723.71 172.82548 50.00
## 10: Walker 100 55356.00 553.56000 250.00
## 11: Kasich 193 54883.00 284.36788 100.00
## 12: Fiorina 235 31294.00 133.16596 50.00
## 13: Johnson 86 25696.85 298.80058 100.00
## 14: Santorum 23 11108.00 482.95652 100.00
## 15: Stein 37 5577.00 150.72973 50.00
## 16: Christie 21 3645.00 173.57143 100.00
## 17: O'Malley 8 2890.64 361.33000 20.16
## 18: Jindal 4 2759.00 689.75000 104.50
## 19: Graham 15 2355.00 157.00000 25.00
## 20: McMullin 21 1865.00 88.80952 50.00
## 21: Lessig 2 1000.00 500.00000 500.00
## 22: Webb 3 650.00 216.66667 100.00
## 23: Perry 2 600.00 300.00000 300.00
## cand_nm count totalDonations meanDonation medianDonation
Not surprisingly, Trump and Clinton raised the most money, followed by Cruz, Carson, and Rubio.
Clinton by and away had the highest number of distinct donations, followed by Trump, Cruz, Sanders, and Carson. Republican candidates appear to have much higher average donations, while the democratic candidates recieving much more smaller donations.
Bush and Walker had the highest average donation, though this doesn’t take into account sample size.I would predict that the candidates that recieved the most donations had lower mean donations.
Scott Walker and Jeb Bush had the highest median donations, both Republican. The Democratic candidates, Clinton and Sanders, had the lowest median donations.
Republican candidates outraised Democratic candidates by almost a 2:1 margin. No other party came close, though Libertarian candidates get an honorable mention with 3rd.
## election_tp count totalDonations meanDonation medianDonation
## 1: P2016 35160 4247493.10 120.8047 40
## 2: G2016 18906 2410760.46 127.5130 40
## 3: 206 78970.64 383.3526 240
## 4: O2016 2 1000.00 500.0000 500
More money was raised in the General, with almost twice as much money that was raised in the primary.
## cand_nm election_tp count totalDonations meanDonation medianDonation
## 1: Clinton G2016 12341 1022292.72 82.83711 25.000
## 2: Cruz P2016 7572 753105.98 99.45932 50.000
## 3: Clinton P2016 7274 895621.63 123.12643 25.000
## 4: Sanders P2016 7029 301434.87 42.88446 27.000
## 5: Trump P2016 6424 628089.48 97.77233 40.000
## 6: Trump G2016 6244 1292418.92 206.98573 87.140
## 7: Carson P2016 4320 587275.17 135.94333 50.000
## 8: Rubio P2016 1103 494813.73 448.60719 100.000
## 9: Paul P2016 314 49505.74 157.66159 50.000
## 10: Bush P2016 284 279369.00 983.69366 282.500
## 11: Fiorina P2016 235 31294.00 133.16596 50.000
## 12: Huckabee P2016 221 85866.50 388.53620 100.000
## 13: Trump 196 77340.00 394.59184 282.000
## 14: Kasich P2016 193 54883.00 284.36788 100.000
## 15: Cruz G2016 151 36938.00 244.62252 35.000
## 16: Walker P2016 100 55356.00 553.56000 250.000
## 17: Johnson G2016 84 22896.85 272.58155 100.000
## 18: Santorum P2016 23 11108.00 482.95652 100.000
## 19: Christie P2016 21 3645.00 173.57143 100.000
## 20: Stein G2016 20 1426.00 71.30000 27.000
## 21: Paul G2016 20 8217.97 410.89850 281.875
## 22: Carson G2016 18 7735.00 429.72222 100.000
## 23: McMullin G2016 16 1315.00 82.18750 50.000
## 24: Stein P2016 15 3151.00 210.06667 100.000
## 25: Graham P2016 15 2355.00 157.00000 25.000
## 26: Rubio G2016 11 17500.00 1590.90909 2400.000
## 27: McMullin 5 550.00 110.00000 100.000
## 28: O'Malley 4 80.64 20.16000 20.160
## 29: O'Malley P2016 4 2810.00 702.50000 50.000
## 30: Jindal P2016 4 2759.00 689.75000 104.500
## 31: Webb P2016 3 650.00 216.66667 100.000
## 32: Stein O2016 2 1000.00 500.00000 500.000
## 33: Perry P2016 2 600.00 300.00000 300.000
## 34: Lessig P2016 2 1000.00 500.00000 500.000
## 35: Johnson P2016 2 2800.00 1400.00000 1400.000
## 36: Huckabee 1 1000.00 1000.00000 1000.000
## 37: Bush G2016 1 20.00 20.00000 20.000
## cand_nm election_tp count totalDonations meanDonation medianDonation
With this data now grouped and separated out, let’s look at the most popular candidates in Alabama: Clinton, Trump, Sanders, Cruz, Rubio, and Carson and categorize money raised by race.
Observations: 1) Clinton raised the most money of any candidate in the Primaries 2) Cruz by a substantial margin raised the most money of any republican candidate during the primary, followed by Trump, Carson and Rubio 3) Trump raised the most money total, with the majority coming during the General 4) Trump outraised Hilary in the General
I next want to look at donations as time-series, by candidate and race.
## contb_receipt_dt count totalDonations meanDonation medianDonation
## 1: 2014-12-22 1 2000 2000 2000
## 2: 2014-12-29 1 128 128 128
## 3: 2014-12-30 2 1250 625 625
## 4: 2015-02-10 1 25 25 25
## 5: 2015-03-03 3 750 250 250
## ---
## 638: 2016-12-20 1 150 150 150
## 639: 2016-12-22 1 250 250 250
## 640: 2016-12-25 1 250 250 250
## 641: 2016-12-29 1 35 35 35
## 642: 2016-12-30 1 200 200 200
Observations: 1) The day with the largest sum of donations was a day in January, before the primary season, which started in February 2) The day with the second largest sum was February 29th, the day before “Super Tuesday” (March 1st), in which there were 12 primaries (https://www.washingtonpost.com/graphics/politics/2016-election/primaries/schedule/) 3) The next largest days were in mid to late July, mirroring the dates of the Republican and Democratic Conventions (https://en.wikipedia.org/wiki/2016_Democratic_National_Convention, https://en.wikipedia.org/wiki/2016_Republican_National_Convention) 4) After the conventions (General Election race), the daily donation sums picked up consistently until the end of the race in the beginning of November 5) Donations during the General Election race (after July 1st), seem to have a cyclical pattern, I hypothesize representing weekly cycles
Observations: 1) We see multiple trends here: rising and falling around Super Tuesday (March 1st), then spikes in July (due to party conventions), and then steady increases between August and November 2)The days with the highest number of donations were July 11/12, which each had over 1000 donations, which was a week before the Republican Party Convention (July 18) 3) The number of donations per day steadily rose after July (during the General Election Race), from August to November 4) The number of donations per day rose as primary season approached (the first primary was February 1st, 2016)
While there were not many donations (by count) early in the election cycle, the average donations were extremely high, compared to donations post November 2015.This suggests few donors with larger pockets making donations early.
Similar to the plot above, days earlier in the campagin cycle had significant median donations compared to days in the last year of the election cycle.
## contb_receipt_dt cand_nm count totalDonations meanDonation
## 1: 2014-12-22 Paul 1 2000 2000
## 2: 2014-12-29 Paul 1 128 128
## 3: 2014-12-30 Paul 2 1250 625
## 4: 2015-02-10 Graham 1 25 25
## 5: 2015-03-03 Carson 3 750 250
## ---
## 2928: 2016-12-20 Trump 1 150 150
## 2929: 2016-12-22 Trump 1 250 250
## 2930: 2016-12-25 Trump 1 250 250
## 2931: 2016-12-29 Trump 1 35 35
## 2932: 2016-12-30 Trump 1 200 200
## medianDonation
## 1: 2000
## 2: 128
## 3: 625
## 4: 25
## 5: 250
## ---
## 2928: 150
## 2929: 250
## 2930: 250
## 2931: 35
## 2932: 200
Looking at the Sum of donations for all candidate here is hard. Let’s look at the major candidates instead, to get a better view.
For interetability purposes.
Filtering donations by candidate
Observations: 1) Carson had large donation totals in the beginning, and for a while was the leading fundraiser 2) Trump raised the majority of his donations after he won the nomination (post July 2016) 3) CLinton consistently outrose Sanders early on, and after winning the nomination had an upward trend in donations 4) With republican candidates other than Trump, Carson was the initial leader in fundraising, but when his numbers fell. Rubio’s rose, and when Rubio’s fell, Cruz’s rose. Here we can see donors most likely reacting to performance in polls and primary races. Until he won the nomination, Trump was consistenly behind in fundraising among candidates
Observations: 1) Among Republican candidates in the Primary, Carson had consistently higher number of donations per day, until Cruz over took him. 2) Number of daily donations for Trump and Clinton, the eventual Primary winners, did not pick up until the July 2016, where you see a larg spike in Trump donations and a sharp positive trend for Clinton which grew stronger over time 3) Rubio had consistenly the lowest number of donations, save for Trump early in the primary season. This is interesting because for a time Rubio was raising large amounts, ahead of other candidates (in the chart above), hinting that his average donation was the highest amount of any candidate
Observations: 1) Rubio, as hinted at above, had days with high median amounts 2) Clinton and Trump had donations early on that were high in median value, while carson and Sanders had days with the lowest amounts 3) Clinton;’s median donation decrease over time, as she recieved more donations 4) Aside from some major spikes, Cruz’s median donation were mostly small amounts
I want to analyze some of the fundraising over time focusing on the democratic candidates- Clinton and Sanders. While I do not have any polling data, knowing that Clinton won (https://en.wikipedia.org/wiki/Alabama_Democratic_primary,_2016), it will be interesting to analyze with that in mind.
Clinton consistently outrose Sanders. You can see Sanders trend up to its peak before March 1st, when donations subsequently started falling. I would hypothesize that this has to do with his performance on Super Tuesday (March 1st). Another interesting trend is the sharp increase in July, when the Democratic Convention occured, followed by consistent increases as the General Election approached.
Looking at the donation counts, we see that Sanders was in fact comptetive with Clinton, at least in contrast for the total daily sum raised. For a period of time, Sanders ever had higher number of daily donations than Clinton did.
Observations: 1) Early on that Carson and Cruz were the early leading fundraisers 2) Rubio had a large spike in December 2015, which seems to have also signalled the end of Carson’s candicacy, as he doesn’t appear to be leading at any other point 3) Trump had a late blooming in terms of fundraising. There weren’t consistently large daily sums of donations until he won the nomination between May 2016 and July 2016 4) Late in Trumps campaigns there were consistent decrease in donation sums
Here we can see a transition between candidates receiving the most donations per day, initially with Carson, then Cruz, then Trump after May 2016. A lower level of time granularity could help here.
Perhaps a bit too granular. Let’s group by week.
Right away, we can see how decreasing time granularity makes it easier to discern trends. We see an overall positive trend in donations sums from July 2015 to November 2016. We can also see the 2 distinct giving periods: the Primary season from July 15 to March 1st, then the General Election season from July 2016 to November 2016. Weeks in the general season had higher amounts of giving on average than weeks in the Primary season.
In contrast to the Sum raised by week in the previous plot, we see a clear positive trend from the beginning of the election cycle in March 2015 to November 2016. In other words, the number of donations generally increased as the Election neered. We also see some large spikes: The week of “Super Tuesday”, March 1st, The Republican and Democratic Conventions (2 weeks apart in July), and the week before the election (November 1st, 2016)
While a bit of a mess, we can see how Trump and Clinton’s fundraising efforts progressed during the General Election season (July 2017 on), as well as Rubio’s big week in December 2015. Let’s use facet wrap to make things cleaner.
Now this is a lot cleaner. Observations: 1) Carson started off strong compared to the other republican candidates, but soon faltered and gave way to Cruz 2) Aside of a few big weeks, Rubio did not raise consistently higher amounts 3) Sanders did not raise nearly as much money as Clinton at any period 4) The majority of Trump donations occured after he had won the nomination
Observations: 1) Sanders was comptetitve with Clinton on number of donations, however, and you can see his campaign gaining steam throughout the primaries, until it fell off after March 1st, the week of 12 primaries (Super Tuesday) 2) Cruz had consistently higher amounts of donations per week than any other republican candidate in the primaries 3) The Republican Convention was a boon for Trump’s campaign, which had the highest weekly donation total of any candidate
####- Sum of Donations by Political Party per week
Republicans had consistenly higher weekly donation totals than Democrats did.
This is a key plot due to the following insights it reveals: 1) During the primary season, each republican candidate (Trump, Cruz, Carson, Rubio) raised similar amounts, and Trump, the eventual General nominee, lost to Cruz by a decent margin and barely edged out Carson for second most money raised. 2) Trump eventually won the state during the General Election, and this aligns with him raising a substantial amount more than Clinton during the General. 3) This answers a simpler question: which candidates raised the most money. Trump and Clinton predictably reaised the most.
Why is this plot interesting? 1) It shows which candidates had higher average donations. This hints at the demographics of the donors- if you make the assumption that people with higher incomes give more than people with lower incomes. 2) The two Democratic candidates in the race, Sanders and Clinton, had the lowest average donation amount. 3) Jeb Bush, who didn’t make it far in the primary season, had by far the highest average donation among candidates with 100+ donations.
This plot shows: 1) The highest weekly amounts raised in the cycle occured during the Party Conventions in July 2) Fundraising in the general (post July 2016) was consistently higher on a weekly basis than in the primaries 3) Aside from a spike in December 2015 (due to a large Rubio value), the highest amount raised in the Primary season was the weeks around Super Tuesday, March 1st. As this day involved 12 primaries, it is an important day for all campaigns, and it showed here with higher donations.
When analyzing a publically available datset like this, there is a very high chance it will require significant pre-processing and feature creation. This was very true here. This can be made time-consuming when you take into account domain knowledge- coming in, I didn’t know much about campaign donations, as such when cleaning the data I had to ensure I knew what I was dealing with and understood what each column meant.
For demographic information, there was a lot of missing or incomplete and incosistent information. For example employment title and employer made analysis of donor demographics not possible here. As such, many features that could have provided great insight were thrown away.
I was able to answer my main, overarching questions, find some interesting trends, and create some very informative plots. This dataset was also ripe for time-series analysis, and I was able to get a new look at political campaigning, albeit in one out of 50 states.
I would like to take some of the spatial data here, like cities and zipcodes, join them with outside population data, and analyze the counties/zip codes/cities that raised the most money. Another possible project would be looking at the time series data in conjunction with polling data, and seeing if future donations could be modeled based on polling data and primary scheduling.