Measuring Presidential Campaign Contributions from the 2016 cycle from Alabama

Now that the 2016 Presidential Election cycle has been long wrapped up, there is a trove of data on it. One of the areas in campaigning that is seen as an important indicator of winning is campaign contributions.

For my Data Analyst Nanodegree from Udacity, I was tasked with finding and analyzing a dataset in the wild using R. To accomplish this, I set out to analyze campaign contributions to different presidential campaingns during the 2016 presidential election cycle from Alabama. Why Alabama? It was randomly chosen.

Data was downloaded from http://classic.fec.gov/disclosurep/PDownload.do.

Some questions I hope to answer: 1) Which candidates raised the most money? 2) How did donations sums/counts progress over time? 3) Which candidates had the lowest/highest average donations?

Importing relevant Libraries

Loading the dataset and first look

## Skim summary statistics
##  n obs: 55209 
##  n variables: 18 
## 
## Variable type: character 
##             variable missing complete     n min max empty n_unique
## 1            cand_id       0    55209 55209   9   9     0       23
## 2            cand_nm       0    55209 55209   9  25     0       23
## 3            cmte_id       0    55209 55209   9   9     0       23
## 4   contb_receipt_dt       0    55209 55209   9   9     0      646
## 5        contbr_city       0    55209 55209   0  22     2      697
## 6    contbr_employer      20    55189 55209   0  38   281     4737
## 7          contbr_nm       0    55209 55209   4  40     0    16980
## 8  contbr_occupation       3    55206 55209   0  38   285     2785
## 9          contbr_st       0    55209 55209   2   2     0        1
## 10        contbr_zip       0    55209 55209   0   9    28     6666
## 11       election_tp       0    55209 55209   0   5   211        4
## 12          file_num       0    55209 55209   7   7     0      144
## 13           form_tp       0    55209 55209   4   5     0        3
## 14           memo_cd       0    55209 55209   0   1 40016        2
## 15         memo_text       0    55209 55209   0  67 43505       66
## 16      receipt_desc       0    55209 55209   0  67 54308       19
## 17           tran_id       0    55209 55209   5  20     0    55072
## 
## Variable type: numeric 
##            variable missing complete     n   mean     sd   min p25 median
## 1 contb_receipt_amt       0    55209 55209 116.53 361.54 -7300  20     38
##   p75   max     hist
## 1 100 10800 ▁▁▁▇▁▁▁▁

The only numeric variable here, donation amount, had an intersting distribution. The mean was 116, the min a strange -7300, max of 10800, and a median of 38.

Data Review and Cleaning

Some variables that jumped out to me: - Committee ID - Candidate ID - Candidate Name - Conbritor City - Contrib. Zip - Contrib. Occupation - Receipt Amount - Recepit Date - Election Type

It will interesting to do some exploration of giving amounts and counts over time, as well as some spatial potting. Another thing that could be possible is analysis of different attributes (city, occupation, etc.) of contributors, and see if there are any common features.

There are some columns which may not provide much insight: -receipt description - memo code - memo text - contbr state (all values are from AL) - contrib. employer (I am doubtful there are enough insightful values for each of the 55000 rows) - Contr. Name (these should be used to create IDs, then be removed, to de-identify the rows) If anything, I can use these to create a contributor ID to do some analysis on how many contributions the average Alabaman gave - form type (not sure what this represents) - file number(not sure what this represents)

There are few missing values for the columns here, the only substantial are in contributor employer.

Let’s now look at individual column values with the table() function.

Candidate information Columns -Committe ID, Candidate ID, and Candidate Name

## 
## C00458844 C00500587 C00573519 C00574624 C00575449 C00575795 C00577130 
##      1153         2      4372      8049       338     19789      7086 
## C00577312 C00577981 C00578492 C00578658 C00578757 C00579458 C00580100 
##       235       225        23         8        15       294     13148 
## C00580159 C00580399 C00580480 C00581199 C00581215 C00581876 C00583146 
##         4        21       101        38         3       194         2 
## C00605568 C00623884 
##        87        22 
## [1] 23
## 
## P00003392 P20002671 P20002721 P20003281 P20003984 P40003576 P60003670 
##     19789        87        23         2        38       338       194 
## P60005915 P60006046 P60006111 P60006723 P60007168 P60007242 P60007671 
##      4372       101      8049      1153      7086       235         8 
## P60007697 P60008059 P60008398 P60008521 P60008885 P60009685 P60022654 
##        15       294         4        21         3         2        22 
## P80001571 P80003478 
##     13148       225 
## [1] 23
## 
##                 Bush, Jeb       Carson, Benjamin S. 
##                       294                      4372 
##  Christie, Christopher J.   Clinton, Hillary Rodham 
##                        21                     19789 
## Cruz, Rafael Edward 'Ted'            Fiorina, Carly 
##                      8049                       235 
##        Graham, Lindsey O.            Huckabee, Mike 
##                        15                       225 
##             Jindal, Bobby             Johnson, Gary 
##                         4                        87 
##           Kasich, John R.          Lessig, Lawrence 
##                       194                         2 
##            McMullin, Evan   O'Malley, Martin Joseph 
##                        22                         8 
##                Paul, Rand    Perry, James R. (Rick) 
##                       338                         2 
##              Rubio, Marco          Sanders, Bernard 
##                      1153                      7086 
##      Santorum, Richard J.               Stein, Jill 
##                        23                        38 
##          Trump, Donald J.             Walker, Scott 
##                     13148                       101 
##     Webb, James Henry Jr. 
##                         3 
## [1] 23

There were 23 candidates that recieved donation in this election cycle in Alabama. This database does no contain any information on candidate political party or gender. This will be added shortly.

A caveat here is this candidate information only represents candidates that recieved at least one donation, so does not serve as a list of all candidates that ran for president in this cycle.

From a cursory look, looking at the candidates names via the format here is unwieldy. Let’s use the last name for interpretability when doing EDA.

For most of the candidates listed here, I am able to ID which party they ran in. Apparently Lessig and McMullin ran as a Democrat and Independant, respectively.

- Creating party name vector containing candidates

## [1] 23

- Assigning Party IDs to Candidates

Now each row is assigned a party ID. Values were assigned using if-else statements, then condened into one column, “party”.

- Assigning Candidate Genders

Each candidate now has a gender.

- Cleaning candidate names

As stated earlier, last names are now being used for each cnadidate. This will make plots less un-wieldy for EDA.

- Dropping columns with not much descriptive info

- Cleaning the Date column

- Changing the nme column to unique IDs

This block above is for purposes of anonymizing the rows, but creating an index. This allows for some stats by donor, like number of donations, sums, means, etc.

EDA

Univarate - Donations

Donation Summary

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -7300.0    20.0    38.0   116.5   100.0 10800.0

A min of -$7300? How is this possible. Let’s look at a random donor ID.

##      cmte_id   cand_id cand_nm contbr_city contbr_zip contbr_employer
## 1: C00580100 P80001571   Trump  BIRMINGHAM      35209                
##    contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc
## 1:                               -7300       2016-12-01       Refund
##    memo_text form_tp election_tp party cand_gdr nameID
## 1:             SB28A       G2016     1        1   5151
##      cmte_id   cand_id cand_nm contbr_city contbr_zip
## 1: C00605568 P20002671 Johnson  BIRMINGHAM      35209
## 2: C00580100 P80001571   Trump  BIRMINGHAM      35209
## 3: C00580100 P80001571   Trump  BIRMINGHAM      35209
##                      contbr_employer                 contbr_occupation
## 1: TWICE REQUESTED, NOT YET RECEIVED TWICE REQUESTED, NOT YET RECEIVED
## 2:                                                                    
## 3:                     COBB THEATERS                             OWNER
##    contb_receipt_amt contb_receipt_dt receipt_desc memo_text form_tp
## 1:              2000       2016-09-30                          SA17A
## 2:             -7300       2016-12-01       Refund             SB28A
## 3:             10000       2016-11-02                          SA17A
##    election_tp party cand_gdr nameID
## 1:       G2016     4        1   5151
## 2:       G2016     1        1   5151
## 3:       G2016     1        1   5151

The receipt decription states “Refund”. This explains why there are some negative values. Let’s explore this further.

##        cmte_id   cand_id cand_nm contbr_city contbr_zip contbr_employer
##   1: C00580100 P80001571   Trump ALBERTVILLE      35951         RETIRED
##   2: C00580100 P80001571   Trump ALBERTVILLE      35951         RETIRED
##   3: C00580100 P80001571   Trump ALBERTVILLE      35951         RETIRED
##   4: C00580100 P80001571   Trump      ATHENS      35613         RETIRED
##   5: C00580100 P80001571   Trump      ATHENS      35613         RETIRED
##  ---                                                                   
## 930: C00580100 P80001571   Trump      JASPER      35504   SELF-EMPLOYED
## 931: C00575795 P00003392 Clinton    WETUMPKA  360931736                
## 932: C00575795 P00003392 Clinton  HUNTSVILLE  358013606                
## 933: C00575795 P00003392 Clinton  TUSCALOOSA  354042917                
## 934: C00575795 P00003392 Clinton  TUSCALOOSA  354042917                
##      contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc
##   1:           RETIRED               -28       2016-10-13             
##   2:           RETIRED               -28       2016-10-20             
##   3:           RETIRED               -28       2016-10-27             
##   4:           RETIRED               -80       2016-10-18             
##   5:           RETIRED               -80       2016-10-25             
##  ---                                                                  
## 930:           DENTIST               -80       2016-10-18             
## 931:                                 -75       2016-09-04       Refund
## 932:                                -100       2016-10-01       Refund
## 933:                                  -5       2016-10-14       Refund
## 934:                                -100       2016-10-14       Refund
##      memo_text form_tp election_tp party cand_gdr nameID
##   1:              SA18       G2016     1        1      4
##   2:              SA18       G2016     1        1      4
##   3:              SA18       G2016     1        1      4
##   4:              SA18       G2016     1        1     26
##   5:              SA18       G2016     1        1     26
##  ---                                                    
## 930:              SA18       G2016     1        1  16877
## 931:             SB28A       G2016     2        0   2060
## 932:             SB28A       G2016     2        0     90
## 933:             SB28A       G2016     2        0   1521
## 934:             SB28A       G2016     2        0   1521

There are 934 (~1.7% of total values) negative contributions. It is possible that these are refunds. Let’s look at the values with table().

## 
##                                            
##                                        379 
##                    REATTRIBUTION TO SPOUSE 
##                                         38 
## REATTRIBUTION TO SPOUSE; SEE REDESIGNATION 
##                                          1 
##           REDESIGNATION TO CRUZ FOR SENATE 
##                                         71 
##                   REDESIGNATION TO GENERAL 
##                                        180 
##      REDESIGNATION TO PRESIDENTIAL GENERAL 
##                                          1 
##                                     Refund 
##                                        264

For the majority of these, there is no description. Others were redesignated to presidential campaigns (Hilary Victory Fund, redisgnation). These will be adjusted to have positive values, and the term type changed to G2016 (for presidential contributions). Others were redistributed to spouses. All redistributed (to spouse and to general/senate) values occur in the dataset twice, and people who wanted refunds wanted their donation rescinded. The safest approach would be to remove all negative values.

Removing negative donations

Now all negative values are removed. Let’s look at the the new variable summary:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.24    23.00    40.00   124.15   100.00 10800.00

Interesting. The mean, median and 25/75% quartiles went up a few values. The data looks to skew left. Let’s look at the number of records for the higher end:

## [1] 556
## [1] 832
## [1] 1873

There aren’t many values above 500, so they will excluded from the following “quick and dirty” plot.

The variable is now ready for EDA!

####Donation Histogram

The distribution includes some very large outliers, which distorts the plot. To get a sense of it is showing we need to zoom in a bit more using coord_cartesion() and setting the limits to 0:500

That is cleaner, and we can see some patterns. Most donations were in the ranges of $0-100. Large upticks of donations occur at certain intervals (200, 250, 100, 50), as one would expect.

Let’s zoom in even further, on donations under 100.

Looking at the values zoomed in below 100, the most common values are 10, 25, 50, and 100. The majority of donations were below $50.

Donations by Candidate

- Summary statistics

##      cand_nm count totalDonations meanDonation medianDonation
##  1:    Trump 12864     1997848.40    155.30538          80.00
##  2:  Clinton 19615     1917914.35     97.77794          25.00
##  3:     Cruz  7723      790043.98    102.29755          50.00
##  4:   Carson  4338      595010.17    137.16233          50.00
##  5:    Rubio  1114      512313.73    459.88665         100.00
##  6:  Sanders  7029      301434.87     42.88446          27.00
##  7:     Bush   285      279389.00    980.31228         265.00
##  8: Huckabee   222       86866.50    391.29054         100.00
##  9:     Paul   334       57723.71    172.82548          50.00
## 10:   Walker   100       55356.00    553.56000         250.00
## 11:   Kasich   193       54883.00    284.36788         100.00
## 12:  Fiorina   235       31294.00    133.16596          50.00
## 13:  Johnson    86       25696.85    298.80058         100.00
## 14: Santorum    23       11108.00    482.95652         100.00
## 15:    Stein    37        5577.00    150.72973          50.00
## 16: Christie    21        3645.00    173.57143         100.00
## 17: O'Malley     8        2890.64    361.33000          20.16
## 18:   Jindal     4        2759.00    689.75000         104.50
## 19:   Graham    15        2355.00    157.00000          25.00
## 20: McMullin    21        1865.00     88.80952          50.00
## 21:   Lessig     2        1000.00    500.00000         500.00
## 22:     Webb     3         650.00    216.66667         100.00
## 23:    Perry     2         600.00    300.00000         300.00
##      cand_nm count totalDonations meanDonation medianDonation

- Which Candidates raised the most money?

Not surprisingly, Trump and Clinton raised the most money, followed by Cruz, Carson, and Rubio.

- Which Candidates had the most donations?

Clinton by and away had the highest number of distinct donations, followed by Trump, Cruz, Sanders, and Carson. Republican candidates appear to have much higher average donations, while the democratic candidates recieving much more smaller donations.

- Which Candidates had the highest average donation?

Bush and Walker had the highest average donation, though this doesn’t take into account sample size.I would predict that the candidates that recieved the most donations had lower mean donations.

- Which Candidates had the highest median donation?

Scott Walker and Jeb Bush had the highest median donations, both Republican. The Democratic candidates, Clinton and Sanders, had the lowest median donations.

- What is the distribution of donations for each candidate?

How much money was raised by party?

Republican candidates outraised Democratic candidates by almost a 2:1 margin. No other party came close, though Libertarian candidates get an honorable mention with 3rd.

How much money was raised by Race?

##    election_tp count totalDonations meanDonation medianDonation
## 1:       P2016 35160     4247493.10     120.8047             40
## 2:       G2016 18906     2410760.46     127.5130             40
## 3:               206       78970.64     383.3526            240
## 4:       O2016     2        1000.00     500.0000            500

More money was raised in the General, with almost twice as much money that was raised in the primary.

How much money was raised by Candidate for each race?

##      cand_nm election_tp count totalDonations meanDonation medianDonation
##  1:  Clinton       G2016 12341     1022292.72     82.83711         25.000
##  2:     Cruz       P2016  7572      753105.98     99.45932         50.000
##  3:  Clinton       P2016  7274      895621.63    123.12643         25.000
##  4:  Sanders       P2016  7029      301434.87     42.88446         27.000
##  5:    Trump       P2016  6424      628089.48     97.77233         40.000
##  6:    Trump       G2016  6244     1292418.92    206.98573         87.140
##  7:   Carson       P2016  4320      587275.17    135.94333         50.000
##  8:    Rubio       P2016  1103      494813.73    448.60719        100.000
##  9:     Paul       P2016   314       49505.74    157.66159         50.000
## 10:     Bush       P2016   284      279369.00    983.69366        282.500
## 11:  Fiorina       P2016   235       31294.00    133.16596         50.000
## 12: Huckabee       P2016   221       85866.50    388.53620        100.000
## 13:    Trump               196       77340.00    394.59184        282.000
## 14:   Kasich       P2016   193       54883.00    284.36788        100.000
## 15:     Cruz       G2016   151       36938.00    244.62252         35.000
## 16:   Walker       P2016   100       55356.00    553.56000        250.000
## 17:  Johnson       G2016    84       22896.85    272.58155        100.000
## 18: Santorum       P2016    23       11108.00    482.95652        100.000
## 19: Christie       P2016    21        3645.00    173.57143        100.000
## 20:    Stein       G2016    20        1426.00     71.30000         27.000
## 21:     Paul       G2016    20        8217.97    410.89850        281.875
## 22:   Carson       G2016    18        7735.00    429.72222        100.000
## 23: McMullin       G2016    16        1315.00     82.18750         50.000
## 24:    Stein       P2016    15        3151.00    210.06667        100.000
## 25:   Graham       P2016    15        2355.00    157.00000         25.000
## 26:    Rubio       G2016    11       17500.00   1590.90909       2400.000
## 27: McMullin                 5         550.00    110.00000        100.000
## 28: O'Malley                 4          80.64     20.16000         20.160
## 29: O'Malley       P2016     4        2810.00    702.50000         50.000
## 30:   Jindal       P2016     4        2759.00    689.75000        104.500
## 31:     Webb       P2016     3         650.00    216.66667        100.000
## 32:    Stein       O2016     2        1000.00    500.00000        500.000
## 33:    Perry       P2016     2         600.00    300.00000        300.000
## 34:   Lessig       P2016     2        1000.00    500.00000        500.000
## 35:  Johnson       P2016     2        2800.00   1400.00000       1400.000
## 36: Huckabee                 1        1000.00   1000.00000       1000.000
## 37:     Bush       G2016     1          20.00     20.00000         20.000
##      cand_nm election_tp count totalDonations meanDonation medianDonation

With this data now grouped and separated out, let’s look at the most popular candidates in Alabama: Clinton, Trump, Sanders, Cruz, Rubio, and Carson and categorize money raised by race.

Observations: 1) Clinton raised the most money of any candidate in the Primaries 2) Cruz by a substantial margin raised the most money of any republican candidate during the primary, followed by Trump, Carson and Rubio 3) Trump raised the most money total, with the majority coming during the General 4) Trump outraised Hilary in the General

I next want to look at donations as time-series, by candidate and race.

Time Series Analysis of Donations

##      contb_receipt_dt count totalDonations meanDonation medianDonation
##   1:       2014-12-22     1           2000         2000           2000
##   2:       2014-12-29     1            128          128            128
##   3:       2014-12-30     2           1250          625            625
##   4:       2015-02-10     1             25           25             25
##   5:       2015-03-03     3            750          250            250
##  ---                                                                  
## 638:       2016-12-20     1            150          150            150
## 639:       2016-12-22     1            250          250            250
## 640:       2016-12-25     1            250          250            250
## 641:       2016-12-29     1             35           35             35
## 642:       2016-12-30     1            200          200            200

- How much money was raised per day?

Observations: 1) The day with the largest sum of donations was a day in January, before the primary season, which started in February 2) The day with the second largest sum was February 29th, the day before “Super Tuesday” (March 1st), in which there were 12 primaries (https://www.washingtonpost.com/graphics/politics/2016-election/primaries/schedule/) 3) The next largest days were in mid to late July, mirroring the dates of the Republican and Democratic Conventions (https://en.wikipedia.org/wiki/2016_Democratic_National_Convention, https://en.wikipedia.org/wiki/2016_Republican_National_Convention) 4) After the conventions (General Election race), the daily donation sums picked up consistently until the end of the race in the beginning of November 5) Donations during the General Election race (after July 1st), seem to have a cyclical pattern, I hypothesize representing weekly cycles

- How many donations were there per day?

Observations: 1) We see multiple trends here: rising and falling around Super Tuesday (March 1st), then spikes in July (due to party conventions), and then steady increases between August and November 2)The days with the highest number of donations were July 11/12, which each had over 1000 donations, which was a week before the Republican Party Convention (July 18) 3) The number of donations per day steadily rose after July (during the General Election Race), from August to November 4) The number of donations per day rose as primary season approached (the first primary was February 1st, 2016)

- What was the mean and median donations per day?

- Mean

While there were not many donations (by count) early in the election cycle, the average donations were extremely high, compared to donations post November 2015.This suggests few donors with larger pockets making donations early.

- Median

Similar to the plot above, days earlier in the campagin cycle had significant median donations compared to days in the last year of the election cycle.

Time Series Donations by Candidate

##       contb_receipt_dt cand_nm count totalDonations meanDonation
##    1:       2014-12-22    Paul     1           2000         2000
##    2:       2014-12-29    Paul     1            128          128
##    3:       2014-12-30    Paul     2           1250          625
##    4:       2015-02-10  Graham     1             25           25
##    5:       2015-03-03  Carson     3            750          250
##   ---                                                           
## 2928:       2016-12-20   Trump     1            150          150
## 2929:       2016-12-22   Trump     1            250          250
## 2930:       2016-12-25   Trump     1            250          250
## 2931:       2016-12-29   Trump     1             35           35
## 2932:       2016-12-30   Trump     1            200          200
##       medianDonation
##    1:           2000
##    2:            128
##    3:            625
##    4:             25
##    5:            250
##   ---               
## 2928:            150
## 2929:            250
## 2930:            250
## 2931:             35
## 2932:            200

- How much total money was raised per day per candidate?

Looking at the Sum of donations for all candidate here is hard. Let’s look at the major candidates instead, to get a better view.

- Distribution for candidates with 1000+ donations?

For interetability purposes.

Filtering donations by candidate

Observations: 1) Carson had large donation totals in the beginning, and for a while was the leading fundraiser 2) Trump raised the majority of his donations after he won the nomination (post July 2016) 3) CLinton consistently outrose Sanders early on, and after winning the nomination had an upward trend in donations 4) With republican candidates other than Trump, Carson was the initial leader in fundraising, but when his numbers fell. Rubio’s rose, and when Rubio’s fell, Cruz’s rose. Here we can see donors most likely reacting to performance in polls and primary races. Until he won the nomination, Trump was consistenly behind in fundraising among candidates

- Number of Donations per major candidate over time?

Observations: 1) Among Republican candidates in the Primary, Carson had consistently higher number of donations per day, until Cruz over took him. 2) Number of daily donations for Trump and Clinton, the eventual Primary winners, did not pick up until the July 2016, where you see a larg spike in Trump donations and a sharp positive trend for Clinton which grew stronger over time 3) Rubio had consistenly the lowest number of donations, save for Trump early in the primary season. This is interesting because for a time Rubio was raising large amounts, ahead of other candidates (in the chart above), hinting that his average donation was the highest amount of any candidate

- Median Donations over time per major candidate?

Observations: 1) Rubio, as hinted at above, had days with high median amounts 2) Clinton and Trump had donations early on that were high in median value, while carson and Sanders had days with the lowest amounts 3) Clinton;’s median donation decrease over time, as she recieved more donations 4) Aside from some major spikes, Cruz’s median donation were mostly small amounts

- Donations over time for Democratic Candidates

I want to analyze some of the fundraising over time focusing on the democratic candidates- Clinton and Sanders. While I do not have any polling data, knowing that Clinton won (https://en.wikipedia.org/wiki/Alabama_Democratic_primary,_2016), it will be interesting to analyze with that in mind.

Clinton consistently outrose Sanders. You can see Sanders trend up to its peak before March 1st, when donations subsequently started falling. I would hypothesize that this has to do with his performance on Super Tuesday (March 1st). Another interesting trend is the sharp increase in July, when the Democratic Convention occured, followed by consistent increases as the General Election approached.

- How many donations did each Democratic candidate recieve per day over time?

Looking at the donation counts, we see that Sanders was in fact comptetive with Clinton, at least in contrast for the total daily sum raised. For a period of time, Sanders ever had higher number of daily donations than Clinton did.

- Donations over time for Republican Candidates

Observations: 1) Early on that Carson and Cruz were the early leading fundraisers 2) Rubio had a large spike in December 2015, which seems to have also signalled the end of Carson’s candicacy, as he doesn’t appear to be leading at any other point 3) Trump had a late blooming in terms of fundraising. There weren’t consistently large daily sums of donations until he won the nomination between May 2016 and July 2016 4) Late in Trumps campaigns there were consistent decrease in donation sums

Here we can see a transition between candidates receiving the most donations per day, initially with Carson, then Cruz, then Trump after May 2016. A lower level of time granularity could help here.

Donation Time series by week

Perhaps a bit too granular. Let’s group by week.

- Sum of Weekly Donations

Right away, we can see how decreasing time granularity makes it easier to discern trends. We see an overall positive trend in donations sums from July 2015 to November 2016. We can also see the 2 distinct giving periods: the Primary season from July 15 to March 1st, then the General Election season from July 2016 to November 2016. Weeks in the general season had higher amounts of giving on average than weeks in the Primary season.

- Number of Weekly Donations

In contrast to the Sum raised by week in the previous plot, we see a clear positive trend from the beginning of the election cycle in March 2015 to November 2016. In other words, the number of donations generally increased as the Election neered. We also see some large spikes: The week of “Super Tuesday”, March 1st, The Republican and Democratic Conventions (2 weeks apart in July), and the week before the election (November 1st, 2016)

- Sum of Weekly Donations by Candidate

While a bit of a mess, we can see how Trump and Clinton’s fundraising efforts progressed during the General Election season (July 2017 on), as well as Rubio’s big week in December 2015. Let’s use facet wrap to make things cleaner.

Now this is a lot cleaner. Observations: 1) Carson started off strong compared to the other republican candidates, but soon faltered and gave way to Cruz 2) Aside of a few big weeks, Rubio did not raise consistently higher amounts 3) Sanders did not raise nearly as much money as Clinton at any period 4) The majority of Trump donations occured after he had won the nomination

- Count of Donations by major candidate per week

Observations: 1) Sanders was comptetitve with Clinton on number of donations, however, and you can see his campaign gaining steam throughout the primaries, until it fell off after March 1st, the week of 12 primaries (Super Tuesday) 2) Cruz had consistently higher amounts of donations per week than any other republican candidate in the primaries 3) The Republican Convention was a boon for Trump’s campaign, which had the highest weekly donation total of any candidate

####- Sum of Donations by Political Party per week

Republicans had consistenly higher weekly donation totals than Democrats did.

Final Plots

1. Total Donation Amounts of Major Candidates by Election Type

This is a key plot due to the following insights it reveals: 1) During the primary season, each republican candidate (Trump, Cruz, Carson, Rubio) raised similar amounts, and Trump, the eventual General nominee, lost to Cruz by a decent margin and barely edged out Carson for second most money raised. 2) Trump eventually won the state during the General Election, and this aligns with him raising a substantial amount more than Clinton during the General. 3) This answers a simpler question: which candidates raised the most money. Trump and Clinton predictably reaised the most.

2. Average Donation by Candidate

Why is this plot interesting? 1) It shows which candidates had higher average donations. This hints at the demographics of the donors- if you make the assumption that people with higher incomes give more than people with lower incomes. 2) The two Democratic candidates in the race, Sanders and Clinton, had the lowest average donation amount. 3) Jeb Bush, who didn’t make it far in the primary season, had by far the highest average donation among candidates with 100+ donations.

3. Weekly Donation Sums

This plot shows: 1) The highest weekly amounts raised in the cycle occured during the Party Conventions in July 2) Fundraising in the general (post July 2016) was consistently higher on a weekly basis than in the primaries 3) Aside from a spike in December 2015 (due to a large Rubio value), the highest amount raised in the Primary season was the weeks around Super Tuesday, March 1st. As this day involved 12 primaries, it is an important day for all campaigns, and it showed here with higher donations.

Project Take-aways and Reflection

Where did I run into difficulties?

When analyzing a publically available datset like this, there is a very high chance it will require significant pre-processing and feature creation. This was very true here. This can be made time-consuming when you take into account domain knowledge- coming in, I didn’t know much about campaign donations, as such when cleaning the data I had to ensure I knew what I was dealing with and understood what each column meant.

For demographic information, there was a lot of missing or incomplete and incosistent information. For example employment title and employer made analysis of donor demographics not possible here. As such, many features that could have provided great insight were thrown away.

Where did I find success?

I was able to answer my main, overarching questions, find some interesting trends, and create some very informative plots. This dataset was also ripe for time-series analysis, and I was able to get a new look at political campaigning, albeit in one out of 50 states.

How could the analysis be enriched in future work (e.g. additional data and analyses)?

I would like to take some of the spatial data here, like cities and zipcodes, join them with outside population data, and analyze the counties/zip codes/cities that raised the most money. Another possible project would be looking at the time series data in conjunction with polling data, and seeing if future donations could be modeled based on polling data and primary scheduling.