2016 Florida Presidential Campaign Contribution

— Krishna Thapa

Abstract

Election season is here! As usual, this episode of American politics is going to be fun to watch. Can we predict who is going to win? If this was a horse race–would you bet on a particular horse by looking at “horse related data”? Let’s find out!

As of July 20, 2016, FEC record shows that 827.6 million US dollars have been collected for the election effort. Florida is one of the election battleground states and it would be interesting to know the contribution each candidate or party received so far. In this study, we’re going to explore campaign contribution for 2016 presidential race in the state of Florida. We’re going ask questions like–which party received more contribution, which region supports Democrats vs Republicans?


Dataset

I downloaded the dataset from FEC website: http://fec.gov/disclosurep/pnational.do

The dataset for Florida has more than 200k observations so far. We take a random sample of just 10k for this analysis.

## 'data.frame':    10000 obs. of  18 variables:
##  $ cmte_id          : Factor w/ 21 levels "C00458844","C00500587",..: 4 7 13 6 1 6 3 6 7 7 ...
##  $ cand_id          : Factor w/ 21 levels "P00003392","P20002671",..: 10 12 16 1 11 1 8 1 12 12 ...
##  $ cand_nm          : Factor w/ 21 levels "Bush, Jeb","Carson, Benjamin S.",..: 5 16 1 4 15 4 2 4 16 16 ...
##  $ contbr_nm        : Factor w/ 7450 levels "ABBOTT, GREG",..: 4843 5838 1195 7419 4716 2251 3420 3001 6594 6098 ...
##  $ contbr_city      : Factor w/ 506 levels "ALACHUA","ALFORD",..: 270 103 82 220 5 297 266 402 331 188 ...
##  $ contbr_st        : Factor w/ 1 level "FL": 1 1 1 1 1 1 1 1 1 1 ...
##  $ contbr_zip       : int  331662741 327387837 323271510 321597714 327029051 341088232 329047496 337162928 328065071 330352224 ...
##  $ contbr_employer  : Factor w/ 2654 levels "",".","(GOV&PRES RONALD REAGAN-SENIOR GOVERNM",..: 1226 1701 884 828 1975 1975 1975 1859 1689 1393 ...
##  $ contbr_occupation: Factor w/ 1740 levels "","3D ARTIST",..: 752 1036 1505 1133 1349 1579 1349 1579 1429 1056 ...
##  $ contb_receipt_amt: num  216 10 50 100 200 50 250 10 15 10 ...
##  $ contb_receipt_dt : Factor w/ 432 levels "01-APR-15","01-APR-16",..: 412 358 326 9 335 43 257 358 51 193 ...
##  $ receipt_desc     : Factor w/ 18 levels "","* EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_cd          : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_text        : Factor w/ 35 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 2 1 1 1 1 1 1 2 2 ...
##  $ form_tp          : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ file_num         : int  1077664 1077648 1047239 1066337 1047126 1073673 1073637 1058609 1077404 1079424 ...
##  $ tran_id          : Factor w/ 9995 levels "A01D606C58A5144BEB81",..: 6031 7172 3606 1041 5314 1502 3037 691 7788 9652 ...
##  $ election_tp      : Factor w/ 3 levels "","G2016","P2016": 3 3 3 3 3 3 3 3 3 3 ...

We see that there are total of 23 candidates who received campaign contributions. Strangely, zipcode associated with contributors is a huge number. The column election_tp has codes to indicate which election the contribution was made for (General, Primary, Runoff,etc.).

Before making plots, let’s find the number of negative contribution (should be 0)

## [1] 192

Since non-zero number of campaign contribution are negative values, we convert the column to it’s absolute value and make a histogram of contribution amount.

## 
##       G2016 P2016 
##     1   141  9858

We also notice that 141 observations in the sample are allocated for general election and the rest is for the primary election.

Memo for the contribution has some interesting information, which is not investigated further in this study. Long names containing strings “;” or “/” were dropped out for the above plot.

Data Cleaning

The Presidential Campaign contribution data comes with 18 different columns, some of which are of no interest for this analysis. Take, for instance, file_nm, which is the file number associated with a particular contribution.

We also have contbr_zip column which is supposed to have zip code associated with the contributor. Unfortunately, most of the codes are more than 5 digits long. Stripping down to first five digits seems to produce sensible result.

When we are looking at campaign contribution, we might want to know which party did a particular contribution go to. The downloaded data has information about which candidate was the contribution for. This can be translated to a party. So, we add a new column to the dataframe with “party” header and we assign a label to the column depending on the candidate and their party affiliation.

We also might be interested in knowing “when” did the contribution come through. How did contributions vary over time for a given candidate? To that end, we create four new columns–month, day, year, mth_yr using contb_receipt_dt column. Month column is also converted from character to numerical value. We then drop unwanted columns from our dataframe.

Above plot, with aes(fill=c(year, party)):

Now, I would also like to see the campaign contributions county-wise. To that end, I take the city_name of each observation and use ggmap/maptools to extract the latitude/longitude for that city. Then, thanks to answer from this stackoverflow question, I map each (lat,long) value to county. There were couple of counties that geocode could not find appropriate county for (got NAs). These special cases were handled manually (via google search). I did not consider peculiar cases like longboat key and boca grande cities which fell under two different counties. Out of 10000 samples, 177 were dropped out.

Hint: geocode performs poorly for coastal/beach cities.

## 'data.frame':    9263 obs. of  23 variables:
##  $ search_city      : Factor w/ 506 levels "ALACHUA florida",..: 1 1 1 1 1 1 1 2 2 3 ...
##  $ cand_id          : Factor w/ 21 levels "P00003392","P20002671",..: 12 1 1 10 12 12 12 12 12 12 ...
##  $ cand_nm          : Factor w/ 21 levels "Bush, Jeb","Carson, Benjamin S.",..: 16 4 4 5 16 16 16 16 16 16 ...
##  $ contbr_nm        : Factor w/ 7450 levels "ABBOTT, GREG",..: 1721 7387 7387 3942 5692 731 6042 3680 3680 7430 ...
##  $ contbr_city      : chr  "ALACHUA" "ALACHUA" "ALACHUA" "ALACHUA" ...
##  $ contbr_employer  : Factor w/ 2654 levels "",".","(GOV&PRES RONALD REAGAN-SENIOR GOVERNM",..: 1042 1095 1095 1975 2108 1689 1701 1701 1689 1920 ...
##  $ contbr_occupation: Factor w/ 1740 levels "","3D ARTIST",..: 1484 1038 1038 1349 1429 1036 1036 1036 1036 408 ...
##  $ contb_receipt_amt: num  300 19 5 50 15 50 250 30 25 10 ...
##  $ contb_receipt_dt : Factor w/ 432 levels "01-APR-15","01-APR-16",..: 426 386 407 67 372 91 103 376 248 363 ...
##  $ receipt_desc     : Factor w/ 18 levels "","* EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_cd          : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
##  $ memo_text        : Factor w/ 35 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 2 1 1 1 2 1 2 2 2 2 ...
##  $ form_tp          : Factor w/ 3 levels "SA17A","SA18",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ election_tp      : Factor w/ 3 levels "","G2016","P2016": 3 3 3 3 3 3 3 3 3 3 ...
##  $ zip              : num  32615 32615 32615 32615 32616 ...
##  $ party            : Factor w/ 4 levels "D","G","L","R": 1 1 1 4 1 1 1 1 1 1 ...
##  $ day              : Factor w/ 31 levels "01","02","03",..: 31 28 29 5 27 7 8 27 18 26 ...
##  $ month            : Factor w/ 12 levels "APR","AUG","DEC",..: 5 4 9 10 4 8 6 8 9 9 ...
##  $ year             : Factor w/ 4 levels "13","14","15",..: 4 4 4 3 4 4 3 4 4 4 ...
##  $ mth_yr           : Factor w/ 18 levels "APR-15","APR-16",..: 7 6 13 15 6 11 8 11 13 13 ...
##  $ lon              : num  -82.5 -82.5 -82.5 -82.5 -82.5 ...
##  $ lat              : num  29.8 29.8 29.8 29.8 29.8 ...
##  $ county           : Factor w/ 60 levels "alachua","baker",..: 1 1 1 1 1 1 1 27 27 50 ...

Analysis

Now, we’re ready to make some plots.

How does contribution vary with date?

I would like to know if people are more charitable to political causes in some seasons/moth-of-year. One can expect things to rise up as election day approaches. For simplicity, let’s just exclude Libertarian and Green party.

Except for the month of may, republican candidates/party received more contributions. Perhaps the number of republican candidates (n = 14) has something to do with it?

Total contributions by year

We see that the contributions spiked up in July of 2015. Data also suggests contributions started coming in from March of 2015, perhaps suggesting that some candidates announced their candidacy. The median contribution from early contributors was generally high in early 2015.

Day of month

Day of week

People seem to be contributing more on Monday and Tuesday, and less so on Saturday and Sunday.

Where were the highest contributors employed?

Over 400k contributions from retirees. Surprisingly, unemployed also contributed handsomely to come within top 95th percentile. It would be interesting to see if same categories hold true for other states.

Let’s look at contribution to party candidates separately

Republicans

Democrats

Ratio of contributions (D/R)

We see that contribution to republican party and candidates started getting larger than those of democrats from July 2015 and continues till may of 2016.

Party wise contributions

There are 2 contributions for Green party, and one for libertarian. Excluding those, let’s see what box plot looks like for democrats and republicans

## 
##    D    G    L    R 
## 5584    2    1 3675

Interestingly, democrats had more number of contributions than the republicans.

Which region sent contributions the most?

Let’s see if geographic location can reveal anything about campaign contribution.

Which county contributed the most?

Palm beach county sent the largest amount of contributions, followed by brevard county. Most counties with biggest contributions are giving to republicans; democrats seem to receive comparatively higher amount of contributions from counties that send comparatively less.

7 counties–from Calhoun, Franklin, Glades, Holmes, Lafayette, Liberty, and Monroe is missing (or not available in the samples I picked). More number of contributions from the southern counties. Median contributions for democrats is 27 and for republicans is 50.

Who contributes more?

Counties from the southern Florida seem to be contributing more in numbers, and also in amount. Wonder how this correlates to median income. Anyway, moving along…

Which party gets more love from where?

Just like earlier, let’s make a ratio plot.

More median contributions to republicans in almost all counties.

Final Plots and Summary

Distribution of campaign contribution

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.45    19.00    35.00   211.80   100.00 10800.00
## final$party: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.45   13.00   27.00  108.50   50.00 5000.00 
## -------------------------------------------------------- 
## final$party: G
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   25.00   81.25  137.50  137.50  193.80  250.00 
## -------------------------------------------------------- 
## final$party: L
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    1000    1000    1000    1000    1000 
## -------------------------------------------------------- 
## final$party: R
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    25.0    50.0   368.4   200.0 10800.0

Description

The plot above shows a right-tailed distribution of contributions. About 60% of samples were for democratic party/candidates. However, republican candidates received 2 times more in contributions. This is because the median contribution for democrats and republicans were 27 and 50 respectively.

Interestingly, contributions that were made in 2015 were higher than that of 2016. The median contribution for democrats in ’15 was 50 and that for republicans was 100. This makes sense because the early backers to candidates for both party came from the right tail of the distribution–meaning they tended to be wealthier donors than the grass-root supporters who contributed lesser amount on average.

Contribution to republican candidates

Description

There were 15 candidates on the republican side who received some campaign contribution from the state of Florida. The mean contribution received by most candidates is generally less than 100, but there were few exceptions. Chris Christie and Rick Santorum received median contribution of $2600 each. Only four candidates–Jeb Bush, Ted Cruz, Marco Rubio, and Benjamin Carson received more than $100,000 in contributions. Among all these candidates, Ted Cruz received more number of contributions (median of 50).

We now know that Donald Trump won the Florida primary with ~45% of the votes, Marco Rubio at ~27%, and Ted Cruz at 17%, and Kasich at ~7%. Campaign contributions from Florida residents seems to be not-so-good indicator of the final winner.

County wide contribution ratio

Description

In the above plot, D/R is the ratio of amount receaved by the Democrats to Republicans. The white colored counties are sending very little contribution or the sample does not contain observation for that particular county. Looking at the rest of the counties, we see that contribution is mostly going to republicans. Democrats seem to be receiving comparatively less contribution than the republicans from the southern counties. Floridians from the west-coast seem to be sending just about equal amount of contributions to democrats and republicans.


Reflection

Looking back, it was probably naive to think that one could predict “winning horse” by this limited set of data. Donald Trump won the primary election in Florida despite having fewer number (and amount) of contribution. Jeb Bush, despite being a former Governor of the state and being only few of the candidates to have received more than 100k in contributions from Floridians, left the race before the Florida elections. Ted Cruz came in third despite having larger number of contributors.

Within democrats, Bernie Sanders received more number of contributions but less amount (median Bernie supporters contributed $27, to Hillary’s $38). Hillary Clinton won in Florida primary eventually.

Party wise, I was surprised to see that there were many more contributions to democrats than to republicans despite the latter having many candidates. It was also surprising to see that many counties sent much more contribution to republicans than to democrats especially since Florida is a battleground state. The higher median contribution to republicans explains most of this peculiar pattern. The distribution of contribution – right long tailed for each party and each year was itself very interesting.

To be able to predict a winner in these race, one should of course take into account what is happening nationally. Trump, for instance, was winning primaries in other states which carried over the momentum to Florida (and same goes to Hillary). Some sort of national sentiment poll in these states prior to the election could have a better prediction power over the campaign contribution data. That being said, campaign contribution data does add value to our understanding of the race. Candidates with less number of contributors and less amount of money to spend are more likely to loose.

The dataset itself is rich–main limitation of the data is that the campaign contributors are a self-selected groups. Religious voters could contribute in higher amount for a particular candidate, for instance, but the overall percentage of voters for whom religion is the primary ballot concern could be far less. Other contributing factors like unemployment rate, race, gender, level of education, candidates’ overall popularity/likability among the general population could provide us with much more robust tools for modeling data. Furthermore, similar data (along with results) from earlier elections could potentially be used to train the data model which can then be used to predict a winner in future elections.