NC Election Contribution Analysis

In this exlporatory data analysis we will take a look at all the presidential campaign contributions from the 2016 presidential election that originated from the state of North Carolina. This data comes from the Federal Election Commision and can be found here.

First lets import the data and take a look at what we are working with in the dataset

## [1] 158456     18
## Classes 'tbl_df', 'tbl' and 'data.frame':    158456 obs. of  18 variables:
##  $ cmte_id          : chr  "C00580100" "C00577130" "C00577130" "C00575795" ...
##  $ cand_id          : chr  "P80001571" "P60007168" "P60007168" "P00003392" ...
##  $ cand_nm          : chr  "Trump, Donald J." "Sanders, Bernard" "Sanders, Bernard" "Clinton, Hillary Rodham" ...
##  $ contbr_nm        : chr  "SELLATI, KASEY" "LYANSKY, YAN" "HAYWARD, KELLY" "DREHMEL, CLAIRE" ...
##  $ contbr_city      : chr  "CARY" "MC LEANSVILLE" "PINEVILLE" "APEX" ...
##  $ contbr_st        : chr  "NC" "NC" "NC" "NC" ...
##  $ contbr_zip       : int  27519 273019765 281347558 275398332 282114253 28037 27524 286552648 281177030 28054 ...
##  $ contbr_employer  : chr  "SAVER MAGAZINE" "BENNETT COLLEGE" "AMERICAN AIRLINES" "NORTH CAROLINA STATE UNIVERSITY" ...
##  $ contbr_occupation: chr  "SELF EMPLOYED" "PROFESSOR" "CUSTOMER SERVICE AGENT" "PROFESSOR" ...
##  $ contb_receipt_amt: num  52 27 33 60 116 ...
##  $ contb_receipt_dt : chr  "28-NOV-16" "05-MAR-16" "06-MAR-16" "26-APR-16" ...
##  $ receipt_desc     : chr  NA NA NA NA ...
##  $ memo_cd          : chr  "X" NA NA "X" ...
##  $ memo_text        : chr  NA "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" "* HILLARY VICTORY FUND" ...
##  $ form_tp          : chr  "SA18" "SA17A" "SA17A" "SA18" ...
##  $ file_num         : int  1146165 1077404 1077404 1091718 1091718 1146165 1146165 1091718 1091718 1146165 ...
##  $ tran_id          : chr  "SA18.168022" "VPF7BKXT595" "VPF7BKZ8J22" "C4768028" ...
##  $ election_tp      : chr  "G2016" "P2016" "P2016" "P2016" ...
##  - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 158456 obs. of  5 variables:
##  - attr(*, "spec")=List of 2
##   ..- attr(*, "class")= chr "col_spec"

So this data contains 158,456 different contributions to different election campaigns and 18 variables containing such information as contribution amount, name of the contributor, and contribution date. However, a quick look shows that date is a chr vector so that will need to be corrected since date will be a key variable. The best way to do that is to change it using the as.Date() function. I’m also going to clean up the zip codes and remove the extra digits only keeping the five digit root.

Contribution Distributions

There’s not a whole lot of numeric data in these tables except for one of the main data points which is actual contribution amounts so lets take a look at the summary of that data.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -5400.00    15.00    27.00    98.84    80.00 10800.00

Let’s take a look at the histogram of the distribution of campaign contribution amounts

There are some negative contribution amounts here that indicate refunds of contributions for various reason. I don’t think they’ll be usefull to include because I want to look at the population of what was actually given so I’ll filter those out and see how the distribution of the contributions look now.

That’s still a pretty heavily skewed data set so let’s take a look at the distributions with a log10 scale on the contribution amounts. I’ll also break down how many contributions there where per political party in this next graph as well. To do this I went through and added a column that detailed the party of the contribution based on the candidate. Clinton, Webb, O’Malley, and Sanders were Deomcrats, MacMullin, Johnson, and Stein were Independent, and the rest were grouped as Republicans. I also turned this column into a factor, along with the candidate name column, so I can perform other operations on it later.