R Bridge Final Project: Campaign Donations by Committees

The Federal Election Commission (FEC) provides data on committees, candidates and campaign finances for the current election cycle and for election cycles through 1980 in regards to all registered congressional, senate and presidential candidates.

During this campaign year (2015-2016), committee-based contributions are frequently front-page news stories. Committees are defined by the FEC as federal political action committees and party committees, campaign committees for presidential, house and senate candidates, as well as groups or organizations who are spending money for or against candidates for federal office. This project will take a look committee based contributions during this election cycle.

I. Getting the Data

The datasets are split into two parts, i.e., a candidate master file and a committee transaction file. The committee file (committ) contains all the transactions with a foreign key reference to the candidate, which will require the candidate master file (cand) to reverse the alias. Header files were also available provided by the FEC, which were used to make assembling the data easier. All the data acquired from the FEC site was imported into a GitHub repo and getURL, from the RCurl package, was used to get the text.

# Candidate Data HTML
cand_f <- ("https://raw.githubusercontent.com/Liam-O/R_BridgeFinal_csv/master/cn.txt")
#Committee Data HTML
committ_f <- "https://raw.githubusercontent.com/Liam-O/R_BridgeFinal_csv/master/itpas2.txt"


# Data Headers
cand_head <- "https://raw.githubusercontent.com/Liam-O/R_BridgeFinal_csv/master/cn_header_file.csv"
committ_head <- "https://raw.githubusercontent.com/Liam-O/R_BridgeFinal_csv/master/pas2_header_file.csv"

# Read Data
cand <- read.table(text = getURL(cand_f),
                   header = FALSE,
                   sep = "|",
                   quote = "",
                   col.names = read.csv(text = getURL(cand_head), sep = ",", header = FALSE,
                                        stringsAsFactors = FALSE ),
                   comment.char = "",
                   fill = TRUE,
                   strip.white = TRUE,
                   stringsAsFactors = FALSE)
committ <- read.table(text = getURL(committ_f),
                     header = FALSE,
                     sep = "|",
                     quote = "",
                     col.names = read.csv(text = getURL(committ_head), sep = ",", header = FALSE,
                                          stringsAsFactors = FALSE ),
                     comment.char = "",
                     fill = TRUE,
                     strip.white = TRUE,
                     stringsAsFactors = FALSE)

II. Joining and Subsetting the Data

The committee and candidate datasets contain much more data than is required by this routine analysis, so they both will be subset then joined using the respective candidate key, i.e. CAND_ID:

# Candidate subset for the 2016 Presidential Election
cand_sub <- subset(cand, CAND_OFFICE == "P" & CAND_ELECTION_YR == 2016, select = c(1,2,4,6))
# Committee subset
committ_sub <- subset(committ, select = c(1, 4, 7, 8, 14, 15, 17))
# The merged data frame for election donations for the 2016 presidential candidates
elec_don <- merge(x = committ_sub, y = cand_sub, by = "CAND_ID")

Taking a look at at the campaign totals for each candidate, we see how much money each candidate was given by a committee as of June 30th, 2016 (the oldest transaction date in the dataset):

elec_don_sum <- aggregate(TRANSACTION_AMT ~ CAND_NAME, elec_don, FUN = sum)
elec_don_sum <- elec_don_sum[order(-elec_don_sum$TRANSACTION_AMT),]
row.names(elec_don_sum) <- c(1:nrow(elec_don_sum))
elec_don_sum

##                                            CAND_NAME TRANSACTION_AMT
## 1                                          BUSH, JEB        87261552
## 2                TRUMP, DONALD J. / MICHAEL R. PENCE        63783172
## 3                                       RUBIO, MARCO        48665104
## 4  / KAINE, TIMOTHY MICHAEL, CLINTON, HILLARY RODHAM        29891265
## 5                          CRUZ, RAFAEL EDWARD "TED"        29025599
## 6                            CHRISTIE, CHRISTOPHER J        23416057
## 7                                     KASICH, JOHN R        19900219
## 8                                         PAUL, RAND         6865191
## 9                                   SANDERS, BERNARD         6317961
## 10                          CARSON, BENJAMIN S SR MD         5324216
## 11                                    FIORINA, CARLY         3875776
## 12                                 GRAHAM, LINDSEY O         3557034
## 13                                     JINDAL, BOBBY         3054106
## 14                                    HUCKABEE, MIKE         2955746
## 15                                     WALKER, SCOTT         2415197
## 16                             PERRY, JAMES R (RICK)         2389621
## 17                           O'MALLEY, MARTIN JOSEPH          454929
## 18                                      BIDEN, JOE R          306446
## 19                              SANTORUM, RICHARD J.          179130
## 20                                  PATAKI, GEORGE E          124583
## 21                 WILLIAM BILL WELD, GARY JOHNSON /           12747
## 22                                     CLINTON, BILL            1874
## 23                             PETERSEN, AUSTIN WADE            1000
## 24                                       SMITH, RYAN            1000
## 25                        TRUMP, THE MUSLIM DICTATOR             445
## 26                                   WALKER, DORIS V             245
## 27                               WITTER, JUSTIN PAUL              76

From above, the number one recipient of committee funds, Jeb Bush, got pushed out of the race fairly early. The number two and four candidates, Trump and Clinton respectively, won their party’s nomination. Sander’s, who was in the primary almost to the end, ranks 9th in terms of committee funding. This summary of the dataset shows that the candidate that raises the most money is not necessarily the one who does the best at the ballot box. I am sure the respective committees are upset about their nearly $90 million investment in Bush who flamed out relatively early in the race.

Since there were a lot of candidates in this years election, we will look at the big two from each big party, i.e. Trump-Cruz for the Republicans and Clinton-Sanders for the Democrats. While we narrow down the dataset a bit further, the transaction date (TRANSACTION_DT) formatting is a non-zero padded numeric. In order to properly do analysis with this field, we will reformat it now in a way that R will understand:

# Renaming big 4 candidates:
# TRUMP
elec_don$CAND_NAME[elec_don$CAND_ID == "P80001571"] <- "TRUMP"
# CRUZ
elec_don$CAND_NAME[elec_don$CAND_ID == "P60006111"] <- "CRUZ"
# CLINTON
elec_don$CAND_NAME[elec_don$CAND_ID == "P00003392"] <- "CLINTON"
# SANDERS
elec_don$CAND_NAME[elec_don$CAND_ID == "P60007168"] <- "SANDERS"

# Subset of big 4 candidates:
elec_don <- subset(elec_don, CAND_NAME == c("TRUMP", "CRUZ", "CLINTON", "SANDERS"))

# Reformat Date:
elec_don$TRANSACTION_DT <- formatC(elec_don$TRANSACTION_DT, width = 8, format = "d", flag = "0")
elec_don$TRANSACTION_DT <- as.Date(as.character(elec_don$TRANSACTION_DT), format = "%m%d%Y")

III. Viewing Data

With our dataset structured the way we want it, ggplot will be used to view the data. As a preliminary analysis, the distribution of the funding will be views using density and box plots. Since the transactions, TRANSACTION_AMT, cover a large range, log scales will be used where appropriate.

ggplot(data = elec_don) + geom_density(aes(x = TRANSACTION_AMT), fill = "grey50") + facet_wrap(~CAND_NAME) +
    scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))

ggplot(data = elec_don, aes(x = CAND_NAME, y = TRANSACTION_AMT)) + geom_boxplot() +
    scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))

# Mean of the contributions to candidates:
aggregate(TRANSACTION_AMT ~ CAND_NAME, elec_don, FUN = mean)

##   CAND_NAME TRANSACTION_AMT
## 1   CLINTON        4395.990
## 2      CRUZ       26102.402
## 3   SANDERS        2835.232
## 4     TRUMP       33609.618

# Median of the contributions to candidates:
aggregate(TRANSACTION_AMT ~ CAND_NAME, elec_don, FUN = median)

##   CAND_NAME TRANSACTION_AMT
## 1   CLINTON           468.0
## 2      CRUZ          2064.5
## 3   SANDERS            50.0
## 4     TRUMP           203.5

From the distribution plots above, it is clear that Clinton and Cruz received the largest frequency of large checks. From the summary statistics, Cruz had the largest distribution of large checks, with a median of around $2,000, and Sanders had the smallest with was around $50. Trump received the largest average donation of around $34,000, but his median donation is only around $200. From the density plot, Trump received a large frequency of donations around $100, with few very small donations. The outliers from the very large donations pushed the median value to the right.

Looking at the types of committee (ENTITY_TYP) contributions and the dates the money was given (TRANSACTION_DT) can give not only give the heartbeat of the respective campaigns during the election cycle, but the individuals/entities driving the election. The following plot, using the facet grid utility of the ggplot2 package condenses all this information into one plot.

ggplot(na.exclude(elec_don), aes(x = TRANSACTION_DT, y = TRANSACTION_AMT)) +
    geom_point(aes(color = ENTITY_TP), size = 2, alpha = .5) +
    facet_grid(~CAND_NAME) +
    scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) + 
    scale_x_date(labels = date_format("%b %y")) + theme(axis.text.x = element_text(angle = 45))

From the plot above, Clinton began fundraising early on in the game with a lot of organizational entities donating well in excess of $100, while Sanders started off with a varied amount of entities donating less than $100. In the early part of the election, Cruz was getting isolated, but substantial, donations from PACs and Trump was still in the “self-funding” stage.

Towards the end of the primary, Clinton was getting a large frequency of contributions of varied amounts from organizations and PACs while Sanders was still getting a moderate amount of contributions from varied entities with an increasing growth in the size of the donations from organizations and even a couple of PACs. Contributions to Cruz shot up from mostly organizational funding, probably as a result of his rival’s success in the polls. The surge in funding for Cruz appeared to be a bit too late as Trump surged ahead in the delegate count. Once Cruz dropped out, Trump’s “self-funding” tactic put him at a large disadvantage relative to his rival who has been getting frequent, large contributions from organizations and PACs throughout the election cycle. Only then do we see contributions from PACs, individuals and large contributions from organizations.