1 Overview

This report examines the 2016 Presidential Campaign Finance contributor report for the state of Pennsylvania, as provided by the Federal Election Commission (FEC). Its main objective is to describe the flow of financial contributions to US presidential candidates in terms of their size, date and geography within the state of Pennsylvania.

The report documents how, in the course of the 2016 presidential campagin, Hillary Clinton received nearly $13m in contributions from individuals in the state of Pennsylvania, far more than the next closest candidate, Donald Trump, at approximately $4m. In fact, Clinton’s total was more than all 23 other candidates combined.

Nevertheless, when grouping contributions by unique individuals, it becomes evident that Clinton actually gained her $9m advantage from fewer distinct individuals than Trump. The stark difference in percentage of a candidate’s contributions coming from unique individuals (as opposed to repeat donations from the same individual) may reveal something important about the candidates’ actual base of support.

It is also interesting to compare the geographic distribution of political contributions and presidential votes. The Democrats’ fundraising advantage was geographically concentrated in urban centers. Democrats outraised Republicans in only 26 of 67 counties despite raising an additional $6.4m.

We would expect to find a relationship between the party raising the most money in a county and the party receiving the most votes. In 50 counties, the party raising the most money did see its candidate receive the most votes. However, in 16 counties, Trump received more votes than Clinton despite Democrats outraising Republicans in the same county. In only 1 county did Clinton outperform Trump at the ballot booth despite Republicans outraising Democrats. In fact, in nearly all cases, Clinton’s share in the vote was less than the share of political contributions slated for Democratic candidates.

I have hidden the code in the final document for easier viewing, but please do see the original .Rmd or .md for the code producing this report in GitHub.

2 Importing the Data

The original data, as provided by the Federal Election Commission (FEC), was found here, but it appears it is no longer available as they are updating their website. Please find the csv file in the Github repository until the new link can be given. Please note that the data does not include any super-PAC money, which by its nature is not officially donated to candidates themselves.

Here is a brief look at our initial raw dataset.

cmte_id cand_id cand_nm contbr_nm contbr_city contbr_st contbr_zip contbr_employer contbr_occupation contb_receipt_amt contb_receipt_dt receipt_desc memo_cd memo_text form_tp file_num tran_id election_tp
C00580100 P80001571 Trump, Donald J. ROONEY, JEAN ALLENTOWN PA 18103 INFORMATION REQUESTED INFORMATION REQUESTED 75.30 07-OCT-16 NA X NA SA18 1146165 SA18.134904 G2016
C00577130 P60007168 Sanders, Bernard LEONTOVICH, M DOWNINGTOWN PA 193352266 NOT EMPLOYED NOT EMPLOYED 15.00 04-MAR-16 NA NA
  • EARMARKED CONTRIBUTION: SEE BELOW
SA17A 1077404 VPF7BKWAXT2 P2016
C00577130 P60007168 Sanders, Bernard LEONTOVICH, M DOWNINGTOWN PA 193352266 NOT EMPLOYED NOT EMPLOYED 10.00 05-MAR-16 NA NA
  • EARMARKED CONTRIBUTION: SEE BELOW
SA17A 1077404 VPF7BKXAZE6 P2016
C00577130 P60007168 Sanders, Bernard LEONTOVICH, M DOWNINGTOWN PA 193352266 NOT EMPLOYED NOT EMPLOYED 10.00 06-MAR-16 NA NA
  • EARMARKED CONTRIBUTION: SEE BELOW
SA17A 1077404 VPF7BM0HEC7 P2016
C00575795 P00003392 Clinton, Hillary Rodham KRAMER, VICKI PHILADELPHIA PA 191064153 N/A RETIRED 21.64 06-APR-16 NA X
  • HILLARY VICTORY FUND
SA18 1091718 C4675238 P2016
C00577130 P60007168 Sanders, Bernard KERNS, MICHAEL PHILADELPHIA PA 191252423 GERMANTOWN FRIENDS SCHOOL TECHNICAL DIRECTOR 15.00 05-MAR-16 NA NA
  • EARMARKED CONTRIBUTION: SEE BELOW
SA17A 1077404 VPF7BKWKFX7 P2016

After taking a first look at the data, here are a few initial observations:

Missing Data
Variable Number Missing
cmte_id 0
cand_id 0
cand_nm 0
contbr_nm 0
contbr_city 9
contbr_st 0
contbr_zip 23
contbr_employer 1,834
contbr_occupation 1,643
contb_receipt_amt 0
contb_receipt_dt 0
receipt_desc 241,328
memo_cd 193,406
memo_text 160,839
form_tp 0
file_num 0
tran_id 0
election_tp 452
Unique Contributors Total Observations Percent Unique
56,881 243,796 0.233

3 Data Wrangling

3.1 Investigating election_tp

First, we can solve a few small problems in election_tp.

election_tp n
G2016 92,431
G2106 1
O2016 43
P2016 150,868
P2020 1
NA 452
  • G2106 must be a simple error. We can recode that.

  • O2016 is ‘Other 2016’. All of these donations seem to be for Jill Stein’s recount effort after the November General election. We should exclude this data because the focus of the report is to investigate how fundraising (prior to the election) impacted the outcome of the election.

  • P2020 seems like a celebratory donation after the election so we would want to remove it anyway. This will happen when we clean up contb_receipt_dt.

  • It is not clear why 452 are blank at the moment, and so they should remain in the dataset.

3.2 Investigating contb_receipt_dt

contb_receipt_dt needs to be converted from an integer to a Date format. Moreover, 2,475 contributions after the date of the election will be excluded.

3.3 Investigating contb_receipt_amt

Starting with a histogram or density plot, we can see a few very negative values in the dataset.

We can take a look at the most negative contributions and note that many have a receipt_desc marked as “Refund”. Some are reattributed to other individuals; others have no explanation. Altogether there are 2,486 non-positive contributions.

contbr_nm contb_receipt_amt receipt_desc
CARANGI, JOE -93308 Refund
FOLINO, J.A. -8100 Refund
BALL, GEORGE -7700 Refund
SCHRECK, JOHN -6175 Refund
PHILLIPS, MARY -5400 Refund
MYERS, SETH C. MR. -5400 REATTRIBUTION TO SPOUSE

On the other hand we have some very large contributions. 150 contributions exceed the maximum legal limit of $2,700.1

Ideally, it would be possible to match up positive and negative contributions from each contributor, but there seems to be no reliable way to do that from the data at hand. contbr_nm is not a primary key (people have the same names and small errors are easy to find).

The simplifying solution taken was to remove all contributions below zero and above $2,700. This is a fairly crude measure because individuals can make multiple smaller contributions that if exceeding the maximum limit, they would need to be capped (rather than excluded) to the limit. At the same time, considering the number of observations in the dataset, these cases represent a small fraction. Moreover, even if legitimate, these large contributions are outliers by any stretch and mostly would serve to conceal patterns in the vast majority of the data.

3.4 Investigating contbr_zip

contbr_zip data includes some five digit and some nine digit zip codes. I broke these into separate columns to enable analysis by zip code.

Sorting this list and viewing the head or tail reveals obvious errors. Quite a few are given as “99999”“; some are a single digit; and others belong to another state. However since all states are listed as”PA“, I chose not to exclude them. These are probably simple errors or perhaps people who live in two states.

3.5 Investigating receipt description

receipt_desc, memo_cd, and memo_text are variables that are not particularly well documented. Many, but not all of them are contributions flagged for refund. Others are redesignations or reattributions. Given that there are so few of them compared to the rest of the dataset, I will exclude any observations with special notes to ensure cleanliness of the data, even if a very small number of legitimate contributions are excluded.

receipt_desc n
NA 238,054
REDESIGNATION FROM PRIMARY 177
REATTRIBUTION FROM SPOUSE 129
SEE REATTRIBUTION 93
REATTRIBUTION / REDESIGNATION REQUESTED 57
  • EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING
47

3.6 Investigating memo text

memo_cd marks if a memo is attached to the contribution (though not always). These memos, found in memo_text note earmarked contributions, or specific candidates like “Hillary Victory Fund”. This seems to pose no harm to the dataset.

memo_text n
NA 156,345
  • EARMARKED CONTRIBUTION: SEE BELOW
57,895
  • HILLARY VICTORY FUND
22,990
*BEST EFFORTS UPDATE 335
EARMARKED FROM MAKE DC LISTEN 281
18

3.7 Adding party affiliation

The dataset did not have a variable for party affiliation so we can add it and represent it as a factor.

3.8 Assign blank election_tp

My last step in data wrangling will be to assign observations with empty election_tp values. Nearly all of the 250 observations with a blank election_tp were for Donald Trump.

Candidate No. election_tp missing
McMullin, Evan 19
Stein, Jill 4
Trump, Donald J. 227

Observations for candidates other than Trump could only be assigned to “P2016”. Trump officially secured his party’s nomination on 20 July 2016. Hence, any donation before that date will be marked for the primary and thereafter for the general.

4 Exploratory Data Analysis

Before really diving into the analysis, a few univariate plots help sense-check the data and ensure that data wrangling was sufficient.

4.1 The Most Common Contribution Amounts

A histogram of contb_receipt_amt shows a highly right-skewed plot, with the vast majority of contributions less than $100 and a few contributions much higher, including at the maximum limit. We can also notice the lack of smoothness in the distribution because people tend to contribute in round numbers when they donate.

4.2 The Pace of Donations over the Campaign

We can plot the number of contributions on each day to determine the pace at which contributions were received. The plot below shows just how long the campaign was with the first contributions coming in as early as 2014. The frequency though of course really picks up in 2016. The plot also suggests there were considerable day-to-day differences in the number of contributions. Some days were clearly more popular for donating than others, even in the heat of the campaign.

I have also colored the plot by election_tp to show contributions designated for the primary or general elections. We notice the shift shortly after June 2016 when most contributions are marked for general. Nevertheless, we see small numbers of contributions marked for the primary even after they have finished. We can also observe a greater number of primary contributions (when there were more candidates) than for the general.

4.3 Repetition in Contributors

Do most contributors usually just give once? Or do they donate repeatedly? The output below does not tell us anything about the amounts donated. Instead, it shows that while some people donate repeatedly, it is most common just to donate once. There is an inverse relationship between the number of contributions and the number of people making that number of contributions.

As the number of donations from a single contributor increases, the number of people making that number of donations decreases. (Although we cannot see any points far on the right side of the x-axis, we know there must be at least one observation that far out given it has been included in the default scale).

We can see this disparity in a table as well. One can observe a huge drop between the number of people donating once versus those donating twice. Moreover, this trend continues as the number of donations from a single person increases.

donation_count n
1 27,868
2 7,884
3 4,134
4 2,795
5 2,020
6 1,500
7 1,253
8 1,038
9 846
10 725

At the same time, there are indeed a handful of individuals who donated more than 100 times. The extent to which this pattern varies by candidate will be explored later in the report.

contbr_nm n
COMELLA, JOHN 186
BETHEA, DAMON 179
SHOVLIN, MARIE 150
ROSOFF, ANDREW 142
SHORT, CHRIS 139
LIBERTIN, MARY 135
WEITKAMP, RICKY 131
EDWARDS, JOHN P 130
LONCAR, BRANDON 130
HANN, STEVE E 125

4.4 Aggregate Donations by Party and Candidate

With some sense checks done, we can try to answer some basic questions. Which party (and which candidate) raised the largest amount of money? First, it is clear that Democrats outraised Republicans by a huge margin.

Party Total Funds Raised ($)
Democrat 15,547,628.93
Republican 9,172,421.80
Independent 93,322.61
Green 25,729.73

Of course in comparison to the major parties, Independent and Green parties are almost non-existent.

The party variable is an aggregation of individual candidates so what we really want to examine is the candidates themselves. Clinton dominated the field. She raised over $12.9m, while the rest of the field combined raised only $ 11.9m. We can first see the raw disparity in a table.

We can also see Clinton’s advantage, both in the amount of money raised and in the relative paucity of Democratic challengers compared to the much wider Republican field.

We might also like to visualize the aggregate funds by candidate in a waffle plot, where each square represents $250,000 (and each column equals $1m). Note that the necessary rounding to make each square conceals actual values, but it is easier to discern that Clinton raised about $13m, Trump about $4m, and Sanders about $2.5m.

4.5 The Distribution of Contributions across Candidates

Clinton has a clear aggregate advantage in fundraising, but we can also examine the distribution in order to describe a typical contribution. We should be able to answer, for instance, if a candidate mostly receives many small donations or fewer large ones.

Given that a only few candidates heavily dominate parties, I will just focus on the top candidates. Good options to visualize these distributions include boxplots, density plots and violin plots. Because our data is so heavily skewed, “zooming” in on contributions just below $200 is the best way to see the difference in the range of contribution where the bulk of the data lies.

From the boxplots below, it appears that Clinton has the lowest median contribution (nearly the same as Sanders) and seems considerably lower than Trump. However, this is misleading, as will be made clear below.

An unweighted violin plot is another method to highlight the shape of a distribution. We can see bumps at typical donation amounts– $25, $50, $100. More than any other candidate, Sanders received support in small amounts, as reflected in his Q1 of the boxplot above.

One problem though with the above box and violin plots is that we have treated each contribution individually, which hides the fact that individuals are able to donate more than once. Perhaps more interesting than a typical Trump or Clinton contribution is a typical Trump or Clinton contributor. If a contributor donates $5 one hundred times, for most questions, it would be preferable to represent that contributor as $500, which can be achieved by grouping. This grouping may not be entirely perfect because there are some small errors in the name list where a contbr_nm of a single person is spelled just slightly different, but nevertheless it achieves its purpose.

Having done this grouping, the table below shows an interesting result. Despite Clinton having far more instances of contributions in the dataset, Trump actually had more unique contributors. The percentage of instances of contributions coming from a unique contributor differs widely between Trump and the other leading candidates, including Clinton.

Candidate No. Unique Contributors No. Contributions Percent of Contributions from Unique Contributors
Trump 22,102 28,673 0.771
Clinton 18,586 117,417 0.158
Sanders 7,274 59,201 0.123
Cruz 3,025 14,973 0.202

After grouping by individual contributors (instead of contributions), we get a very different picture. Of any candidate, Trump had by far the smallest median distinct contributor total. The grassroots Sanders was second. Clinton’s was the highest.

One last way to see this discrepancy would be to hold each unique contributor to only one donation. We can filter the data to retain only the single largest donation from each unique contributor. Clinton still leads in aggregate funds received, but the amount of funds removed from repeat donors is considerably more than other candidates, especially Trump.

While interesting, the plot above ignores a lot of perfectly valid contributions. To really show the discrepancy generated between grouping by contributions and grouping by contributors, I will focus only on the two general candidates, Clinton and Trump. In the plot below, it is easier to see how whether considering individual contributions or individual contributors makes all the difference in determining how to portray the distribution of a candidate’s fundraising profile.

4.6 Daily Candidate Totals

In addition to knowing the aggregate totals for each candidate, and understanding the increments in which it was received by examining the respective distributions, we can also examine the strength of a campaign at a particular date in time. For many questions, more important than who gave the contribution is just how much the candidate received, and on what date. We can create such a dataset with the information we have. This will let us track a candidate’s progress in fundraising over time.

From the plot below, we can see how early Clinton accumulated her large aggregate advantage over the rest of the field. The growth in Trump’s support, on the other hand, breaks very late, mostly after Clinton has already secured her nomination and not long before his own nomination was sealed. The lines of most other Republican candidates stop before Trump’s surge even begins. This could suggest preference for other Republican candidates in Pennsylvania, or just a reflection of Trump’s “self-funding” campaign style.

4.7 Contributions by Zip Code

The last few steps to our analysis have a geographic focus. It is well known that political support varies widely by geography, commonly along urban and rural divides. Our dataset has the zip code of each contribution so that will be our starting point.

First, we have to remove a few errors in our data– zip codes clearly not from PA. One way to get this data would be from the zipcode package. However, we also need to be able to link zip codes with counties. I found a record of this data here at unitedstateszipcodes.org. After downloading the free version, I joined latitude, longitude and county names to a data frame of contributions summed per zip code. Considering that a zip code represents such a small area, it can be difficult to visualize. However, we can get a sense of the geographic disparity if we redundantly map contribution totals to point size, color, and alpha shading. Enable the zoom function below, particularly in the Philadelphia region, to see the concentration of financing.

The map highlights the concentration of contributions around Philadelphia and Pittsburgh. A simple table though perhaps does a better job of communicating the magnitude of the disparity between zip codes. Below we can see that one single zip code in downtown Philadelphia contributed over $900,000, more than double the next highest zip code.

4.8 Contributions by County

For some purposes, zip code is too small of a geographic unit to communicate clear trends. We can aggregate zip codes into counties. Before looking at county totals by party however, we should first observe the disparity in total contributions. If we fill the county map by total contributions (the map on the left), we can see that the majority of contributions come from a few counties surrounding the two major metropolitan areas, Philadelphia and Pittsburgh.

This of course though should not be too surprising as it closely mirrors the population that the map represents (shown on the right). We could get county population data from a number of places, but I have chosen to scrape the latest census data from pennsylvania-demographics.com. The original population data can be found here. Mouse over the county for exact figures.

Now we can explore the political direction towards which these contributions were directed in each county. We could group contributions by any candidate but it makes the most sense to group by Democrat vs. Republican donations, and then fill with opposing party colors. As shown in the map below, the Philadelphia and Pittsburgh metro areas, Lackawanna county (the Scranton metro area), and Centre (home to State College) posted large Democratic advantages in fundraising. Many other counties, particularly on the eastern border, witnessed only slight Democratic advantages.

The map is very useful for seeing regional trends, but it can be difficult to grasp finer differences between counties. A dumbbell plot is a useful tool depicting contributions to both Democratic and Republican candidates for each county. It shows the wide gap in larger counties, like Philadelphia, between Democrat and Republican sums. At the same time, it shows the extent to which these few counties account for most of the state’s political contributions. Allegheny County, for instance, accounted for the largest Republican total of any county, but this amount still fell far short of its Democratic contributions. Most small counties have slight Republican edges in fundraising. Mouse over the lines for exact figures.

4.9 Comparing Political Contributions to Vote Totals

Ultimately, campaigns are about votes. We should expect to find a relationship between political contributions and vote totals. Namely, we would expect that greater contributions to a party (or a candidate) from a certain area suggests greater political support for that party, and therefore that support will translate into a greater number of votes. We can test the extent to which this plausible relationship holds in this case by comparing the campaign finance data to the election results.

I scraped the election data from the New York Times, available here. As the plot below shows, despite a large fundraising advantage, Clinton narrowly lost the election in Pennsylvania, and all of its 20 electoral college votes, to Trump by fewer than 50,000 votes.

While there is no prize for winning the number of votes in a given county, examining vote counts by county is an interesting way to analyze voting patterns across the state, which we can then compare to our county map of political contributions.

Not surprisingly, there are a number of similarities between the electoral map and the campaign finance map. The strongest Democratic fundraiser, Philadelphia, voted heavily for Clinton. Several other counties, such as Allegheny, Centre or Lackawanna, that had strong Democratic fundraising advantages, saw narrow Clinton edges in their vote counts. Most other counties voted more heavily for Trump.

Instead of a map, we might prefer to view these vote totals more directly in a plot. We can produce another dumbbell plot similar to the previous one mapping contributions, but now replaced with votes for Clinton (blue) and Trump (red). We get a similar pattern. The largest counties voted for Clinton by wide margins, while the smaller counties voted more heavily for Trump.

Now we can identify counties where expectations were not met given what we know about their political contributions and election votes. From our campaign finance data, we saw that Democrats outraised Republicans in 26 of 67 counties (they tend to win fewer, more populous counties). In the election though, Clinton received more votes than Trump in only 11 counties. Comparing campaign finance data and actual votes, we find that in 50 counties, the party raising the most funds also had the candidate with the most votes.

In 16 counties however, Democrats outraised Republicans, while at the same time Trump outperformed Clinton in terms of votes. The reverse was true in only one county, Lehigh, where Clinton outperformed Trump in terms of votes despite Republicans outraising Democrats. We can compare these totals in the table below. Note that “Expected” in the Outcome column only means that the party raising the most money in that county also received the most votes.

We can also highlight counties which reverse expectations of the earlier campaign funds map. Luzerne and Wayne are two good examples of this reversal. In Luzerne County, Democrats outraised Republicans by nearly $100,000, but Trump garnered about 25,000 more votes in the county. In the state’s northeast corner of Wayne County, Democrats outraised Republicans by more than 2 to 1. Counting votes however, Trump had a more than 2 to 1 advantage over Clinton. Note that the dark blue county of Lackawanna in the state’s northeast may appear to be outlined at first, but that is only due to its neighbors.

Setting our marker at the 50% threshold was a useful, but in many ways arbitrary decision given that electoral college votes are determined for the state as a whole. For a closer look at each county, we can compare the difference between a party’s share in political contributions and a candidate’s share of votes. We could do this for either party/candidate, but I will do it for Clinton. It can be read in reverse to know the performance of Trump.

As seen below, in nearly all counties, Clinton’s vote share percentage (marked in green) is less than the share of political contributions for Democratic candidates (marked in orange). In some cases this difference is fairly minimal. In Philadelphia County, 90% of political contributions went to Democratic candidates, and 84% of votes went to Clinton. In other cases, the difference is substantial. In Montour County, for example, 84% of political contributions went to Democratic candidates, but just 35% of votes went to Clinton. There are just a few cases where Clinton’s vote share exceeds the Democrats’ share of political contributions, most of which involve very low percentages in heavily Republican areas. Only in Lehigh did Republicans outraise Democrats, but Clinton earned more than 50% of the vote.

5 Conclusion

This investigation has hopefully demonstrated the following key points:

Future research in this area should look to investigate:


  1. According to the FEC, $2,700 is the maximum amount an individual can contribute to a federal candidate. Primary and general elections are considered separate elections.