This report examines the 2016 Presidential Campaign Finance contributor report for the state of Pennsylvania, as provided by the Federal Election Commission (FEC). Its main objective is to describe the flow of financial contributions to US presidential candidates in terms of their size, date and geography within the state of Pennsylvania.
The report documents how, in the course of the 2016 presidential campagin, Hillary Clinton received nearly $13m in contributions from individuals in the state of Pennsylvania, far more than the next closest candidate, Donald Trump, at approximately $4m. In fact, Clinton’s total was more than all 23 other candidates combined.
Nevertheless, when grouping contributions by unique individuals, it becomes evident that Clinton actually gained her $9m advantage from fewer distinct individuals than Trump. The stark difference in percentage of a candidate’s contributions coming from unique individuals (as opposed to repeat donations from the same individual) may reveal something important about the candidates’ actual base of support.
It is also interesting to compare the geographic distribution of political contributions and presidential votes. The Democrats’ fundraising advantage was geographically concentrated in urban centers. Democrats outraised Republicans in only 26 of 67 counties despite raising an additional $6.4m.
We would expect to find a relationship between the party raising the most money in a county and the party receiving the most votes. In 50 counties, the party raising the most money did see its candidate receive the most votes. However, in 16 counties, Trump received more votes than Clinton despite Democrats outraising Republicans in the same county. In only 1 county did Clinton outperform Trump at the ballot booth despite Republicans outraising Democrats. In fact, in nearly all cases, Clinton’s share in the vote was less than the share of political contributions slated for Democratic candidates.
I have hidden the code in the final document for easier viewing, but please do see the original .Rmd or .md for the code producing this report in GitHub.
The original data, as provided by the Federal Election Commission (FEC), was found here, but it appears it is no longer available as they are updating their website. Please find the csv file in the Github repository until the new link can be given. Please note that the data does not include any super-PAC money, which by its nature is not officially donated to candidates themselves.
Here is a brief look at our initial raw dataset.
| cmte_id | cand_id | cand_nm | contbr_nm | contbr_city | contbr_st | contbr_zip | contbr_employer | contbr_occupation | contb_receipt_amt | contb_receipt_dt | receipt_desc | memo_cd | memo_text | form_tp | file_num | tran_id | election_tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C00580100 | P80001571 | Trump, Donald J. | ROONEY, JEAN | ALLENTOWN | PA | 18103 | INFORMATION REQUESTED | INFORMATION REQUESTED | 75.30 | 07-OCT-16 | NA | X | NA | SA18 | 1146165 | SA18.134904 | G2016 |
| C00577130 | P60007168 | Sanders, Bernard | LEONTOVICH, M | DOWNINGTOWN | PA | 193352266 | NOT EMPLOYED | NOT EMPLOYED | 15.00 | 04-MAR-16 | NA | NA |
|
SA17A | 1077404 | VPF7BKWAXT2 | P2016 |
| C00577130 | P60007168 | Sanders, Bernard | LEONTOVICH, M | DOWNINGTOWN | PA | 193352266 | NOT EMPLOYED | NOT EMPLOYED | 10.00 | 05-MAR-16 | NA | NA |
|
SA17A | 1077404 | VPF7BKXAZE6 | P2016 |
| C00577130 | P60007168 | Sanders, Bernard | LEONTOVICH, M | DOWNINGTOWN | PA | 193352266 | NOT EMPLOYED | NOT EMPLOYED | 10.00 | 06-MAR-16 | NA | NA |
|
SA17A | 1077404 | VPF7BM0HEC7 | P2016 |
| C00575795 | P00003392 | Clinton, Hillary Rodham | KRAMER, VICKI | PHILADELPHIA | PA | 191064153 | N/A | RETIRED | 21.64 | 06-APR-16 | NA | X |
|
SA18 | 1091718 | C4675238 | P2016 |
| C00577130 | P60007168 | Sanders, Bernard | KERNS, MICHAEL | PHILADELPHIA | PA | 191252423 | GERMANTOWN FRIENDS SCHOOL | TECHNICAL DIRECTOR | 15.00 | 05-MAR-16 | NA | NA |
|
SA17A | 1077404 | VPF7BKWKFX7 | P2016 |
After taking a first look at the data, here are a few initial observations:
election_tp is the lone exception, but has relatively few missing values.| Variable | Number Missing |
|---|---|
| cmte_id | 0 |
| cand_id | 0 |
| cand_nm | 0 |
| contbr_nm | 0 |
| contbr_city | 9 |
| contbr_st | 0 |
| contbr_zip | 23 |
| contbr_employer | 1,834 |
| contbr_occupation | 1,643 |
| contb_receipt_amt | 0 |
| contb_receipt_dt | 0 |
| receipt_desc | 241,328 |
| memo_cd | 193,406 |
| memo_text | 160,839 |
| form_tp | 0 |
| file_num | 0 |
| tran_id | 0 |
| election_tp | 452 |
We have an equal number of unique cand_id and cand_nm, which suggests there are no errors there. We would likely want to treat candidates as a factor variable. Last name is sufficient to distinguish candidates so we can create that variable for clearer plots.
56,881 contb_nm out of 243,796 observations. That means a lot of repeat donors as shown below. How this percentage varies by candidate is something that will be explored.
| Unique Contributors | Total Observations | Percent Unique |
|---|---|---|
| 56,881 | 243,796 | 0.233 |
contbr_st is all PA as it should be.
contbr_zip has a mix of 5 digit and 9 digit zip codes. Looking at min and max values alone suggests at least some errors.
contb_receipt_amt: Our only true numeric variable surprisingly has many negative values.
contb_receipt_dt: We should convert to date format.
tran_id: Surprisingly not a primary key so it is unclear what this variable represents.
election_tp: A few correctable errors and some missing data
The dataset lacks any party affiliation data, but this can be easily mutated.
I can drop the following variables: cmte_id, cand_id, contbr_city, contbr_st, contbr_employer, contbr_occupation, form_tp, file_num, tran_id.
First, we can solve a few small problems in election_tp.
| election_tp | n |
|---|---|
| G2016 | 92,431 |
| G2106 | 1 |
| O2016 | 43 |
| P2016 | 150,868 |
| P2020 | 1 |
| NA | 452 |
G2106 must be a simple error. We can recode that.
O2016 is ‘Other 2016’. All of these donations seem to be for Jill Stein’s recount effort after the November General election. We should exclude this data because the focus of the report is to investigate how fundraising (prior to the election) impacted the outcome of the election.
P2020 seems like a celebratory donation after the election so we would want to remove it anyway. This will happen when we clean up contb_receipt_dt.
It is not clear why 452 are blank at the moment, and so they should remain in the dataset.
contb_receipt_dt needs to be converted from an integer to a Date format. Moreover, 2,475 contributions after the date of the election will be excluded.
Starting with a histogram or density plot, we can see a few very negative values in the dataset.
We can take a look at the most negative contributions and note that many have a receipt_desc marked as “Refund”. Some are reattributed to other individuals; others have no explanation. Altogether there are 2,486 non-positive contributions.
| contbr_nm | contb_receipt_amt | receipt_desc |
|---|---|---|
| CARANGI, JOE | -93308 | Refund |
| FOLINO, J.A. | -8100 | Refund |
| BALL, GEORGE | -7700 | Refund |
| SCHRECK, JOHN | -6175 | Refund |
| PHILLIPS, MARY | -5400 | Refund |
| MYERS, SETH C. MR. | -5400 | REATTRIBUTION TO SPOUSE |
On the other hand we have some very large contributions. 150 contributions exceed the maximum legal limit of $2,700.1
Ideally, it would be possible to match up positive and negative contributions from each contributor, but there seems to be no reliable way to do that from the data at hand. contbr_nm is not a primary key (people have the same names and small errors are easy to find).
The simplifying solution taken was to remove all contributions below zero and above $2,700. This is a fairly crude measure because individuals can make multiple smaller contributions that if exceeding the maximum limit, they would need to be capped (rather than excluded) to the limit. At the same time, considering the number of observations in the dataset, these cases represent a small fraction. Moreover, even if legitimate, these large contributions are outliers by any stretch and mostly would serve to conceal patterns in the vast majority of the data.
contbr_zip data includes some five digit and some nine digit zip codes. I broke these into separate columns to enable analysis by zip code.
Sorting this list and viewing the head or tail reveals obvious errors. Quite a few are given as “99999”“; some are a single digit; and others belong to another state. However since all states are listed as”PA“, I chose not to exclude them. These are probably simple errors or perhaps people who live in two states.
receipt_desc, memo_cd, and memo_text are variables that are not particularly well documented. Many, but not all of them are contributions flagged for refund. Others are redesignations or reattributions. Given that there are so few of them compared to the rest of the dataset, I will exclude any observations with special notes to ensure cleanliness of the data, even if a very small number of legitimate contributions are excluded.
| receipt_desc | n |
|---|---|
| NA | 238,054 |
| REDESIGNATION FROM PRIMARY | 177 |
| REATTRIBUTION FROM SPOUSE | 129 |
| SEE REATTRIBUTION | 93 |
| REATTRIBUTION / REDESIGNATION REQUESTED | 57 |
|
47 |
memo_cd marks if a memo is attached to the contribution (though not always). These memos, found in memo_text note earmarked contributions, or specific candidates like “Hillary Victory Fund”. This seems to pose no harm to the dataset.
| memo_text | n |
|---|---|
| NA | 156,345 |
|
57,895 |
|
22,990 |
| *BEST EFFORTS UPDATE | 335 |
| EARMARKED FROM MAKE DC LISTEN | 281 |
|
|
18 |
The dataset did not have a variable for party affiliation so we can add it and represent it as a factor.
My last step in data wrangling will be to assign observations with empty election_tp values. Nearly all of the 250 observations with a blank election_tp were for Donald Trump.
| Candidate | No. election_tp missing |
|---|---|
| McMullin, Evan | 19 |
| Stein, Jill | 4 |
| Trump, Donald J. | 227 |
Observations for candidates other than Trump could only be assigned to “P2016”. Trump officially secured his party’s nomination on 20 July 2016. Hence, any donation before that date will be marked for the primary and thereafter for the general.
Before really diving into the analysis, a few univariate plots help sense-check the data and ensure that data wrangling was sufficient.
A histogram of contb_receipt_amt shows a highly right-skewed plot, with the vast majority of contributions less than $100 and a few contributions much higher, including at the maximum limit. We can also notice the lack of smoothness in the distribution because people tend to contribute in round numbers when they donate.
We can plot the number of contributions on each day to determine the pace at which contributions were received. The plot below shows just how long the campaign was with the first contributions coming in as early as 2014. The frequency though of course really picks up in 2016. The plot also suggests there were considerable day-to-day differences in the number of contributions. Some days were clearly more popular for donating than others, even in the heat of the campaign.
I have also colored the plot by election_tp to show contributions designated for the primary or general elections. We notice the shift shortly after June 2016 when most contributions are marked for general. Nevertheless, we see small numbers of contributions marked for the primary even after they have finished. We can also observe a greater number of primary contributions (when there were more candidates) than for the general.
Do most contributors usually just give once? Or do they donate repeatedly? The output below does not tell us anything about the amounts donated. Instead, it shows that while some people donate repeatedly, it is most common just to donate once. There is an inverse relationship between the number of contributions and the number of people making that number of contributions.
As the number of donations from a single contributor increases, the number of people making that number of donations decreases. (Although we cannot see any points far on the right side of the x-axis, we know there must be at least one observation that far out given it has been included in the default scale).
We can see this disparity in a table as well. One can observe a huge drop between the number of people donating once versus those donating twice. Moreover, this trend continues as the number of donations from a single person increases.
| donation_count | n |
|---|---|
| 1 | 27,868 |
| 2 | 7,884 |
| 3 | 4,134 |
| 4 | 2,795 |
| 5 | 2,020 |
| 6 | 1,500 |
| 7 | 1,253 |
| 8 | 1,038 |
| 9 | 846 |
| 10 | 725 |
At the same time, there are indeed a handful of individuals who donated more than 100 times. The extent to which this pattern varies by candidate will be explored later in the report.
| contbr_nm | n |
|---|---|
| COMELLA, JOHN | 186 |
| BETHEA, DAMON | 179 |
| SHOVLIN, MARIE | 150 |
| ROSOFF, ANDREW | 142 |
| SHORT, CHRIS | 139 |
| LIBERTIN, MARY | 135 |
| WEITKAMP, RICKY | 131 |
| EDWARDS, JOHN P | 130 |
| LONCAR, BRANDON | 130 |
| HANN, STEVE E | 125 |
With some sense checks done, we can try to answer some basic questions. Which party (and which candidate) raised the largest amount of money? First, it is clear that Democrats outraised Republicans by a huge margin.
| Party | Total Funds Raised ($) |
|---|---|
| Democrat | 15,547,628.93 |
| Republican | 9,172,421.80 |
| Independent | 93,322.61 |
| Green | 25,729.73 |
Of course in comparison to the major parties, Independent and Green parties are almost non-existent.
The party variable is an aggregation of individual candidates so what we really want to examine is the candidates themselves. Clinton dominated the field. She raised over $12.9m, while the rest of the field combined raised only $ 11.9m. We can first see the raw disparity in a table.
We can also see Clinton’s advantage, both in the amount of money raised and in the relative paucity of Democratic challengers compared to the much wider Republican field.
We might also like to visualize the aggregate funds by candidate in a waffle plot, where each square represents $250,000 (and each column equals $1m). Note that the necessary rounding to make each square conceals actual values, but it is easier to discern that Clinton raised about $13m, Trump about $4m, and Sanders about $2.5m.
Clinton has a clear aggregate advantage in fundraising, but we can also examine the distribution in order to describe a typical contribution. We should be able to answer, for instance, if a candidate mostly receives many small donations or fewer large ones.
Given that a only few candidates heavily dominate parties, I will just focus on the top candidates. Good options to visualize these distributions include boxplots, density plots and violin plots. Because our data is so heavily skewed, “zooming” in on contributions just below $200 is the best way to see the difference in the range of contribution where the bulk of the data lies.
From the boxplots below, it appears that Clinton has the lowest median contribution (nearly the same as Sanders) and seems considerably lower than Trump. However, this is misleading, as will be made clear below.
An unweighted violin plot is another method to highlight the shape of a distribution. We can see bumps at typical donation amounts– $25, $50, $100. More than any other candidate, Sanders received support in small amounts, as reflected in his Q1 of the boxplot above.
One problem though with the above box and violin plots is that we have treated each contribution individually, which hides the fact that individuals are able to donate more than once. Perhaps more interesting than a typical Trump or Clinton contribution is a typical Trump or Clinton contributor. If a contributor donates $5 one hundred times, for most questions, it would be preferable to represent that contributor as $500, which can be achieved by grouping. This grouping may not be entirely perfect because there are some small errors in the name list where a contbr_nm of a single person is spelled just slightly different, but nevertheless it achieves its purpose.
Having done this grouping, the table below shows an interesting result. Despite Clinton having far more instances of contributions in the dataset, Trump actually had more unique contributors. The percentage of instances of contributions coming from a unique contributor differs widely between Trump and the other leading candidates, including Clinton.
| Candidate | No. Unique Contributors | No. Contributions | Percent of Contributions from Unique Contributors |
|---|---|---|---|
| Trump | 22,102 | 28,673 | 0.771 |
| Clinton | 18,586 | 117,417 | 0.158 |
| Sanders | 7,274 | 59,201 | 0.123 |
| Cruz | 3,025 | 14,973 | 0.202 |
After grouping by individual contributors (instead of contributions), we get a very different picture. Of any candidate, Trump had by far the smallest median distinct contributor total. The grassroots Sanders was second. Clinton’s was the highest.
One last way to see this discrepancy would be to hold each unique contributor to only one donation. We can filter the data to retain only the single largest donation from each unique contributor. Clinton still leads in aggregate funds received, but the amount of funds removed from repeat donors is considerably more than other candidates, especially Trump.
While interesting, the plot above ignores a lot of perfectly valid contributions. To really show the discrepancy generated between grouping by contributions and grouping by contributors, I will focus only on the two general candidates, Clinton and Trump. In the plot below, it is easier to see how whether considering individual contributions or individual contributors makes all the difference in determining how to portray the distribution of a candidate’s fundraising profile.
In addition to knowing the aggregate totals for each candidate, and understanding the increments in which it was received by examining the respective distributions, we can also examine the strength of a campaign at a particular date in time. For many questions, more important than who gave the contribution is just how much the candidate received, and on what date. We can create such a dataset with the information we have. This will let us track a candidate’s progress in fundraising over time.
From the plot below, we can see how early Clinton accumulated her large aggregate advantage over the rest of the field. The growth in Trump’s support, on the other hand, breaks very late, mostly after Clinton has already secured her nomination and not long before his own nomination was sealed. The lines of most other Republican candidates stop before Trump’s surge even begins. This could suggest preference for other Republican candidates in Pennsylvania, or just a reflection of Trump’s “self-funding” campaign style.
The last few steps to our analysis have a geographic focus. It is well known that political support varies widely by geography, commonly along urban and rural divides. Our dataset has the zip code of each contribution so that will be our starting point.
First, we have to remove a few errors in our data– zip codes clearly not from PA. One way to get this data would be from the zipcode package. However, we also need to be able to link zip codes with counties. I found a record of this data here at unitedstateszipcodes.org. After downloading the free version, I joined latitude, longitude and county names to a data frame of contributions summed per zip code. Considering that a zip code represents such a small area, it can be difficult to visualize. However, we can get a sense of the geographic disparity if we redundantly map contribution totals to point size, color, and alpha shading. Enable the zoom function below, particularly in the Philadelphia region, to see the concentration of financing.
The map highlights the concentration of contributions around Philadelphia and Pittsburgh. A simple table though perhaps does a better job of communicating the magnitude of the disparity between zip codes. Below we can see that one single zip code in downtown Philadelphia contributed over $900,000, more than double the next highest zip code.
For some purposes, zip code is too small of a geographic unit to communicate clear trends. We can aggregate zip codes into counties. Before looking at county totals by party however, we should first observe the disparity in total contributions. If we fill the county map by total contributions (the map on the left), we can see that the majority of contributions come from a few counties surrounding the two major metropolitan areas, Philadelphia and Pittsburgh.
This of course though should not be too surprising as it closely mirrors the population that the map represents (shown on the right). We could get county population data from a number of places, but I have chosen to scrape the latest census data from pennsylvania-demographics.com. The original population data can be found here. Mouse over the county for exact figures.
Now we can explore the political direction towards which these contributions were directed in each county. We could group contributions by any candidate but it makes the most sense to group by Democrat vs. Republican donations, and then fill with opposing party colors. As shown in the map below, the Philadelphia and Pittsburgh metro areas, Lackawanna county (the Scranton metro area), and Centre (home to State College) posted large Democratic advantages in fundraising. Many other counties, particularly on the eastern border, witnessed only slight Democratic advantages.
The map is very useful for seeing regional trends, but it can be difficult to grasp finer differences between counties. A dumbbell plot is a useful tool depicting contributions to both Democratic and Republican candidates for each county. It shows the wide gap in larger counties, like Philadelphia, between Democrat and Republican sums. At the same time, it shows the extent to which these few counties account for most of the state’s political contributions. Allegheny County, for instance, accounted for the largest Republican total of any county, but this amount still fell far short of its Democratic contributions. Most small counties have slight Republican edges in fundraising. Mouse over the lines for exact figures.
Ultimately, campaigns are about votes. We should expect to find a relationship between political contributions and vote totals. Namely, we would expect that greater contributions to a party (or a candidate) from a certain area suggests greater political support for that party, and therefore that support will translate into a greater number of votes. We can test the extent to which this plausible relationship holds in this case by comparing the campaign finance data to the election results.
I scraped the election data from the New York Times, available here. As the plot below shows, despite a large fundraising advantage, Clinton narrowly lost the election in Pennsylvania, and all of its 20 electoral college votes, to Trump by fewer than 50,000 votes.
While there is no prize for winning the number of votes in a given county, examining vote counts by county is an interesting way to analyze voting patterns across the state, which we can then compare to our county map of political contributions.
Not surprisingly, there are a number of similarities between the electoral map and the campaign finance map. The strongest Democratic fundraiser, Philadelphia, voted heavily for Clinton. Several other counties, such as Allegheny, Centre or Lackawanna, that had strong Democratic fundraising advantages, saw narrow Clinton edges in their vote counts. Most other counties voted more heavily for Trump.
Instead of a map, we might prefer to view these vote totals more directly in a plot. We can produce another dumbbell plot similar to the previous one mapping contributions, but now replaced with votes for Clinton (blue) and Trump (red). We get a similar pattern. The largest counties voted for Clinton by wide margins, while the smaller counties voted more heavily for Trump.
Now we can identify counties where expectations were not met given what we know about their political contributions and election votes. From our campaign finance data, we saw that Democrats outraised Republicans in 26 of 67 counties (they tend to win fewer, more populous counties). In the election though, Clinton received more votes than Trump in only 11 counties. Comparing campaign finance data and actual votes, we find that in 50 counties, the party raising the most funds also had the candidate with the most votes.
In 16 counties however, Democrats outraised Republicans, while at the same time Trump outperformed Clinton in terms of votes. The reverse was true in only one county, Lehigh, where Clinton outperformed Trump in terms of votes despite Republicans outraising Democrats. We can compare these totals in the table below. Note that “Expected” in the Outcome column only means that the party raising the most money in that county also received the most votes.
We can also highlight counties which reverse expectations of the earlier campaign funds map. Luzerne and Wayne are two good examples of this reversal. In Luzerne County, Democrats outraised Republicans by nearly $100,000, but Trump garnered about 25,000 more votes in the county. In the state’s northeast corner of Wayne County, Democrats outraised Republicans by more than 2 to 1. Counting votes however, Trump had a more than 2 to 1 advantage over Clinton. Note that the dark blue county of Lackawanna in the state’s northeast may appear to be outlined at first, but that is only due to its neighbors.
Setting our marker at the 50% threshold was a useful, but in many ways arbitrary decision given that electoral college votes are determined for the state as a whole. For a closer look at each county, we can compare the difference between a party’s share in political contributions and a candidate’s share of votes. We could do this for either party/candidate, but I will do it for Clinton. It can be read in reverse to know the performance of Trump.
As seen below, in nearly all counties, Clinton’s vote share percentage (marked in green) is less than the share of political contributions for Democratic candidates (marked in orange). In some cases this difference is fairly minimal. In Philadelphia County, 90% of political contributions went to Democratic candidates, and 84% of votes went to Clinton. In other cases, the difference is substantial. In Montour County, for example, 84% of political contributions went to Democratic candidates, but just 35% of votes went to Clinton. There are just a few cases where Clinton’s vote share exceeds the Democrats’ share of political contributions, most of which involve very low percentages in heavily Republican areas. Only in Lehigh did Republicans outraise Democrats, but Clinton earned more than 50% of the vote.
This investigation has hopefully demonstrated the following key points:
Clinton amassed a massive aggregate fundraising advantage over all other candidates in Pennsylvania (roughly $13m to $4m for Trump and $2.5m for Sanders).
Grouping by unique contributors instead of individual contributions reveals that repeat donors were crucial to Clinton’s aggregate fundraising advantage. Despite raising far less money, most of which did not arrive until his party’s nomination had been sealed, Trump actually had more unique contributors than Clinton.
One zip code in downtown Philadelphia contributed a total of $900,000 (mostly to Democrats)– more than double the next highest zip code.
Trump received more votes than Clinton in 16 counties in which Democrats outraised Repbublicans, while Clinton received more votes in only 1 county in which Republicans outraised Democrats.
Future research in this area should look to investigate:
How does Pennsylvania fit into the larger national picture in 2016?
How does 2016 Pennsylvania compare to its performance in previous elections?
To what extent, if any, does the story presented here change when also considering super PAC contributions for each candidate?