========================================================
Presidential campaign finance data contains much valuable information, therefore, it always been on the center stage during any presidential election. On a high level, it tells us the demographic and geographic support for each party and candidate. The patterns of the donation flow also reflect the current status of the candidates in the race.
For this exploratory data analysis, I am examining the 2016 presidential campaign finance data from the Federal Election Commission. The data is up to date until November 20, 2015.
The reason of choosing data from Ohio in particular is because Ohio is often considered the “Swing State”, which means there is no historically state-wide consistent strong support to a particular party in presidential elections. Therefore, it provides us a more “neutral” view of the donations to each party and candidates in the 2016 presidential race.
In this analysis, I try to answer three questions: First, which party and candidates have garnered the most support in Ohio (reflected by the donations received)? Second, is there a difference in donations between the genders, if so, what are the patterns? Lastly, where do donations come from geographically? In particular, do big counties tend to donate large amount of donation both in aggregate and per capita?
By answering these questions, we can understand a bit more of the 2016 presidential elections, and possibly, the developments as we come near 2016 in Ohio.
# Load the Data
oh = read.csv("/Users/garymu/Dropbox/Udacity/DAND/project4/OH.csv",
stringsAsFactors= F)
str(oh)
## 'data.frame': 9598 obs. of 18 variables:
## $ cmte_id : chr "C00458844" "C00458844" "C00458844" "C00577130" ...
## $ cand_id : chr "P60006723" "P60006723" "P60006723" "P60007168" ...
## $ cand_nm : chr "Rubio, Marco" "Rubio, Marco" "Rubio, Marco" "Sanders, Bernard" ...
## $ contbr_nm : chr "STROPKAY, ANNA T. MS." "STROPKAY, ANNA T. MS." "HOOVER, JERRY" "BRACKMAN, MATTHEW" ...
## $ contbr_city : chr "SEVEN HILLS" "SEVEN HILLS" "DOVER" "COLUMBUS" ...
## $ contbr_st : chr "OH" "OH" "OH" "OH" ...
## $ contbr_zip : int 441315955 441315955 446227695 432013514 430568049 440223963 452427345 433519313 432351371 432351371 ...
## $ contbr_employer : chr "RETIRED" "RETIRED" "SELF-EMPLOYED" "ABBOTT NUTRITION" ...
## $ contbr_occupation: chr "RETIRED" "RETIRED" "OWNER" "DESIGNER" ...
## $ contb_receipt_amt: num 14 14 250 100 50 100 50 100 100 100 ...
## $ contb_receipt_dt : chr "8-May-15" "18-May-15" "16-Apr-15" "23-Aug-15" ...
## $ receipt_desc : chr "nan" "" "" "" ...
## $ memo_cd : chr "X" "X" "" "" ...
## $ memo_text : chr "TRANSFER FROM RUBIO VICTORY" "TRANSFER FROM RUBIO VICTORY" "" "* EARMARKED CONTRIBUTION: SEE BELOW" ...
## $ form_tp : chr "SA18" "SA18" "SA17A" "SA17A" ...
## $ file_num : int 1029436 1029436 1029436 1029414 1029414 1029436 1029414 1029436 1029436 1029436 ...
## $ tran_id : chr "SA18.749749.2.0615" "SA18.752863.2.0615" "SA17.793795" "VPF7BEZ5R79" ...
## $ election_tp : chr "P2016" "P2016" "P2016" "P2016" ...
dim(oh)
## [1] 9541 27
The Ohio 2016 Presidential Campaign Finance from Federal Election Commission website has 9598 observations and 18 variables, each observation indicates a donation transaction.
After processing the data and adding additional variables to help with analysis, the new data set has 9 additional variables, and 57 observations were dropped. The reason of the observations is because some donation amount has negative value. While it may be human error (it’s impossible to give negative dollars), I filtered them out from the dataset for a cleaner view of the donations.
total_donation = sum(oh$contb_receipt_amt)
total_donation
## [1] 4422674
Until November 20, 2015, total donations made to presidential candidates in Ohio grossed about 4.42 million US dollars. Where do these dollars flow to?
On the first look, Republican party seems to take the majority share of the donor contribution: Republican candidates have almost 3.5 million donations in total, which is 3 times than that of the Democratic party, with only 892 thousand dollars in donation.
If we look merely at the aggregated donations, it seems the Republican party is leading the Democrats in terms of donor supports in Ohio; however, a closer look at the candidate level tells a different story.
## Source: local data frame [2 x 4]
##
## party money_received cand_num donation_per_cand
## (chr) (dbl) (int) (dbl)
## 1 democrat 892230.1 4 223057.5
## 2 republican 3530444.2 17 207673.2
It turns out the Democratic party has only 4 candidates, while Republican party has more than 4 times of the amount of candidates - 17 candidates in total. Therefore, the total amount of donations received from the Republican party could be skewed by the higher number of candidates in the race so far.
Looking at the donation received per candidate, we get a different view. Each Democratic candidate received on average 223 thousand dollars in donation, which is 7% more than the donation received on average per Republican candidate (207 thousand dollars).
However, this is still at an aggregate level, and does not necessarily reflect the donation variance between each candidate. We need to drill further down to individual candidate level for additional patterns in donations.
## Source: local data frame [21 x 5]
##
## party cand_last money_received avg_donation_amt donor_num
## (chr) (chr) (dbl) (dbl) (int)
## 1 republican Christie 300.00 100.0000 3
## 2 republican Pataki 1250.00 625.0000 2
## 3 republican Webb 2500.00 357.1429 7
## 4 republican Jindal 3000.00 600.0000 5
## 5 republican Perry 3185.00 318.5000 10
## 6 democrat Lessig 4012.77 182.3986 22
## 7 republican Huckabee 4948.00 141.3714 35
## 8 democrat O'Malley 10350.00 1478.5714 7
## 9 republican Santorum 10598.46 365.4641 29
## 10 republican Graham 16775.00 621.2963 27
## .. ... ... ... ... ...
Looking at the donations received by each candidate, a couple things jumped out:
First, it’s very obvious that most of the total donations in Ohio are received by one candidate - John Kasich. This should not come as surprising as Kasich is the current Ohio State Governor since 2011, and he seems to be able to garner a majority of monetary support from local donors. In fact, of all the donations made in Ohio, almost 50% of them went to Kasich (which equates to 61% donation made to Republican party), and the remaining 50% is received by the rest of the 16 candidates.
Second, within each party, the donation is concentrated on only a few top front-runners. For Democratic party, Hilary Clinton and Bernie Sanders are the two leading candidates, taking 98% of total Democratic donations in Ohio state, and of which, 85% goes towards Hilary.
For Republican party, Kasich has the majority of the donations in Ohio, and leading the rest of the Republican candidates by a wide margin. The other Republican front runners - Carson, Rubio, Cruz and Bush takes a total of 32% Republican donors’ donations, and the reamining 7% goes towards the rest of the 12 Republican runners.
We are able to see clearly who are the “front-runners” (defined by the donation received in each party) in Ohio State from the chart above.
Because most of the donations are concentrated on only a few candidates, let’s examine the donations for the candidates who received more than 5% of total dollars in each political party.
For Democratic party, they are Hilary Clinton and Bernie Sanders; for Republican party, they are John Kasich, Marco Rubio, Ben Carson, Ted Cruz and Jeb Bush.
Kasich received the most amount of donations in Ohio, and from the boxplot above, it’s obvious to see there are a few donors that made a huge amount of donations to Kasich, in particular, more than 15 thousand dollars in on donation.
We know the donation data includes individual donors and companies, as individual donors are more likely to make relatively lower amount of donations than firms, this may indicate that some companies may be making a significant amount of donations to a couple candidates, which we will examine later.
If we filter out the outliers (donors who made significant amount of donations), it’s still apparent that Kasich still receive large donation amounts from each donors with the highest mean donations and the 50% quartile of each donation amount is much higher than all other candidates.
Bush on the other hand, also has high average donation dollar amounts, this could mean he received donations from both small individual donors and possible a few large donations from companies.
From the average amount of donations made to each candidates, Kasich received on average $1,766 in donations, followed by Bush, who received an average of $1,442, and leading Rubio by a great margin, whose average donation amount is $610.
This also supports the theory that big donors, which are most likely companies or corporations in Ohio are backing Republican donors, specifically Kasich and Bush.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 40.0 100.0 463.5 250.0 16200.0
The histogram shows the distribution of donations made in Ohio, which is heavily skewed towards the higher end. This means there are a couple of donors that made large amount of donations. In fact, there is one donation with $16,200 which skews the distribution.
As the average donation is higher than the 3rd quartile (avg donation is $463, 3rd quartile donation is $250), it’s very apparent that most of the donations are in smaller amounts. Since the distribution of donation amount is heavily skewed, median is the more robust statistics to describe the centrality of the donation data, and most of the donations are centered around $100.
## Source: local data frame [2 x 4]
##
## entity total_donations avg_donations donor_num
## (chr) (dbl) (dbl) (int)
## 1 company 41000 4556 9
## 2 individual 4381674 460 9532
In Ohio, 99% of the donations dollars come from individual donors; however, while only 9 out of 9541 donors were companies, their average donation amount far exceeded the average donations made by individuals by almost 10 fold, with each company donating about $4500 and average donation amount by individuals is around $460.
Plotting donors on a scatter plot by the amount donated and dates donations were made, it is easier to see the distribution of donations made by each donors to the two parties.
It seems that most of the donations are between $10 to $1000, with a few large amounts of donations over $5000.
Not surprisingly, the largest donation came from a company (“Benesch Friedlander Coplan & Aronoff LLP, aka Benesch Law Firm), which is a business law firm headquartered in Cleveland, Ohio. Benesch Law Firm made the single largest donation in Ohio to Kasich, as the firm seems to have a special relationship with the Ohio Governor. The firm’s former employee, Billie Fiore, made political contributions to Kasich dated back to 2010, and in 2011, and she was then appointed to the Board of Trustees of Central Ohio Technical College in the same year.
From the chart, we can see that the majority of the donations are in small amounts. In fact, 75% of the donations were under $250.
## 0% 25% 50% 75% 100%
## 0 40 100 250 16200
Therefore, zooming in to donations smaller than $250, we are able to see more details of the majority of the donors in Ohio.
Looking at the 75% of the total donations, it’s easy to see that most of the donations were made after April 2015, as most of the candidates officially announced the candidacy after April 2015 (except for Ted Cruz, who announced in March, 2015).
From the histogram above, we can also see that most donations are at whole numbers like $50, $100, etc. This means that most donors are donating in $50 increment amounts.
A New York Times article quoted from a Crowdpac research that “the wage and wealth gap between men and women plays a role [in fund-raising gap]”, and in the states where the gender income equality is smaller, more of politicians’ big donors are women. And it continues to say that “women give more to liberals and to other women.”
In the Ohio presidential campaign financing data, we are able to predict each donor’s gender by their first names. Hence, this allows us to examine if the points made by the research is true:
Is a particular gender more likely to make political donation than the other? Does it also reflect on the amount donated to political candidates?
Are female donors more likely to donate to liberal candidates and/or to women candidates?
## Source: local data frame [2 x 4]
##
## contbr_gender total_donation avg_donation count
## (chr) (dbl) (dbl) (int)
## 1 female 1637453 433.4179 3778
## 2 male 2658239 478.2725 5558
In Ohio, there are more male donors than female donors for the presidential election, with 5,558 men and 3,778 women donating to presidential candidates, which equates to 50% more male donors than female in Ohio. And the inequality does not stop there.
Looking at the average amount donated, men donated $478.27 on average while women donated $433.42 to presidential candidates, a 10% difference (or $44.85 difference) between the genders.
To test out if the difference donated between genders is statistically significant, we can validate by a one-tailed hypothesis testing, with null hypothesis as there is no difference in donation amount between genders, and the alternative hypothesis would be donations from females are less than men.
##
## Welch Two Sample t-test
##
## data: female_donation and male_donation
## t = -2.5014, df = 8307.7, p-value = 0.006195
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -15.35582
## sample estimates:
## mean of x mean of y
## 433.4179 478.2725
From the test output, we can see that the donations from women is indeed significantly lower than that of men in Ohio (p = 0.006). According to The American Association of University Women (AAUW), Ohio ranks towards the bottom part of the gender income equality in the USA (ranks 33), with female earning only 78% of the predominant wage earned by men. Hence, our presidential election campaign finance data in Ohio helped to corroborate that the gender income inequality does reflect on the political campaign financing between genders.
The distribution of the amount donated by gender is also heavily skewed towards the higher end, especially by male donors, as there are a couple of male donors with more than $5000 donations, and the majority of the donors are concentrated on the lower end across genders.
Hence, we need to transform the axis to get a granular look into the majority of donation for both genders.
With log10 transformed donation amount, we have a closer to normal distribution. We can see from the chart that not only more men made donations to presidential candidates at every donation amount level, they also have higher donation amounts than women, especially on the higher end of donation amount.
Therefore, it also supports the hypothesis that if there is income inequality between genders in a state, it would also reflect on the political contribution.
Another claim from the research is that women tend to donate more to liberal and/or women candidates.
Although there is a certain complexity in determining if an individual candidate is liberal or conservative, and the scale at which how liberal or conservative they are, we are painting a broad stroke, with the Democratic party as liberal in general and the Republican party is conservative.
## Source: local data frame [4 x 5]
## Groups: contbr_gender [?]
##
## contbr_gender party total_donations avg_donations donor_num
## (chr) (chr) (dbl) (dbl) (int)
## 1 female democrat 506440.2 383 1322
## 2 female republican 1131012.7 461 2456
## 3 male democrat 364326.7 276 1321
## 4 male republican 2293912.0 541 4237
Looking at female donor group only, Ohio women do not seem to donate more to liberals than conservatives on average - only 35% women donated to the Democratic party, and the average amount donated to Democrats are 20% less than to the Republicans ($383 vs. $461).
However, when comparing men in Ohio, women have higher donations to the liberals relatively. Ohio men donated $276 on average to Democrats, and $541 to Republicans, a 49% difference, and only 23% men donated to Democrats.
So while Ohio women do not donate more to liberals within group, they do have higher tendency to make donations to candidates from the liberal party than men in Ohio.
## Source: local data frame [4 x 5]
## Groups: contbr_gender [?]
##
## contbr_gender cand_gender total_donations avg_donations donor_num
## (chr) (chr) (dbl) (dbl) (int)
## 1 female female_candidate 469151.6 508 923
## 2 female male_candidate 1168301.4 409 2855
## 3 male female_candidate 323900.4 396 817
## 4 male male_candidate 2334338.4 492 4741
When it comes to donations to candidates by gender, female donors tend to donate more on average to female candidates, while men donated more to male candidates more, and cross-gender donation amounts tend to be lower than same-sex.
While only 24% women in Ohio donated to female candidates (Clinton and Fiorina), the donation amounts to females far exceeds that to male candidates by 24% - $508 vs. $409.
Comparing to men, the disparity is even more apparent: only 15% men donated to female candidates, and the average donation amount to female candidates is much less than male - $396 to female vs $492 to male, a 19% difference.
Hence, it also seems true that women tend to make more political donations to female donors than male.
Looking at the distribution of donors by their geographical locations in Ohio, it is not surprisingly to see that most of the donors are concentrated in the big cities like Cincinnati, Columbus, Cleveland and Toledo.
But what about the donations amounts by geographical locations, do big cities tend to donate more per capita than smaller ones? Let’s look at aggregate donation amounts and donation per capita by counties for comparison:
The total donation amount is the highest in the big counties mentioned above, as there are more donors in the densely populated areas. However, looking at donations per capita by county, some counties showed higher donations per capita like Adams, Belmont and Monroe. While we don’t know if it’s attributed by the higher the income per capita in those counties or if people in those counties are particularly active in donating to political candidates, the higher donation per capita in those counties indicate those counties may important roles in the primary election in Ohio; however, more data is needed to determine this.
In Ohio, the campaign finance donations distributions by candidates shows an interesting fact - More than 90% of the donations are received by the top 2-5 candidates, and the “front-runners” in each party leads the followers by a great margin.
As we saw from the analysis above, Ohio women seem to donate more to liberal parties and candidates more than men; however, when it comes to donation tendency by candidate genders, we see a clearer trend that women have high tendency to donate more to female candidates much more than male donors, and male donors donates predominantly more to male candidates than to female runners.
While we need more data to corroborate this, it seems that counties with high total amount of donation do not necessarily donate more per capita. Some counties have relatively lower gross amount of donations, but each person is donating more to candidates, which could be indicative of a strong support for a particular candidate in that county. Therefore, the counties with high donation per capita could be influential in the local primary election.
I encountered several issues when examining the 2016 Presidential Campaign finance data. The most common ones are inadequate data. For example, one of the questions I set out to answer the relationship of gender and donations. While I have names of each donor, their genders are unknown from the original the dataset. I was able to use extra packages and data to fill in the gender data, it’s not deterministic, rather, a probabilistic view of each donor’s gender. Therefore, we need to interpret the data with caution.
Despite lack of necessary data, some of the data in the dataset is useful in making linkage to the key external data and helped with the analysis. Geographical location is one example. Although we only have zip codes and county names for each donor, we don’t have the latitude and longitude data for them. With the extra information from other datasets and packages like zipcode, and maptools, I am able to connect the dots and conduct the analysis I set out to perform.
Lastly, the dataset I have is for Ohio only. Although it helps me to understand the election status in Ohio, I am not able to extrapolate the conclusions nationwide.
By looking at the Presidential donation data in Ohio, I can get a glimpse of the status of the presidential election at the current stage.
It’s not surprising that specific candidate like John Kasich holds “Home field advantage”, as he was able to secure about 50% of total donations in Ohio being the state’s Governor. However, 90% the donations go to the top few candidates across the Republican and Democratic party, which could be a superficial indication of the supports received in Ohio for each candidate, and it could also be indicative of who may be the front-runners for the primary election in Ohio.
Gender does play an important role in presidential campaign finance. On average, females donate less than males at both the aggregate and average level. While we don’t have the sufficient data to examine why, we do see other important facts on gender and campaign finance - women tend to donate more to the liberals and to female candidates.
Lastly, big counties have high gross donations, but some smaller counties have high donations per capita. We don’t necessarily know exactly the reason of the divergence, but counties with high donation per capita may indicate its strong support for a particular candidate, and may play an influential role in the local primary election.
The campaign finance data I analyzed is for Ohio state only, and it would not be apprpriate to extrapolate to the other states or to the nation. Therefore, the future work may be to analyze the campaign finance data with all 50 states, and check if the insights we get from Ohio hold true on the national level.
Mapping, Stackoverflow https://uchicagoconsulting.wordpress.com/tag/r-ggplot2-maps-visualization/
Gender Wage Gap by State, AAWU: http://www.aauw.org/resource/archive-data-gender-wage-gap-by-state-and-congressional-district/
Men Dominate in Political Giving; Hillary Clinton’s Donors Are an Exception, New York Times http://www.nytimes.com/2015/10/15/upshot/the-gender-gap-in-political-giving.html?_r=0
Money Race, Crowdpac https://www.crowdpac.com/money-race
Federal Election Commission http://fec.gov/