In this exlporatory data analysis we will take a look at all the presidential campaign contributions from the 2016 presidential election that originated from the state of North Carolina. This data comes from the Federal Election Commision and can be found here.
First lets import the data and take a look at what we are working with in the dataset
## [1] 158456 18
## Classes 'tbl_df', 'tbl' and 'data.frame': 158456 obs. of 18 variables:
## $ cmte_id : chr "C00580100" "C00577130" "C00577130" "C00575795" ...
## $ cand_id : chr "P80001571" "P60007168" "P60007168" "P00003392" ...
## $ cand_nm : chr "Trump, Donald J." "Sanders, Bernard" "Sanders, Bernard" "Clinton, Hillary Rodham" ...
## $ contbr_nm : chr "SELLATI, KASEY" "LYANSKY, YAN" "HAYWARD, KELLY" "DREHMEL, CLAIRE" ...
## $ contbr_city : chr "CARY" "MC LEANSVILLE" "PINEVILLE" "APEX" ...
## $ contbr_st : chr "NC" "NC" "NC" "NC" ...
## $ contbr_zip : int 27519 273019765 281347558 275398332 282114253 28037 27524 286552648 281177030 28054 ...
## $ contbr_employer : chr "SAVER MAGAZINE" "BENNETT COLLEGE" "AMERICAN AIRLINES" "NORTH CAROLINA STATE UNIVERSITY" ...
## $ contbr_occupation: chr "SELF EMPLOYED" "PROFESSOR" "CUSTOMER SERVICE AGENT" "PROFESSOR" ...
## $ contb_receipt_amt: num 52 27 33 60 116 ...
## $ contb_receipt_dt : chr "28-NOV-16" "05-MAR-16" "06-MAR-16" "26-APR-16" ...
## $ receipt_desc : chr NA NA NA NA ...
## $ memo_cd : chr "X" NA NA "X" ...
## $ memo_text : chr NA "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" "* HILLARY VICTORY FUND" ...
## $ form_tp : chr "SA18" "SA17A" "SA17A" "SA18" ...
## $ file_num : int 1146165 1077404 1077404 1091718 1091718 1146165 1146165 1091718 1091718 1146165 ...
## $ tran_id : chr "SA18.168022" "VPF7BKXT595" "VPF7BKZ8J22" "C4768028" ...
## $ election_tp : chr "G2016" "P2016" "P2016" "P2016" ...
## - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 158456 obs. of 5 variables:
## - attr(*, "spec")=List of 2
## ..- attr(*, "class")= chr "col_spec"
So this data contains 158,456 different contributions to different election campaigns and 18 variables containing such information as contribution amount, name of the contributor, and contribution date. However, a quick look shows that date is a chr vector so that will need to be corrected since date will be a key variable. The best way to do that is to change it using the as.Date() function. I’m also going to clean up the zip codes and remove the extra digits only keeping the five digit root.
There’s not a whole lot of numeric data in these tables except for one of the main data points which is actual contribution amounts so lets take a look at the summary of that data.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5400.00 15.00 27.00 98.84 80.00 10800.00
Let’s take a look at the histogram of the distribution of campaign contribution amounts
There are some negative contribution amounts here that indicate refunds of contributions for various reason. I don’t think they’ll be usefull to include because I want to look at the population of what was actually given so I’ll filter those out and see how the distribution of the contributions look now.
That’s still a pretty heavily skewed data set so let’s take a look at the distributions with a log10 scale on the contribution amounts. I’ll also break down how many contributions there where per political party in this next graph as well. To do this I went through and added a column that detailed the party of the contribution based on the candidate. Clinton, Webb, O’Malley, and Sanders were Deomcrats, MacMullin, Johnson, and Stein were Independent, and the rest were grouped as Republicans. I also turned this column into a factor, along with the candidate name column, so I can perform other operations on it later.
As shown here a great majority of the contributions fall between 10 and a 100 dollars something that couldn’t be told from the histograms above. It also shows an almost normal distribution of the contributions as well. Let’s facet wrap these and see how each distribution looks by party.
Still not able to see the Independent contribution distribution at all. The next step is to subset the data to only Independents and then run our histogram again to get a good look at the Independents data. Still from this graph we can see that the Deomcrats may have gotten more contributions but the contributions were generally smaller in actual dollar amount. We’ll take a look at the actual distribution of party contributions after this next graph.
Now there is a clear picture of the distribution of Independent contributions. They are dwarfed by the contributions to the Deomcrat and Republican Parties. Let’s look at the actual numbers of the distribution and see what they say.
## $D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.38 10.00 25.00 77.67 50.00 5000.00
##
## $I
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 37.0 100.0 210.3 250.0 2700.0
##
## $R
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8 25.0 50.0 167.4 100.0 10800.0
Now in graph form:
Here we can see that the Republicans definitely have more outliers at the high end of the scale. And although the Deomcrats had higher counts of contributions, their mean and median contribution values were lower than both the Independent candidates and the Republican candidates. What does that mean in terms of the total amount of contributions? What party, and what candidate, recieved the most contributions by count and dollar amount? We’ll take a look at that in the next section.
So the first graph we’ll look is the count of total contributions to each party.
So the Deomcrats recieved more than double the total contributions than the Republicans in the 2016 presidential election but what about the actual amount of the contributions? Given the fact that in our earlier distributions the Replublicans averaged more money per contribution the results will be a lot closer than this graph.
Yep in terms of dollar amounts the Republicans donated close to 7.75 million dollars while Democrats are ust a little above 8.5 million. So it seems that the higher average contribution by Republican contributor’s was almost enough to counteract the Democrats edge in contribution totals. And in a presidential election where the money raised was counted in billions this small difference wouldn’t have been very noticeable.
Again these are contributions to the party’s regardless of candidate. Next we’ll break down the numbers by candidate, but first lets take a look at the Independents seperate. In both of these graphs we can’t really tell how much the independents really contributed. Let’s look at those contributions.
The Independent candidates barely garnered over $125,000 in total contributions and only a little over 600 individual contributions. Showing again that third parties in America really aren’t consider viable and the individual contributions back up that belief.
Let’s see the campaign distribution histogram faceted by candidate before digging deeper into the data. Since these numbers are very spread out in their amounts so I’ve applied a log10 transformation to both the x axis and the counts.
Well that shows that Clinton, Sanders, Trump, Carson, and Cruz got the majority of contributions during the campaign. And as we’ll see in later graphs when it came to contributions Hillary Clinton led the way handily. Let’s try looking at the data another way and see if can discern more info about contribution distributions
While an interesting look at the data still a whole lot of noise to see anything properly like the histogram above. It appears that the best way to look at the campaign contributions was the facet wrap early, but to add more info to that lets look at the box plots of the contribution distributions for each candidate.
One interesting thing here is that Trump has a higer median contribution amount than his main opponents Ben Carson and Ted Cruz, both Cruz and Carson definitely have more outlier contributions above the $5,000 mark. Now that the distribution of the contributions has been looked at by candidate let’s take a look at the
And now the total money amounts contributed:
Again because of the large values of Hillary Clinton it’s hard to get a feel for all the data of all the candidates. So I’ve broken down the graphs above by each party and then added them into one graph using the grid.arrange function let’s see if that paints a clear picture of the data.
Here are the tables for each one since it’s hard to see some of the values especially for the O’Malley and Webb.
## # A tibble: 4 x 2
## cand_nm `Contribution Count`
## <fctr> <int>
## 1 Clinton, Hillary Rodham 69508
## 2 O'Malley, Martin Joseph 52
## 3 Sanders, Bernard 40313
## 4 Webb, James Henry Jr. 26
## # A tibble: 16 x 2
## cand_nm `Contribution Count`
## <fctr> <int>
## 1 Bush, Jeb 523
## 2 Carson, Benjamin S. 7852
## 3 Christie, Christopher J. 36
## 4 Cruz, Rafael Edward 'Ted' 12394
## 5 Fiorina, Carly 698
## 6 Graham, Lindsey O. 158
## 7 Huckabee, Mike 225
## 8 Jindal, Bobby 20
## 9 Kasich, John R. 431
## 10 Lessig, Lawrence 24
## 11 Paul, Rand 663
## 12 Perry, James R. (Rick) 5
## 13 Rubio, Marco 2200
## 14 Santorum, Richard J. 23
## 15 Trump, Donald J. 20520
## 16 Walker, Scott 130
## # A tibble: 3 x 2
## cand_nm `Contribution Count`
## <fctr> <int>
## 1 Johnson, Gary 340
## 2 McMullin, Evan 82
## 3 Stein, Jill 185
Now here are the same graphs just with contribution amounts instead of counts.
Here are the tables for these graphs as well:
## # A tibble: 4 x 2
## cand_nm `Total Money`
## <fctr> <dbl>
## 1 Clinton, Hillary Rodham 6746156.34
## 2 O'Malley, Martin Joseph 13693.25
## 3 Sanders, Bernard 1762192.81
## 4 Webb, James Henry Jr. 13850.00
## # A tibble: 16 x 2
## cand_nm `Total Money`
## <fctr> <dbl>
## 1 Bush, Jeb 484586.45
## 2 Carson, Benjamin S. 1183718.56
## 3 Christie, Christopher J. 19495.00
## 4 Cruz, Rafael Edward 'Ted' 1054865.23
## 5 Fiorina, Carly 120046.86
## 6 Graham, Lindsey O. 241147.08
## 7 Huckabee, Mike 89465.08
## 8 Jindal, Bobby 21285.00
## 9 Kasich, John R. 161739.16
## 10 Lessig, Lawrence 7161.75
## 11 Paul, Rand 128991.30
## 12 Perry, James R. (Rick) 1555.00
## 13 Rubio, Marco 517081.24
## 14 Santorum, Richard J. 4240.00
## 15 Trump, Donald J. 3514914.41
## 16 Walker, Scott 135700.00
## # A tibble: 3 x 2
## cand_nm `Total Money`
## <fctr> <dbl>
## 1 Johnson, Gary 80125.21
## 2 McMullin, Evan 8092.00
## 3 Stein, Jill 39430.75
Hillary Clinton leads head and shoulders above the rest of the candidates in both the number of contributions and the money coming from those contributions. Bernie Sanders was came in second with the number of contributions but his dollar amount of contributions was third to Donald Trump. Even though Sanders had 19,553 more contributions, Trump raised $1,757,145.60 more in total money.
With such a high level of support coming from the state it is surprising that Hillary lost it in the election. Obviously money doesn’t translate to electoral victories all the time but for the most part they do. A deeper look at this data would look at other battleground states and see if Clinton lost those states despite receiving much more money than her oppoenent Donald Trump.
Ok now lets take a look at some of the location data. The election data includes the zipcode of every contributor which was cleaned up earlier. In order to convert those zipcodes to actual location data I imported the zipcodes package and loaded the zipcode dataframe. This dataframe contains the longtitude and latitude of every zipcode in the nation. Then I merged that with the zipcodes counts table I turned into a dataframe. I then plotted each zipcode and colored it by the count of contributions.
I also imported the North Carolina map broken down into individual counties in order to map distributions based on the location data from the zipcode database.
Getting the geospatial data to work properly was a real struggle at first. There aren’t a whole lot of clear tutotrials that describe how to do exactly what I wanted with these graphs. But after many hours of reading and trial and error I was able to get things pieced together to demonstrate one of the key components of this data which is the geographical data. With these maps we can see clearly where the contributors were located and this kind of data most likely is crucial to future campaigns looking to garner contribution money for their campaigns.
Also there are a few outliers that fall in South Carolina and Virginia showing that there may be some problems with the data set that needs to be cleaned up. Either way I’ll adjust my limits and try and exclude them from future maps. You can see the clusters of the most contributions form around the major metropolitan areas of Raleigh/Durham, Charlote, and the Triad Area (Winston_Salem, Grensboro, High Point). You can also see smaller groupings around Asheville and Wilimington as well. This is to be expected as the larger cities will definitely have more contributions because there are more people.
Let’s see if we can get a better look by using a density map of contributions:
So we’ve done counts but what about actual money contributed? One would expect it to match up with the counts location data but let’s see.
And here we see that indeed the contribution amounts match up with the counts for the most part. But lets look at major donors which we’ll classify as total contribution amounts over 10,000 from a particular zip code.
Here are the top five locations by zipcode that contributed the most money to presidential campaigns in
## zip city Dollars_Cont
## 183 27517 Chapel Hill 386673.6
## 180 27514 Chapel Hill 352602.6
## 514 28211 Charlotte 345161.4
## 269 27705 Durham 326565.2
## 510 28207 Charlotte 318752.9
## 182 27516 Chapel Hill 298211.2
Let’s take a look at the distribution of contributions for these top contributing zip codes:
It makes sense that if the Deomcrats raised the most money in the state, the zipcodes with the most contribution amounts contributed overwhelmingly to the Democratic party. This is not a surprise since four of the six are located in Chapel Hill and Durham the location of University of North Carolina and Duke University. Durham also has a large African American population which has in recent history been aligned with the Deomcratic Party. Still it is a little unorthodox as the Republican party is often considered the party of the rich and well off in the United States. Before running my earlier studies I would have hypothesized that the Republicans would have raised more money and the largest contributions would come from the more Republican leaning suburbs of the larger cities. The numbers show that this is clearly not the case for this presidential election at least.
Let’s take a look at the box plot graphs for the same data:
The median values of three of the six zipcodes are well above the median of the Democratic contributions. Zipcode 28207 is located in Charlotte and had 50% of their contributions above the $100 mark. This is probably explained by the income statistics for the area where the median income is 127,729. Here’s the summary stats for each of the top zipcodes contributions as well.
## $`27514`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 25.0 50.0 171.3 100.0 2700.0
##
## $`27516`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 19.00 30.00 74.59 75.00 2700.00
##
## $`27517`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.5 15.0 40.0 112.2 100.0 2700.0
##
## $`27705`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 15.00 27.00 97.76 75.00 2700.00
##
## $`28207`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.5 25.0 100.0 532.0 500.0 2700.0
##
## $`28211`
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.93 25.00 75.00 314.58 250.00 5400.00
In fact all the zipcodes have a median and mean well above the the median and meanfor the Deomcrats as a whole. I haven’t looked at the income data for each of these zipcodes but most likely the people that live in these zipcodes are on the higher end of the income spectrum.
In this next plot I tried to combine all the data above into one plot. To show the areas of North Carolina that contributed more to each political party and the amount of money contributed. I created a percentage to the Deomcrats to color the points and then adjusted the alpha of each point by tieing it to the contribution amount of the majority party. If more than fifty percent of the contributions went to Republicans I used the total Republican contribution to dictate the plots alpha and vice versa for the Democrats.
With this map we can see two solidly Democratic areas in the state; Durham/Chapel Hill area and the Asheville area at least in terms of the percentage of contributions. The larger metropolitan areas such as Raleigh, Charlotte, and Winston-Salem/Greensboro definitely lean towards the Democrats they are much closer to purple and a 50/50 split in contributions than solidly blue.
The main questions I wanted to answer with this data analysis were which party/candidate had the most contributions in North Carolina, which party/candidate raised the most money, and where in North Carolina these contributions were coming from.
This graph is probably the clearest representation that the Democrats recieved the most number of contributions, but it also shows that when it comes to larger contributions, i.e. contributions over $100 that the Republicans match if not exceed the Democrats in that area. Seeing as how most people in upper income ranges typically lean Republican this is to be expected.
This plot further breaks down the first plot and shows the best overview of the contributions by party AND candidate. It’s a deeper dive into the data and one can see what happened with the candidates in the primaries without actually knowing what happened and then can interpolate that to presidential race nationwide.
This next graph is perhaps the most important and I think shows why Clinton lost as a candidate despite outraising Trump in the state. If one uses contributions as a stand in for overall support you can see why Clinton lost the stateAs noted above the only true Democratic stalwarts were Durham, Chapel Hill, and Asheville. However the major metropolitan centers were split more evenly. Combine that with the overall rural support to Donald Trump it would have been enough to swing the election to Trump despite the oeverwhelming financial support coming from the state.
The data contained contribution info on almost 16,000 contributions. I was definitely surprised by how much more money Hillary Clinton raised over her opponent Donald Trump given her loss in the general election. For future evaluations I would like to look at other battleground states Clinton lost and see if the trend of outraising her opponent yet losing the state continued.
I was also surprised that the larger cities in North Carolina didn’t lean stronger Democratic in their contributions and it was the smaller towns of Durham and Chapel Hill that lead the way in terms of overall amount of contributions.
Overall though most things went as expected. More of the Republican donations came from the less urban zipcodes in the state while the more urban leaned Democrat. The Repbulicans had an higher average and median contribution rate which more than helped make up for their lack of overall contributions.