By Susan Li
March 6 2017
This is an exploration of 2016 US presidential campaign donations in the state of Massachusetts. For this exploration data analysis, I am researching the 2016 presidential campaign finance data from Federal Election Commission. The dataset contains financial contribution transaction from April 18 2015 to November 24 2016.
Throughout the analysis, I will attempt to answer the following questions:
## [1] 295667 18
## 'data.frame': 295667 obs. of 18 variables:
## $ cmte_id : chr "C00577130" "C00577130" "C00577130" "C00577130" ...
## $ cand_id : chr "P60007168" "P60007168" "P60007168" "P60007168" ...
## $ cand_nm : chr "Sanders, Bernard" "Sanders, Bernard" "Sanders, Bernard" "Sanders, Bernard" ...
## $ contbr_nm : chr "LEDWELL, BENJAMIN" "LEDWELL, BENJAMIN" "LEDWELL, BENJAMIN" "LEDWELL, BENJAMIN" ...
## $ contbr_city : chr "NEWBURYPORT" "NEWBURYPORT" "NEWBURYPORT" "NEWBURYPORT" ...
## $ contbr_st : chr "MA" "MA" "MA" "MA" ...
## $ contbr_zip : int 19504700 19504700 19504700 19504700 10269501 2420 21392903 24621313 25542718 12016408 ...
## $ contbr_employer : chr "ANDOVER POLICE, MA." "ANDOVER POLICE, MA." "ANDOVER POLICE, MA." "ANDOVER POLICE, MA." ...
## $ contbr_occupation: chr "POLICE OFFICER" "POLICE OFFICER" "POLICE OFFICER" "POLICE OFFICER" ...
## $ contb_receipt_amt: num 40 35 50 27 100 ...
## $ contb_receipt_dt : chr "04-Mar-16" "04-Mar-16" "06-Mar-16" "06-Mar-16" ...
## $ receipt_desc : chr "" "" "" "" ...
## $ memo_cd : chr "" "" "" "" ...
## $ memo_text : chr "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" ...
## $ form_tp : chr "SA17A" "SA17A" "SA17A" "SA17A" ...
## $ file_num : int 1077404 1077404 1077404 1077404 1077404 1146165 1091718 1091718 1091718 1077404 ...
## $ tran_id : chr "VPF7BKWGAE6" "VPF7BKWGCP3" "VPF7BKYF9S6" "VPF7BM0K9E6" ...
## $ election_tp : chr "P2016" "P2016" "P2016" "P2016" ...
This dataset contains 295667 contributions and 18 variables. To start, I want to have a glance how the contribution distributed.
I realized that there were so many outliers(extreme high and extreme low values), it was impossible to see details. And there were negative contributions too.
##
## 5 10 100 50 25
## 16780 26856 34241 36978 39546
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -84240.0 15.0 28.0 116.1 100.0 86940.0
Transforming to log10 to better understand the distribution of the contribution. The distribution looks normal and the data illustrated that most donors made small amount of contributions.
Interesting to see how people donate. the most frequent amount is $25, followed by $50, then $100. And the minimum donation was -$84240 and maximum donation was $86940.
To perform in depth analysis, I decided to omit the negative contributions which I believe they were refund and contributions that exceed $2700 limit, because it breaks Federal Election Campaign Act and will be refunded. This means 5897 contributions are omitted.
sum(ma$contb_receipt_amt >= 2700)
## [1] 3244
sum(ma$contb_receipt_amt < 0)
## [1] 2653
I will need to add more variables such as candidate party affiliate, donors’ gender and donors’ zipcodes.
After processing the data and I have added 5 additional variables to help with the analysis, and removed 5897 observations because they were either negative amount or amount exceed $2700.
The additional variables are:
After adding the variables, I wonder what the contribution distribution looks like across the parties, candidates, genders and occupations.
## # A tibble: 3 × 5
## party sum_party number_of_candidate mean_party n
## <chr> <dbl> <int> <dbl> <int>
## 1 democrat 25832080.8 5 5166416.2 243358
## 2 others 270771.3 3 90257.1 981
## 3 republican 4605409.9 17 270906.5 24556
## [1] 268895
Until November, 2016, total number of donations made to the presidential election near 269K, and the Democratic party took more than 243K and almost 10 times of the number of donations made to the Republican party.
##
## Bush, Jeb Carson, Benjamin S.
## 388 2591
## Christie, Christopher J. Clinton, Hillary Rodham
## 133 147534
## Cruz, Rafael Edward 'Ted' Fiorina, Carly
## 5624 469
## Gilmore, James S III Graham, Lindsey O.
## 1 110
## Huckabee, Mike Jindal, Bobby
## 91 1
## Johnson, Gary Kasich, John R.
## 457 755
## Lessig, Lawrence McMullin, Evan
## 130 20
## O'Malley, Martin Joseph Pataki, George E.
## 269 3
## Paul, Rand Perry, James R. (Rick)
## 490 2
## Rubio, Marco Sanders, Bernard
## 1578 95408
## Santorum, Richard J. Stein, Jill
## 15 504
## Trump, Donald J. Walker, Scott
## 12256 49
## Webb, James Henry Jr.
## 17
There were total 25 candidates, Hillary Clinton was the leader in the number of contributions, followed by Bernard Sanders, then Donald Trump.
## # A tibble: 2 × 3
## gender sum_gen n_gen
## <chr> <dbl> <int>
## 1 female 15029545 150055
## 2 male 15678717 118840
Interesting to know that there were a lot more women than men to made donations, about 26% difference. Was it because of Hillary Clinton? We will find out later.
Who are those donors?
## # A tibble: 10 × 4
## contbr_occupation sum_occu mean_occu n
## <ord> <dbl> <dbl> <int>
## 1 RETIRED 4480345.1 108.43830 41317
## 2 NOT EMPLOYED 1417174.5 53.55103 26464
## 3 TEACHER 389587.2 56.29060 6921
## 4 ATTORNEY 1313684.0 212.50146 6182
## 5 PROFESSOR 876504.6 142.56744 6148
## 6 PHYSICIAN 842674.2 160.11290 5263
## 7 CONSULTANT 805573.5 192.12342 4193
## 8 SOFTWARE ENGINEER 361221.3 96.48006 3744
## 9 HOMEMAKER 686431.1 205.39530 3342
## 10 ENGINEER 309927.2 99.68709 3109
When we count the number of donors, retired people take the first place, followed by not employed people, teacher comes to the third, homemaker and engineer are among the least in terms of number of contributions.
## Min. 1st Qu. Median Mean 3rd Qu.
## "2014-09-25" "2016-03-12" "2016-06-01" "2016-06-01" "2016-09-18"
## Max.
## "2016-12-30"
And it is also interesting to see when people made contributions. The date distribution appears bimodal with period peaking around March 2016 or so and again close to the election.
There are 268895 contributions and 18 variables. The variables that interest to me and I will be using are:
Othere observations:
The main features in the dataset are party, candidate and contribution amount. I’d like to find the answers to my questions at the beginning of this report. I’d also like to try to use combination of variables to build a logistics regression model to predictive a donor’s contribution party.
Gender, occupation, time of the contribution, location are likely contribute to the contribution amount and contribution party. I think occupation probably contributes most to the average contribution amount, and gender probably contributes most to the contribution party.
I created 5 variables:
I omitted negative contributions because I believe they were refund, and I omitted contributions that exceed $2700 because because it breaks Federal Election Campaign Act and will be refunded.
## # A tibble: 3 × 5
## party sum_party number_of_candidate mean_party n
## <ord> <dbl> <int> <dbl> <int>
## 1 democrat 25832080.8 5 5166416.2 243358
## 2 others 270771.3 3 90257.1 981
## 3 republican 4605409.9 17 270906.5 24556
## ma$cand_nm
## Jindal, Bobby Perry, James R. (Rick)
## 250.00 750.00
## Gilmore, James S III Pataki, George E.
## 2700.00 3950.00
## Santorum, Richard J. McMullin, Evan
## 7620.10 9305.00
## Huckabee, Mike Webb, James Henry Jr.
## 11048.00 12100.09
## Walker, Scott Paul, Rand
## 46345.00 75241.48
## Lessig, Lawrence Fiorina, Carly
## 88483.86 111371.48
## Stein, Jill Graham, Lindsey O.
## 112948.03 147830.00
## Johnson, Gary Christie, Christopher J.
## 148518.27 161570.00
## O'Malley, Martin Joseph Carson, Benjamin S.
## 206496.39 269372.60
## Bush, Jeb Kasich, John R.
## 399839.00 410268.30
## Cruz, Rafael Edward 'Ted' Rubio, Marco
## 447206.14 622812.22
## Trump, Donald J. Sanders, Bernard
## 1887235.59 4603428.95
## Clinton, Hillary Rodham
## 20921571.54
## [1] 30708262
The total contribution amount made to the presidential candidates grossed over 30 million US dollars in Massachusetts. We can easily see where the money went.
Democratic party takes the majority share of donor contribution. Democratic party got more than 25.8 mollion US dollars in total, which is 5.6 times of what the Republican received. It is getting worse for the Republican when comes to the average amount, as there were 17 Republican candidates and only 5 Democratic candidates.
Same with the number of contributions, Hillary Clinton received the most contribution amount followed by Bernard Sanders then Donald Trump.
There is no surprise as Massachusetts is the home of Kennedy family, and routinely voted for the Democratic party in federal elections. And Hillary Clinton has decades-deep roots in Massachusetts politics.
To see contribution patterns between parties and candidates, I start with boxplots.
However, it is very hard to compare contributions among all parties at a glance because there are so many outliers. I will apply log scale and remove the ‘others’ party from now on because my analysis is focused on the Democratic party and the Republican party.
## ma$party: democrat
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04 15.00 27.00 106.10 75.00 2700.00
## --------------------------------------------------------
## ma$party: republican
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.80 27.17 50.00 187.50 100.00 2700.00
Now it is much better. Although the Republican has the higher median and mean, the Democrat has more variations and the distribution is more spread out. This indicates that the Democrat has more big and small donors.
## ma$cand_nm: Bush, Jeb
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5 100 267 1031 2700 2700
## --------------------------------------------------------
## ma$cand_nm: Carson, Benjamin S.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 25 50 104 100 2700
## --------------------------------------------------------
## ma$cand_nm: Christie, Christopher J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20 250 1000 1215 2700 2700
## --------------------------------------------------------
## ma$cand_nm: Clinton, Hillary Rodham
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04 19.00 36.30 141.80 100.00 2700.00
## --------------------------------------------------------
## ma$cand_nm: Cruz, Rafael Edward 'Ted'
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.00 50.00 79.52 100.00 2700.00
## --------------------------------------------------------
## ma$cand_nm: Fiorina, Carly
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 25.0 100.0 237.5 200.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: Gilmore, James S III
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2700 2700 2700 2700 2700 2700
## --------------------------------------------------------
## ma$cand_nm: Graham, Lindsey O.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5 500 1000 1344 2300 2700
## --------------------------------------------------------
## ma$cand_nm: Huckabee, Mike
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.0 16.0 50.0 121.4 100.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: Jindal, Bobby
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 250 250 250 250 250 250
## --------------------------------------------------------
## ma$cand_nm: Kasich, John R.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 50.0 200.0 543.4 500.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: Lessig, Lawrence
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.0 100.0 250.0 680.6 500.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: O'Malley, Martin Joseph
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 100.0 500.0 767.6 1000.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: Pataki, George E.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 250 625 1000 1317 1850 2700
## --------------------------------------------------------
## ma$cand_nm: Paul, Rand
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 25.0 50.0 153.6 100.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: Perry, James R. (Rick)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 250.0 312.5 375.0 375.0 437.5 500.0
## --------------------------------------------------------
## ma$cand_nm: Rubio, Marco
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.05 25.00 75.00 394.70 250.00 2700.00
## --------------------------------------------------------
## ma$cand_nm: Sanders, Bernard
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 48.25 50.00 2700.00
## --------------------------------------------------------
## ma$cand_nm: Santorum, Richard J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 17.55 100.00 508.00 500.00 2700.00
## --------------------------------------------------------
## ma$cand_nm: Trump, Donald J.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.80 28.00 72.02 154.00 150.00 2700.00
## --------------------------------------------------------
## ma$cand_nm: Walker, Scott
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 75.0 250.0 500.0 945.8 1000.0 2700.0
## --------------------------------------------------------
## ma$cand_nm: Webb, James Henry Jr.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 100.0 100.1 250.0 711.8 500.0 2700.0
Now the picture looks interesting. Christopher Christie, Lindsey Graham and George Patake have the highest median, Jeb Bush has the greatest interquartile range while Hillary Clinton and Bernard Sanders seem to have the lowest median. But Hillary Clinton has the most outliers(big pocket donors) than anyone else. Bernard Sanders has significant number of outliers as well.
Now let’s examine within parties.
## Source: local data frame [22 x 5]
## Groups: party [2]
##
## party cand_nm sum_can mean_can n
## <chr> <chr> <dbl> <dbl> <int>
## 1 republican Jindal, Bobby 250.00 250.0000 1
## 2 republican Perry, James R. (Rick) 750.00 375.0000 2
## 3 republican Gilmore, James S III 2700.00 2700.0000 1
## 4 republican Pataki, George E. 3950.00 1316.6667 3
## 5 republican Santorum, Richard J. 7620.10 508.0067 15
## 6 republican Huckabee, Mike 11048.00 121.4066 91
## 7 democrat Webb, James Henry Jr. 12100.09 711.7700 17
## 8 republican Walker, Scott 46345.00 945.8163 49
## 9 republican Paul, Rand 75241.48 153.5540 490
## 10 democrat Lessig, Lawrence 88483.86 680.6451 130
## # ... with 12 more rows
Within each party, majority of the donations were received by only few candidates. For Democratic party, Hillary Clinton and Bernard Sanders take almost 99% of all donations to the Democratic party, and of which, 81% went to Hillary Clinton. For the Republican party, Donald Trump led the way taking 41% of all donations to the Republican party. Donald Trump, Marco Rubio, Ted Cruz, John Kasich, Jeb Bush all together taking 83% of all donations to the Republican party, the remaining 17% were shared by the other 12 Republican candidates.
From the above charts, we are able to see who were the top candidates in each party in Massachusetts. I will examine the following candidates who received at least 9% of total donations in their party in details later.
## [1] "Clinton, Hillary Rodham" "Sanders, Bernard"
## [3] "Trump, Donald J." "Rubio, Marco"
## [5] "Cruz, Rafael Edward 'Ted'"
We have seen earlier that women made 26% more number of contributions than men. Is that the same for the amount of money donated? And do women tend to donate more to the liberals and/or to woman candidate?
## ma$gender: female
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04 15.00 27.00 99.78 72.00 2700.00
## --------------------------------------------------------
## ma$gender: male
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.24 19.00 35.00 131.10 100.00 2700.00
On average, male donated $131.1 and female donated $99.78, there is a 31% difference between genders. Female contributed much less than male when we look at median, mean and third quartile.
## # A tibble: 2 × 3
## gender sum_gen n
## <chr> <dbl> <int>
## 1 female 14934709 149682
## 2 male 15502782 118232
However, when we look at the total contribution amount between genders, they were very close.
## Source: local data frame [10 x 3]
## Groups: cand_nm [?]
##
## cand_nm gender sum_gen_can
## <chr> <chr> <dbl>
## 1 Clinton, Hillary Rodham female 11598864.9
## 2 Clinton, Hillary Rodham male 9322706.6
## 3 Cruz, Rafael Edward 'Ted' female 137480.1
## 4 Cruz, Rafael Edward 'Ted' male 309726.0
## 5 Rubio, Marco female 178444.7
## 6 Rubio, Marco male 444367.6
## 7 Sanders, Bernard female 1987548.9
## 8 Sanders, Bernard male 2615880.1
## 9 Trump, Donald J. female 437974.0
## 10 Trump, Donald J. male 1449261.5
Female in Massachusetts contributed a little less than 15 million US Dollars in total to the presidential campaign in 2016, of which, more than 11 million Dollars went toward Hillary Clinton. This confirms that Massachusetts women donate more to the liberals and/or to woman candidate.
Earlier we have seen that retired people make the most number of contributions, how about total contribution amount and average contribution amount cross top 10 occupations?
## # A tibble: 10 × 4
## contbr_occupation sum_occu mean_occu n
## <ord> <dbl> <dbl> <int>
## 1 RETIRED 4480345.1 108.43830 41317
## 2 NOT EMPLOYED 1417174.5 53.55103 26464
## 3 TEACHER 389587.2 56.29060 6921
## 4 ATTORNEY 1313684.0 212.50146 6182
## 5 PROFESSOR 876504.6 142.56744 6148
## 6 PHYSICIAN 842674.2 160.11290 5263
## 7 CONSULTANT 805573.5 192.12342 4193
## 8 SOFTWARE ENGINEER 361221.3 96.48006 3744
## 9 HOMEMAKER 686431.1 205.39530 3342
## 10 ENGINEER 309927.2 99.68709 3109
Again, retired people take the first place in terms of total contribution amount followed by not employed people, attorney comes to the third. However, when we look at the average contribution amount, attorney comes to the first, and homemaker takes the second place (presumably most of homemakers are women). Unemployed people contribute the least on average. This does make sense.
Surprisingly, software engineer in Massachusetts has been stingy giving their above average income and long history of reliable source of presidential donations. Perhaps this article can answer my question.
I want to dive deeper to investigate the contribution amount distribution among occupations. a boxplot sounds like a good idea. But this one is hard to see because there are so many outliers.
## top_occu_df$contbr_occupation: ATTORNEY
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.04 25.00 50.00 212.60 200.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: CONSULTANT
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.24 25.00 50.00 191.50 100.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: ENGINEER
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.00 40.50 98.23 100.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: HOMEMAKER
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 10.0 25.0 202.9 100.0 2700.0
## --------------------------------------------------------
## top_occu_df$contbr_occupation: NOT EMPLOYED
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05 13.50 27.00 53.55 50.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: PHYSICIAN
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.44 25.00 50.00 159.80 100.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: PROFESSOR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.45 23.00 50.00 142.20 100.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: RETIRED
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5 20.0 35.0 108.1 100.0 2700.0
## --------------------------------------------------------
## top_occu_df$contbr_occupation: SOFTWARE ENGINEER
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 35.00 93.86 100.00 2700.00
## --------------------------------------------------------
## top_occu_df$contbr_occupation: TEACHER
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.97 15.00 25.00 56.15 50.00 2700.00
This looks much better. After I filtered out outliers (donations that are extreme high), a boxplot confirms my above observation. The median contribution of teacher, homemaker and unemployed are relatively low.
It is still apparent that attorney made the large contribution with the highest average donation and the largest variability. Some of them contributed 4 times of their respective median.
Surprisingly, homemakers had the 2nd highest average contribution amount, but the median contribution in this group is among the lowest. It suggests that the distribution of the data is right skewed with many outliers. Also my presumption is that most of the homemakers are women.
Men had higher donation amount than women.
We know that Hillary Clinton raised the most money and had the most supporters in Massachusetts. But is this always true throughout the campaign process? When I look at above 2 graphs, I notice 2 things:
This only reinforces my doubt that what if Bernard Sanders would have run against Donald Trump? Even Donald Trump himself famously stated the following: I would rather run against Crooked Hillary Clinton than Bernie Sanders and that will happen because the books are cooked against Bernie!
Interesting to see every top candidates’ time series trend. Ted Cruz had a slow and steady growth in contribution amount, that ended as soon as he suspended his campaign in May 2016. Marco Rubio dopped out even earlier in March 2016. Donald Trump’s contribution donation had a steady growth until around September 2016. His campaign probably did not spend a lot of money in Massachusetts.
As a side note, although Donald Trump did not win in Massachusetts, A Third of Massachusetts Voters Picked Trump and The Trump effect happened in Massachusetts, too.
Where do those donors reside?
It looks like more republicans concentrated around Boston area, this does make sense as Boston is the largest city in Massachusetts. But look, how blue the state is!
In this section, I will attempt to apply logistic regression method to predict a donor’s contributing party giving his (or her) location (latitude, longitude), gender and donation amount. I will be taking the following steps:
##
## Call:
## glm(formula = party ~ ., family = binomial(link = "logit"), data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.1990 -0.5264 -0.3468 -0.3227 2.6417
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.520e+01 1.206e+00 29.196 < 2e-16 ***
## contb_receipt_amt 3.798e-04 1.544e-05 24.591 < 2e-16 ***
## gendermale 1.000e+00 1.475e-02 67.788 < 2e-16 ***
## latitude -7.499e-01 2.751e-02 -27.253 < 2e-16 ***
## longitude -8.896e-02 1.233e-02 -7.217 5.31e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 150246 on 239877 degrees of freedom
## Residual deviance: 143860 on 239873 degrees of freedom
## (122 observations deleted due to missingness)
## AIC: 143870
##
## Number of Fisher Scoring iterations: 5
##
## model_pred_direction democrat republican
## democrat 26150 1761
## republican 0 3
## [1] "Accuracy 0.936913376800172"
Wow! The 0.94 accuracy on the test set is a very good result. However, this result is based on the mannul split of the data I created earlier. It may not be precise enough.
For a certain period of time, Bernard Sanders received more donations and gained more popularity than Hillary Clinton.
In Massachusetts, the financial donations to the presidential campaign were distributed unevenly. Especially in Democrat, 99% of the donations for Democrat went to two candidates and Hillary Clinton took 81%. It is obvious that Massachusetts is among the bluest of states and Clinton has decades-deep roots in Massachusetts politics.
The total contribution across occupations differ substantially. If I was asked which occupation contributed the most to presidential candidates in Massachusetts in 2016, I would have guessed ‘lawyers’ or ‘CEOs’. Wrong.
The top occupation isn’t really an ‘occupation’, but individuals who lists their occupation as ‘retired’ in federal documents. Unlike lawyers or consultants whose donations may covered by their companies, retired people more likely pay from their own pockets.
It is continue surprising me that software engineer among the lowest in total contribution, considering their presumably above average salary. But any further conclusion requires better knowledge of industry political background.
Hillary Clinton dominated the contribution amount and number of contributions, the closer to the election, the more supporters with more money came to her.
On the other hand, Bernard Sanders had a steady growth in terms of donation amount and number of donors, until he gave up his run.
The original Massachusetts 2016 presidential campaign contributions data contains over 295000 entries from April 2015 until November 2016. Throughout the analysis, I had to deal with several issues:
By analyzing Massachusetts financial donation data, I found several interesting characteristics:
The analysis I conducted is for Massachusetts state only. It would be interesting to analyze campaign finance data for some swing states such as Ohio or Florida, as well as campaign finance data nationwide. I am sure the picture would be very different.
Although the election is over, Americans have seen the post-election surge in donations. There will be more interesting financial contribution data to analyze.
The R Markdown file used to make this post is available here.