Sirron Melville

Analysis of the Financial Contributions to the 2016 Presidential Campaigns in Massachusetts

Introduction

This report will be an exploratory data analysis of the financial contributions made to the 2016 Presidential Campaign in the state of Massachusetts. The data set being examined is provided by the Federal Election Commission and contains the financial contributions o the campaign from April 18th 2015 to November 24th 2016.

The following questions will be answered by the analysis:

Univariate Plots Section

## [1] 295667     18
## 'data.frame':    295667 obs. of  18 variables:
##  $ cmte_id          : chr  "C00577130" "C00577130" "C00577130" "C00577130" ...
##  $ cand_id          : chr  "P60007168" "P60007168" "P60007168" "P60007168" ...
##  $ cand_nm          : chr  "Sanders, Bernard" "Sanders, Bernard" "Sanders, Bernard" "Sanders, Bernard" ...
##  $ contbr_nm        : chr  "LEDWELL, BENJAMIN" "LEDWELL, BENJAMIN" "LEDWELL, BENJAMIN" "LEDWELL, BENJAMIN" ...
##  $ contbr_city      : chr  "NEWBURYPORT" "NEWBURYPORT" "NEWBURYPORT" "NEWBURYPORT" ...
##  $ contbr_st        : chr  "MA" "MA" "MA" "MA" ...
##  $ contbr_zip       : int  19504700 19504700 19504700 19504700 10269501 2420 21392903 24621313 25542718 12016408 ...
##  $ contbr_employer  : chr  "ANDOVER POLICE, MA." "ANDOVER POLICE, MA." "ANDOVER POLICE, MA." "ANDOVER POLICE, MA." ...
##  $ contbr_occupation: chr  "POLICE OFFICER" "POLICE OFFICER" "POLICE OFFICER" "POLICE OFFICER" ...
##  $ contb_receipt_amt: num  40 35 50 27 100 ...
##  $ contb_receipt_dt : chr  "04-MAR-16" "04-MAR-16" "06-MAR-16" "06-MAR-16" ...
##  $ receipt_desc     : chr  "" "" "" "" ...
##  $ memo_cd          : chr  "" "" "" "" ...
##  $ memo_text        : chr  "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" "* EARMARKED CONTRIBUTION: SEE BELOW" ...
##  $ form_tp          : chr  "SA17A" "SA17A" "SA17A" "SA17A" ...
##  $ file_num         : int  1077404 1077404 1077404 1077404 1077404 1146165 1091718 1091718 1091718 1077404 ...
##  $ tran_id          : chr  "VPF7BKWGAE6" "VPF7BKWGCP3" "VPF7BKYF9S6" "VPF7BM0K9E6" ...
##  $ election_tp      : chr  "P2016" "P2016" "P2016" "P2016" ...

In this dataset, there are 295667 contributions and 18 variables. Below is a visualization of the distribution of contributions.

From this, I noticed that there was a lot of outliers in the data and that in order to accurately answer questions, better visualizations had to be created.

## 
##     5    10   100    50    25 
## 16780 26856 34241 36978 39546
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -84240.0     15.0     28.0    116.1    100.0  86940.0

The data was scaled logarithmically in order to better represent the distribution of the contributions. The distribution above is relatively gaussian and the data indicates that the contributions of most of the donors were on the low side.

A summary of the data shows that the most frequent amount donated is $25 followed by $50 and $100. The minimum and maximum donations were -$84240 and maximum respectively.

Individuals are only permitted to donate up to $2700 to a candidate due to the contribution limit set by the FEC(Federal Election Commission). For the analysis, I ommited the negative contributions and the contributions above $2700 as these were probably refunds and redesignations. 5897 contributions are either negative or redesignated contributions.

sum(ma$contb_receipt_amt >= 2700)
## [1] 3244
sum(ma$contb_receipt_amt < 0)
## [1] 2653

More variables need to be created for the analysis. These are donors’ gender, donors’ zip code and party affiliation of the candidate.

The 5 variables that have been created are listed below and 5897 contributions were removed due to them either being negative, refunded or redesignated.

Created variables:

With added variables, I can look at the distribution of contributions by candidate, gender, party, and occuption.

## # A tibble: 3 x 5
##        party  sum_party number_of_candidate mean_party      n
##        <chr>      <dbl>               <int>      <dbl>  <int>
## 1   democrat 25832080.8                   5  5166416.2 243358
## 2     others   270771.3                   3    90257.1    981
## 3 republican  4605409.9                  17   270906.5  24556

## [1] 268895

Based on the dataset, the total number of donations made to the presidential election is 268,895, the Democratic party received 243,358 donations which is approximately 10 times more than the Republican party(24556 donations).

## 
##                 Bush, Jeb       Carson, Benjamin S. 
##                       388                      2591 
##  Christie, Christopher J.   Clinton, Hillary Rodham 
##                       133                    147534 
## Cruz, Rafael Edward 'Ted'            Fiorina, Carly 
##                      5624                       469 
##      Gilmore, James S III        Graham, Lindsey O. 
##                         1                       110 
##            Huckabee, Mike             Jindal, Bobby 
##                        91                         1 
##             Johnson, Gary           Kasich, John R. 
##                       457                       755 
##          Lessig, Lawrence            McMullin, Evan 
##                       130                        20 
##   O'Malley, Martin Joseph         Pataki, George E. 
##                       269                         3 
##                Paul, Rand    Perry, James R. (Rick) 
##                       490                         2 
##              Rubio, Marco          Sanders, Bernard 
##                      1578                     95408 
##      Santorum, Richard J.               Stein, Jill 
##                        15                       504 
##          Trump, Donald J.             Walker, Scott 
##                     12256                        49 
##     Webb, James Henry Jr. 
##                        17

Hillary Clinton led the 25 candidates in the number of contributions with almost 150,000 donations, Bernard Sanders and Donald Trump were second and third respectively.

## # A tibble: 2 x 3
##   gender  sum_gen  n_gen
##    <chr>    <dbl>  <int>
## 1 female 15029545 150055
## 2   male 15678717 118840

Women made up about 56% of the donations. Further analysis may help us determine if Hillary Clinton was the reason for this.

Who are those contributors?

## # A tibble: 10 x 4
##    contbr_occupation  sum_occu mean_occu     n
##                <ord>     <dbl>     <dbl> <int>
##  1           RETIRED 4480345.1 108.43830 41317
##  2      NOT EMPLOYED 1417174.5  53.55103 26464
##  3           TEACHER  389587.2  56.29060  6921
##  4          ATTORNEY 1313684.0 212.50146  6182
##  5         PROFESSOR  876504.6 142.56744  6148
##  6         PHYSICIAN  842674.2 160.11290  5263
##  7        CONSULTANT  805573.5 192.12342  4193
##  8 SOFTWARE ENGINEER  361221.3  96.48006  3744
##  9         HOMEMAKER  686431.1 205.39530  3342
## 10          ENGINEER  309927.2  99.68709  3109

It seems that the top 3 occupations of contributors are retirees, people that are not employed and teachers. Homemakers and engineers round out the top ten.

##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "2014-09-25" "2016-03-12" "2016-06-01" "2016-06-01" "2016-09-18" 
##         Max. 
## "2016-12-30"

The above distribution of the date of contribution is somewhat bimodal showing that most of the contributions were around March/April 2016 and close to the election.

Univariate Analysis

What is the structure of your dataset?

The dataset contains 268895 contributions and 18 variables. The variables that will be used in the analysis are:

  • cand_nm: Candidate Name
  • contbr_zip: Contributor ZIP Code
  • contbr_nm: Contributor name (first name will be used in gender prediction)
  • contbr_occupation: Contributor Occupation
  • contb_receipt_amt: Amount of Contribution
  • contb_receipt_dt: Date of Contribution

Other observations:

  • Most of the donations are small.
  • The median donation amount is $28.
  • Most of the donations went to the democratic party.
  • Hillary Clinton received most of the donations.
  • 56% of the donations were made by women.
  • Retirees donated the most.

What is/are the main features of interest in your dataset?

The main features of interest in the dataset are candidate, contribution amount and party. Analysis using these variables will help answer the aforementioned questions. A combination of variables can also be used to build a logistics regression model to predict the party a donor contributed to.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

The party that receives the contribution and contribution amount can be affected by gender, location, occupation and time of the contribution. The average contribution amount may be influenced by occupation and gender may play a role in the party that receives the contribution.

Did you create any new variables from existing variables in the dataset?

5 variables were created:

  • party: party affilliation of candidate.
  • contbr_first_nm: the first name of the contributor will be used to predict gender.
  • gender: contributor’s gender.
  • Latitude: Donor’s latitude to be rendered on a map.
  • Longitute: Donor’s longitude to be rendered on a map.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

The negative contributions and contributions over $2700 were omitted because of the contribution limit set by the FEC which prohibits donors from giving more than $2700. These contributions were either refunded or redesignated.

Bivariate Plots Section

## # A tibble: 3 x 5
##        party  sum_party number_of_candidate mean_party      n
##        <ord>      <dbl>               <int>      <dbl>  <int>
## 1   democrat 25832080.8                   5  5166416.2 243358
## 2     others   270771.3                   3    90257.1    981
## 3 republican  4605409.9                  17   270906.5  24556

## ma$cand_nm
##             Jindal, Bobby    Perry, James R. (Rick) 
##                    250.00                    750.00 
##      Gilmore, James S III         Pataki, George E. 
##                   2700.00                   3950.00 
##      Santorum, Richard J.            McMullin, Evan 
##                   7620.10                   9305.00 
##            Huckabee, Mike     Webb, James Henry Jr. 
##                  11048.00                  12100.09 
##             Walker, Scott                Paul, Rand 
##                  46345.00                  75241.48 
##          Lessig, Lawrence            Fiorina, Carly 
##                  88483.86                 111371.48 
##               Stein, Jill        Graham, Lindsey O. 
##                 112948.03                 147830.00 
##             Johnson, Gary  Christie, Christopher J. 
##                 148518.27                 161570.00 
##   O'Malley, Martin Joseph       Carson, Benjamin S. 
##                 206496.39                 269372.60 
##                 Bush, Jeb           Kasich, John R. 
##                 399839.00                 410268.30 
## Cruz, Rafael Edward 'Ted'              Rubio, Marco 
##                 447206.14                 622812.22 
##          Trump, Donald J.          Sanders, Bernard 
##                1887235.59                4603428.95 
##   Clinton, Hillary Rodham 
##               20921571.54

## [1] 30708262

In Massachusetts, the total amount of contributions to the presidential candidates’ was $34,335,685 USD. Most of that money went to Hillary Clinton,Bernard Sanders and Donald Trump .

The Democratic party received $29,364,787 USD which is 6.3 times more than the Republican party which received $4,686,844 USD. What is even more surprising is the fact that there were 17 Republican candidates and 5 Democratic candidates meaning that the Democratic candidates received more on average.

Hillary Clinton received the most amount of contributions followed by Bernard Sanders and Donald Trump respectively.

Massachusetts is a historically blue state and Hillary Clinton has strong political roots there.

Below, boxplots are used to show contribution patterns between candidates and parties.

It is hard to compare the contributions between the parties without scaling the data logarithmically due to the presence of alot of outliers. Below I will apply a log scale and focus my analysis on the Democratic and Republican parties by removing the “others” group.

## ma$party: democrat
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.04   15.00   27.00  106.10   75.00 2700.00 
## -------------------------------------------------------- 
## ma$party: republican
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.80   27.17   50.00  187.50  100.00 2700.00

While the Republican party has a higher median and mean contribution amount, the Democratic party has a spread out distribution meaning that they have a range of donors from small to big.

## ma$cand_nm: Bush, Jeb
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       5     100     267    1031    2700    2700 
## -------------------------------------------------------- 
## ma$cand_nm: Carson, Benjamin S.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1      25      50     104     100    2700 
## -------------------------------------------------------- 
## ma$cand_nm: Christie, Christopher J.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      20     250    1000    1215    2700    2700 
## -------------------------------------------------------- 
## ma$cand_nm: Clinton, Hillary Rodham
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.04   19.00   36.30  141.80  100.00 2700.00 
## -------------------------------------------------------- 
## ma$cand_nm: Cruz, Rafael Edward 'Ted'
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.00   50.00   79.52  100.00 2700.00 
## -------------------------------------------------------- 
## ma$cand_nm: Fiorina, Carly
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0    25.0   100.0   237.5   200.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: Gilmore, James S III
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2700    2700    2700    2700    2700    2700 
## -------------------------------------------------------- 
## ma$cand_nm: Graham, Lindsey O.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       5     500    1000    1344    2300    2700 
## -------------------------------------------------------- 
## ma$cand_nm: Huckabee, Mike
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0    16.0    50.0   121.4   100.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: Jindal, Bobby
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     250     250     250     250     250     250 
## -------------------------------------------------------- 
## ma$cand_nm: Kasich, John R.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0    50.0   200.0   543.4   500.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: Lessig, Lawrence
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     5.0   100.0   250.0   680.6   500.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: O'Malley, Martin Joseph
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0   100.0   500.0   767.6  1000.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: Pataki, George E.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     250     625    1000    1317    1850    2700 
## -------------------------------------------------------- 
## ma$cand_nm: Paul, Rand
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    25.0    50.0   153.6   100.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: Perry, James R. (Rick)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   250.0   312.5   375.0   375.0   437.5   500.0 
## -------------------------------------------------------- 
## ma$cand_nm: Rubio, Marco
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.05   25.00   75.00  394.70  250.00 2700.00 
## -------------------------------------------------------- 
## ma$cand_nm: Sanders, Bernard
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   15.00   27.00   48.25   50.00 2700.00 
## -------------------------------------------------------- 
## ma$cand_nm: Santorum, Richard J.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   17.55  100.00  508.00  500.00 2700.00 
## -------------------------------------------------------- 
## ma$cand_nm: Trump, Donald J.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.80   28.00   72.02  154.00  150.00 2700.00 
## -------------------------------------------------------- 
## ma$cand_nm: Walker, Scott
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    75.0   250.0   500.0   945.8  1000.0  2700.0 
## -------------------------------------------------------- 
## ma$cand_nm: Webb, James Henry Jr.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   100.0   100.1   250.0   711.8   500.0  2700.0

A look at the above visualization shows that Christopher Christie, Lindsey Graham and George Patake have the highest median and Jeb Bush has the greatest IQR(Interquartile range). Hillary Clinton and Bernard Sanders have the lowest median but they also have the most outliers(big donors).

Below, I will look into how the candidates’ did in their parties.

## # A tibble: 22 x 5
## # Groups:   party [2]
##         party                cand_nm  sum_can  mean_can     n
##         <chr>                  <chr>    <dbl>     <dbl> <int>
##  1 republican          Jindal, Bobby   250.00  250.0000     1
##  2 republican Perry, James R. (Rick)   750.00  375.0000     2
##  3 republican   Gilmore, James S III  2700.00 2700.0000     1
##  4 republican      Pataki, George E.  3950.00 1316.6667     3
##  5 republican   Santorum, Richard J.  7620.10  508.0067    15
##  6 republican         Huckabee, Mike 11048.00  121.4066    91
##  7   democrat  Webb, James Henry Jr. 12100.09  711.7700    17
##  8 republican          Walker, Scott 46345.00  945.8163    49
##  9 republican             Paul, Rand 75241.48  153.5540   490
## 10   democrat       Lessig, Lawrence 88483.86  680.6451   130
## # ... with 12 more rows

In each party, the majority of the donations were received by only few candidates. Hillary Clinton(81%) and Bernard Sanders(18%) received 99% of the total donations received by the Democratic party. Donald Trump received 41% of the total donations received by the Republican party. Donald Trump, Marco Rubio, Ted Cruz, Jeb Bush, John Kasich all made up 83% of the total donations received by the Republican party. The other 12 Republican candidates accounted for the remaining 17%.

It is clear who the top candidates in each party were in Massachusetts. Below, the analysis will continue to examine the candidates who received at least 9% of the total donations received by their party.

## [1] "Clinton, Hillary Rodham"   "Sanders, Bernard"         
## [3] "Trump, Donald J."          "Rubio, Marco"             
## [5] "Cruz, Rafael Edward 'Ted'"

We noticed that women made up 56% of the contributions. Further questions to be asked are: Do they make up 56% of the contribution amount? Who do they donate to, liberals and/or women candidates?

## ma$gender: female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.04   15.00   27.00   99.78   72.00 2700.00 
## -------------------------------------------------------- 
## ma$gender: male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.24   19.00   35.00  131.10  100.00 2700.00

Men donated $131.1 and women donated $99.78 on average. While women made more donations, their contributed amount is a lot less than men as seen by the significant differences in the median, mean and third quartile.

## # A tibble: 2 x 3
##   gender  sum_gen      n
##    <chr>    <dbl>  <int>
## 1 female 14934709 149682
## 2   male 15502782 118232

The above visualization shows that the total contribution amount by gender is very close. this is due to the fact that even though women donated less on average, they made more donations.

## # A tibble: 10 x 3
## # Groups:   cand_nm [?]
##                      cand_nm gender sum_gen_can
##                        <chr>  <chr>       <dbl>
##  1   Clinton, Hillary Rodham female  11598864.9
##  2   Clinton, Hillary Rodham   male   9322706.6
##  3 Cruz, Rafael Edward 'Ted' female    137480.1
##  4 Cruz, Rafael Edward 'Ted'   male    309726.0
##  5              Rubio, Marco female    178444.7
##  6              Rubio, Marco   male    444367.6
##  7          Sanders, Bernard female   1987548.9
##  8          Sanders, Bernard   male   2615880.1
##  9          Trump, Donald J. female    437974.0
## 10          Trump, Donald J.   male   1449261.5

In Massachusetts, the women contributed about 15 million USD to the presidential campaign in 2016. Almost 12 million USD was donated to Hillary Clinton and approximately 1.5 million USD was donated to Bernard Sanders. This supports the assumption that in Massachusetts, women donate more to the liberals and/or women candidates.

We saw that retirees make the most contributions, now we will analyze the total contribution amount and average contribution amount across the top 10 occupations.

## # A tibble: 10 x 4
##    contbr_occupation  sum_occu mean_occu     n
##                <ord>     <dbl>     <dbl> <int>
##  1           RETIRED 4480345.1 108.43830 41317
##  2      NOT EMPLOYED 1417174.5  53.55103 26464
##  3           TEACHER  389587.2  56.29060  6921
##  4          ATTORNEY 1313684.0 212.50146  6182
##  5         PROFESSOR  876504.6 142.56744  6148
##  6         PHYSICIAN  842674.2 160.11290  5263
##  7        CONSULTANT  805573.5 192.12342  4193
##  8 SOFTWARE ENGINEER  361221.3  96.48006  3744
##  9         HOMEMAKER  686431.1 205.39530  3342
## 10          ENGINEER  309927.2  99.68709  3109

Looking at the above visualizations, the retirees, people who are not employed and attorneys are the top three in terms of number of contributions. The attorneys and homemakers are the top 2 when we look at the average contribution amount. Unemployed people tend to contribute the least on average which is expected.

Above is a boxplot of the contribution amount distribution among the various occupations. It is difficult to analyze because of all the outliers.

## top_occu_df$contbr_occupation: ATTORNEY
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.04   25.00   50.00  212.60  200.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: CONSULTANT
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.24   25.00   50.00  191.50  100.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: ENGINEER
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.00   40.50   98.23  100.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: HOMEMAKER
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    10.0    25.0   202.9   100.0  2700.0 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: NOT EMPLOYED
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.05   13.50   27.00   53.55   50.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: PHYSICIAN
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.44   25.00   50.00  159.80  100.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: PROFESSOR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.45   23.00   50.00  142.20  100.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: RETIRED
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.5    20.0    35.0   108.1   100.0  2700.0 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: SOFTWARE ENGINEER
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   15.00   35.00   93.86  100.00 2700.00 
## -------------------------------------------------------- 
## top_occu_df$contbr_occupation: TEACHER
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.97   15.00   25.00   56.15   50.00 2700.00

The above boxplot has excluded outliers and gives a better representation of the data. The median contribution of homemaker, teacher and unemployed are relatively low.

Attorneys made the largest contributions, some of them donated approximately 4 times their median, they had the most variability and the highest average donation.

Bivariate Analysis

Talk about some of the interesting findings you observed in this part of the investigation.

  • The Democratic party received most of the total contribution in Massachusetts(86%).
  • There were 5 Democratic candidates and 17 Republican candidates. There is a major disparity when comparing the average amount per candidate between the parties.
  • A few candidates received the majority of contributions in each party.
  • While there are more female donors, men donate more on average.
  • The majority of the contributions from female donors went to the Democratic party and/or woman candidate.
  • Retirees make up most of the total number of contributions, while engineers and software engineers are among the least in total number of contributions.
  • Attorneys had the highest average contribution amount and greatest IQR(Interquartile range), people who are not employed have the lowest average contribution amount and one of the smallest IQR’s.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

Homemakers had the 2nd highest average contribution amount, but they were among the lowest in terms of median contribution. This suggests that the distribution of the data is skewed right with a lot of outliers. One can also assume that the majority of homemakers are women.

What was the strongest relationship you found?

Men donated more than women on average even though there were more women donors..

Multivariate Plots Section

Hillary Clinton raised the most money and had the most donors in Massachusetts. This wasn’t always the case throughout the campaign process. The above two visualizations show that:

Above is the time series trend for the top candidates, Hillary Clinton had steady growth in contribution amount, so did Bernard Sanders until he dropped out to endorse Hillary Clinton. Ted Cruz had a slow and consistent growth in contribution amount which ended when he suspended his campaign in May 2016. Donald Trump’s contribution amount had a steady growth from March 2016 until around September 2016. He was quoted as saying that he wanted to compete in Massachusetts which is a predominatly Democratic state, he even set up a Massachusetts Headquarters.

Where in Massachusetts do the contributors reside?

As stated above, Massachusetts is a historically Democratic state. Most of the Republican supporters seem to be concentrated around Boston which is the largest city in Massachusetts.

Predictive Modeling

Below, I will apply a logistic regression model to the data in order to predict the contributing party of a donor using their gender, donation amount and location(latitude, longitude). The steps to be taken are as follows:

## 
## Call:
## glm(formula = party ~ ., family = binomial(link = "logit"), data = train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1990  -0.5264  -0.3468  -0.3227   2.6417  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        3.520e+01  1.206e+00  29.196  < 2e-16 ***
## contb_receipt_amt  3.798e-04  1.544e-05  24.591  < 2e-16 ***
## gendermale         1.000e+00  1.475e-02  67.788  < 2e-16 ***
## latitude          -7.499e-01  2.751e-02 -27.253  < 2e-16 ***
## longitude         -8.896e-02  1.233e-02  -7.217 5.31e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 150246  on 239877  degrees of freedom
## Residual deviance: 143860  on 239873  degrees of freedom
##   (122 observations deleted due to missingness)
## AIC: 143870
## 
## Number of Fisher Scoring iterations: 5

Interpretation of the Results of the Logistic Regression Model

  • The log odds of contributing to the Republican party decreases by 0.75 for a one unit increase in latitude.
  • The log odds of contributing to the Republican party decreases by 0.09 for a one unit increase in abs(longitude).
  • The log odds of contributing to the Republican party increase by 0.0004 for a one unit increase in contribution amount.
  • If all other variables are kept constant, the male donor is more likely to contribute to the Republican party.

Assessing the Accuracy of the Predictive Model

##                     
## model_pred_direction democrat republican
##           democrat      26150       1761
##           republican        0          3
## [1] "Accuracy 0.936913376800172"

An accuracy of 0.94 on the test set is a very good result but it may not be precise enough as the result is based on the manual split of the data.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation.

  • The contribution amount Hillary Clinton received increased closer to the election. The donors who contributed large sums seemed to support her.
  • Donald Trump received less donations closer to the election.

Were there any interesting or surprising findings?

For a while, it seemed as though Bernard Sanders was more popular than Hillary Clinton because he received more donations.

Final Plots

A Few Candidates Received Most of the Donations.

The financial contributions to the presidential campaign in Massachusetts were distributed unevenly. In the Democratic party, 99% of the contributions went to Hillary Clinton(81%) and Bernard Sanders(18%). Massachusetts is a historically Democratic state and Hillary Clinton also has strong political ties there.

Contribution by Occupation

There is a lot of disparity in the total contribution across occupations. One would assume that attorneys and engineers would be the major contributors but it tunrned out that retirees contributed the most money to the 2016 Presidential campaign in Massachusetts.

It is surprising that software engineers and engineers contributed the least amount to the 2016 Presidential campaign especially since they more than likely make an above average salary. To gain further insight, one would have to know the political background of the industry.

Time Series of Top Candidates

Hillary Clinton was way ahead of the other candidates in the number of contributions and contribution amount received towards the election.

Bernard Sanders was on par with Hillary Clinton in the contribution amount received and ahead in the number of contributions received until he pulled out and decided to endorse her.

Reflection

Challenges

The downloaded dataset for the 2016 Presidential campaign for the state of Massachusetts from April 2015 to November 2016 contained 295667 donations. The challenges that I encountered during the analysis are listed below:

  • In the dataset, there were several negative contributions and contributions that were over the $2700 contribution limit set by the FEC. I attributed the negative contributions to refunds and redesignation of funds. As a result, I removed the contributions that were negative and above $2700 from the dataset.
  • I had to add a gender column to the dataset so that I could analyze the relationship between gender and contributions. R’s gender package was used because it encodes gender based on names and dates of birth using historical dataets which make it able to report the probability that a name was male or female.
  • In order to accurately represent a donor’s geographic location, I had to add latitude and longitude columns to the dataset by using the latitude and longitude information for U.S. ZIP codes from the zipcode package. I then used ggmap to visualize spatial data on top of a static Google Map of Massachusetts.
  • I used a logistic regression model to try to predict a donor’s contributing party based their gender, donation amount and location(latitude, longitude).

These were all challenges due to the fact that I had to either change the data or utilize packages and models that were new to me.

Successes

  • The success of this project was due to the many packages that R offers and the statistical comptuations that can be done in R. Some important packages were the dplyr, gender, ggmap, ggplot2 and zipcode packages.
  • The project was a success due to the interesting findings that were revealed as a result of an in depth analysis of the datset.

Conclusion

The analysis of the financial contributions to the 2016 Presidential campaign for the state of Massachusetts provided some interesting revelations. * Most of the donations were to a few candidates. * Massachusetts is mostly a Democratic state. * Females seem to donate to liberals and/or a female candidate. * The retirees are the group that made the most number of contributions. * The engineers and software engineers make the least number of contributions and are in the bottom 4 of the top 10 occupations in average contribution amount despite having above average salaries. * Bernard Sanders was more popular than Hillary Clinton until he dropped out of the Presidential campaign.

Future Work

This analysis was for the state of Massachusetts, analysis of swing states like Florida, Nevada, North Carolina or even analysis of the whole U.S. would provide some very different and interesting insights.

There was a Post-Election surge in contributions to groups that pledge to fight Donald Trumps’s proposed policies. I think that this will be another dataset that might pique interest.