Cincinnati Money in Politics

Sections

  • Packages Required
  • Introduction
  • Data Preparation
  • Analysis and Trends
  • Self-Direct Analysis

1. Introduction

The goal of this exercise is to investigate and provide insights about compiled contributions to politic campaigns from people in the great area of Cincinnati. It contains about 57K of observations taken from 2015 until end of 2018.

The information is organized in tabular form that describes: Last Name, First Name, Address, Employer, Contributor Occupation, Receipt Date, Receipt Amount, Contributor Aggregate YTD, Committee Name, Committee Type and Committee Party Affiliation.

Source: http://asayanalytics.com/cinci_politics

Variables

Variable Type Description Possible values
contributor_last_name Label Last name of contributor NA
contributor_first_name Label First name of contributor NA
contributor_street_1 Label Address Line 1 of contributor NA
contributor_employer Label Name of employer of contributor NA
contributor_occupation Label Occupation of contributor Multiple values
contribution_receipt_date Continuous Date when the contribution was made Years: 2015 to 2018
contribution_receipt_amount Continuous Amount of individual contribution
contributor_aggregate_ytd Continuous Amount aggregated YTD for a contributor
committee_name Label Committee name 638 possible different committees
committee_type Categorical Committee Type House, PAC, Party, Presidential, Senate
committee_party_affiliation Categorical Committee Type DEMOCRATIC PARTY, DEMOCRATIC-FARMER-LABOR, GREEN PARTY, INDEPENDENT, LIBERTARIAN PARTY, REPUBLICAN PARTY

2. Required Packages

The packages required for this markdown are:

3. Data Preparation

Sample of 10% of data after cleaning process

Summary Data

The total of observations is 57,351. A cleaning process has been performed over the original data. Missing values have been normalized assigning “unavailable”, as well different variations of NA values were found which were turned into standard NA. As well the column contributor_employer many forms of “RETIRED”. About six records reported before 2015 were removed from dataset.

Column committee_qualified was created to identify if the committee type is qualified. It was created originally using the content from column committee_type, after splitting qualified content in a new column, the original column committee_type kept only the committee_type PAC or Party.

Summary of total and mean of individual and aggregated contributions by year

Total of individual contributions by year

Total of aggregated contributions by year

Summary of total and mean of individual and aggregated contributions by Committee Type

Summary of total and mean of individual and aggregated contributions by Committee Party Affiliation

5. Direct Analysis

Top Contributors

Donations over 100K

These people have contributed more than $100K since 2015

Distribution by Committee Type

Number of contributions over 100

This table shows people with many contributions. Their names were goggled to try to find more detailed information. Some of them donate as well for other causes like animal rescue and Children’s Hospital among others. A possible theory is that they are volunteers and that could be the reason why they donated money many times even many on the same day but small contributions.

Employer distribution

Xavier Contributions

6. Self-Directed Analysis

Refunds

Analysis per Year

Committees with more refunds

Total Refunds over $1000

* Refund Rate = Total_Refund/Total_Contribution

Refunds per month

January and February present a higher rate of refunds. This could be related people request the refund after tax year is closed, otherwise that money will count on tax report for the fiscal year. Probably people ask for refunds at the beginning of the year once the tax report starts over to avoid to get a lower tax refund.

* Refund Rate = Total_Refund/Total_Contribution

Refunds by Committee Type

* Refund Rate = Total_Refund/Total_Contribution

6.2 Additional Self-Directed Analysis

This section performs an ANOVA test to the dataset of Cincinnati in Politics to observe contributions across Committee Types and Committee Affiliations.

I am interested to see whether contributions are more likely to take place in some committee type or committee affiliation.

The process to understand changes in the behavior of contribution is performed using ANOVA test to analyze if there is a statistically significant difference in contributions, between different committees type or affiliation.

Therefore, we proceed to compare the means of contribution amount among all 5 committee type, then other among all 6 committee affiliations and check if differences are statistically significant.

6.2.1 Committee Type


Null Hypothesis: All 5 committee type means are equal, there is no relationship between committee type and contribution amount.

H0: U1 = U2 = U3 = U4 + U5

Alternative Hypothesis: Not all 5 committee type means are equal, there is a relationship between committee type and contribution amount.

H1: not all U are equal

Linear Model


Call:
lm(formula = contribution_receipt_amount ~ committee_type, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-31174   -424   -235    -69  65626 

Coefficients:
                           Estimate Std. Error t value
(Intercept)                  539.00      15.11  35.662
committee_typePAC            -68.32      45.02  -1.518
committee_typeParty          -64.94      18.82  -3.450
committee_typePresidential  -413.24      19.19 -21.537
committee_typeSenate        -220.37      20.70 -10.645
                                       Pr(>|t|)    
(Intercept)                < 0.0000000000000002 ***
committee_typePAC                      0.129128    
committee_typeParty                    0.000561 ***
committee_typePresidential < 0.0000000000000002 ***
committee_typeSenate       < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1514 on 57346 degrees of freedom
Multiple R-squared:  0.01122,   Adjusted R-squared:  0.01115 
F-statistic: 162.7 on 4 and 57346 DF,  p-value: < 0.00000000000000022
       House          PAC        Party Presidential       Senate 
      539.00       470.67       474.06       125.76       318.63 

Anova

Good, my F value is 162.7 and p-value is very low less than 0.05. The variation of contribution amount among different committee types is larger than the variation of contribution amount within each committee type. We accept the alternative hypothesis H1 that there is a significant relationship between committee type and contribution amount.

                  Df       Sum Sq   Mean Sq F value              Pr(>F)
committee_type     4   1491577409 372894352   162.7 <0.0000000000000002
Residuals      57346 131401601816   2291382                            
                  
committee_type ***
Residuals         
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hold on! ANOVA indicates that not all means are equal. However the categorical variable “commite_type” has more than 2 levels (5 types), and it might be that it’s just one committee type that is not equal to others. ANOVA doesn’t provide which groups are different from others.

POST HOC TEST

To determine which groups are different from others, it is required to conduct a POST HOC TEST or a post hoc pair comparison. There are many post hoc test available for analysis of variance. Here, it is Tukey with the R function TukeyHSD

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = lm.ct)

$committee_type
                          diff       lwr        upr     p adj
PAC-House            -68.32456 -191.1351   54.48599 0.5509514
Party-House          -64.94162 -116.2913  -13.59191 0.0050871
Presidential-House  -413.23877 -465.5765 -360.90103 0.0000000
Senate-House        -220.37146 -276.8392 -163.90367 0.0000000
Party-PAC              3.38294 -116.2826  123.04847 0.9999920
Presidential-PAC    -344.91421 -465.0070 -224.82139 0.0000000
Senate-PAC          -152.04690 -273.9963  -30.09750 0.0060478
Presidential-Party  -348.29715 -392.7567 -303.83765 0.0000000
Senate-Party        -155.42984 -204.6845 -106.17520 0.0000000
Senate-Presidential  192.86731  142.5835  243.15116 0.0000000

Results and Interpretation

From the POST HOC output (looking at “diff” and “p adj” columns), it can be seen which committee types have significant differences in contribution amount from others. For example that:

There is no significant difference in contribution amount between PAC-House and Party-PAC.Not quite a lot of difference between Party-House and Senate-PAC. Finally, there is a significant difference in contribution amount between Presidential-House, Senate-House, Presidential-PAC, Presidential-PAC Presidential-Party, Senate-Party and Senate- Presidential.

6.2.2 Committee Party Affiliation


Null Hypothesis: All 6 committee affiliation means are equal, there is no relationship between committee affiliation and contribution amount.

H0: U1 = U2 = U3 = U4 + U5

Alternative Hypothesis: Not all 6 committee affiliation means are equal, there is a relationship between committee type and contribution amount.

H1: not all U are equal

Linear Model


Call:
lm(formula = contribution_receipt_amount ~ committee_party_affiliation, 
    data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-31311   -236   -214   -139  65861 

Coefficients:
                                                   Estimate Std. Error
(Intercept)                                         238.717      7.633
committee_party_affiliationDEMOCRATIC-FARMER-LABOR  115.225    195.412
committee_party_affiliationGREEN PARTY                8.936    271.761
committee_party_affiliationINDEPENDENT               65.004    172.535
committee_party_affiliationLIBERTARIAN PARTY        -92.048    167.202
committee_party_affiliationREPUBLICAN PARTY         372.624     13.657
                                                   t value
(Intercept)                                         31.273
committee_party_affiliationDEMOCRATIC-FARMER-LABOR   0.590
committee_party_affiliationGREEN PARTY               0.033
committee_party_affiliationINDEPENDENT               0.377
committee_party_affiliationLIBERTARIAN PARTY        -0.551
committee_party_affiliationREPUBLICAN PARTY         27.285
                                                              Pr(>|t|)    
(Intercept)                                        <0.0000000000000002 ***
committee_party_affiliationDEMOCRATIC-FARMER-LABOR               0.555    
committee_party_affiliationGREEN PARTY                           0.974    
committee_party_affiliationINDEPENDENT                           0.706    
committee_party_affiliationLIBERTARIAN PARTY                     0.582    
committee_party_affiliationREPUBLICAN PARTY        <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1513 on 57345 degrees of freedom
Multiple R-squared:  0.01285,   Adjusted R-squared:  0.01276 
F-statistic: 149.3 on 5 and 57345 DF,  p-value: < 0.00000000000000022
       DEMOCRATIC PARTY DEMOCRATIC-FARMER-LABOR             GREEN PARTY 
                 238.72                  353.94                  247.65 
            INDEPENDENT       LIBERTARIAN PARTY        REPUBLICAN PARTY 
                 303.72                  146.67                  611.34 

Anova

Good, my F value is 162.7 and p-value is very low less than 0.05. The variation of contribution amount among different committee affiliations is larger than the variation of contribution amount within each committee affiliation We accept the alternative hypothesis H1 that there is a significant relationship between committee affiliation and contribution amount.

                               Df       Sum Sq   Mean Sq F value
committee_party_affiliation     5   1707214648 341442930   149.3
Residuals                   57345 131185964577   2287662        
                                         Pr(>F)    
committee_party_affiliation <0.0000000000000002 ***
Residuals                                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hold on! ANOVA indicates that not all means are equal. However the categorical variable “commite_affiliation” has more than 2 levels (6 parties), and it might be that it’s just one committee affiliation that is not equal to others. ANOVA doesn’t provide which groups are different from others.

POST HOC TEST

To determine which groups are different from others, it is required to conduct a POST HOC TEST or a post hoc pair comparison. There are many post hoc test available for analysis of variance. Here, it is Tukey with the R function TukeyHSD

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = lm.ca)

$committee_party_affiliation
                                                 diff         lwr
DEMOCRATIC-FARMER-LABOR-DEMOCRATIC PARTY   115.225350  -441.64211
GREEN PARTY-DEMOCRATIC PARTY                 8.936243  -765.50133
INDEPENDENT-DEMOCRATIC PARTY                65.003796  -426.66876
LIBERTARIAN PARTY-DEMOCRATIC PARTY         -92.048447  -568.52551
REPUBLICAN PARTY-DEMOCRATIC PARTY          372.623701   333.70608
GREEN PARTY-DEMOCRATIC-FARMER-LABOR       -106.289108 -1059.65608
INDEPENDENT-DEMOCRATIC-FARMER-LABOR        -50.221554  -792.44587
LIBERTARIAN PARTY-DEMOCRATIC-FARMER-LABOR -207.273797  -939.52063
REPUBLICAN PARTY-DEMOCRATIC-FARMER-LABOR   257.398351  -299.97909
INDEPENDENT-GREEN PARTY                     56.067553  -860.74710
LIBERTARIAN PARTY-GREEN PARTY             -100.984689 -1009.74077
REPUBLICAN PARTY-GREEN PARTY               363.687459  -411.11690
LIBERTARIAN PARTY-INDEPENDENT             -157.052243  -841.03027
REPUBLICAN PARTY-INDEPENDENT               307.619905  -184.63017
REPUBLICAN PARTY-LIBERTARIAN PARTY         464.672148   -12.40083
                                                upr     p adj
DEMOCRATIC-FARMER-LABOR-DEMOCRATIC PARTY   672.0928 0.9917528
GREEN PARTY-DEMOCRATIC PARTY               783.3738 1.0000000
INDEPENDENT-DEMOCRATIC PARTY               556.6764 0.9990173
LIBERTARIAN PARTY-DEMOCRATIC PARTY         384.4286 0.9940060
REPUBLICAN PARTY-DEMOCRATIC PARTY          411.5413 0.0000000
GREEN PARTY-DEMOCRATIC-FARMER-LABOR        847.0779 0.9995714
INDEPENDENT-DEMOCRATIC-FARMER-LABOR        692.0028 0.9999634
LIBERTARIAN PARTY-DEMOCRATIC-FARMER-LABOR  524.9730 0.9664120
REPUBLICAN PARTY-DEMOCRATIC-FARMER-LABOR   814.7758 0.7763029
INDEPENDENT-GREEN PARTY                    972.8822 0.9999779
LIBERTARIAN PARTY-GREEN PARTY              807.7714 0.9995782
REPUBLICAN PARTY-GREEN PARTY              1138.4918 0.7640104
LIBERTARIAN PARTY-INDEPENDENT              526.9258 0.9867125
REPUBLICAN PARTY-INDEPENDENT               799.8700 0.4781781
REPUBLICAN PARTY-LIBERTARIAN PARTY         941.7451 0.0614002

Results and Interpretation

From the POST HOC output (looking at “diff” and “p adj” columns), it can be seen which committee affiliations have significant differences in contribution amount from others. For example that:

There is a significant difference in contribution amount between REPUBLICAN PARTY-DEMOCRATIC PARTY. A little close to be significant between REPUBLICAN PARTY-LIBERTARIAN PARTY. The other comparisons are not statistically significant different.

6.2.3 Committee Type + Affiliation

Linear Model

The idea here is to understand if having more than one IV can explain better changes in contribution amount.

Now, let’s see a model using Committee Type and Committee Party Affiliation to predict Contribution Amount. Then, we are going to compare if the model combined explains better than the two simple model seen previously.


Call:
lm(formula = contribution_receipt_amount ~ committee_party_affiliation + 
    committee_type, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-31395   -338   -241      1  65759 

Coefficients:
                                                   Estimate Std. Error
(Intercept)                                         410.424     15.887
committee_party_affiliationDEMOCRATIC-FARMER-LABOR    6.345    194.663
committee_party_affiliationGREEN PARTY               18.811    270.544
committee_party_affiliationINDEPENDENT               20.654    171.868
committee_party_affiliationLIBERTARIAN PARTY        -28.445    166.504
committee_party_affiliationREPUBLICAN PARTY         354.046     13.969
committee_typePAC                                  -205.979     45.106
committee_typeParty                                 -69.138     18.742
committee_typePresidential                         -386.026     19.131
committee_typeSenate                               -144.986     20.805
                                                   t value
(Intercept)                                         25.834
committee_party_affiliationDEMOCRATIC-FARMER-LABOR   0.033
committee_party_affiliationGREEN PARTY               0.070
committee_party_affiliationINDEPENDENT               0.120
committee_party_affiliationLIBERTARIAN PARTY        -0.171
committee_party_affiliationREPUBLICAN PARTY         25.345
committee_typePAC                                   -4.567
committee_typeParty                                 -3.689
committee_typePresidential                         -20.179
committee_typeSenate                                -6.969
                                                               Pr(>|t|)
(Intercept)                                        < 0.0000000000000002
committee_party_affiliationDEMOCRATIC-FARMER-LABOR             0.973996
committee_party_affiliationGREEN PARTY                         0.944568
committee_party_affiliationINDEPENDENT                         0.904347
committee_party_affiliationLIBERTARIAN PARTY                   0.864353
committee_party_affiliationREPUBLICAN PARTY        < 0.0000000000000002
committee_typePAC                                      0.00000496845602
committee_typeParty                                            0.000225
committee_typePresidential                         < 0.0000000000000002
committee_typeSenate                                   0.00000000000323
                                                      
(Intercept)                                        ***
committee_party_affiliationDEMOCRATIC-FARMER-LABOR    
committee_party_affiliationGREEN PARTY                
committee_party_affiliationINDEPENDENT                
committee_party_affiliationLIBERTARIAN PARTY          
committee_party_affiliationREPUBLICAN PARTY        ***
committee_typePAC                                  ***
committee_typeParty                                ***
committee_typePresidential                         ***
committee_typeSenate                               ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1505 on 57341 degrees of freedom
Multiple R-squared:  0.0222,    Adjusted R-squared:  0.02205 
F-statistic: 144.6 on 9 and 57341 DF,  p-value: < 0.00000000000000022

Anova

                               Df       Sum Sq   Mean Sq F value
committee_party_affiliation     5   1707214648 341442930   150.7
committee_type                  4   1242816217 310704054   137.1
Residuals                   57341 129943148361   2266147        
                                         Pr(>F)    
committee_party_affiliation <0.0000000000000002 ***
committee_type              <0.0000000000000002 ***
Residuals                                          
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing against simple model of committee type:

Analysis of Variance Table

Model 1: contribution_receipt_amount ~ committee_type
Model 2: contribution_receipt_amount ~ committee_party_affiliation + committee_type
  Res.Df          RSS Df  Sum of Sq      F                Pr(>F)    
1  57346 131401601816                                               
2  57341 129943148361  5 1458453456 128.72 < 0.00000000000000022 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing against simple model of committee party affiliation:

Analysis of Variance Table

Model 1: contribution_receipt_amount ~ committee_party_affiliation
Model 2: contribution_receipt_amount ~ committee_party_affiliation + committee_type
  Res.Df          RSS Df  Sum of Sq      F                Pr(>F)    
1  57345 131185964577                                               
2  57341 129943148361  4 1242816217 137.11 < 0.00000000000000022 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results shows that model with two IVs did indeed provide a significantly better fit to the data compared to model

Results and Interpretation

Model of committee type + committee party affiliation fits better than the simple linear models to explain changes in contribution amount.

From the perspective of committee affiliation, Republican Party is significant and all committee types are significant.

Reinaldo Quintero

2019-10-22