Cincinnati Money in Politics
Sections
- Packages Required
- Introduction
- Data Preparation
- Analysis and Trends
- Self-Direct Analysis
1. Introduction
The goal of this exercise is to investigate and provide insights about compiled contributions to politic campaigns from people in the great area of Cincinnati. It contains about 57K of observations taken from 2015 until end of 2018.
The information is organized in tabular form that describes: Last Name, First Name, Address, Employer, Contributor Occupation, Receipt Date, Receipt Amount, Contributor Aggregate YTD, Committee Name, Committee Type and Committee Party Affiliation.
Source: http://asayanalytics.com/cinci_politics
Variables
| Variable | Type | Description | Possible values |
|---|---|---|---|
| contributor_last_name | Label | Last name of contributor | NA |
| contributor_first_name | Label | First name of contributor | NA |
| contributor_street_1 | Label | Address Line 1 of contributor | NA |
| contributor_employer | Label | Name of employer of contributor | NA |
| contributor_occupation | Label | Occupation of contributor | Multiple values |
| contribution_receipt_date | Continuous | Date when the contribution was made | Years: 2015 to 2018 |
| contribution_receipt_amount | Continuous | Amount of individual contribution | |
| contributor_aggregate_ytd | Continuous | Amount aggregated YTD for a contributor | |
| committee_name | Label | Committee name | 638 possible different committees |
| committee_type | Categorical | Committee Type | House, PAC, Party, Presidential, Senate |
| committee_party_affiliation | Categorical | Committee Type | DEMOCRATIC PARTY, DEMOCRATIC-FARMER-LABOR, GREEN PARTY, INDEPENDENT, LIBERTARIAN PARTY, REPUBLICAN PARTY |
2. Required Packages
The packages required for this markdown are:
3. Data Preparation
Sample of 10% of data after cleaning process
Summary Data
The total of observations is 57,351. A cleaning process has been performed over the original data. Missing values have been normalized assigning “unavailable”, as well different variations of NA values were found which were turned into standard NA. As well the column contributor_employer many forms of “RETIRED”. About six records reported before 2015 were removed from dataset.
Column committee_qualified was created to identify if the committee type is qualified. It was created originally using the content from column committee_type, after splitting qualified content in a new column, the original column committee_type kept only the committee_type PAC or Party.
Summary of total and mean of individual and aggregated contributions by year
Total of individual contributions by year
Total of aggregated contributions by year
Summary of total and mean of individual and aggregated contributions by Committee Type
Summary of total and mean of individual and aggregated contributions by Committee Party Affiliation
4. Simple Analysis & Trends
Amount contributed behavior over time from 2015 to present
Money raised last two election cycles
Association amount contributed and committee type
5. Direct Analysis
Top Contributors
Donations over 100K
These people have contributed more than $100K since 2015
Distribution by Committee Type
Number of contributions over 100
This table shows people with many contributions. Their names were goggled to try to find more detailed information. Some of them donate as well for other causes like animal rescue and Children’s Hospital among others. A possible theory is that they are volunteers and that could be the reason why they donated money many times even many on the same day but small contributions.
Employer distribution
Xavier Contributions
6. Self-Directed Analysis
Refunds
Analysis per Year
Committees with more refunds
Total Refunds over $1000
* Refund Rate = Total_Refund/Total_ContributionRefunds per month
January and February present a higher rate of refunds. This could be related people request the refund after tax year is closed, otherwise that money will count on tax report for the fiscal year. Probably people ask for refunds at the beginning of the year once the tax report starts over to avoid to get a lower tax refund.
* Refund Rate = Total_Refund/Total_ContributionRefunds by Committee Type
* Refund Rate = Total_Refund/Total_Contribution6.2 Additional Self-Directed Analysis
This section performs an ANOVA test to the dataset of Cincinnati in Politics to observe contributions across Committee Types and Committee Affiliations.
I am interested to see whether contributions are more likely to take place in some committee type or committee affiliation.
The process to understand changes in the behavior of contribution is performed using ANOVA test to analyze if there is a statistically significant difference in contributions, between different committees type or affiliation.
Therefore, we proceed to compare the means of contribution amount among all 5 committee type, then other among all 6 committee affiliations and check if differences are statistically significant.
6.2.1 Committee Type
Null Hypothesis: All 5 committee type means are equal, there is no relationship between committee type and contribution amount.
H0: U1 = U2 = U3 = U4 + U5
Alternative Hypothesis: Not all 5 committee type means are equal, there is a relationship between committee type and contribution amount.
H1: not all U are equal
Linear Model
Call:
lm(formula = contribution_receipt_amount ~ committee_type, data = df)
Residuals:
Min 1Q Median 3Q Max
-31174 -424 -235 -69 65626
Coefficients:
Estimate Std. Error t value
(Intercept) 539.00 15.11 35.662
committee_typePAC -68.32 45.02 -1.518
committee_typeParty -64.94 18.82 -3.450
committee_typePresidential -413.24 19.19 -21.537
committee_typeSenate -220.37 20.70 -10.645
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
committee_typePAC 0.129128
committee_typeParty 0.000561 ***
committee_typePresidential < 0.0000000000000002 ***
committee_typeSenate < 0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1514 on 57346 degrees of freedom
Multiple R-squared: 0.01122, Adjusted R-squared: 0.01115
F-statistic: 162.7 on 4 and 57346 DF, p-value: < 0.00000000000000022
House PAC Party Presidential Senate
539.00 470.67 474.06 125.76 318.63
Anova
Good, my F value is 162.7 and p-value is very low less than 0.05. The variation of contribution amount among different committee types is larger than the variation of contribution amount within each committee type. We accept the alternative hypothesis H1 that there is a significant relationship between committee type and contribution amount.
Df Sum Sq Mean Sq F value Pr(>F)
committee_type 4 1491577409 372894352 162.7 <0.0000000000000002
Residuals 57346 131401601816 2291382
committee_type ***
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hold on! ANOVA indicates that not all means are equal. However the categorical variable “commite_type” has more than 2 levels (5 types), and it might be that it’s just one committee type that is not equal to others. ANOVA doesn’t provide which groups are different from others.
POST HOC TEST
To determine which groups are different from others, it is required to conduct a POST HOC TEST or a post hoc pair comparison. There are many post hoc test available for analysis of variance. Here, it is Tukey with the R function TukeyHSD
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm.ct)
$committee_type
diff lwr upr p adj
PAC-House -68.32456 -191.1351 54.48599 0.5509514
Party-House -64.94162 -116.2913 -13.59191 0.0050871
Presidential-House -413.23877 -465.5765 -360.90103 0.0000000
Senate-House -220.37146 -276.8392 -163.90367 0.0000000
Party-PAC 3.38294 -116.2826 123.04847 0.9999920
Presidential-PAC -344.91421 -465.0070 -224.82139 0.0000000
Senate-PAC -152.04690 -273.9963 -30.09750 0.0060478
Presidential-Party -348.29715 -392.7567 -303.83765 0.0000000
Senate-Party -155.42984 -204.6845 -106.17520 0.0000000
Senate-Presidential 192.86731 142.5835 243.15116 0.0000000
Results and Interpretation
From the POST HOC output (looking at “diff” and “p adj” columns), it can be seen which committee types have significant differences in contribution amount from others. For example that:
There is no significant difference in contribution amount between PAC-House and Party-PAC.Not quite a lot of difference between Party-House and Senate-PAC. Finally, there is a significant difference in contribution amount between Presidential-House, Senate-House, Presidential-PAC, Presidential-PAC Presidential-Party, Senate-Party and Senate- Presidential.
6.2.2 Committee Party Affiliation
Null Hypothesis: All 6 committee affiliation means are equal, there is no relationship between committee affiliation and contribution amount.
H0: U1 = U2 = U3 = U4 + U5
Alternative Hypothesis: Not all 6 committee affiliation means are equal, there is a relationship between committee type and contribution amount.
H1: not all U are equal
Linear Model
Call:
lm(formula = contribution_receipt_amount ~ committee_party_affiliation,
data = df)
Residuals:
Min 1Q Median 3Q Max
-31311 -236 -214 -139 65861
Coefficients:
Estimate Std. Error
(Intercept) 238.717 7.633
committee_party_affiliationDEMOCRATIC-FARMER-LABOR 115.225 195.412
committee_party_affiliationGREEN PARTY 8.936 271.761
committee_party_affiliationINDEPENDENT 65.004 172.535
committee_party_affiliationLIBERTARIAN PARTY -92.048 167.202
committee_party_affiliationREPUBLICAN PARTY 372.624 13.657
t value
(Intercept) 31.273
committee_party_affiliationDEMOCRATIC-FARMER-LABOR 0.590
committee_party_affiliationGREEN PARTY 0.033
committee_party_affiliationINDEPENDENT 0.377
committee_party_affiliationLIBERTARIAN PARTY -0.551
committee_party_affiliationREPUBLICAN PARTY 27.285
Pr(>|t|)
(Intercept) <0.0000000000000002 ***
committee_party_affiliationDEMOCRATIC-FARMER-LABOR 0.555
committee_party_affiliationGREEN PARTY 0.974
committee_party_affiliationINDEPENDENT 0.706
committee_party_affiliationLIBERTARIAN PARTY 0.582
committee_party_affiliationREPUBLICAN PARTY <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1513 on 57345 degrees of freedom
Multiple R-squared: 0.01285, Adjusted R-squared: 0.01276
F-statistic: 149.3 on 5 and 57345 DF, p-value: < 0.00000000000000022
DEMOCRATIC PARTY DEMOCRATIC-FARMER-LABOR GREEN PARTY
238.72 353.94 247.65
INDEPENDENT LIBERTARIAN PARTY REPUBLICAN PARTY
303.72 146.67 611.34
Anova
Good, my F value is 162.7 and p-value is very low less than 0.05. The variation of contribution amount among different committee affiliations is larger than the variation of contribution amount within each committee affiliation We accept the alternative hypothesis H1 that there is a significant relationship between committee affiliation and contribution amount.
Df Sum Sq Mean Sq F value
committee_party_affiliation 5 1707214648 341442930 149.3
Residuals 57345 131185964577 2287662
Pr(>F)
committee_party_affiliation <0.0000000000000002 ***
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hold on! ANOVA indicates that not all means are equal. However the categorical variable “commite_affiliation” has more than 2 levels (6 parties), and it might be that it’s just one committee affiliation that is not equal to others. ANOVA doesn’t provide which groups are different from others.
POST HOC TEST
To determine which groups are different from others, it is required to conduct a POST HOC TEST or a post hoc pair comparison. There are many post hoc test available for analysis of variance. Here, it is Tukey with the R function TukeyHSD
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = lm.ca)
$committee_party_affiliation
diff lwr
DEMOCRATIC-FARMER-LABOR-DEMOCRATIC PARTY 115.225350 -441.64211
GREEN PARTY-DEMOCRATIC PARTY 8.936243 -765.50133
INDEPENDENT-DEMOCRATIC PARTY 65.003796 -426.66876
LIBERTARIAN PARTY-DEMOCRATIC PARTY -92.048447 -568.52551
REPUBLICAN PARTY-DEMOCRATIC PARTY 372.623701 333.70608
GREEN PARTY-DEMOCRATIC-FARMER-LABOR -106.289108 -1059.65608
INDEPENDENT-DEMOCRATIC-FARMER-LABOR -50.221554 -792.44587
LIBERTARIAN PARTY-DEMOCRATIC-FARMER-LABOR -207.273797 -939.52063
REPUBLICAN PARTY-DEMOCRATIC-FARMER-LABOR 257.398351 -299.97909
INDEPENDENT-GREEN PARTY 56.067553 -860.74710
LIBERTARIAN PARTY-GREEN PARTY -100.984689 -1009.74077
REPUBLICAN PARTY-GREEN PARTY 363.687459 -411.11690
LIBERTARIAN PARTY-INDEPENDENT -157.052243 -841.03027
REPUBLICAN PARTY-INDEPENDENT 307.619905 -184.63017
REPUBLICAN PARTY-LIBERTARIAN PARTY 464.672148 -12.40083
upr p adj
DEMOCRATIC-FARMER-LABOR-DEMOCRATIC PARTY 672.0928 0.9917528
GREEN PARTY-DEMOCRATIC PARTY 783.3738 1.0000000
INDEPENDENT-DEMOCRATIC PARTY 556.6764 0.9990173
LIBERTARIAN PARTY-DEMOCRATIC PARTY 384.4286 0.9940060
REPUBLICAN PARTY-DEMOCRATIC PARTY 411.5413 0.0000000
GREEN PARTY-DEMOCRATIC-FARMER-LABOR 847.0779 0.9995714
INDEPENDENT-DEMOCRATIC-FARMER-LABOR 692.0028 0.9999634
LIBERTARIAN PARTY-DEMOCRATIC-FARMER-LABOR 524.9730 0.9664120
REPUBLICAN PARTY-DEMOCRATIC-FARMER-LABOR 814.7758 0.7763029
INDEPENDENT-GREEN PARTY 972.8822 0.9999779
LIBERTARIAN PARTY-GREEN PARTY 807.7714 0.9995782
REPUBLICAN PARTY-GREEN PARTY 1138.4918 0.7640104
LIBERTARIAN PARTY-INDEPENDENT 526.9258 0.9867125
REPUBLICAN PARTY-INDEPENDENT 799.8700 0.4781781
REPUBLICAN PARTY-LIBERTARIAN PARTY 941.7451 0.0614002
Results and Interpretation
From the POST HOC output (looking at “diff” and “p adj” columns), it can be seen which committee affiliations have significant differences in contribution amount from others. For example that:
There is a significant difference in contribution amount between REPUBLICAN PARTY-DEMOCRATIC PARTY. A little close to be significant between REPUBLICAN PARTY-LIBERTARIAN PARTY. The other comparisons are not statistically significant different.
6.2.3 Committee Type + Affiliation
Linear Model
The idea here is to understand if having more than one IV can explain better changes in contribution amount.
Now, let’s see a model using Committee Type and Committee Party Affiliation to predict Contribution Amount. Then, we are going to compare if the model combined explains better than the two simple model seen previously.
Call:
lm(formula = contribution_receipt_amount ~ committee_party_affiliation +
committee_type, data = df)
Residuals:
Min 1Q Median 3Q Max
-31395 -338 -241 1 65759
Coefficients:
Estimate Std. Error
(Intercept) 410.424 15.887
committee_party_affiliationDEMOCRATIC-FARMER-LABOR 6.345 194.663
committee_party_affiliationGREEN PARTY 18.811 270.544
committee_party_affiliationINDEPENDENT 20.654 171.868
committee_party_affiliationLIBERTARIAN PARTY -28.445 166.504
committee_party_affiliationREPUBLICAN PARTY 354.046 13.969
committee_typePAC -205.979 45.106
committee_typeParty -69.138 18.742
committee_typePresidential -386.026 19.131
committee_typeSenate -144.986 20.805
t value
(Intercept) 25.834
committee_party_affiliationDEMOCRATIC-FARMER-LABOR 0.033
committee_party_affiliationGREEN PARTY 0.070
committee_party_affiliationINDEPENDENT 0.120
committee_party_affiliationLIBERTARIAN PARTY -0.171
committee_party_affiliationREPUBLICAN PARTY 25.345
committee_typePAC -4.567
committee_typeParty -3.689
committee_typePresidential -20.179
committee_typeSenate -6.969
Pr(>|t|)
(Intercept) < 0.0000000000000002
committee_party_affiliationDEMOCRATIC-FARMER-LABOR 0.973996
committee_party_affiliationGREEN PARTY 0.944568
committee_party_affiliationINDEPENDENT 0.904347
committee_party_affiliationLIBERTARIAN PARTY 0.864353
committee_party_affiliationREPUBLICAN PARTY < 0.0000000000000002
committee_typePAC 0.00000496845602
committee_typeParty 0.000225
committee_typePresidential < 0.0000000000000002
committee_typeSenate 0.00000000000323
(Intercept) ***
committee_party_affiliationDEMOCRATIC-FARMER-LABOR
committee_party_affiliationGREEN PARTY
committee_party_affiliationINDEPENDENT
committee_party_affiliationLIBERTARIAN PARTY
committee_party_affiliationREPUBLICAN PARTY ***
committee_typePAC ***
committee_typeParty ***
committee_typePresidential ***
committee_typeSenate ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1505 on 57341 degrees of freedom
Multiple R-squared: 0.0222, Adjusted R-squared: 0.02205
F-statistic: 144.6 on 9 and 57341 DF, p-value: < 0.00000000000000022
Anova
Df Sum Sq Mean Sq F value
committee_party_affiliation 5 1707214648 341442930 150.7
committee_type 4 1242816217 310704054 137.1
Residuals 57341 129943148361 2266147
Pr(>F)
committee_party_affiliation <0.0000000000000002 ***
committee_type <0.0000000000000002 ***
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparing against simple model of committee type:
Analysis of Variance Table
Model 1: contribution_receipt_amount ~ committee_type
Model 2: contribution_receipt_amount ~ committee_party_affiliation + committee_type
Res.Df RSS Df Sum of Sq F Pr(>F)
1 57346 131401601816
2 57341 129943148361 5 1458453456 128.72 < 0.00000000000000022 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparing against simple model of committee party affiliation:
Analysis of Variance Table
Model 1: contribution_receipt_amount ~ committee_party_affiliation
Model 2: contribution_receipt_amount ~ committee_party_affiliation + committee_type
Res.Df RSS Df Sum of Sq F Pr(>F)
1 57345 131185964577
2 57341 129943148361 4 1242816217 137.11 < 0.00000000000000022 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results shows that model with two IVs did indeed provide a significantly better fit to the data compared to model
Results and Interpretation
Model of committee type + committee party affiliation fits better than the simple linear models to explain changes in contribution amount.
From the perspective of committee affiliation, Republican Party is significant and all committee types are significant.