For this analysis, data regarding the shifting political climate for the city of Cincinnati from the Federal Election Commission from 2015 to the present day will be examined.The following is a brief discription of each column in the data set, and their practical meaning. These descriptions of the columns were found at https://canvas.xavier.edu/courses/56050/assignments/443782.
contributor_last_name - Contributor Last Name
contributor_first_name - Contributor first Name
contributor_street_1 - Contributor Street Address
contributor_employeer - Contributor Employeer
contributor_occupation - Contributor Occupation
contribution_reciept_date - Date of Contribution
contribution_receipt_amount - Contribution Amount (Positive numbers reflect a donation, negative values reflect a refund of a previous donation)
contributor_aggregate_ytd - Total amount of contributions made year-to-date by this individual contribubtor.
committee_name - Name of the political committee recieving the contribution
committee_type - Presidential, senate, house, party or political action committe
committee_party_affiliation - Party affiliation of the committee
By analyzing the types and amounts of contributions in the city of Cincinnati, we can make inferences regarding this aspect of the political climate in Cincinnati, which can help to inform political decisions such as creating voting districts.
The following are the packages intended for use for this analysis:
dplry: allows for conclusions about the data to be made by assisting with dataset manipulation in a manner similar to SQL
tidyverse: allows for the usage of ggplot, which can be used to create visualizations and graphs
rmarkdown and knitr: allows for reporting
DT and data.table: allows for interactive data table visualization tool
The csv file with the data has been imported from the web host source OneDrive, and given the name ‘cinci_politics’
After importing the csv file, the first step before any further analysis can be done is to perform any data cleaning/wrangling tasks that apply to the entire set.
## Rows: 57,375
## Columns: 11
## $ contributor_last_name <fct> LIRA, WILMES, JAPIKSE, HILL, DAVIS, KUE...
## $ contributor_first_name <fct> MONICA, JOHN, CORNELIS, THOMAS, MICHAEL...
## $ contributor_street_1 <fct> 3230 LONGMEADOW LN, 4215 DELANEY ST, 25...
## $ contributor_employer <fct> N/A, SELF-EMPLOYED, RETIRED, ACOSTA SAL...
## $ contributor_occupation <fct> RETIRED, PAINTER, RETIRED, CHEF, ATTY, ...
## $ contribution_receipt_date <fct> 14/10/2016, 27/09/2016, 03/11/2016, 13/...
## $ contribution_receipt_amount <dbl> 10.00, 25.00, 40.00, 10.00, 500.00, 50....
## $ contributor_aggregate_ytd <dbl> 626.45, 265.00, 1055.00, 673.44, 1500.0...
## $ committee_name <fct> "HILLARY FOR AMERICA", "HILLARY FOR AME...
## $ committee_type <fct> Presidential, Presidential, Party - Qua...
## $ committee_party_affiliation <fct> DEMOCRATIC PARTY, DEMOCRATIC PARTY, REP...
This gave a list of column names, as well as a description of the data types for each variable. ‘Total Contribution Amount’ and ‘Contributor Aggregate YTD’ are double factors containing real numbers, while all other columns are factors, which respresent categorical variables.
## [1] 32
According to our code, only 32 out of 57375 rows contain N/A values, which is less than 1% of the total data.
## contributor_last_name contributor_first_name
## 0 0
## contributor_street_1 contributor_employer
## 0 31
## contributor_occupation contribution_receipt_date
## 1 0
## contribution_receipt_amount contributor_aggregate_ytd
## 0 0
## committee_name committee_type
## 0 0
## committee_party_affiliation
## 0
The table above shows which columns are missing data. All but one N/A value is under the Contributor Employer column, while one N/A value is under Contributor Occupation.
Because there is so few missing values and the values that are missing are all categorical data (we can’t replace missing values with a median value), I have decided to delete the missing rows.Any other issues found while conducting the analysis will be solved below.
The following is a table of summary statistics for the two continuous variables in the data set, the column involving Contribution Receipt Amount and the total amount of Contributions YTD (year to date.)
## contribution_receipt_amount_mean contribution_receipt_amount_sd
## 1 354.855 1522.395
## contribution_receipt_amount_max contribution_receipt_amount_min
## 1 66100 -30700
## contributor_aggregate_ytd_mean contributor_aggregate_ytd_sd
## 1 908.6691 2093.281
## contributor_aggregate_ytd_max contributor_aggregate_ytd_min n
## 1 1e+05 -2700 57343
The mean of the Contribution Receipt amount ($354.78) demonstrations the average amount that a person gives during a general donation, while the mean of the Contributor Aggregate Ytd of $908.57 shows that the average Contributor has given in the past before.However, because the standard deviation of the average Contributor Aggregate Ytd is high ($2092), this indicates that there may be some outliers, who’s high levels of past donations have lead to a high Contributor Aggregate Ytd. The min and max of both the Contribution Receipt amount and the Contributor Aggregate Ytd amount indicates some of these outliers, with high max and low mins for both of these variables.
The Table below provides a ways to interact with the data further.
“Which employers are most heavily represented in the data?”
The first thing we want to do is get a list of the most popular employers by count. Based on the results given below, ‘retired’ is the most common employment status/employer.
##
## RETIRED NOT EMPLOYED
## 9189 8155
## N/A SELF-EMPLOYED
## 6671 3414
## NONE
## 2753 1399
## UNIVERSITY OF CINCINNATI SELF EMPLOYED
## 1375 1017
## INFORMATION REQUESTED SELF
## 834 686
## HOMEMAKER CINCINNATI CHILDREN'S HOSPITAL
## 482 400
## INFORMATION REQUESTED PER BEST EFFORTS PROCTER & GAMBLE
## 256 202
## XAVIER UNIVERSITY P&G
## 150 141
## PROCTOR & GAMBLE TRIHEALTH
## 134 134
## TSC THE SALVATION ARMY
## 128 117
## GE AVIATION PORTER WRIGHT MORRIS & ARTHUR LLP
## 115 106
## HEBREW UNION COLLEGE STATE OF OHIO
## 105 105
## TOTC FROST BROWN TODD LLC
## 93 92
## GE CINCINNATI SYMPHONY ORCHESTRA
## 89 88
## UC HEALTH AMERICAN FINANCIAL GROUP
## 88 87
## CINCINNATI STATE COLLEGE CITY OF CINCINNATI
## 84 84
## CINCINNATI CHILDRENS HOSPITAL FERNO
## 80 80
## EHMLI INC HEBREW UNION COLLEGE-JEWISH INSTITUTE
## 79 78
## WESTERN & SOUTHERN FINANCIAL GROUP DEPT OF VETERANS AFFAIRS
## 74 73
## CINCINNATI CHILDREN'S HOSPITAL MEDICAL GREAT OAKS INSTITUTE OF TECHNOLOGY AND
## 72 72
## MILFORD SCHOOL DISTRICT MIAMI UNIVERSITY
## 71 70
## THE IMORENO GROUP, PLC CINCINNATI PUBLIC SCHOOLS
## 69 68
## THE CHRIST HOSPITAL FROST BROWN TODD
## 68 65
## MASTERCARD NULL
## 64 64
## UPTOWN ARTS EMPLOYED
## 61 60
## IRS CINCINNATI CHILDRENS
## 60 59
## CINCINNATI PUBLIC LIBRARY DUKE ENERGY
## 59 58
## GENERAL ELECTRIC UCMC
## 58 56
## PUBLIC LIBRARY OF CINCINNATI AND HAMIL MERCY HEALTH
## 55 54
## TOWNE PROPERTIES CINCINNATI CHILDREN'S
## 54 53
## PE SYSTEMS UNIVERSITY OF CINCINNATI COLLEGE OF ME
## 53 53
## PAR EXCELLENCE SYSTEMS NORTHERN KENTUCKY UNIVERSITY
## 52 51
## CHAVEZ PROPERTIES EBERLY MCMAHON COPETAS, LLC
## 50 50
## HAMILTON COUNTY ERNST AND YOUNG LLP
## 49 47
## SUN CHEMICAL GOVERNMENT STRATEGIES GROUP
## 47 46
## U.S. ENVIRONMENTAL PROTECTION AGENCY CCHMC
## 46 45
## EUGENE J VANLEEUWEN, MD, INC. LANCE S COX CO LPA
## 45 44
## THE KROGER CO. AFG
## 44 43
## KEATING MUETHING & KLEKAMP LINDHORST & DREIDAME
## 43 43
## ROBERT HALF UC
## 43 43
## NORTH AMERICAN PROPERTIES NOT-EMPLOYED
## 42 42
## MURDOCK ORTHODONTICS VILLAGE OF WOODLAWN
## 41 40
## SMP ARTSKONNECT
## 39 37
## EXL SERVICES INC FELTON WILLIS, LLC
## 37 37
## LIBRARY OF CONGRESS PORTER WRIGHT
## 37 37
## RIVER TRADING COMPANY WORKFLEX SOLUTIONS
## 37 37
## JOHNSON & JOHNSON KRELLER
## 36 36
## SURVEY ANALYTICS GATEWAY DISTRIBUTION INC.
## 36 35
## I-WIRELESS LLC LIGHTHOUSE YOUTH SERVICES
## 35 35
## BLANK ROME LLP CINCINNATI WALDORF SCHOOL
## 34 34
Next, we want to consolidate any values that repeat (such as ‘Self’ and ‘Self Employed) that are written differently but mean the same thing. We can do this by adding a new column, that combines any repeates in the most common responses so that the graph can show us a clearer trend in the data. For example, all ’Self Employed’ and ‘Self’ outputs will both be listed at ‘Self Employed’ in that column.
Finally, based on the new column I just made, I put together a bar graph of the most common employers.
According to the bar graph above, all the other employers outside of the top-ten most common combined create the largest value. However, what is more interesting in our analysis is the next top 3 contributor employers, which are Retired, Not Employed, Self Employed. This shows that most people who contribute to a political party do not have a traditional employer, like a company they work for. This is interesting because it could indicate a connection between people who value their indepenence (not wanting a boss) and people who contribute to political committees. P&G, Cinci CHildren’s, and UC are the biggest traditional employers in the Cincinnati area.
"What percent of money contributed in Cincinnati benefits Democrats? Republicans? All others?
First, we want to create a new variable that combines all other political parties that aren’t Democrats or Republicians into a single ’Other party.
Then, we want to calculate the total amount of contributions made to all political parties recorded in the data.
## [1] 20348450
It was determined that the total amount of contributions was $20,355,665, based on my code.
Next, we want to determine the total amount of contributions for the Democratic, Republican, and Other parties.
## # A tibble: 3 x 2
## party sum_party
## <chr> <dbl>
## 1 Democrat 9368516.
## 2 Other 64927.
## 3 Republican 10915006.
My code shows that the total Sum of Contributions for Democrats is $9,372,506, Republicans is $10,918,232, and Other Parties is $64927.
Next, we calculate the the percent of the total amount of contributions for each party.
## [1] 46.04372
## [1] 53.63731
## [1] 0.3189628
``` Finally, you create a graph that shows the percentage of money spent on each political party using the calculated values. As you can see from the graph below, most of the money in the Cincinnati area (the highest percentage of money) is given to the Republican party, followed closley by the Democratic party. All other parties combined trail behind by a considerable amount.
For my Self-Directed Analysis, I decided that I wanted to see what professional occupation donates the most money to political campaigns on average. I thought this would be an interesting thing to pursue since it would help to direct political parties on where they should focus their marketing.
To accomplish this, I will be graphing the occupation on the x-axis and the amount donated on the y-axis using a geom_bar
While my actual graph isn’t cooperating at the moment, some ways that this analysis could be helpful is understanding what kinds of careers donate the most can inform the marketing decisions of political parties, as mentioned above. For example, if you know that retired people tend to donate more often, you would know to advertise your political ideas in places where older people visit more frequently, such as retirement centers.