The purpose of this document is to perform an analysis of the political contributions made by Cincinnati voters in order to figure out how political contributions have impacted Cincinnati Politics in the last few years.
Using campaign finance data from the Federal elections commision website to show the impact of Cincinnati political contributions on elections.
My analysis of this data will be useful to someone who wants to understand things like what committe type i.e Senat,House, Presidential did Cincinnati residents seem to contribute to the most, it would also be helpful in figuring out what party was contributed to the most in a particular year and how that party performed politically in that particular year.
## SECTIONS The document is structured with the following sections; * Introction in to the data * Data wrangling * Data Analysis
The packages required for this project are listed below
#Naming the dataset set "Donations" and reading it as a CSV file from the web
Donations<-read_csv("http://asayanalytics.com/cinci_politics")
Donations1$`contribution_receipt_date`<-dmy (Donations1$contribution_receipt_date)
#This code helps to change the date format to a format that's easy to work with; esp during plotting visualizations
Donations2 <- filter(Donations,Donations$contribution_receipt_date>= 2015)
#Using dplyr function select to pick contribution amounts from 2015 through 2019.This lets me use just the data that is relevant for analysis.
Donations2$contributor_employer <-Donations2$contributor_employer %>%
str_replace(pattern = '^N.A.$', replacement='NA') %>%
str_replace(pattern = '^N/A$', replacement='NA') %>%
str_replace(pattern = '^N A$', replacement='NA') %>%
str_replace(pattern = '^N/A RETIRED$',replacement='NA')
#The code above replaces the different types of N/As in the column to standard "NA"
Donations2$contributor_occupation <-Donations2$contributor_occupation%>%
str_replace(pattern = '^N.A.$', replacement='NA') %>%
str_replace(pattern = '^N/A$', replacement='NA') %>%
str_replace(pattern = '^N A$', replacement='NA') %>%
str_replace(pattern = '^N/A RETIRED$',replacement='NA')
#Unifying N/A recognized by R in the contributor_employer column
Donations2[is.na(Donations2$contributor_employer), "contributor_employer"]='NA'
# The chunk below tests if all the N/As were unified
Donations2 %>%
filter(is.na(contributor_employer))
## # A tibble: 0 x 11
## # … with 11 variables: contributor_last_name <chr>,
## # contributor_first_name <chr>, contributor_street_1 <chr>,
## # contributor_employer <chr>, contributor_occupation <chr>,
## # contribution_receipt_date <chr>, contribution_receipt_amount <dbl>,
## # contributor_aggregate_ytd <dbl>, committee_name <chr>,
## # committee_type <chr>, committee_party_affiliation <chr>
Donations2[is.na(Donations2$contributor_occupation), "contributor_occupation"]='NA'
#The second part of the code is to test if all the N/A were unified
Donations2 %>%
filter(is.na(contributor_occupation))
## # A tibble: 0 x 11
## # … with 11 variables: contributor_last_name <chr>,
## # contributor_first_name <chr>, contributor_street_1 <chr>,
## # contributor_employer <chr>, contributor_occupation <chr>,
## # contribution_receipt_date <chr>, contribution_receipt_amount <dbl>,
## # contributor_aggregate_ytd <dbl>, committee_name <chr>,
## # committee_type <chr>, committee_party_affiliation <chr>
#Unifying N/A recognized by R in the contributor_employer column
##SUMMARIZING DATA
Donations2 %>%
select(-(committee_type)) %>%
sample_frac(size = .1) %>%
arrange(contributor_occupation) %>%
datatable()
4.1 Trends in amount contributed from 2015 to present.
Donations2 %>%
select(contribution_receipt_amount,contribution_receipt_date) %>%
filter(percent_rank(`contribution_receipt_amount`)>=.98) %>%
mutate(year = year(`contribution_receipt_date`)) %>%
group_by(year) %>%
summarise(Total_contribution = sum(contribution_receipt_amount ,na.rm=1)) %>%
ggplot(aes(x=year, y= Total_contribution)) +
geom_bar(stat="Identity")
# We can tell that the year 2016 had the most contributions. There is not necessarily a
#particular pattern as each year has different contribution amounts.
#4.2 Visualization
#Compare the amount of money each party raised.
Donations2 %>%
select(contribution_receipt_amount,committee_type, committee_party_affiliation) %>%
filter(percent_rank(contribution_receipt_amount)>=.98) %>%
group_by(committee_type) %>%
summarise(Contribution_Total=sum(contribution_receipt_amount ,na.rm=1)) %>%
ggplot(aes(x=committee_type, y=Contribution_Total)) +
geom_bar(stat="Identity")
#4.3 Visualization
# Relationship between the amount contributed and the committe type
Donations2 %>%
select(contribution_receipt_amount,committee_type, committee_party_affiliation) %>%
filter(percent_rank(contribution_receipt_amount)>=.90) %>%
group_by(committee_type) %>%
ggplot(aes(x=contribution_receipt_amount, y=committee_type, color=committee_party_affiliation))+
geom_jitter(alpha=.45)
#5.1
Donations2$Individual_ID<-paste(Donations2$contributor_last_name,Donations2$contributor_first_name)
Donations2 %>%
select(contributor_occupation, contribution_receipt_amount, committee_name, Individual_ID) %>%
group_by(Individual_ID) %>%
summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
arrange(desc(sum_contribution_receipt_amount)) %>%
datatable()
#5.2
#Who made the most individual contribution and why do you think that is.
Donations2 %>%
select(Individual_ID, contribution_receipt_amount) %>%
group_by(Individual_ID) %>%
summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
arrange(desc(sum_contribution_receipt_amount)) %>%
datatable()
#5.4 How many Xavier employees have made contributions
Donations2%>%
group_by(Individual_ID) %>%
select(Individual_ID, contribution_receipt_amount, contributor_employer) %>%
filter(contributor_employer == "XAVIER UNIVERSITY") %>%
summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
arrange(desc(sum_contribution_receipt_amount)) %>%
datatable()
#A total of 23 Xavier employees made political contributions between the year 2015-2019.
#Tracey Ann made a total contribution of $1,750 contributions, which makes her the highest Xavier employee contributor.
# 6.2# What occupation type makes the most contributions? Doing this analysis to show the type of job that allows people to make the most contributions.
# I will be using the functions in dplyr to help select just the part of the data that I need. Since we have the occupation column, I can create a data table that summarises the total contributions by occupation.
Donations2 %>%
select(contributor_occupation, contribution_receipt_amount, committee_name) %>%
group_by(contributor_occupation) %>%
summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
arrange(desc(sum_contribution_receipt_amount)) %>%
datatable()
# From the table, we can tell that the occupation "retired' makes the most contributions. It could be because the people that are retired have money and hence they have the ability to donate.