Mid-Term Exam Fall 2019

INTRODUCTION

The purpose of this document is to perform an analysis of the political contributions made by Cincinnati voters in order to figure out how political contributions have impacted Cincinnati Politics in the last few years.

Using campaign finance data from the Federal elections commision website to show the impact of Cincinnati political contributions on elections.

My analysis of this data will be useful to someone who wants to understand things like what committe type i.e Senat,House, Presidential did Cincinnati residents seem to contribute to the most, it would also be helpful in figuring out what party was contributed to the most in a particular year and how that party performed politically in that particular year.

## SECTIONS The document is structured with the following sections; * Introction in to the data * Data wrangling * Data Analysis

REQUIRED PACKAGES

The packages required for this project are listed below

THE DATA

Downloading the data using the link below

#Naming the dataset set "Donations" and reading it as a CSV file from the web
Donations<-read_csv("http://asayanalytics.com/cinci_politics")

REMOVING BLATANT DATA ERRORS

Removing zeros in the data set

Use lubridate package to change date format

Donations1$`contribution_receipt_date`<-dmy (Donations1$contribution_receipt_date)

#This code helps to change the date format to a format that's easy to work with; esp during plotting visualizations

Selecting for data that starts with January 2015 to date

Donations2 <- filter(Donations,Donations$contribution_receipt_date>= 2015)
#Using dplyr function select to pick contribution amounts from 2015 through 2019.This lets me use just the data that is relevant for analysis.

DATA CONFORMITY

Replacing the different variations of N/As in the “contributor employer”column to just “NA”

Donations2$contributor_employer <-Donations2$contributor_employer %>% 
  str_replace(pattern = '^N.A.$', replacement='NA') %>% 
  str_replace(pattern = '^N/A$', replacement='NA') %>% 
  str_replace(pattern = '^N A$', replacement='NA') %>% 
  str_replace(pattern = '^N/A RETIRED$',replacement='NA')
#The code above replaces the different types of N/As in the column to standard "NA"

Replacing the different variations of N/As in the “contributor employer”column to just “NA”

Donations2$contributor_occupation <-Donations2$contributor_occupation%>% 
  str_replace(pattern = '^N.A.$', replacement='NA') %>% 
  str_replace(pattern = '^N/A$', replacement='NA') %>% 
  str_replace(pattern = '^N A$', replacement='NA') %>% 
  str_replace(pattern = '^N/A RETIRED$',replacement='NA')

Unifying the missing data in the employer contribution column that R recognizes as N/A to “NA”

#Unifying N/A recognized by R in the contributor_employer column
Donations2[is.na(Donations2$contributor_employer), "contributor_employer"]='NA'
# The chunk below tests if all the N/As were unified
Donations2 %>% 
  filter(is.na(contributor_employer))

## # A tibble: 0 x 11
## # … with 11 variables: contributor_last_name <chr>,
## #   contributor_first_name <chr>, contributor_street_1 <chr>,
## #   contributor_employer <chr>, contributor_occupation <chr>,
## #   contribution_receipt_date <chr>, contribution_receipt_amount <dbl>,
## #   contributor_aggregate_ytd <dbl>, committee_name <chr>,
## #   committee_type <chr>, committee_party_affiliation <chr>

Unifying the missing data in the “contributer occupation column to”NA"

Donations2[is.na(Donations2$contributor_occupation), "contributor_occupation"]='NA'
#The second part of the code is to test if all the N/A were unified
Donations2 %>% 
  filter(is.na(contributor_occupation))

## # A tibble: 0 x 11
## # … with 11 variables: contributor_last_name <chr>,
## #   contributor_first_name <chr>, contributor_street_1 <chr>,
## #   contributor_employer <chr>, contributor_occupation <chr>,
## #   contribution_receipt_date <chr>, contribution_receipt_amount <dbl>,
## #   contributor_aggregate_ytd <dbl>, committee_name <chr>,
## #   committee_type <chr>, committee_party_affiliation <chr>

#Unifying N/A recognized by R in the contributor_employer column

DUMMY VARIABLES

Reducing the committte type variables to just “PAC, HOUSE, PRESIDENTIAL, PARTY, SENATE”

##SUMMARIZING DATA

Using a datatable that’s arranged by the occupation. Since the sample was too big for my computer, I took just 10% of the data and formed a table with it.

Donations2 %>%
  select(-(committee_type)) %>%
  sample_frac(size = .1) %>%
  arrange(contributor_occupation) %>%
  datatable()

VISUALIZATIONS

4.1 Trends in amount contributed from 2015 to present.

Donations2 %>% 
  select(contribution_receipt_amount,contribution_receipt_date) %>% 
  filter(percent_rank(`contribution_receipt_amount`)>=.98) %>%
  mutate(year = year(`contribution_receipt_date`)) %>% 
  group_by(year) %>%
  summarise(Total_contribution = sum(contribution_receipt_amount ,na.rm=1)) %>% 
  ggplot(aes(x=year, y= Total_contribution)) +
  geom_bar(stat="Identity")

# We can tell that the year 2016 had the most contributions. There is not necessarily a 
#particular pattern as each year has different contribution amounts.

Raised by each party per year.

#4.2 Visualization
#Compare the amount of money each party raised.
Donations2 %>% 
  select(contribution_receipt_amount,committee_type, committee_party_affiliation) %>% 
  filter(percent_rank(contribution_receipt_amount)>=.98) %>%
  group_by(committee_type) %>%
  summarise(Contribution_Total=sum(contribution_receipt_amount ,na.rm=1)) %>% 
  ggplot(aes(x=committee_type, y=Contribution_Total)) +
  geom_bar(stat="Identity")

Association between committee type and amount contributed.

#4.3 Visualization
# Relationship between the amount contributed and the committe type
Donations2 %>% 
  select(contribution_receipt_amount,committee_type, committee_party_affiliation) %>% 
  filter(percent_rank(contribution_receipt_amount)>=.90) %>%
  group_by(committee_type) %>% 
  ggplot(aes(x=contribution_receipt_amount, y=committee_type, color=committee_party_affiliation))+
          geom_jitter(alpha=.45)

DIRECTED ANALYSIS

#5.1
Donations2$Individual_ID<-paste(Donations2$contributor_last_name,Donations2$contributor_first_name)
Donations2 %>%
  select(contributor_occupation, contribution_receipt_amount, committee_name, Individual_ID) %>%
  group_by(Individual_ID) %>%
  summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
  arrange(desc(sum_contribution_receipt_amount)) %>%
  datatable()

#5.2
#Who made the most individual contribution and why do you think that is.
Donations2 %>% 
  select(Individual_ID, contribution_receipt_amount) %>%
  group_by(Individual_ID) %>%
  summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
  arrange(desc(sum_contribution_receipt_amount)) %>%
  datatable()

#5.4 How many Xavier employees have made contributions
 Donations2%>%
   group_by(Individual_ID) %>%
   select(Individual_ID, contribution_receipt_amount, contributor_employer) %>%
   filter(contributor_employer == "XAVIER UNIVERSITY") %>%
   summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
   arrange(desc(sum_contribution_receipt_amount)) %>%
   datatable()

#A total of 23 Xavier employees made political contributions between the year 2015-2019.
 #Tracey Ann  made a total contribution of $1,750 contributions, which makes her the highest Xavier employee contributor.

# 6.2# What occupation type  makes the most contributions? Doing this analysis to show the type of job that allows people to make the most contributions.
  # I will be using the functions in dplyr to help select just the part of the data that I need. Since we have the occupation column, I can create a data table that summarises  the total contributions by occupation.
Donations2 %>%
  select(contributor_occupation, contribution_receipt_amount, committee_name) %>%
  group_by(contributor_occupation) %>%
  summarise(sum_contribution_receipt_amount = sum(contribution_receipt_amount, na.rm =1)) %>%
  arrange(desc(sum_contribution_receipt_amount)) %>%
  datatable()

# From the table, we can tell that the occupation "retired' makes the most contributions. It could be because the people that are retired have money and hence they have the ability to donate.

Mid-Term Exam Fall 2019

Angoya

10/22/2019

INTRODUCTION

REQUIRED PACKAGES

THE DATA

Downloading the data using the link below

REMOVING BLATANT DATA ERRORS

Removing zeros in the data set

Use lubridate package to change date format

Selecting for data that starts with January 2015 to date

DATA CONFORMITY

Replacing the different variations of N/As in the “contributor employer”column to just “NA”

Replacing the different variations of N/As in the “contributor employer”column to just “NA”

Unifying the missing data in the employer contribution column that R recognizes as N/A to “NA”

Unifying the missing data in the “contributer occupation column to”NA"

DUMMY VARIABLES

Reducing the committte type variables to just “PAC, HOUSE, PRESIDENTIAL, PARTY, SENATE”

Using a datatable that’s arranged by the occupation. Since the sample was too big for my computer, I took just 10% of the data and formed a table with it.

VISUALIZATIONS

Raised by each party per year.

Association between committee type and amount contributed.

DIRECTED ANALYSIS