Assignment 7 - Analysis

Author

Gavin Steele

Introduction

I am an aspiring democratic political hack. I have a fond appreciation for fundraising (it pays my bills) and the different candidates we run in each district. I found https://www.transparencyusa.org which shows all of the state candidates running in Ohio, it shows there name, party, office running for contributions (funds raised) and expenditures.

I want to know which party has the financial advantage, which candidates are the best at raising money (potentially future bosses), what races had the biggest expenditures, and to see if there are any other findings I can extract from this data.

Scraping

Using the tidyverse, httr, and rvest packages I was able to extract the data from the webpage. I looped the function 21 times to get all of the different pages of Ohio candidates. I extracted that as a csv and that brings us to where we are at.

Libraries

library(tidyverse)

Loading Data

ohio_candidates <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQABHokXReN1R7u7yLLrPZchAaYqhx5n8uhZ7flHwB7B9iM?download=1")

Cleaning the Data

Past Gavin did us no favors in putting this in an appropriate format. I need to clean the name column to just be the candidates name none of the extra stuff. Then format the financial data to be easily manipulated in R.

backup_ohio_candidates <- ohio_candidates #gavin is dumb and will inevitably screw up so making a backup.
ohio_candidates <- backup_ohio_candidates #command to revert back

#cleaning the names. 
ohio_candidates <- ohio_candidates %>% 
  mutate(candidate = str_squish(candidate),
         candidate = str_remove(candidate, "Ohio.*$"),
         candidate = str_remove(candidate, "Democrat.*$"),
         candidate = str_remove(candidate, "Republican.*$"),
         candidate = str_remove(candidate, "Attorney.*$"),
         candidate = str_remove(candidate, "Cuyahoga.*$"),
         candidate = str_remove(candidate, "Nonp.*$"),
         candidate = str_remove(candidate, "No Part.*$"),
         candidate = str_remove(candidate, "Independent.*$"),
         candidate = str_remove(candidate, "Libertarian.*$"),
         district_number = parse_number(office),
         state_senate = ifelse(str_detect(office,("Senate")), 1, 0)
         ) %>% 
  mutate(contributions = parse_number(contributions),
         expenditures = parse_number(expenditures))

Actual Data Analytics

Which party has more money?

summary_df<-ohio_candidates %>% 
  group_by(party) %>% 
  summarize(`Number of Candidates` = n(),
            `Average Contributions` = mean(contributions),
            `Average Expenditures` = mean(expenditures),
            `Total Funds` = sum(contributions)) %>% 
  view()

Table but let’s make it candidate proof.

summary_df %>% 
  filter(party == "D" | party == "R") %>%  # filter to only real parties
ggplot(aes(x= party, y = `Total Funds`))+
  geom_col()+
  labs(
    title = "Total Contributions by Party"
  )+
  scale_y_continuous(labels = scales::dollar)

Which races are there a lot of competition?

This is very flawed because 332 of the entries are missing the office - this would have to be a manual / intensive fix. The number of candidates, contributions and expendetures in that race.

office <- ohio_candidates %>% 
  mutate(office = str_remove_all(office, "\\d")) %>% 
  group_by(office) %>% 
  summarize(`Number of Candidates` = n(),
            `Average Contributions` = mean(contributions),
            `Average Expenditures` = mean(expenditures),
            `Total Funds` = sum(contributions)) %>% 
  view()

A graph.

office %>% 
 slice_max(`Average Contributions`, n = 10) %>% 
  ggplot(aes(x= office, y = `Average Contributions`))+
  geom_col()+
  coord_flip()+
  labs(
    title = "Average Contribution for Office Type")+
  scale_y_continuous(labels = scales::dollar)

My initial data set has a lot of office running for missing. This will be a point I improve upon in my final project.

What is the Dem vs. Rep avg for cash on hand?

pch <- ohio_candidates %>% 
  mutate(cash_on_hand = contributions-expenditures) %>% 
  group_by(party) %>%
  summarize(total_cash_on_hand = sum(cash_on_hand)) %>% 
  filter(party == "D"|party =="R") %>% 
  select(party, total_cash_on_hand) %>% 
  view()

I know pie charts suck but… indulgent with me. It wouldn’t be politics without some deceptive bs.

pch %>% 
ggplot(aes(x = "", y = total_cash_on_hand, fill = party)) +
  geom_col() +
  coord_polar(theta = "y") +
  labs(title = "Cash on Hand by Party") +
  scale_fill_manual(values = c("D"= "blue", "R" = "red")) +
  theme_void()

Which democrats raised the most money?

ohio_candidates %>% 
  filter(party == "D") %>% 
  slice_max(contributions, n = 10) %>% 
  ggplot(aes(x = reorder (candidate,contributions),
             y = contributions
             ))+
  geom_col(fill = "blue")+
  coord_flip()+
  labs (title = "Top Democrat Fundraisers",
        x = "Candidate",
        y = "Money Raised")+
  scale_y_continuous(labels = scales::dollar)

I want on that Amy Acton ticket!

Conclusion

I should sell my soul for the Republican party. They raise a plethora more money to spend on people of my nature.