Ohio Political Finance Analysis

Author

Gavin Steele

Introduction

I am an aspiring democratic political hack. I have a fond appreciation for fundraising (it pays my bills) and the different candidates we run in each district. I found https://www.transparencyusa.org which shows all of the state candidates running in Ohio, it shows there name, party, office running for contributions (funds raised) and expenditures.

I want to know which party has the financial advantage, which candidates are the best at raising money (potentially future bosses), what races had the biggest expenditures, and to see if there are any other findings I can extract from this data.

Scraping

Using the tidyverse, httr, and rvest packages I was able to extract the data from the webpage. I looped the function 21 times to get all of the different pages of Ohio candidates. I extracted that as a csv and that brings us to where we are at.

Libraries

library(tidyverse)

Introduction to the Data

I scrapped the contributions, expenditures, party, and office sought of all Ohio political candidates.

Variable Name	Variable Explanation	Variable Type
candidate	The candidates name	charachter
contributions	The amount of money the candidate has raised	num
expenditures	The amount of money the candidate has reported spending	num
district_number	For Statehouse and Senate seats the district number	num
state_senate	A flag if the numerical district is state senate or not.	num
bio	The biography of the candidate scrapped from balotpedia	chr
office	The office the candidate is seeking	chr
party	The first letter of the party of the candidate: D - democrat R - republican N - nonpartisan L - ibertarian I - independent	chr

Loading Data

ohio_candidates <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQABHokXReN1R7u7yLLrPZchAaYqhx5n8uhZ7flHwB7B9iM?download=1")

Cleaning the Data

Past Gavin did us no favors in putting this in an appropriate format. I need to clean the name column to just be the candidates name none of the extra stuff. Then format the financial data to be easily manipulated in R.

backup_ohio_candidates <- ohio_candidates #gavin is dumb and will inevitably screw up so making a backup.
ohio_candidates <- backup_ohio_candidates #command to revert back

#cleaning the names. 
ohio_candidates <- ohio_candidates %>% 
  mutate(candidate = str_squish(candidate),
         candidate = str_remove(candidate, "Ohio.*$"),
         candidate = str_remove(candidate, "Democrat.*$"),
         candidate = str_remove(candidate, "Republican.*$"),
         candidate = str_remove(candidate, "Attorney.*$"),
         candidate = str_remove(candidate, "Cuyahoga.*$"),
         candidate = str_remove(candidate, "Nonp.*$"),
         candidate = str_remove(candidate, "No Part.*$"),
         candidate = str_remove(candidate, "Independent.*$"),
         candidate = str_remove(candidate, "Libertarian.*$"),
         district_number = parse_number(office),
         state_senate = ifelse(str_detect(office,("Senate")), 1, 0)
         ) %>% 
  mutate(contributions = parse_number(contributions),
         expenditures = parse_number(expenditures))

Short Comings

Unfortunately, the data acquired from Transparency USA is missing a bunch of the offices the candidates are running for.

hxn sum(is.na(ohio_candidates$office))}

332 offices run for are missing. Let’s fix that.

Secondary Data Source

I will use Ballotpedia to supplement the data from Transparency USA.

##extract the candidates we need more info on  
missing_candidates <- ohio_candidates %>%   
  filter(is.na(office))  
candidates <- missing_candidates$candidate  

##format to run through ballotopedia scrape script

candidates <-  gsub(" ","_",candidates) 
candidates <- gsub("_$", "",candidates)

Scraped the 332 names missing office. Returned this csv

ballotpedia <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQAc0y0PxP3qQKEAAJo7Jgz1AaPyTbs_WDLiEwbFvhpWeCU?download=1")

Analyze the new data.

summary(ballotpedia)

  candidate             name               bio               party          
 Length:257         Length:257         Length:257         Length:257        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
    office         
 Length:257        
 Class :character  
 Mode  :character

sum(is.na(ballotpedia$office))

[1] 176

176/332

[1] 0.5301205

sum(is.na(ballotpedia$bio))

[1] 123

# we filled more than half the offices with this scrape and got 123 personal bios.

Match the secondary data to the Primary Data

ohio_candidates <- ohio_candidates %>%    
  mutate(candidate = str_trim(candidate))  
merged_data <- ohio_candidates %>%    
  left_join(ballotpedia, by = "candidate")  

#Clean Office and establish the standards from earlier
merged_data <- merged_data %>% 
  mutate(office.y = str_remove(office.y, "Candidate, ")) %>%
  mutate(office = coalesce(office.y,office.x)) %>% 
  mutate(district_number = parse_number(office),
         state_senate = ifelse(str_detect(office,("Senate")), 1, 0)
         )
#Add the party
merged_data <- merged_data %>% 
  mutate(party.y = substr(party.y,1,1)) %>%
  mutate(party = coalesce(party.y,party.x))
#Remove the redundant rows
merged_data <- merged_data %>% 
  select(-office.x, -office.y,-party.x,-party.y) 
merged_data <- merged_data %>% 
  select(-name)

Actual Data Analytics

Which party has more money?

summary_df<- merged_data %>% 
  group_by(party) %>% 
  summarize(`Number of Candidates` = n(),
            `Average Contributions` = mean(contributions),
            `Average Expenditures` = mean(expenditures),
            `Total Funds` = sum(contributions)) %>% 
  view()

Table but let’s make it candidate proof.

summary_df %>% 
  filter(party == "D" | party == "R") %>%  # filter to only real parties
ggplot(aes(x= party, y = `Total Funds`))+
  geom_col()+
  labs(
    title = "Total Contributions by Party"
  )+
  scale_y_continuous(labels = scales::dollar)

Oh my Republicans are pushing a 3:1 funding advantage.

Which races are there a lot of competition?

This is very flawed because 332 of the entries are missing the office - this would have to be a manual / intensive fix. The number of candidates, contributions and expendetures in that race.

office <- merged_data %>% 
  mutate(office = str_remove_all(office, "\\d")) %>% 
  group_by(office) %>% 
  summarize(`Number of Candidates` = n(),
            `Average Contributions` = mean(contributions),
            `Average Expenditures` = mean(expenditures),
            `Total Funds` = sum(contributions)) %>% 
  view()

A graph.

office %>% 
 slice_max(`Average Contributions`, n = 10) %>% 
  ggplot(aes(x= reorder(office,`Average Contributions`), y = `Average Contributions`))+
  geom_col()+
  coord_flip()+
  labs(
    title = "Average Contribution for Office Type",
    x = "Office Sought")+
  scale_y_continuous(labels = scales::dollar)

Can kind of see how the sexy offices raise more and then by the bigger constituencies.

What is the Dem vs. Rep avg for cash on hand?

pch <- merged_data %>% 
  mutate(cash_on_hand = contributions-expenditures) %>% 
  group_by(party) %>%
  summarize(total_cash_on_hand = sum(cash_on_hand)) %>% 
  filter(party == "D"|party =="R") %>% 
  select(party, total_cash_on_hand) %>% 
  view()

I know pie charts suck but… indulgent with me. It wouldn’t be politics without some deceptive bs.

pch %>% 
ggplot(aes(x = "", y = total_cash_on_hand, fill = party)) +
  geom_col() +
  coord_polar(theta = "y") +
  labs(title = "Cash on Hand by Party") +
  scale_fill_manual(values = c("D"= "blue", "R" = "red")) +
  theme_void()

This tells the story of Ohio. If only I could say with certainty Republicans had a 2:1 cash advantage over Democrats. You wonder why Democrats lose.

Which democrats raised the most money?

merged_data %>% 
  filter(party == "D") %>% 
  slice_max(contributions, n = 10) %>% 
  ggplot(aes(x = reorder (candidate,contributions),
             y = contributions
             ))+
  geom_col(fill = "blue")+
  coord_flip()+
  labs (title = "Top Democrat Fundraisers",
        x = "Candidate",
        y = "Money Raised")+
  scale_y_continuous(labels = scales::dollar)

Funnily enough, Bryan Hambley the 2nd highest raising democrat lost his primary for Secretary of State. Sean Brennan is a State representative. There is a distinct lack of democrats State Auditor candidate. A business consultant of the Ohio Democratic Party might suggest that Bryan Hambley and Allison Russo should have not challenged each other in the primary because they are both great at fundraising.

Which Republicans are the most dangerous?

If they have a lot of money they can prove difficult.

merged_data %>% 
  filter(party == "R") %>% 
  slice_max(contributions, n = 10) %>% 
  ggplot(aes(x = reorder (candidate,contributions),
             y = contributions
             ))+
  geom_col(fill = "red")+
  coord_flip()+
  labs (title = "Top Republican Fundraisers",
        x = "Candidate",
        y = "Money Raised")+
  scale_y_continuous(labels = scales::dollar)

Notice the difference in the scales. The republican statewide candidates have massive stockpiles of cash, even if they are dwarfed by Vivek Ramaswamy. This graph shows why people were unwilling to challenge / advocate for an alternative candidate to Vivek.

What Statehouse and Senate Districts have the most financial activity?

district_data <- merged_data %>%
  filter(!is.na(district_number))

district_summary <- district_data %>%
  group_by(state_senate, district_number) %>%
  summarise(total_raised = sum(contributions, na.rm = TRUE),
            .groups = "drop")

district_summary <- district_summary %>%
  mutate(
    type = ifelse(state_senate == 1, "Senate", "House"),
    district = paste(type, district_number)
  )

district_summary %>%
  slice_max(total_raised, n = 10) %>%
  ggplot(aes(
    x = reorder(district, total_raised),
    y = total_raised,
    fill = type
  )) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top Statehouse and Senate Districts by Amount Raised",
    x = "District",
    y = "Total Contributions"
  ) +
  scale_y_continuous(labels = scales::dollar)

Do Senate Races cost more on average than House?

chamber_summary <- merged_data %>%
  filter(!is.na(district_number)) %>%
  mutate(type = ifelse(state_senate == 1, "Senate", "House")) %>%
  group_by(type) %>%
  summarise(
    avg_raised = mean(contributions, na.rm = TRUE),
    .groups = "drop"
  )



chamber_summary %>%
  ggplot(aes(
    x = reorder(type, avg_raised),
    y = avg_raised,
    fill = type
  )) +
  geom_col() +
    labs(
    title = "Do House or Senate Races Cost More?",
    x = "Chamber",
    y = "Average Contributions"
  ) +
  scale_y_continuous(labels = scales::dollar)

This makes sense. Senate seats are more prestigious 33 vs. 99. The difference between the two is not directly correlated with the number of constituents. If it were Senate would be 3x the house.

Conclusion

The data and analysis for the most part prove conventional wisdom correctly. The Ohio Republican party has enjoyed a 25+ year super-majority. This super-majority can be shown in fundraising numbers. It is fascinating to see how much money gets raised and spent on politics in the State of Ohio. The shear scale of political spending makes you wonder what would happen if instead of biannual candidate marketing campaigns (vanity projects) politicians invested these funds into their communities. Then again we wouldn’t have the iconic road debris, airwaves pollutants, or random high schoolers & college students interrupting your daily life.

--- title: "Ohio Political Finance Analysis" format: html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. author: "Gavin Steele" # Author name toc: true # Generates an automatic table of contents. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: TRUE # FALSE: Code chunk messages are hidden by default. echo: TRUE # TRUE: Show all code in the output. # You can view execution options for code chunks here: # https://quarto.org/docs/computations/execution-options.html # View more formatting options here: # https://quarto.org/docs/reference/formats/html.html editor: markdown: wrap: 72 --- ## Introduction I am an aspiring democratic political hack. I have a fond appreciation for fundraising (it pays my bills) and the different candidates we run in each district. I found <https://www.transparencyusa.org> which shows all of the state candidates running in Ohio, it shows there name, party, office running for contributions (funds raised) and expenditures. I want to know which party has the financial advantage, which candidates are the best at raising money (potentially future bosses), what races had the biggest expenditures, and to see if there are any other findings I can extract from this data. ## Scraping Using the tidyverse, httr, and rvest packages I was able to extract the data from the webpage. I looped the function 21 times to get all of the different pages of Ohio candidates. I extracted that as a csv and that brings us to where we are at. ## Libraries ```{r} library(tidyverse) ``` ## Introduction to the Data I scrapped the contributions, expenditures, party, and office sought of all Ohio political candidates. +-----------------+----------------------------------+---------------+ | Variable Name | Variable Explanation | Variable Type | +=================+==================================+===============+ | candidate | The candidates name | charachter | +-----------------+----------------------------------+---------------+ | contributions | The amount of money the | num | | | candidate has raised | | +-----------------+----------------------------------+---------------+ | expenditures | The amount of money the | num | | | candidate has reported spending | | +-----------------+----------------------------------+---------------+ | district_number | For Statehouse and Senate seats | num | | | the district number | | +-----------------+----------------------------------+---------------+ | state_senate | A flag if the numerical district | num | | | is state senate or not. | | +-----------------+----------------------------------+---------------+ | bio | The biography of the candidate | chr | | | scrapped from balotpedia | | +-----------------+----------------------------------+---------------+ | office | The office the candidate is | chr | | | seeking | | +-----------------+----------------------------------+---------------+ | party | The first letter of the party of | chr | | | the candidate: | | | | | | | | D - democrat | | | | | | | | R - republican | | | | | | | | N - nonpartisan | | | | | | | | L - ibertarian | | | | | | | | I - independent | | +-----------------+----------------------------------+---------------+ ## Loading Data ```{r} ohio_candidates <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQABHokXReN1R7u7yLLrPZchAaYqhx5n8uhZ7flHwB7B9iM?download=1") ``` ## Cleaning the Data Past Gavin did us no favors in putting this in an appropriate format. I need to clean the name column to just be the candidates name none of the extra stuff. Then format the financial data to be easily manipulated in R. ```{r} backup_ohio_candidates <- ohio_candidates #gavin is dumb and will inevitably screw up so making a backup. ohio_candidates <- backup_ohio_candidates #command to revert back #cleaning the names. ohio_candidates <- ohio_candidates %>% mutate(candidate = str_squish(candidate), candidate = str_remove(candidate, "Ohio.*$"), candidate = str_remove(candidate, "Democrat.*$"), candidate = str_remove(candidate, "Republican.*$"), candidate = str_remove(candidate, "Attorney.*$"), candidate = str_remove(candidate, "Cuyahoga.*$"), candidate = str_remove(candidate, "Nonp.*$"), candidate = str_remove(candidate, "No Part.*$"), candidate = str_remove(candidate, "Independent.*$"), candidate = str_remove(candidate, "Libertarian.*$"), district_number = parse_number(office), state_senate = ifelse(str_detect(office,("Senate")), 1, 0) ) %>% mutate(contributions = parse_number(contributions), expenditures = parse_number(expenditures)) ``` ## Short Comings Unfortunately, the data acquired from Transparency USA is missing a bunch of the offices the candidates are running for. ```{r} sum(is.na(ohio_candidates$office))} ``` 332 offices run for are missing. Let's fix that. ## Secondary Data Source I will use Ballotpedia to supplement the data from Transparency USA. ```{r} ##extract the candidates we need more info on missing_candidates <- ohio_candidates %>% filter(is.na(office)) candidates <- missing_candidates$candidate ##format to run through ballotopedia scrape script candidates <- gsub(" ","_",candidates) candidates <- gsub("_$", "",candidates) ``` Scraped the 332 names missing office. Returned this csv ```{r} ballotpedia <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQAc0y0PxP3qQKEAAJo7Jgz1AaPyTbs_WDLiEwbFvhpWeCU?download=1") ``` ## Analyze the new data. ```{r} summary(ballotpedia) sum(is.na(ballotpedia$office)) 176/332 sum(is.na(ballotpedia$bio)) # we filled more than half the offices with this scrape and got 123 personal bios. ``` ## Match the secondary data to the Primary Data ```{r} ohio_candidates <- ohio_candidates %>% mutate(candidate = str_trim(candidate)) merged_data <- ohio_candidates %>% left_join(ballotpedia, by = "candidate") #Clean Office and establish the standards from earlier merged_data <- merged_data %>% mutate(office.y = str_remove(office.y, "Candidate, ")) %>% mutate(office = coalesce(office.y,office.x)) %>% mutate(district_number = parse_number(office), state_senate = ifelse(str_detect(office,("Senate")), 1, 0) ) #Add the party merged_data <- merged_data %>% mutate(party.y = substr(party.y,1,1)) %>% mutate(party = coalesce(party.y,party.x)) #Remove the redundant rows merged_data <- merged_data %>% select(-office.x, -office.y,-party.x,-party.y) merged_data <- merged_data %>% select(-name) ``` ## Actual Data Analytics ### Which party has more money? ```{r} summary_df<- merged_data %>% group_by(party) %>% summarize(`Number of Candidates` = n(), `Average Contributions` = mean(contributions), `Average Expenditures` = mean(expenditures), `Total Funds` = sum(contributions)) %>% view() ``` Table but let's make it candidate proof. ```{r} summary_df %>% filter(party == "D" | party == "R") %>% # filter to only real parties ggplot(aes(x= party, y = `Total Funds`))+ geom_col()+ labs( title = "Total Contributions by Party" )+ scale_y_continuous(labels = scales::dollar) ``` Oh my Republicans are pushing a 3:1 funding advantage. ### Which races are there a lot of competition? This is very flawed because 332 of the entries are missing the office - this would have to be a manual / intensive fix. The number of candidates, contributions and expendetures in that race. ```{r} office <- merged_data %>% mutate(office = str_remove_all(office, "\\d")) %>% group_by(office) %>% summarize(`Number of Candidates` = n(), `Average Contributions` = mean(contributions), `Average Expenditures` = mean(expenditures), `Total Funds` = sum(contributions)) %>% view() ``` A graph. ```{r} office %>% slice_max(`Average Contributions`, n = 10) %>% ggplot(aes(x= reorder(office,`Average Contributions`), y = `Average Contributions`))+ geom_col()+ coord_flip()+ labs( title = "Average Contribution for Office Type", x = "Office Sought")+ scale_y_continuous(labels = scales::dollar) ``` Can kind of see how the sexy offices raise more and then by the bigger constituencies. ### What is the Dem vs. Rep avg for cash on hand? ```{r} pch <- merged_data %>% mutate(cash_on_hand = contributions-expenditures) %>% group_by(party) %>% summarize(total_cash_on_hand = sum(cash_on_hand)) %>% filter(party == "D"|party =="R") %>% select(party, total_cash_on_hand) %>% view() ``` I know pie charts suck but... indulgent with me. It wouldn't be politics without some deceptive bs. ```{r} pch %>% ggplot(aes(x = "", y = total_cash_on_hand, fill = party)) + geom_col() + coord_polar(theta = "y") + labs(title = "Cash on Hand by Party") + scale_fill_manual(values = c("D"= "blue", "R" = "red")) + theme_void() ``` This tells the story of Ohio. If only I could say with certainty Republicans had a 2:1 cash advantage over Democrats. You wonder why Democrats lose. ## Which democrats raised the most money? ```{r} merged_data %>% filter(party == "D") %>% slice_max(contributions, n = 10) %>% ggplot(aes(x = reorder (candidate,contributions), y = contributions ))+ geom_col(fill = "blue")+ coord_flip()+ labs (title = "Top Democrat Fundraisers", x = "Candidate", y = "Money Raised")+ scale_y_continuous(labels = scales::dollar) ``` Funnily enough, Bryan Hambley the 2nd highest raising democrat lost his primary for Secretary of State. Sean Brennan is a State representative. There is a distinct lack of democrats State Auditor candidate. A business consultant of the Ohio Democratic Party might suggest that Bryan Hambley and Allison Russo should have not challenged each other in the primary because they are both great at fundraising. ## Which Republicans are the most dangerous? If they have a lot of money they can prove difficult. ```{r} merged_data %>% filter(party == "R") %>% slice_max(contributions, n = 10) %>% ggplot(aes(x = reorder (candidate,contributions), y = contributions ))+ geom_col(fill = "red")+ coord_flip()+ labs (title = "Top Republican Fundraisers", x = "Candidate", y = "Money Raised")+ scale_y_continuous(labels = scales::dollar) ``` Notice the difference in the scales. The republican statewide candidates have massive stockpiles of cash, even if they are dwarfed by Vivek Ramaswamy. This graph shows why people were unwilling to challenge / advocate for an alternative candidate to Vivek. ## What Statehouse and Senate Districts have the most financial activity? ```{r} district_data <- merged_data %>% filter(!is.na(district_number)) district_summary <- district_data %>% group_by(state_senate, district_number) %>% summarise(total_raised = sum(contributions, na.rm = TRUE), .groups = "drop") district_summary <- district_summary %>% mutate( type = ifelse(state_senate == 1, "Senate", "House"), district = paste(type, district_number) ) district_summary %>% slice_max(total_raised, n = 10) %>% ggplot(aes( x = reorder(district, total_raised), y = total_raised, fill = type )) + geom_col() + coord_flip() + labs( title = "Top Statehouse and Senate Districts by Amount Raised", x = "District", y = "Total Contributions" ) + scale_y_continuous(labels = scales::dollar) ``` ## Do Senate Races cost more on average than House? ```{r} chamber_summary <- merged_data %>% filter(!is.na(district_number)) %>% mutate(type = ifelse(state_senate == 1, "Senate", "House")) %>% group_by(type) %>% summarise( avg_raised = mean(contributions, na.rm = TRUE), .groups = "drop" ) chamber_summary %>% ggplot(aes( x = reorder(type, avg_raised), y = avg_raised, fill = type )) + geom_col() + labs( title = "Do House or Senate Races Cost More?", x = "Chamber", y = "Average Contributions" ) + scale_y_continuous(labels = scales::dollar) ``` This makes sense. Senate seats are more prestigious 33 vs. 99. The difference between the two is not directly correlated with the number of constituents. If it were Senate would be 3x the house. ## Conclusion The data and analysis for the most part prove conventional wisdom correctly. The Ohio Republican party has enjoyed a 25+ year super-majority. This super-majority can be shown in fundraising numbers. It is fascinating to see how much money gets raised and spent on politics in the State of Ohio. The shear scale of political spending makes you wonder what would happen if instead of biannual candidate marketing campaigns (vanity projects) politicians invested these funds into their communities. Then again we wouldn't have the iconic road debris, airwaves pollutants, or random high schoolers & college students interrupting your daily life.