library(tidyverse)Ohio Political Finance Analysis
Introduction
I am an aspiring democratic political hack. I have a fond appreciation for fundraising (it pays my bills) and the different candidates we run in each district. I found https://www.transparencyusa.org which shows all of the state candidates running in Ohio, it shows there name, party, office running for contributions (funds raised) and expenditures.
I want to know which party has the financial advantage, which candidates are the best at raising money (potentially future bosses), what races had the biggest expenditures, and to see if there are any other findings I can extract from this data.
Scraping
Using the tidyverse, httr, and rvest packages I was able to extract the data from the webpage. I looped the function 21 times to get all of the different pages of Ohio candidates. I extracted that as a csv and that brings us to where we are at.
Libraries
Introduction to the Data
I scrapped the contributions, expenditures, party, and office sought of all Ohio political candidates.
| Variable Name | Variable Explanation | Variable Type |
|---|---|---|
| candidate | The candidates name | charachter |
| contributions | The amount of money the candidate has raised | num |
| expenditures | The amount of money the candidate has reported spending | num |
| district_number | For Statehouse and Senate seats the district number | num |
| state_senate | A flag if the numerical district is state senate or not. | num |
| bio | The biography of the candidate scrapped from balotpedia | chr |
| office | The office the candidate is seeking | chr |
| party | The first letter of the party of the candidate: D - democrat R - republican N - nonpartisan L - ibertarian I - independent |
chr |
Loading Data
ohio_candidates <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQABHokXReN1R7u7yLLrPZchAaYqhx5n8uhZ7flHwB7B9iM?download=1")Cleaning the Data
Past Gavin did us no favors in putting this in an appropriate format. I need to clean the name column to just be the candidates name none of the extra stuff. Then format the financial data to be easily manipulated in R.
backup_ohio_candidates <- ohio_candidates #gavin is dumb and will inevitably screw up so making a backup.
ohio_candidates <- backup_ohio_candidates #command to revert back
#cleaning the names.
ohio_candidates <- ohio_candidates %>%
mutate(candidate = str_squish(candidate),
candidate = str_remove(candidate, "Ohio.*$"),
candidate = str_remove(candidate, "Democrat.*$"),
candidate = str_remove(candidate, "Republican.*$"),
candidate = str_remove(candidate, "Attorney.*$"),
candidate = str_remove(candidate, "Cuyahoga.*$"),
candidate = str_remove(candidate, "Nonp.*$"),
candidate = str_remove(candidate, "No Part.*$"),
candidate = str_remove(candidate, "Independent.*$"),
candidate = str_remove(candidate, "Libertarian.*$"),
district_number = parse_number(office),
state_senate = ifelse(str_detect(office,("Senate")), 1, 0)
) %>%
mutate(contributions = parse_number(contributions),
expenditures = parse_number(expenditures))Short Comings
Unfortunately, the data acquired from Transparency USA is missing a bunch of the offices the candidates are running for.
hxn sum(is.na(ohio_candidates$office))}
332 offices run for are missing. Let’s fix that.
Secondary Data Source
I will use Ballotpedia to supplement the data from Transparency USA.
##extract the candidates we need more info on
missing_candidates <- ohio_candidates %>%
filter(is.na(office))
candidates <- missing_candidates$candidate
##format to run through ballotopedia scrape script
candidates <- gsub(" ","_",candidates)
candidates <- gsub("_$", "",candidates)Scraped the 332 names missing office. Returned this csv
ballotpedia <- read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/steeleg2_xavier_edu/IQAc0y0PxP3qQKEAAJo7Jgz1AaPyTbs_WDLiEwbFvhpWeCU?download=1")Analyze the new data.
summary(ballotpedia) candidate name bio party
Length:257 Length:257 Length:257 Length:257
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
office
Length:257
Class :character
Mode :character
sum(is.na(ballotpedia$office)) [1] 176
176/332 [1] 0.5301205
sum(is.na(ballotpedia$bio)) [1] 123
# we filled more than half the offices with this scrape and got 123 personal bios.Match the secondary data to the Primary Data
ohio_candidates <- ohio_candidates %>%
mutate(candidate = str_trim(candidate))
merged_data <- ohio_candidates %>%
left_join(ballotpedia, by = "candidate")
#Clean Office and establish the standards from earlier
merged_data <- merged_data %>%
mutate(office.y = str_remove(office.y, "Candidate, ")) %>%
mutate(office = coalesce(office.y,office.x)) %>%
mutate(district_number = parse_number(office),
state_senate = ifelse(str_detect(office,("Senate")), 1, 0)
)
#Add the party
merged_data <- merged_data %>%
mutate(party.y = substr(party.y,1,1)) %>%
mutate(party = coalesce(party.y,party.x))
#Remove the redundant rows
merged_data <- merged_data %>%
select(-office.x, -office.y,-party.x,-party.y)
merged_data <- merged_data %>%
select(-name)Actual Data Analytics
Which party has more money?
summary_df<- merged_data %>%
group_by(party) %>%
summarize(`Number of Candidates` = n(),
`Average Contributions` = mean(contributions),
`Average Expenditures` = mean(expenditures),
`Total Funds` = sum(contributions)) %>%
view()Table but let’s make it candidate proof.
summary_df %>%
filter(party == "D" | party == "R") %>% # filter to only real parties
ggplot(aes(x= party, y = `Total Funds`))+
geom_col()+
labs(
title = "Total Contributions by Party"
)+
scale_y_continuous(labels = scales::dollar)Oh my Republicans are pushing a 3:1 funding advantage.
Which races are there a lot of competition?
This is very flawed because 332 of the entries are missing the office - this would have to be a manual / intensive fix. The number of candidates, contributions and expendetures in that race.
office <- merged_data %>%
mutate(office = str_remove_all(office, "\\d")) %>%
group_by(office) %>%
summarize(`Number of Candidates` = n(),
`Average Contributions` = mean(contributions),
`Average Expenditures` = mean(expenditures),
`Total Funds` = sum(contributions)) %>%
view()A graph.
office %>%
slice_max(`Average Contributions`, n = 10) %>%
ggplot(aes(x= reorder(office,`Average Contributions`), y = `Average Contributions`))+
geom_col()+
coord_flip()+
labs(
title = "Average Contribution for Office Type",
x = "Office Sought")+
scale_y_continuous(labels = scales::dollar)Can kind of see how the sexy offices raise more and then by the bigger constituencies.
What is the Dem vs. Rep avg for cash on hand?
pch <- merged_data %>%
mutate(cash_on_hand = contributions-expenditures) %>%
group_by(party) %>%
summarize(total_cash_on_hand = sum(cash_on_hand)) %>%
filter(party == "D"|party =="R") %>%
select(party, total_cash_on_hand) %>%
view()I know pie charts suck but… indulgent with me. It wouldn’t be politics without some deceptive bs.
pch %>%
ggplot(aes(x = "", y = total_cash_on_hand, fill = party)) +
geom_col() +
coord_polar(theta = "y") +
labs(title = "Cash on Hand by Party") +
scale_fill_manual(values = c("D"= "blue", "R" = "red")) +
theme_void()This tells the story of Ohio. If only I could say with certainty Republicans had a 2:1 cash advantage over Democrats. You wonder why Democrats lose.
Which democrats raised the most money?
merged_data %>%
filter(party == "D") %>%
slice_max(contributions, n = 10) %>%
ggplot(aes(x = reorder (candidate,contributions),
y = contributions
))+
geom_col(fill = "blue")+
coord_flip()+
labs (title = "Top Democrat Fundraisers",
x = "Candidate",
y = "Money Raised")+
scale_y_continuous(labels = scales::dollar)Funnily enough, Bryan Hambley the 2nd highest raising democrat lost his primary for Secretary of State. Sean Brennan is a State representative. There is a distinct lack of democrats State Auditor candidate. A business consultant of the Ohio Democratic Party might suggest that Bryan Hambley and Allison Russo should have not challenged each other in the primary because they are both great at fundraising.
Which Republicans are the most dangerous?
If they have a lot of money they can prove difficult.
merged_data %>%
filter(party == "R") %>%
slice_max(contributions, n = 10) %>%
ggplot(aes(x = reorder (candidate,contributions),
y = contributions
))+
geom_col(fill = "red")+
coord_flip()+
labs (title = "Top Republican Fundraisers",
x = "Candidate",
y = "Money Raised")+
scale_y_continuous(labels = scales::dollar)Notice the difference in the scales. The republican statewide candidates have massive stockpiles of cash, even if they are dwarfed by Vivek Ramaswamy. This graph shows why people were unwilling to challenge / advocate for an alternative candidate to Vivek.
What Statehouse and Senate Districts have the most financial activity?
district_data <- merged_data %>%
filter(!is.na(district_number))
district_summary <- district_data %>%
group_by(state_senate, district_number) %>%
summarise(total_raised = sum(contributions, na.rm = TRUE),
.groups = "drop")
district_summary <- district_summary %>%
mutate(
type = ifelse(state_senate == 1, "Senate", "House"),
district = paste(type, district_number)
)
district_summary %>%
slice_max(total_raised, n = 10) %>%
ggplot(aes(
x = reorder(district, total_raised),
y = total_raised,
fill = type
)) +
geom_col() +
coord_flip() +
labs(
title = "Top Statehouse and Senate Districts by Amount Raised",
x = "District",
y = "Total Contributions"
) +
scale_y_continuous(labels = scales::dollar)Do Senate Races cost more on average than House?
chamber_summary <- merged_data %>%
filter(!is.na(district_number)) %>%
mutate(type = ifelse(state_senate == 1, "Senate", "House")) %>%
group_by(type) %>%
summarise(
avg_raised = mean(contributions, na.rm = TRUE),
.groups = "drop"
)
chamber_summary %>%
ggplot(aes(
x = reorder(type, avg_raised),
y = avg_raised,
fill = type
)) +
geom_col() +
labs(
title = "Do House or Senate Races Cost More?",
x = "Chamber",
y = "Average Contributions"
) +
scale_y_continuous(labels = scales::dollar)This makes sense. Senate seats are more prestigious 33 vs. 99. The difference between the two is not directly correlated with the number of constituents. If it were Senate would be 3x the house.
Conclusion
The data and analysis for the most part prove conventional wisdom correctly. The Ohio Republican party has enjoyed a 25+ year super-majority. This super-majority can be shown in fundraising numbers. It is fascinating to see how much money gets raised and spent on politics in the State of Ohio. The shear scale of political spending makes you wonder what would happen if instead of biannual candidate marketing campaigns (vanity projects) politicians invested these funds into their communities. Then again we wouldn’t have the iconic road debris, airwaves pollutants, or random high schoolers & college students interrupting your daily life.