Introduction

This project explores financial transactions from the 2020 election cycle across all candidates who ran for federal office. The database on opensecrets.com contains millions of records from this election cycle and past. For research purposes, I’ve chosen to analyze PAC contribution data to visualize which candidates had the largest financial backing from corporate entities.

Methodology & Process

Data

The data used for this research was collected from official fiscal reports from Federal Election Commission. Because the raw data from OpenSecrets is broken up into multiple datasets with records in .txt format, the first thing to do was clean the data to remove characters and assign null values in OpenRefine. The data below displays the important columns from the candidate dataset after cleaning:

## # A tibble: 6 × 5
## # Groups:   Party [4]
##   `Candidate ID` `Candidate Name`     Party Result `District Ran`
##   <chr>          <chr>                <chr> <chr>  <chr>         
## 1 N00039029      Thomas Lamb (I)      I     3N     AK01          
## 2 N00047445      Ray Sean Tugatuk (D) D     DL     AK01          
## 3 N00047447      Thomas Catalano (3)  3     3N     PRES          
## 4 N00044245      Jerry Carl (R)       R     RW     AL01          
## 5 N00044288      Wes Lambert (R)      R     RL     AL01          
## 6 N00025369      Chris Pringle (R)    R     RL     AL01

Separately, OpenSecrets includes a list of contributions from PACs with the donation type and recipient code. This dataset contains 890,578 records:

## # A tibble: 6 × 10
##   `Cycle 2` `FEC Record No.` `Pac ID`  `Candidate ID` Amount Date    Code  Type 
##       <dbl>            <dbl> <chr>     <chr>           <dbl> <chr>   <chr> <chr>
## 1      2020          1.01e18 C00429241 N00033395         250 08/19/… H5100 24K  
## 2      2020          1.01e18 C00429241 N00036275         500 09/13/… H5100 24K  
## 3      2020          1.01e18 C00682724 N00042308          39 12/02/… J1200 24Z  
## 4      2020          1.01e18 C00682724 N00044183          49 11/02/… J1200 24Z  
## 5      2020          1.01e18 C00682724 N00042308          26 10/31/… J1200 24Z  
## 6      2020          1.01e18 C00682724 N00044183          49 11/02/… J1200 24Z  
## # … with 2 more variables: DI <chr>, FECCandID <chr>

To challenge myself, I took on the task of combining the two datasets by applying a full join on the Candidate ID field, creating a dataset with all donations along with the recipients, their party, and the district they ran for in 2020.

## # A tibble: 6 × 14
## # Groups:   Party [4]
##   `Candidate ID` `Candidate Name`     Party Result `District Ran` `Cycle 2`
##   <chr>          <chr>                <chr> <chr>  <chr>              <dbl>
## 1 N00039029      Thomas Lamb (I)      I     3N     AK01                  NA
## 2 N00047445      Ray Sean Tugatuk (D) D     DL     AK01                  NA
## 3 N00047447      Thomas Catalano (3)  3     3N     PRES                  NA
## 4 N00044245      Jerry Carl (R)       R     RW     AL01                2020
## 5 N00044245      Jerry Carl (R)       R     RW     AL01                2020
## 6 N00044245      Jerry Carl (R)       R     RW     AL01                2020
## # … with 8 more variables: FEC Record No. <dbl>, Pac ID <chr>, Amount <dbl>,
## #   Date <chr>, Code <chr>, Type <chr>, DI <chr>, FECCandID <chr>

I noticed that after combining the data, there were a few duplicate values which could be eliminated. To do this, I transformed the data using combined <- distinct(combined) to remove any matching rows. I then filtered the dataset to remove any candidates without PAC contributions for that election cycle to match the original number of records in the PAC dataset:

combined <- distinct(combined)

combined <- combined %>%
  filter(!(is.na(`Amount`)))  %>%
  mutate(Date = as.Date(Date, tryFormats = c("%m/%d/%Y"))) %>%
  filter(Date >= as.Date('2018-03-20') & Date <= as.Date('2021-01-30')) %>%
  mutate(Amount = Amount/1000000) 

head(combined)
## # A tibble: 6 × 14
## # Groups:   Party [1]
##   `Candidate ID` `Candidate Name` Party Result `District Ran` `Cycle 2`
##   <chr>          <chr>            <chr> <chr>  <chr>              <dbl>
## 1 N00044245      Jerry Carl (R)   R     RW     AL01                2020
## 2 N00044245      Jerry Carl (R)   R     RW     AL01                2020
## 3 N00044245      Jerry Carl (R)   R     RW     AL01                2020
## 4 N00044245      Jerry Carl (R)   R     RW     AL01                2020
## 5 N00044245      Jerry Carl (R)   R     RW     AL01                2020
## 6 N00044245      Jerry Carl (R)   R     RW     AL01                2020
## # … with 8 more variables: FEC Record No. <dbl>, Pac ID <chr>, Amount <dbl>,
## #   Date <date>, Code <chr>, Type <chr>, DI <chr>, FECCandID <chr>

With a clean, workable dataset with all the information I was looking for, I was ready to experiment with visualizing results for any insights.

Breaking Up the Data

In order to familiarize myself with R tools, I started with a simple visualization showing the total amount of PAC donations accepted by each party, with no specific donation type in mind. To make things look a bit neater, the donations have been normalized by dividing Amount by 1000000:

party_viz <- ggplot(donations_by_party, aes(x = Party, y = total_donations, fill = Party)) +
  geom_col() +
  scale_fill_manual(values = c(R = 'red', D = 'blue', L = 'gold', 
                               I = 'green', `3` = 'purple', 
                               U='grey')) +
  labs(
    x = "Party Affiliation",
    y = "Total Donatinons (Mil.)",
    title = "PAC Contributions by Party"
  ) +
  theme_minimal()

party_viz

While this visualization is hardly surprising, it is interesting to see exactly how much more money the Republican and Democratic parties accept from PACs.

From there, I went ahead and created a few more dataframes from the master combined dataset.

x <- aggregate(Amount ~ Party + Date, sum, data=combined) %>%
  filter(Party == 'R' | Party == 'D') %>%
  group_by(`Party`) %>%
  mutate(Amount = cumsum(Amount))
  

donation_timeline <- ggplot(x, aes(x = Date, y = Amount, color=Party)) +
  geom_line() +
  scale_color_manual(values = c(R = 'red', D = 'blue')) +
  theme_minimal() + 
  labs(y = "Total Donations (Mil.)", title = "Major Party PAC Support")

The timeline below shows the running total of PAC contributions to the two major parties during the presidential election cycle, including any reimbursements. It’s clear by the end of the 2020 period that the Republican party was outspent by PACs supporting the Democratic party.

Filtering Candidates

With a few tricks under my belt, I felt more comfortable experimenting with more difficult filtering, such as queriying specific candidates based on their election type and averaging totals based on time. For tasks like these, I heavily relied on the tidyverse package.

pres_donations <- combined %>%
  filter(`District Ran` == 'PRES' & (Party == 'R' | Party == 'D') & Date <= as.Date('2020-04-01')) %>%
  group_by(`Candidate Name`)

pres_donations <- aggregate(Amount ~ `Candidate Name` + Date + Party, sum, data=pres_donations) %>%
  group_by(`Candidate Name`) %>%
  mutate(Amount = cumsum(Amount)) %>%
  mutate(Amount = Amount*1000000)


pres_timeline <- ggplot(pres_donations, aes(x = Date, y = Amount, color = `Candidate Name`)) +
  geom_line() +
  facet_wrap(~Party, ncol = 1) +
  theme_minimal() +
  labs(y = "Total Donations")
  

ggplotly(pres_timeline)

Average Quarterly Donations to Party

To visualize averages by quarter, I used the lubridate package to convert the dates into quarterly format, then aggregated the amounts by quarter as mean values. These values were averaged by Party, and inserted into a plotly object for more interaction.

combined$q <- quarter(combined$Date, with_year=TRUE)

combined_q <- aggregate(Amount ~ Party + q, mean, data=combined) %>%
  mutate(Amount = Amount*1000000) %>%
  filter(!(Party == 'U'))
combined_q$q <- as.factor(combined_q$q)

avg_donations <- ggplot(aes(x=q, y=Amount, group=Party), data=combined_q) + 
  geom_point(aes(color=Party)) + 
  geom_line(aes(color=Party)) + 
  scale_x_discrete(labels=c('Q1 2018', 'Q2 2018', 'Q3 2018', 'Q4 2018',
                            'Q1 2019', 'Q2 2019', 'Q3 2019', 'Q4 2019',
                            'Q1 2020', 'Q2 2020', 'Q3 2020', 'Q4 2020',
                            'Q1 2021')) +
  labs(
    title = "Average PAC Donations by Quarter",
    x = "Calendar Quarter",
    y = "Average Donation Amount"
  ) +
  theme_minimal() + 
  scale_color_manual(values = c(R = 'red', D = 'blue', L = 'gold', I = 'green', 
                                `3` = 'purple'), labels = c("Republican",
                                                                        "Democrat", 
                                                                        "Libertarian",
                                                                        "Independent",
                                                                        "3rd Party"))

Total Democratic Primary Candidate Donations

Working more with date ranges, I filtered out just the Democratic presidential candidates and set the timeframe to be the length of the Democratic primaries. This way, we can see which candidates had the largest backing during that period.

dem_primary <- combined %>%
  filter(`District Ran` == 'PRES' &
           Type == "24K" &
           Party == 'D') %>%
  group_by(`Candidate Name`)
  
dem_primary <-aggregate(Amount ~ `Candidate Name` + q + Party, sum, data=dem_primary) %>%
  filter(q >= 2019 & q<=2020.2) %>%
  group_by(`Candidate Name`) %>%
  mutate(Amount = cumsum(Amount)) %>%
  mutate(Amount = Amount*1000)

dem_primary$q <- as.factor(dem_primary$q)

primary_donations <- ggplot(dem_primary, aes(x = q, y = Amount, group=`Candidate Name`)) +
  geom_line(aes(color = `Candidate Name`)) +
  theme_minimal() +
  scale_color_viridis_d() +
  labs(x = 'Quarter', y = "Total Donations (Thous.)") +
  scale_x_discrete(labels =c('Q1 2019', 'Q2 2019', 'Q3 2019', 'Q4 2019',
                             'Q1 2020', 'Q2 2020'))

primary_donations

Visualizing Results

The last area I wanted to explore was the results of the election cycles, isolating donations to winning and losing candidates across all elections. The plot below displays the total amount of contributions taken by both winning and losing candidates in 2020.

wl <- aggregate(Amount ~ `Result` + Date, sum, data=combined) %>%
  group_by(`Result`) %>%
  filter(Result == 'DW' | 
           Result =='DL' |
           Result == 'RW'|
           Result == 'RL') %>%
  mutate(Amount = cumsum(Amount))

wl_bar <- ggplot(wl, aes(x = Result, y = Amount, fill = Result)) +
  geom_col() +
  labs(
    x = "Election Result",
    y = "Total Donatinons (Mil.)",
    title = "Results Vs. Contributions"
    ) +
  scale_x_discrete(labels =c('Losing Democrat', 'Winning Democrat',
                             'Losing Republican', 'Winning Republican')) + 
  theme_minimal() +
  scale_fill_manual(values = c(DL = 'purple', DW = 'blue',
                               RL = 'pink', RW = 'red'), 
                    labels = c('Losing Democrat', 'Winning Democrat',
                                'Losing Republican', 'Winning Republican'))

Lastly, to reduce any large outliers from the presidential election contributions, I isolated the results of every other election, which also shows that candidates for more local elections still largely benefit from taking PAC donations.

wl2 <- combined %>%
  filter(Result == 'DW' | 
           Result =='DL' |
           Result == 'RW'|
           Result == 'RL' &
           !(`District Ran` == 'PRES')) %>%
  group_by(`Result`)

wl2 <- aggregate(Amount ~ `Result` + Date, sum, data=wl2) %>%
  group_by(Result) %>%
  mutate(Amount = cumsum(Amount))

wl2_bar <- ggplot(wl2, aes(x = Date, y = Amount, color = Result)) +
  geom_line() +
  labs(
    x = "Date",
    y = "Total Donatinons (Mil.)",
    title = "PAC Contributions by Party"
  ) + 
  theme_minimal() + 
  scale_color_manual(values = c(DL = 'purple', DW = 'blue',
                                RL = 'pink', RW = 'red'), 
                     labels = c('Losing Democrat', 'Winning Democrat',
                                'Losing Republican', 'Winning Republican'))