Looking into FEC Individual level fundraising for Democratic Party

The FEC requires candidates file their contributions for the 2020 elections with the agency. The data is uploaded as forms which the FEC cleans and adds to CSV files. The data is a treasure-trove of information. Polls get attention from week to week, but the news generally lacks any deep analysis of the FEC filings. Reporting typically covers headline numbers advertising total volume of donations and total amount. The FEC data is rich with self-identifying donor information. People identify their position, industry and location. Money acts as a proxy for electoral success. In the 2020 primary almost all candidates have sworn off contributions from lobbyists. On the surface, this is a noble attempt by democratic candidates to prove they aren’t influenced by corporations. However, digging below the surface, there is plenty of corporate influence in the FEC data.

Below, I look through the FEC data to find which candidates are receiving money from corporate level executives. We can also see if corporate executives are more likely to attempt to capture multiple candidates with donations and which candidates are receiving money from the same executives. I mean it when I say this a is a treasure-trove of unexplored data. I will explain my process for analysis shortly, but first let me address some boring data integrity issues. Feel free to skip tp the section titled “The Plan”, the next several paragraphs are here to outline my assumptions in data collection for anyone who distrusts my analysis.

Data integrity concerns

The FEC mandates quarterly reports to track the individual contributions into democratic candidate’s accounts. As a disclaimer It seems the engineers on the FEC website are slowly going through the individual files on the candidates and adding them to the Big table located on FEC pages here. Originally, I pulled the data from the above link, but the raw data is available on the website in full and some of the candidates q2 FEC filings hadn’t been added to the above database. Therefore, for this report I just grabbed all the individual level csv files and did my own manipulation on the data. It seems I keep several individual level contributors’ (less than .1%) for each candidate that were discarded for whatever reason by the FEC data engineers. The FEC filings aren’t representative of the whole population of donors, as reporting under 200 $ isn’t mandatory. We know that all these candidates have reached 100k contributors, yet most candidates in the database have less than 100k donations listed.

Some other peripheral information, caps are given on individual level contributions at 2800 per year. Throughout the FEC report there are some people who exceed these limits, they are known as conduits. Conduits are individuals that act on behalf of other people to give a donation to a candidate. These conduits aren’t restricted by the $ 2800 level if they exercise no discretion over where the funds go.

Assumptions

Below I make several categorizing assumptions in my data manipulation operations. The regex approach I took to clustering these groups will absolutely include some examples that do not actually represent the categorization of job title that I have made. RegEX is a brute force attempt to categorize the occupations algorithmically. However, as I have tracked the original job occupation, you can see overwhelmingly that these job titles are correctly identified. I display tables for each new clustered position, so that the reader understands that the approach is largely effective. My aggregated DF is at my website, I have categorized all job titles that have been changed with a binomial field to track them. Please feel free to look at the assumptions I made and the false captures in the clusters. if you would like to take this on in your own analysis, or perhaps develop your own clusters, please feel free to do so as well. The original unedited job titles are still present in the dataset.

The Plan

There are many different combinations of the same job title, my solution is to create new clustered occupation columns, which tracks and categorize occupations based upon corporate structures of power. I.E looking at the FEC data you would find examples of the words “C.E.O” “Chief executive officer” “chief officer” “Chief executive”. I bundle all these roles into a singular role of CEO. Other Self-reporting words like “Founder”, “EXECUTIVE DIRECTOR”, and “President” are also representative of those in charge of their organization, so they are also labeled as C.E.O. From there I step down the corporate ladder and attempt to identify all general C level executives as c level executives and place the rest of executive positions into a category of eXECS.

This was all done in careful order to preserve the hierarchy of attempting to capture positions correctly. For instance, Executive secretaries were captured first as secretary, to not show up as executive level job positions. After i completed corporate power structures, I moved onto clustering the rest of the data by the occupation type. The main clusters formed were IT, Academia, Blue-collar workers, Legal field, Science & medicine, law enforcement, and artistic/freelance.



Job Clustering



The exact term CEO shows up 2688 times in the current FEC reports, however, the term CEO shows up over 3500 times throughout the dataset. Below I print the top 3200 uses of the term. We have already increased our capturing of C.E.O’s in the dataset by nearly 20% by labeling everoyne in this table a C.E.O



different_names counts
CEO 2688
President & CEO 152
Founder & CEO 49
CEO/Founder 35
Chairman & CEO 34
President And CEO 26
Environmental CEO/Author 20
President/CEO 17
Co-CEO 14
Nonprofit CEO 11
Pres/CEO 10
Founder and CEO 9
NON PROFIT CEO 8
CEO/Engineer 7
Founder/CEO 7
PRS/CEO 7
CEO & Founder 6
CFO/CEO 6
Founder And CEO 6
Physicist/CEO 6



The cool thing is, we can expand from here. I add in similar terms like President, Executive Director etc and we have 4459 repsondents that represent the actual role of CEO. I expanded this clustering to create some other fields as well. I bundled legal,heatlh, academic,blue collar jobs into their own aggregated categories. But for the purpose of this article, we will attempt to simply look at the CEO, execs, and c lvel execs. To get an idea of what the rolls look like in each of these categories, lets print tables of each category, by original job title.

CEO

we have increased our CEO captures from 2902 to 7522 and you can tell the regex has been highly effective. Below I print the top 25 captured terms which make up 6858 of our total captures. You would be hard pressed to argue any of these captures don’t belong labeled as C.E.O.

Nearly 7k C.E.O. with their original unedited job title

## [1] 7522
Different_CEO_Titles Count
CEO 2902
PRESIDENT 1812
EXECUTIVE DIRECTOR 847
FOUNDER 367
MANAGING PARTNER 224
CHIEF EXECUTIVE OFFICER 194
PRESIDENT & CEO 157
CO-FOUNDER 105
FOUNDER & CEO 52
CEO/FOUNDER 35
CHAIRMAN & CEO 34
PRESIDENT AND CEO 28
ENVIRONMENTAL CEO/AUTHOR 20
EXEC DIRECTOR 19
PRESIDENT AND ARTIST 18
FOUNDER AND CEO 17
PRESIDENT/CEO 17
FOUNDER & PRESIDENT 16
CO-CEO 15
NON-PROFIT EXEC. DIRECTOR 15
NON PROFIT CEO 14
EXEC. DIR. 11
FOUNDER/CEO 11
INTERIM EXECUTIVE DIRECTOR 11
NONPROFIT CEO 11

Executives

We have increased our EXECUTIVE captures from 2570 to 8552

## [1] 8552
Different_ExECUTIVE_Titles Count
EXECUTIVE 2570
DIRECTOR 1671
PARTNER 565
DIRECTOR OF OPERATIONS 152
ACCOUNT EXECUTIVE 130
DIRECTOR OF SALES & MARKETING 72
BUSINESS EXECUTIVE 71
DIRECTOR AT AMCEA 69
EXECUTIVE COACH 59
BOARD MEMBER 56
NONPROFIT EXECUTIVE 53
EXECUTIVE PRODUCER 51
IT EXECUTIVE 48
MARKETING EXECUTIVE 47
DIRECTOR OF IT 45

C-Level_Execs

The last high level clustering I performed was C-level executives. Arguably some of these positions could have been identified in the CEO cluster(Managing Director). Overall 3680 people are now identified as C level Execs

*Top 3027 uses of C-level executive title with original unedited job title.**

Different_C_level_Titles Count
FINANCIAL DIRECTOR 844
CFO 436
MANAGING DIRECTOR 322
COO 297
CTO 263
CHAIRMAN 226
SVP 100
CHIEF FINANCIAL OFFICER 75
EXECUTIVE VICE PRESIDENT 57
CHIEF OPERATING OFFICER 38
EVP 36
CIO 35
VICE CHAIRMAN 35
CMO 33
EXECUTIVE VP 30
CBO 29
CORPORATE EXECUTIVE 29
CHIEF MARKETING OFFICER 24
EXECUTIVE / COO 19
SENIOR MANAGING DIRECTOR 19
EXECUTIVE CHAIRMAN 17
CHAIRMAN OF THE BOARD 16
CHIEF CREATIVE OFFICER 16
CHIEF MEDICAL OFFICER 16
CHIEF INFORMATION OFFICER 15

Clustering code (Only Coders)

  • Feel free to skip this section if you don’t code in R
## Function changes occupation based upon a regular expression. 
## Creates column to note if change was made,stores old names, returns DF
cluster_jobs <- function(df,reg_express,replacement_val){    
   
    ## Populate new col with imputed value
    df$clustered_jobs[str_detect(df$clustered_jobs,reg_express)] <- replacement_val
    ## Build column to track if change was made to occupation 
    df <- df %>% 
            mutate(job_changed = ifelse(clustered_jobs == job_position,'0','1'))
    return(df)
}

 ## Build new_jobs_col
final_df <- final_df %>% 
  mutate(job_position= toupper(job_position))  %>% 
  mutate(clustered_jobs = job_position) 

## create Date column

## Job Clustering
clustered_df <- cluster_jobs(final_df,"CEO","CEO")
clustered_df$clustered_jobs <- as.character(clustered_df$clustered_jobs)
clustered_df$clustered_jobs[is.na(clustered_df$clustered_jobs)] <- "Not selected"

## GRAB SOME EXLCUSIONS
clustered_df <- cluster_jobs(clustered_df,"EXECUTIVE RECRUITER|EXECUTIVE SEARCH|RECRUITER","rECRUITER")


## Executive level positions
## WORDS INTENTIONALLY ARE REPLACED WITH SOME LOWERCASE LETTERS

clustered_df <- cluster_jobs(clustered_df,'EXECUTIVE ASSISTANT|EXEC. A|EXECUTIVE ASSITANT|EXEC ASST|EXECUTIVE ADMINISTRATOR|EXECUTIVE COORDINATOR|EXECUTIVEASSISTANT','EX aSSISTANT')
clustered_df <- cluster_jobs(clustered_df,"CHIEF EXECUTIVE OFFICER|MANAGING PARTNER|EXECUTIVE DIRECTOR|^PRESIDENT\\b|^FOUNDER\\b|EXEC.*DIR","CEO")
#clustered_df <- cluster_jobs(clustered_df,'^PRESIDENT\\b|^FOUNDER\\b|EXEC.*DIR',"CEO")
clustered_df <- cluster_jobs(clustered_df,'CHIEF \\w* OFFICER|^C[A-D|F-Z]O\\b|/C[A-D|F-Z]O\\b|^EXECUTIVE OFFICER',"ClEVEL_eXECS")
clustered_df <- cluster_jobs(clustered_df,'EXECUTIVE VICE PRESIDENT|FINANCIAL DIRECTOR|CORPORATE EXECUTIVE|SVP|EXECUTIVE VP|EVP|MANAGING DIRECTOR|CHAIRMAN|CORPORATE OFFICER|COO$','ClEVEL_eXECS')
clustered_df <- cluster_jobs(clustered_df,'^EXECUTIVE$|EXEC|EXECUTIVE$|^PARTNER|^DIRECTOR|MANAGING|CHAIR OF| CHAIR|BOARD |CO-C',"eXECS")
clustered_df <- cluster_jobs(clustered_df,'^VP|VP\\b|VICE PRESIDENT','Vp roles')


## RANDOM EXCLUSIONS
clustered_df <- cluster_jobs(clustered_df,"CHRISTIAN|^MINISTER|RABBI|PRIEST|CHAPLAIN|PARISH|CLERGY|BISHOP|PASTOR",'rELIOGOUS')

## LEGAL
clustered_df <- cluster_jobs(clustered_df,'LAWYER|ATTORNEY|GENERAL COUNSEL|COUNSEL$|PROSECUTOR|ARBITRAT','aTTORNEY')
clustered_df <- cluster_jobs(clustered_df,'LEGAL|JUDGE|CRIMINAL|COURT','lEGALfIELD')

## IT JOBS
clustered_df <- cluster_jobs(clustered_df,"SOFTWARE|FULL STACK DESIGNER|PROGRAMMER|DATA|TECHNOLOGY|SYSTEMS|DBA|COMPUTER|CYBER|INFORMATION|NETWORK|TECH|UX|GRAPHIC|I\\.T\\.|^IT|^WEB",'ItFIELD')



## Academia

clustered_df <- cluster_jobs(clustered_df,'PROFESSOR|TEACHER|SCHOLAR|LECTURER|DOCTORAL|INSTRUCTOR|ECONOMIST|EDUC|DEAN|TUTOR|RESEARCH','aCADEMIA')
clustered_df <- cluster_jobs(clustered_df,'STUDENT','sTUDENT')
clustered_df <- cluster_jobs(clustered_df,'LIBRAR','lIBRARIAN')
clustered_df <- cluster_jobs(clustered_df,'ACADEMIC|COLLEGE|SCHOOL|HIGHER ED|MATH|^PRINCIPAL$','aCADEMIA') 
## Unemployed self employed
clustered_df <- cluster_jobs(clustered_df,'^SELF','sELF eMPLOYED')
clustered_df <- cluster_jobs(clustered_df,'^NOT-EMPLOYED$','nOT eMPLOYED')


## CREATIVE 
clustered_df <- cluster_jobs(clustered_df,'DESIGNER|ARTIST|MUSICIAN|FILM|CREATIVE|PHOTO|EDITOR|WRITER|PUBLISHER|AUTHOR|GRAPHICS|JOURNALIST|SCULPTOR|ART |CARTOON|^ART|PUBLIC RELATIONS|PUBLICIST|TELEVISION|DESIGN|COMPOSER|MUSIC|DESIGN|DIGITAL MEDIA|DANCE|SINGER|NOVEL|BLOG','mEDIA/cREATIVE')
clustered_df <- cluster_jobs(clustered_df,'^(AUTHOR|ACTOR|ACTRESS)','mEDIA/cREATIVE')
clustered_df <- cluster_jobs(clustered_df,"TALENT|PRODUCER|SOCIAL MEDIA ",'mEDIA/cREATIVE')
clustered_df <- cluster_jobs(clustered_df,"ACTIVIST|ENVIRONMEN",'aCTIVIST')


## FINANCIAL JOBS
clustered_df <- cluster_jobs(clustered_df,'^(FINANCIAL|INVESTOR|INVESTMENT|CAPITAL|ENTREPRENEUR|FINANCE|INVESTOR$|FINANCIAL|INVESTOR|BANKING)','fINANCIAL sECTOR')
clustered_df <- cluster_jobs(clustered_df,'^VENTURE|EQUITY|HEDGE|QUANTITATIVE','fINANCIAL sECTOR')
clustered_df <- cluster_jobs(clustered_df,"ACCOUNT |ACCOUNTS |COMPLIANCE|AUDIT|ACTUARY|FRAUD|ESTIM|SURVEYOR",'aCCOUNT sPECIALISTS')


## Labor 
clustered_df <- cluster_jobs(clustered_df,'^(PIPE|STEAM|ELECTR|UNION|CARPENTER|MECHANIC|WELDER|UNION)','bLUECOLLAR')
clustered_df <- cluster_jobs(clustered_df,'LABOR|CONSTRUCTION|ELECTRICIAN|PLUMBER|PIPEFITTER|MACHINIST|WOOD|WELDERFISHER|PAINT','bLUECOLLAR')
clustered_df <- cluster_jobs(clustered_df,'^(TRUCK|DRIVER)|DRIVER|TRUCK|DELVIERY|UPS|FEDEX|RIDESHARE','bLUECOLLAR')
clustered_df <- cluster_jobs(clustered_df,'CONTRACTOR','cONTRACTOR')

## SCIENCE AND MEDICINE
clustered_df <- cluster_jobs(clustered_df,'PHYSICIAN|MD|M\\.D\\.|CHIROPRACTOR|DOCTOR|VETERI|DENTIST|SURGEON|PEDIATRICIAN|TRIST$|SCIENTIST|PHYSICIST|CHEMIST|LAB|LABORATORY|LABOROATORY|SCIENCE|BIOLOGIST|GEOLOGIST|STATISTICIAN|OLOGIST|ICIAN$','sCIENTIST/pHYSICIAN')
clustered_df <- cluster_jobs(clustered_df,'PSY|SOCIAL WORKER|SOCIAL WORK|FAMILY THERAPIST|CASEWORKER','mENTAL hEALTH')
clustered_df <- cluster_jobs(clustered_df,'^RN$|APRN|CRNA|NURSE|CAREGIVER|DAYCARE|HOMECARE|CHILDCARE|CHILD CARE|HOME CARE|CARETAKER|\\WCARE\\W|CARE PROVIDER','rN/nurse/cAREGIVER')
clustered_df <- cluster_jobs(clustered_df,'MASSAGE|OCCUPATIONAL|PHYSICAL THERAPY|PHYSICAL THERAPIST|ACUPUNCTURIST|LMT','iNJURY_RECOVERY')
clustered_df <- cluster_jobs(clustered_df,'PHARMA','pHARMA')
clustered_df <- cluster_jobs(clustered_df,'MEDICAL|PARAMEDICAL|MEDICINE|DENTAL|PARAMEDIC','mEDICAL')

## SKILLED LABOR
clustered_df <- cluster_jobs(clustered_df,'ANALYST|ENGINEER|ARCHITECT','wHITE_COLLAR')
clustered_df <- cluster_jobs(clustered_df,'CPA|ACCOUNTANT|ACCOUNTING|BOOKKEEPER|TAX','wHITE_COLLAR')
## Real estate
clustered_df <- cluster_jobs(clustered_df,'REALTOR|REAL ESTATE','rEAL ESTATE')

## Secretary
clustered_df <- cluster_jobs(clustered_df,'SECRETARY|RECEPTIONIST','sECRETARY')
clustered_df <- cluster_jobs(clustered_df,'\\w ASSISTANT|ASSISTANT$','sECRETARY')
clustered_df <- cluster_jobs(clustered_df,'RESOURCES|HR|CAREER|VOCATION','hr_cAREER_sERVICES')

## MILITARY AND LAW ENFORCEMENT SECURITY
clustered_df <- cluster_jobs(clustered_df,'MILITARY|ARMY|SOLDIER|LAW ENFORCEMENT|FIRE|SECURITY|POLICE','lAW eNFORCEMENT')


clustered_df <- cluster_jobs(clustered_df,"^BAKER|COOK|CHEF|DISH|WAITER|WAITRESS|BARTENDER|RESTAURANT|BARISTA|BAR\\W|BARBACK",'rESTURANT')
clustered_df <- cluster_jobs(clustered_df,"RETIRE",'rETIRED')
clustered_df <- cluster_jobs(clustered_df,"BANKER|MORTG",'bANKER')
clustered_df <- cluster_jobs(clustered_df,"CONSULTANT",'cONSULTANT')
clustered_df <- cluster_jobs(clustered_df,"SALES",'sALES')
clustered_df <- cluster_jobs(clustered_df,"OWNER",'bUSINESSoWNER')
clustered_df <- cluster_jobs(clustered_df,"MANAGER",'mANAGERS')
clustered_df <- cluster_jobs(clustered_df,"ADMINISTRATOR",'aDMINISTRATORS')
clustered_df <- cluster_jobs(clustered_df,"HEALTH",'hEALTHcARE')
clustered_df <- cluster_jobs(clustered_df,"FARM",'fARMER')
clustered_df <- cluster_jobs(clustered_df,"PILOT",'pILOTS')
clustered_df <- cluster_jobs(clustered_df,"MARKETING",'mARKETING')
clustered_df <- cluster_jobs(clustered_df,"MARKETING",'mARKETING')
clustered_df <- cluster_jobs(clustered_df,"PROFIT|FOUNDATION|CHARITY",'nONPROFIT')
clustered_df <- cluster_jobs(clustered_df,"MANAGEMENT",'mANAGEMENT')
clustered_df <- cluster_jobs(clustered_df,"NONE",'NOT EMPLOYED')
clustered_df <- cluster_jobs(clustered_df,"RETAIL|CASHIER",'rETAILER')
clustered_df <- cluster_jobs(clustered_df,"RESEARCHER",'aCADEMIA')
clustered_df <- cluster_jobs(clustered_df,"ADVERTISING|RAISING|FUNDRAISE",'aDVERTISING')
clustered_df <- cluster_jobs(clustered_df,"VOLUNTEER",'VOLUNTEER')
clustered_df <- cluster_jobs(clustered_df,"CUSTOMER SERVICE|CUSTOMER CARE",'CUSTOMER SERVICE')
clustered_df <- cluster_jobs(clustered_df,"BUSINESS|ENTREPRENEUR",'bUSINESS_sERVICES')
clustered_df <- cluster_jobs(clustered_df,"ENVIRONMEN|ORGANIZER",'aCTIVISTS/oRGANIZER')
clustered_df <- cluster_jobs(clustered_df,"POLICY|PUBLIC|GOVERNMENT|GOVT|FEDERAL EMPLOYEE|COUNCIL|LEGISLAT|DIPLO",'pUBLICpOLICY')
clustered_df <- cluster_jobs(clustered_df,'FOUNDER','CEO')
clustered_df <- cluster_jobs(clustered_df,'VEND','vENDOR')
clustered_df <- cluster_jobs(clustered_df,'DOG|ANIMAL|HORSE|PET CARE|PET-CARE','aNIMALS_sERVICES')
clustered_df <- cluster_jobs(clustered_df,'FITNESS|PERSONAL TRAINER|STRENGTH|ATHLETIC','fITNESS_tRAINER')
clustered_df <- cluster_jobs(clustered_df,'DISPATCH','dISPATCHER')
clustered_df <- cluster_jobs(clustered_df,'SUPERVISOR','sUPERVISOR')
clustered_df <- cluster_jobs(clustered_df,'INSURANCE','iNSURANCEfIELD')
clustered_df <- cluster_jobs(clustered_df,'DRESSER|^HAIR|COLORIST','sTYLIST')
clustered_df <- cluster_jobs(clustered_df,'COMMUNICATIONS','cOMMUNICATIONS')
clustered_df <- cluster_jobs(clustered_df,'CLERK','cLERKS')
clustered_df <- cluster_jobs(clustered_df,'LINGUIST|INTERPRET|TRANSLAT','lANGUAGE')
clustered_df <- cluster_jobs(clustered_df,'INVESTIGATOR','iNVESTIGATOR')
clustered_df <- cluster_jobs(clustered_df,'JANITOR|CUSTODIAN','jANITOR')
clustered_df <- cluster_jobs(clustered_df,'OFFICE','oFFICEjOBS')
clustered_df <- cluster_jobs(clustered_df,'[A-Z]*MAN$','bLUECOLLAR')