Looking into FEC Individual level fundraising for Democratic Party

The FEC requires candidates file their contributions for the 2020 elections with the agency. The data is uploaded as forms which the FEC cleans and adds to CSV files. The data is a treasure-trove of information. Polls get attention from week to week, but the news generally lacks any deep analysis of the FEC filings. Reporting typically covers headline numbers advertising total volume of donations and total amount. The FEC data is rich with self-identifying donor information. People identify their position, industry and location. Money acts as a proxy for electoral success. In the 2020 primary almost all candidates have sworn off contributions from lobbyists. On the surface, this is a noble attempt by democratic candidates to prove they aren’t influenced by corporations. However, digging below the surface, there is plenty of corporate influence in the FEC data.

Below, I look through the FEC data to find which candidates are receiving money from corporate level executives. We can also see if corporate executives are more likely to attempt to capture multiple candidates with donations and which candidates are receiving money from the same executives. I mean it when I say this a is a treasure-trove of unexplored data. I will explain my process for analysis shortly, but first let me address some boring data integrity issues. Feel free to skip tp the section titled “The Plan”, the next several paragraphs are here to outline my assumptions in data collection for anyone who distrusts my analysis.

Data integrity concerns

The FEC mandates quarterly reports to track the individual contributions into democratic candidate’s accounts. As a disclaimer It seems the engineers on the FEC website are slowly going through the individual files on the candidates and adding them to the Big table located on FEC pages here. Originally, I pulled the data from the above link, but the raw data is available on the website in full and some of the candidates q2 FEC filings hadn’t been added to the above database. Therefore, for this report I just grabbed all the individual level csv files and did my own manipulation on the data. It seems I keep several individual level contributors’ (less than .1%) for each candidate that were discarded for whatever reason by the FEC data engineers. The FEC filings aren’t representative of the whole population of donors, as reporting under 200 $ isn’t mandatory. We know that all these candidates have reached 100k contributors, yet most candidates in the database have less than 100k donations listed.

Some other peripheral information, caps are given on individual level contributions at 2800 per year. Throughout the FEC report there are some people who exceed these limits, they are known as conduits. Conduits are individuals that act on behalf of other people to give a donation to a candidate. These conduits aren’t restricted by the $ 2800 level if they exercise no discretion over where the funds go.

Assumptions

Below I make several categorizing assumptions in my data manipulation operations. The regex approach I took to clustering these groups will absolutely include some examples that do not actually represent the categorization of job title that I have made. RegEX is a brute force attempt to categorize the occupations algorithmically. However, as I have tracked the original job occupation, you can see overwhelmingly that these job titles are correctly identified. I display tables for each new clustered position, so that the reader understands that the approach is largely effective. My aggregated DF is at my website, I have categorized all job titles that have been changed with a binomial field to track them. Please feel free to look at the assumptions I made and the false captures in the clusters. if you would like to take this on in your own analysis, or perhaps develop your own clusters, please feel free to do so as well. The original unedited job titles are still present in the dataset.

The Plan

There are many different combinations of the same job title, my solution is to create new clustered occupation columns, which tracks and categorize occupations based upon corporate structures of power. I.E looking at the FEC data you would find examples of the words “C.E.O” “Chief executive officer” “chief officer” “Chief executive”. I bundle all these roles into a singular role of CEO. Other Self-reporting words like “Founder”, “EXECUTIVE DIRECTOR”, and “President” are also representative of those in charge of their organization, so they are also labeled as C.E.O. From there I step down the corporate ladder and attempt to identify all general C level executives as c level executives and place the rest of executive positions into a category of eXECS.

This was all done in careful order to preserve the hierarchy of attempting to capture positions correctly. For instance, Executive secretaries were captured first as secretary, to not show up as executive level job positions. After i completed corporate power structures, I moved onto clustering the rest of the data by the occupation type. The main clusters formed were IT, Academia, Blue-collar workers, Legal field, Science & medicine, law enforcement, and artistic/freelance.

Job Clustering

The exact term CEO shows up 2688 times in the current FEC reports, however, the term CEO shows up over 3500 times throughout the dataset. Below I print the top 3200 uses of the term. We have already increased our capturing of C.E.O’s in the dataset by nearly 20% by labeling everoyne in this table a C.E.O

different_names	counts
CEO	2688
President & CEO	152
Founder & CEO	49
CEO/Founder	35
Chairman & CEO	34
President And CEO	26
Environmental CEO/Author	20
President/CEO	17
Co-CEO	14
Nonprofit CEO	11
Pres/CEO	10
Founder and CEO	9
NON PROFIT CEO	8
CEO/Engineer	7
Founder/CEO	7
PRS/CEO	7
CEO & Founder	6
CFO/CEO	6
Founder And CEO	6
Physicist/CEO	6

The cool thing is, we can expand from here. I add in similar terms like President, Executive Director etc and we have 4459 repsondents that represent the actual role of CEO. I expanded this clustering to create some other fields as well. I bundled legal,heatlh, academic,blue collar jobs into their own aggregated categories. But for the purpose of this article, we will attempt to simply look at the CEO, execs, and c lvel execs. To get an idea of what the rolls look like in each of these categories, lets print tables of each category, by original job title.

CEO

we have increased our CEO captures from 2902 to 7522 and you can tell the regex has been highly effective. Below I print the top 25 captured terms which make up 6858 of our total captures. You would be hard pressed to argue any of these captures don’t belong labeled as C.E.O.

Nearly 7k C.E.O. with their original unedited job title

## [1] 7522

Different_CEO_Titles	Count
CEO	2902
PRESIDENT	1812
EXECUTIVE DIRECTOR	847
FOUNDER	367
MANAGING PARTNER	224
CHIEF EXECUTIVE OFFICER	194
PRESIDENT & CEO	157
CO-FOUNDER	105
FOUNDER & CEO	52
CEO/FOUNDER	35
CHAIRMAN & CEO	34
PRESIDENT AND CEO	28
ENVIRONMENTAL CEO/AUTHOR	20
EXEC DIRECTOR	19
PRESIDENT AND ARTIST	18
FOUNDER AND CEO	17
PRESIDENT/CEO	17
FOUNDER & PRESIDENT	16
CO-CEO	15
NON-PROFIT EXEC. DIRECTOR	15
NON PROFIT CEO	14
EXEC. DIR.	11
FOUNDER/CEO	11
INTERIM EXECUTIVE DIRECTOR	11
NONPROFIT CEO	11

Executives

We have increased our EXECUTIVE captures from 2570 to 8552

## [1] 8552

Different_ExECUTIVE_Titles	Count
EXECUTIVE	2570
DIRECTOR	1671
PARTNER	565
DIRECTOR OF OPERATIONS	152
ACCOUNT EXECUTIVE	130
DIRECTOR OF SALES & MARKETING	72
BUSINESS EXECUTIVE	71
DIRECTOR AT AMCEA	69
EXECUTIVE COACH	59
BOARD MEMBER	56
NONPROFIT EXECUTIVE	53
EXECUTIVE PRODUCER	51
IT EXECUTIVE	48
MARKETING EXECUTIVE	47
DIRECTOR OF IT	45

C-Level_Execs

The last high level clustering I performed was C-level executives. Arguably some of these positions could have been identified in the CEO cluster(Managing Director). Overall 3680 people are now identified as C level Execs

*Top 3027 uses of C-level executive title with original unedited job title.**

Different_C_level_Titles	Count
FINANCIAL DIRECTOR	844
CFO	436
MANAGING DIRECTOR	322
COO	297
CTO	263
CHAIRMAN	226
SVP	100
CHIEF FINANCIAL OFFICER	75
EXECUTIVE VICE PRESIDENT	57
CHIEF OPERATING OFFICER	38
EVP	36
CIO	35
VICE CHAIRMAN	35
CMO	33
EXECUTIVE VP	30
CBO	29
CORPORATE EXECUTIVE	29
CHIEF MARKETING OFFICER	24
EXECUTIVE / COO	19
SENIOR MANAGING DIRECTOR	19
EXECUTIVE CHAIRMAN	17
CHAIRMAN OF THE BOARD	16
CHIEF CREATIVE OFFICER	16
CHIEF MEDICAL OFFICER	16
CHIEF INFORMATION OFFICER	15

Animations

#sort(table(clustered_df$clustered_jobs),decreasing=TRUE) [1:100]


cluster_jobs <- function(df,reg_express,replacement_val){    
   
    ## Populate new col with imputed value
    df$clustered_jobs[str_detect(df$clustered_jobs,reg_express)] <- replacement_val
    ## Build column to track if change was made to occupation 
    df <- df %>% 
            mutate(job_changed = ifelse(clustered_jobs == job_position,'0','1'))
    return(df)
}

# 
# clustered_df$clustered_jobs[str_detect(clustered_df$clustered_jobs,"ACCOUNT ")]
# 
animated_donation_by_job(job_positions="aTTORNEY",graph_title="Attorneys",filename="Attorneys")
animated_donation_by_job(job_positions="aCADEMIA",graph_title="ACADEMIA",filename="ACADEMIA")

# animated_donation_by_job(job_positions="ItFIELD",graph_title="ItFIELD",filename="ItFIELD")
# animated_donation_by_job(job_positions="rETIRED",graph_title="rETIRED",filename="rETIRED")
# animated_donation_by_job(job_positions="mEDIA/cREATIVE",graph_title="creative white collar",filename="creativewhitecollar")
# animated_donation_by_job(job_positions="sCIENTIST/pHYSICIAN",graph_title="medical and science",filename="medicalscience")

Clustering code (Only Coders)

Feel free to skip this section if you don’t code in R

## Function changes occupation based upon a regular expression. 
## Creates column to note if change was made,stores old names, returns DF
cluster_jobs <- function(df,reg_express,replacement_val){    
   
    ## Populate new col with imputed value
    df$clustered_jobs[str_detect(df$clustered_jobs,reg_express)] <- replacement_val
    ## Build column to track if change was made to occupation 
    df <- df %>% 
            mutate(job_changed = ifelse(clustered_jobs == job_position,'0','1'))
    return(df)
}

 ## Build new_jobs_col
final_df <- final_df %>% 
  mutate(job_position= toupper(job_position))  %>% 
  mutate(clustered_jobs = job_position) 

## create Date column

## Job Clustering
clustered_df <- cluster_jobs(final_df,"CEO","CEO")
clustered_df$clustered_jobs <- as.character(clustered_df$clustered_jobs)
clustered_df$clustered_jobs[is.na(clustered_df$clustered_jobs)] <- "Not selected"

## GRAB SOME EXLCUSIONS
clustered_df <- cluster_jobs(clustered_df,"EXECUTIVE RECRUITER|EXECUTIVE SEARCH|RECRUITER","rECRUITER")


## Executive level positions
## WORDS INTENTIONALLY ARE REPLACED WITH SOME LOWERCASE LETTERS

clustered_df <- cluster_jobs(clustered_df,'EXECUTIVE ASSISTANT|EXEC. A|EXECUTIVE ASSITANT|EXEC ASST|EXECUTIVE ADMINISTRATOR|EXECUTIVE COORDINATOR|EXECUTIVEASSISTANT','EX aSSISTANT')
clustered_df <- cluster_jobs(clustered_df,"CHIEF EXECUTIVE OFFICER|MANAGING PARTNER|EXECUTIVE DIRECTOR|^PRESIDENT\\b|^FOUNDER\\b|EXEC.*DIR","CEO")
#clustered_df <- cluster_jobs(clustered_df,'^PRESIDENT\\b|^FOUNDER\\b|EXEC.*DIR',"CEO")
clustered_df <- cluster_jobs(clustered_df,'CHIEF \\w* OFFICER|^C[A-D|F-Z]O\\b|/C[A-D|F-Z]O\\b|^EXECUTIVE OFFICER',"ClEVEL_eXECS")
clustered_df <- cluster_jobs(clustered_df,'EXECUTIVE VICE PRESIDENT|FINANCIAL DIRECTOR|CORPORATE EXECUTIVE|SVP|EXECUTIVE VP|EVP|MANAGING DIRECTOR|CHAIRMAN|CORPORATE OFFICER|COO$','ClEVEL_eXECS')
clustered_df <- cluster_jobs(clustered_df,'^EXECUTIVE$|EXEC|EXECUTIVE$|^PARTNER|^DIRECTOR|MANAGING|CHAIR OF| CHAIR|BOARD |CO-C',"eXECS")
clustered_df <- cluster_jobs(clustered_df,'^VP|VP\\b|VICE PRESIDENT','Vp roles')


## RANDOM EXCLUSIONS
clustered_df <- cluster_jobs(clustered_df,"CHRISTIAN|^MINISTER|RABBI|PRIEST|CHAPLAIN|PARISH|CLERGY|BISHOP|PASTOR",'rELIOGOUS')

## LEGAL
clustered_df <- cluster_jobs(clustered_df,'LAWYER|ATTORNEY|GENERAL COUNSEL|COUNSEL$|PROSECUTOR|ARBITRAT','aTTORNEY')
clustered_df <- cluster_jobs(clustered_df,'LEGAL|JUDGE|CRIMINAL|COURT','lEGALfIELD')

## IT JOBS
clustered_df <- cluster_jobs(clustered_df,"SOFTWARE|FULL STACK DESIGNER|PROGRAMMER|DATA|TECHNOLOGY|SYSTEMS|DBA|COMPUTER|CYBER|INFORMATION|NETWORK|TECH|UX|GRAPHIC|I\\.T\\.|^IT|^WEB",'ItFIELD')



## Academia

clustered_df <- cluster_jobs(clustered_df,'PROFESSOR|TEACHER|SCHOLAR|LECTURER|DOCTORAL|INSTRUCTOR|ECONOMIST|EDUC|DEAN|TUTOR|RESEARCH','aCADEMIA')
clustered_df <- cluster_jobs(clustered_df,'STUDENT','sTUDENT')
clustered_df <- cluster_jobs(clustered_df,'LIBRAR','lIBRARIAN')
clustered_df <- cluster_jobs(clustered_df,'ACADEMIC|COLLEGE|SCHOOL|HIGHER ED|MATH|^PRINCIPAL$','aCADEMIA') 
## Unemployed self employed
clustered_df <- cluster_jobs(clustered_df,'^SELF','sELF eMPLOYED')
clustered_df <- cluster_jobs(clustered_df,'^NOT-EMPLOYED$','nOT eMPLOYED')


## CREATIVE 
clustered_df <- cluster_jobs(clustered_df,'DESIGNER|ARTIST|MUSICIAN|FILM|CREATIVE|PHOTO|EDITOR|WRITER|PUBLISHER|AUTHOR|GRAPHICS|JOURNALIST|SCULPTOR|ART |CARTOON|^ART|PUBLIC RELATIONS|PUBLICIST|TELEVISION|DESIGN|COMPOSER|MUSIC|DESIGN|DIGITAL MEDIA|DANCE|SINGER|NOVEL|BLOG','mEDIA/cREATIVE')
clustered_df <- cluster_jobs(clustered_df,'^(AUTHOR|ACTOR|ACTRESS)','mEDIA/cREATIVE')
clustered_df <- cluster_jobs(clustered_df,"TALENT|PRODUCER|SOCIAL MEDIA ",'mEDIA/cREATIVE')
clustered_df <- cluster_jobs(clustered_df,"ACTIVIST|ENVIRONMEN",'aCTIVIST')


## FINANCIAL JOBS
clustered_df <- cluster_jobs(clustered_df,'^(FINANCIAL|INVESTOR|INVESTMENT|CAPITAL|ENTREPRENEUR|FINANCE|INVESTOR$|FINANCIAL|INVESTOR|BANKING)','fINANCIAL sECTOR')
clustered_df <- cluster_jobs(clustered_df,'^VENTURE|EQUITY|HEDGE|QUANTITATIVE','fINANCIAL sECTOR')
clustered_df <- cluster_jobs(clustered_df,"ACCOUNT |ACCOUNTS |COMPLIANCE|AUDIT|ACTUARY|FRAUD|ESTIM|SURVEYOR",'aCCOUNT sPECIALISTS')


## Labor 
clustered_df <- cluster_jobs(clustered_df,'^(PIPE|STEAM|ELECTR|UNION|CARPENTER|MECHANIC|WELDER|UNION)','bLUECOLLAR')
clustered_df <- cluster_jobs(clustered_df,'LABOR|CONSTRUCTION|ELECTRICIAN|PLUMBER|PIPEFITTER|MACHINIST|WOOD|WELDERFISHER|PAINT','bLUECOLLAR')
clustered_df <- cluster_jobs(clustered_df,'^(TRUCK|DRIVER)|DRIVER|TRUCK|DELVIERY|UPS|FEDEX|RIDESHARE','bLUECOLLAR')
clustered_df <- cluster_jobs(clustered_df,'CONTRACTOR','cONTRACTOR')

## SCIENCE AND MEDICINE
clustered_df <- cluster_jobs(clustered_df,'PHYSICIAN|MD|M\\.D\\.|CHIROPRACTOR|DOCTOR|VETERI|DENTIST|SURGEON|PEDIATRICIAN|TRIST$|SCIENTIST|PHYSICIST|CHEMIST|LAB|LABORATORY|LABOROATORY|SCIENCE|BIOLOGIST|GEOLOGIST|STATISTICIAN|OLOGIST|ICIAN$','sCIENTIST/pHYSICIAN')
clustered_df <- cluster_jobs(clustered_df,'PSY|SOCIAL WORKER|SOCIAL WORK|FAMILY THERAPIST|CASEWORKER','mENTAL hEALTH')
clustered_df <- cluster_jobs(clustered_df,'^RN$|APRN|CRNA|NURSE|CAREGIVER|DAYCARE|HOMECARE|CHILDCARE|CHILD CARE|HOME CARE|CARETAKER|\\WCARE\\W|CARE PROVIDER','rN/nurse/cAREGIVER')
clustered_df <- cluster_jobs(clustered_df,'MASSAGE|OCCUPATIONAL|PHYSICAL THERAPY|PHYSICAL THERAPIST|ACUPUNCTURIST|LMT','iNJURY_RECOVERY')
clustered_df <- cluster_jobs(clustered_df,'PHARMA','pHARMA')
clustered_df <- cluster_jobs(clustered_df,'MEDICAL|PARAMEDICAL|MEDICINE|DENTAL|PARAMEDIC','mEDICAL')

## SKILLED LABOR
clustered_df <- cluster_jobs(clustered_df,'ANALYST|ENGINEER|ARCHITECT','wHITE_COLLAR')
clustered_df <- cluster_jobs(clustered_df,'CPA|ACCOUNTANT|ACCOUNTING|BOOKKEEPER|TAX','wHITE_COLLAR')
## Real estate
clustered_df <- cluster_jobs(clustered_df,'REALTOR|REAL ESTATE','rEAL ESTATE')

## Secretary
clustered_df <- cluster_jobs(clustered_df,'SECRETARY|RECEPTIONIST','sECRETARY')
clustered_df <- cluster_jobs(clustered_df,'\\w ASSISTANT|ASSISTANT$','sECRETARY')
clustered_df <- cluster_jobs(clustered_df,'RESOURCES|HR|CAREER|VOCATION','hr_cAREER_sERVICES')

## MILITARY AND LAW ENFORCEMENT SECURITY
clustered_df <- cluster_jobs(clustered_df,'MILITARY|ARMY|SOLDIER|LAW ENFORCEMENT|FIRE|SECURITY|POLICE','lAW eNFORCEMENT')


clustered_df <- cluster_jobs(clustered_df,"^BAKER|COOK|CHEF|DISH|WAITER|WAITRESS|BARTENDER|RESTAURANT|BARISTA|BAR\\W|BARBACK",'rESTURANT')
clustered_df <- cluster_jobs(clustered_df,"RETIRE",'rETIRED')
clustered_df <- cluster_jobs(clustered_df,"BANKER|MORTG",'bANKER')
clustered_df <- cluster_jobs(clustered_df,"CONSULTANT",'cONSULTANT')
clustered_df <- cluster_jobs(clustered_df,"SALES",'sALES')
clustered_df <- cluster_jobs(clustered_df,"OWNER",'bUSINESSoWNER')
clustered_df <- cluster_jobs(clustered_df,"MANAGER",'mANAGERS')
clustered_df <- cluster_jobs(clustered_df,"ADMINISTRATOR",'aDMINISTRATORS')
clustered_df <- cluster_jobs(clustered_df,"HEALTH",'hEALTHcARE')
clustered_df <- cluster_jobs(clustered_df,"FARM",'fARMER')
clustered_df <- cluster_jobs(clustered_df,"PILOT",'pILOTS')
clustered_df <- cluster_jobs(clustered_df,"MARKETING",'mARKETING')
clustered_df <- cluster_jobs(clustered_df,"MARKETING",'mARKETING')
clustered_df <- cluster_jobs(clustered_df,"PROFIT|FOUNDATION|CHARITY",'nONPROFIT')
clustered_df <- cluster_jobs(clustered_df,"MANAGEMENT",'mANAGEMENT')
clustered_df <- cluster_jobs(clustered_df,"NONE",'NOT EMPLOYED')
clustered_df <- cluster_jobs(clustered_df,"RETAIL|CASHIER",'rETAILER')
clustered_df <- cluster_jobs(clustered_df,"RESEARCHER",'aCADEMIA')
clustered_df <- cluster_jobs(clustered_df,"ADVERTISING|RAISING|FUNDRAISE",'aDVERTISING')
clustered_df <- cluster_jobs(clustered_df,"VOLUNTEER",'VOLUNTEER')
clustered_df <- cluster_jobs(clustered_df,"CUSTOMER SERVICE|CUSTOMER CARE",'CUSTOMER SERVICE')
clustered_df <- cluster_jobs(clustered_df,"BUSINESS|ENTREPRENEUR",'bUSINESS_sERVICES')
clustered_df <- cluster_jobs(clustered_df,"ENVIRONMEN|ORGANIZER",'aCTIVISTS/oRGANIZER')
clustered_df <- cluster_jobs(clustered_df,"POLICY|PUBLIC|GOVERNMENT|GOVT|FEDERAL EMPLOYEE|COUNCIL|LEGISLAT|DIPLO",'pUBLICpOLICY')
clustered_df <- cluster_jobs(clustered_df,'FOUNDER','CEO')
clustered_df <- cluster_jobs(clustered_df,'VEND','vENDOR')
clustered_df <- cluster_jobs(clustered_df,'DOG|ANIMAL|HORSE|PET CARE|PET-CARE','aNIMALS_sERVICES')
clustered_df <- cluster_jobs(clustered_df,'FITNESS|PERSONAL TRAINER|STRENGTH|ATHLETIC','fITNESS_tRAINER')
clustered_df <- cluster_jobs(clustered_df,'DISPATCH','dISPATCHER')
clustered_df <- cluster_jobs(clustered_df,'SUPERVISOR','sUPERVISOR')
clustered_df <- cluster_jobs(clustered_df,'INSURANCE','iNSURANCEfIELD')
clustered_df <- cluster_jobs(clustered_df,'DRESSER|^HAIR|COLORIST','sTYLIST')
clustered_df <- cluster_jobs(clustered_df,'COMMUNICATIONS','cOMMUNICATIONS')
clustered_df <- cluster_jobs(clustered_df,'CLERK','cLERKS')
clustered_df <- cluster_jobs(clustered_df,'LINGUIST|INTERPRET|TRANSLAT','lANGUAGE')
clustered_df <- cluster_jobs(clustered_df,'INVESTIGATOR','iNVESTIGATOR')
clustered_df <- cluster_jobs(clustered_df,'JANITOR|CUSTODIAN','jANITOR')
clustered_df <- cluster_jobs(clustered_df,'OFFICE','oFFICEjOBS')
clustered_df <- cluster_jobs(clustered_df,'[A-Z]*MAN$','bLUECOLLAR')

Untitled

Justin Herman

8/26/2019

Looking into FEC Individual level fundraising for Democratic Party

Job Clustering

Animations

Clustering code (Only Coders)