Overview

This article from FiveThirtyEight highlights how people of color made up only 28% of the general election candidates in the 2022 election, while as a group they make up 41% of the general population. Additionally, the article highlights how this disparity among candidate representation continues to be more stark within the Republican party, where only 19% of canidates were people of color, compared to 46% of candidates running within the Democratic Party.

Link to Article: https://fivethirtyeight.com/features/2022-candidates-race-data/

Set initial libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Read in Data and change to Tibble

dem_candidates_raw = read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/dem_candidates.csv')

rep_candidates_raw = read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/primary-project-2022/rep_candidates.csv')

dem_candidates = as_tibble(dem_candidates_raw)
rep_candidates = as_tibble(rep_candidates_raw)

Clean up Democratic Candidates data

dem_candidates = dem_candidates %>% mutate(Party = 'Democrat') #Add party affiliation to the dataframe

colnames(dem_candidates)
##  [1] "Candidate"            "Gender"               "Race.1"              
##  [4] "Race.2"               "Race.3"               "Incumbent"           
##  [7] "Incumbent.Challenger" "State"                "Primary.Date"        
## [10] "Office"               "District"             "Primary.Votes"       
## [13] "Primary.."            "Primary.Outcome"      "Runoff.Votes"        
## [16] "Runoff.."             "Runoff.Outcome"       "EMILY.s.List"        
## [19] "Justice.Dems"         "Indivisible"          "PCCC"                
## [22] "Our.Revolution"       "Sunrise"              "Sanders"             
## [25] "AOC"                  "Party.Committee"      "Party"
dem_candidates = dem_candidates %>% select(c('Candidate','Party','Gender','Race.1', 'Race.2', 'Race.3','Incumbent', 'Incumbent.Challenger','Office', 'District','State','Primary.Outcome'))

Clean up Republican Candidates data

rep_candidates = rep_candidates %>% mutate(Party = 'Republican') #Add party affiliation to the dataframe

colnames(rep_candidates)
##  [1] "Candidate"            "Gender"               "Race.1"              
##  [4] "Race.2"               "Race.3"               "Incumbent"           
##  [7] "Incumbent.Challenger" "State"                "Primary.Date"        
## [10] "Office"               "District"             "Primary.Votes"       
## [13] "Primary.."            "Primary.Outcome"      "Runoff.Votes"        
## [16] "Runoff.."             "Runoff.Outcome"       "Trump"               
## [19] "Trump.Date"           "Club.for.Growth"      "Party.Committee"     
## [22] "Renew.America"        "E.PAC"                "VIEW.PAC"            
## [25] "Maggie.s.List"        "Winning.for.Women"    "Party"
rep_candidates = rep_candidates %>% select(c('Candidate','Party','Gender','Race.1', 'Race.2', 'Race.3','Incumbent', 'Incumbent.Challenger','Office', 'District','State','Primary.Outcome'))

Merge Dataframes

all_candidates = rbind(dem_candidates, rep_candidates)

Explore and clean the data

#Ensuring that there are no invalid repsonses in any of the fields

unique(all_candidates$Gender)
## [1] "Male"      "Female"    "Nonbinary"
unique(all_candidates$Race.1)
##  [1] "White"                              "Black"                             
##  [3] "Asian (Indian)"                     "Latino"                            
##  [5] "Latino (Mexican)"                   "Latino (Puerto Rican)"             
##  [7] "Unknown"                            "Black (Eritrean)"                  
##  [9] "Asian (Bangladeshi)"                "White "                            
## [11] "Latino (Peruvian)"                  "Middle Eastern (Saudi)"            
## [13] "Latino (Ecuadorian)"                "Black (Nigerian)"                  
## [15] "Native American (Saponi)"           "White (Azerbaijani)"               
## [17] "Middle Eastern (Palestinian)"       "Latino (Venezuelan)"               
## [19] "Native American (Comanche)"         "Asian"                             
## [21] "Native American"                    "Asian (Korean)"                    
## [23] "White (Polish)"                     "Latino (Cuban)"                    
## [25] "Latino (Colombian)"                 "Asian (Japanese)"                  
## [27] "Latino (Salvadoran)"                "Asian (Pakistani)"                 
## [29] "Middle Eastern (Armenian)"          "Asian (Chinese)"                   
## [31] "Middle Eastern (Iranian)"           "Latino (Guatemalan)"               
## [33] "Asian (Taiwanese)"                  "Latino (Puerto Rican / Ecuadorian)"
## [35] "Asian (Filipino)"                   "Native American (Lakota)"          
## [37] "Asian (Thai)"                       "Native American (Cherokee)"        
## [39] "Native American (Creek / Yuchi)"    "White (Albanian)"                  
## [41] "Latino (Dominican)"                 "White (Belgian)"                   
## [43] "Native American (Ho-Chunk)"         "Latino (Chilean)"                  
## [45] "Black (Ethiopian)"                  "Black (Somali)"                    
## [47] "Black (Jamaican)"                   "Black (Liberian)"                  
## [49] "Asian (Laotian)"                    "Pacific Islander (Native Hawaiian)"
## [51] "Native American (Yupik)"            "Middle Eastern (Iraqi)"            
## [53] "Native American (Arapaho)"          "Black (Haitian)"                   
## [55] "Asian "                             "Middle Eastern"                    
## [57] "Latino "                            "White (Kosovar)"                   
## [59] "White (Russian)"                    "Black (Sudanese)"                  
## [61] "Black (Gambian)"                    "Asian (Vietnamese)"                
## [63] "Middle Eastern (Kuwaiti)"           "Middle Eastern (Turkish)"          
## [65] "Latino (Brazilian)"                 "Latino (Uruguayan)"                
## [67] "Asian (Malaysian)"                  "Middle Eastern (Egyptian)"         
## [69] "Latino (Nicaraguan)"                "Middle Eastern (Afghani)"          
## [71] "Latino (Cuban / Mexican)"           "Middle Eastern (Lebanese)"         
## [73] "Middle Eastern (Israeli)"           "Latino (Salvadoran / Mexican)"     
## [75] "White (Serbian)"                    "White (Italian)"                   
## [77] "Middle Eastern "                    "Asian (Vietnamese / Hmong)"        
## [79] "Native American (Inupiaq)"          "Latino (Honduran)"                 
## [81] "Middle Eastern (Syrian)"
unique(all_candidates$Race.2)
##  [1] "Asian (Indian)"                     ""                                  
##  [3] "Black"                              "Latino (Mexican)"                  
##  [5] "Middle Eastern (Iraqi)"             "White"                             
##  [7] "Asian (Pakistani)"                  "Latino"                            
##  [9] "Native American (Pomo)"             "Pacific Islander (Native Hawaiian)"
## [11] "Native American (Yupik)"            "Latino (Cuban)"                    
## [13] "Unknown"                            "Asian (Japanese)"                  
## [15] "Asian"                              "Latino (Brazilian)"                
## [17] "Native American (Cherokee)"         "Asian (Filipino)"                  
## [19] "Native American (Chickasaw)"        "Middle Eastern (Iranian)"          
## [21] "Latino (Puerto Rican)"
unique(all_candidates$Race.3)
## [1] ""                 "Asian (Filipino)" "Asian"
unique(all_candidates$Incumbent)
## [1] "No"  "Yes"
unique(all_candidates$Incumbent.Challenger)
## [1] "No"  "Yes"
unique(all_candidates$State)
##  [1] "Texas"          "Indiana"        "Ohio"           "West Virginia" 
##  [5] "Nebraska"       "Kentucky"       "Oregon"         "Idaho"         
##  [9] "North Carolina" "Pennsylvania"   "Virginia"       "Alabama"       
## [13] "Arkansas"       "Georgia"        "New Mexico"     "Mississippi"   
## [17] "Montana"        "Iowa"           "South Dakota"   "New Jersey"    
## [21] "California"     "Maine"          "Nevada"         "South Carolina"
## [25] "North Dakota"   "Colorado"       "Illinois"       "New York"      
## [29] "Oklahoma"       "Utah"           "Maryland"       "Arizona"       
## [33] "Kansas"         "Michigan"       "Missouri"       "Washington"    
## [37] "Tennessee"      "Connecticut"    "Minnesota"      "Vermont"       
## [41] "Wisconsin"      "Hawaii"         "Alaska"         "Wyoming"       
## [45] "Florida"        "Massachusetts"  "Delaware"       "New Hampshire" 
## [49] "Rhode Island"
unique(all_candidates$Office)
## [1] "Representative"           "Governor"                
## [3] "Senator"                  "Senator (unexpired term)"
unique(all_candidates$District)
##  [1] "1"        "2"        "3"        "4"        "5"        "7"       
##  [7] "8"        "9"        "10"       "12"       "13"       "14"      
## [13] "15"       "16"       "17"       "18"       "20"       "21"      
## [19] "22"       "23"       "24"       "27"       "28"       "29"      
## [25] "30"       "32"       "33"       "34"       "35"       "36"      
## [31] "37"       "38"       "N/A"      "6"        "11"       "19"      
## [37] "25"       "26"       "31"       "39"       "40"       "41"      
## [43] "42"       "43"       "44"       "45"       "46"       "47"      
## [49] "48"       "49"       "50"       "51"       "52"       "At-Large"
unique(all_candidates$Primary.Outcome)
## [1] "Lost"        "Made runoff" "Won"

Clean Up Merged Dataframe

Clean up the candidates dataframe and expand the features, by completing the following steps:

  1. Split Race.1 values to only include the primary race and not the ethnicity information
  2. Remove any trailing white space from the new Race field
  3. Remove the trailing paren from the Ethnicity field
  4. Create a new field - Race.Simplified - that categories Race as White or Non-White
#Split Race.1 column to only include Race and split the Ethnicity into a seperate column

all_candidates = all_candidates %>% separate(Race.1, into=c("Race","Ethnicity"), sep="\\(")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 2418 rows [1, 2,
## 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, ...].
all_candidates = all_candidates %>% mutate(Race = str_trim(Race))
unique(all_candidates$Race)
## [1] "White"            "Black"            "Asian"            "Latino"          
## [5] "Unknown"          "Middle Eastern"   "Native American"  "Pacific Islander"
#Remove the trailing paren from the Ethnicity string
all_candidates = all_candidates %>% mutate(Ethnicity = str_remove(Ethnicity,"\\)"))

unique(all_candidates$Ethnicity)
##  [1] NA                          "Indian"                   
##  [3] "Mexican"                   "Puerto Rican"             
##  [5] "Eritrean"                  "Bangladeshi"              
##  [7] "Peruvian"                  "Saudi"                    
##  [9] "Ecuadorian"                "Nigerian"                 
## [11] "Saponi"                    "Azerbaijani"              
## [13] "Palestinian"               "Venezuelan"               
## [15] "Comanche"                  "Korean"                   
## [17] "Polish"                    "Cuban"                    
## [19] "Colombian"                 "Japanese"                 
## [21] "Salvadoran"                "Pakistani"                
## [23] "Armenian"                  "Chinese"                  
## [25] "Iranian"                   "Guatemalan"               
## [27] "Taiwanese"                 "Puerto Rican / Ecuadorian"
## [29] "Filipino"                  "Lakota"                   
## [31] "Thai"                      "Cherokee"                 
## [33] "Creek / Yuchi"             "Albanian"                 
## [35] "Dominican"                 "Belgian"                  
## [37] "Ho-Chunk"                  "Chilean"                  
## [39] "Ethiopian"                 "Somali"                   
## [41] "Jamaican"                  "Liberian"                 
## [43] "Laotian"                   "Native Hawaiian"          
## [45] "Yupik"                     "Iraqi"                    
## [47] "Arapaho"                   "Haitian"                  
## [49] "Kosovar"                   "Russian"                  
## [51] "Sudanese"                  "Gambian"                  
## [53] "Vietnamese"                "Kuwaiti"                  
## [55] "Turkish"                   "Brazilian"                
## [57] "Uruguayan"                 "Malaysian"                
## [59] "Egyptian"                  "Nicaraguan"               
## [61] "Afghani"                   "Cuban / Mexican"          
## [63] "Lebanese"                  "Israeli"                  
## [65] "Salvadoran / Mexican"      "Serbian"                  
## [67] "Italian"                   "Vietnamese / Hmong"       
## [69] "Inupiaq"                   "Honduran"                 
## [71] "Syrian"
#Create Simplified Race category
all_candidates = all_candidates %>% mutate(Race.Simplified = ifelse(Race == 'White','White','Non-White'))

all_candidates = all_candidates %>% clean_names()

colnames(all_candidates)
##  [1] "candidate"            "party"                "gender"              
##  [4] "race"                 "ethnicity"            "race_2"              
##  [7] "race_3"               "incumbent"            "incumbent_challenger"
## [10] "office"               "district"             "state"               
## [13] "primary_outcome"      "race_simplified"

Conclusion

Based on this initial cleaning of the data discovery, there’s a lot of additional ways to cut the data to better understand the demographic breakdown of the candidates and to identify other potential correlations - both in tabular form as well as through data visualizations. Some examples of this additional exploratory analysis includes:

  1. Looking at breakdown of candidates by race based on the Office they are running for
  2. Looking at breakdown of candidates by race based on the State they were running for office in
  3. Determine the breakdown of candidates by race and gender
  4. Determine the breakdown of candidates by race based on the outcome of their election

An understanding of this data can be very useful to influence outreach efforts for drumming up political engagement by more people from underrpresented communities, in order to expand the bench of potential candidates in the future.