In today’s job market, finding the right candidates is tough. Companies need skilled professionals, and they need them fast. But here’s the problem: going through resumes one by one takes forever, and it’s not always effective. Glassdoor found that, on average, each of their corporate job postings attracts approximately 250 resumes.
To speed things up, many companies use something called an Applicant Tracking System (ATS). It’s like a robot that reads resumes and looks for specific words. It’s good at some things, but not everything.
ATS has a flaw. It only likes resumes that look a certain way. If your resume doesn’t fit the mold, it might get ignored. That’s not fair to talented people who don’t have traditional resumes.
My project is different. Instead of parsing resumes, I look at what’s inside. I check how well a resume matches what a company wants. It’s like finding a puzzle piece that fits, no matter its shape.
I want to make hiring smarter and faster. With my project, companies can find the best candidates, even if their resumes are a bit different. It’s a bridge between robots and real people, making hiring better for everyone. We believe this will help companies succeed in today’s competitive job market.
In the fast-paced world of job recruitment, finding the right candidates can be a tough challenge. Companies often use Applicant Tracking Systems (ATS) to help them sort through resumes. But here’s the problem: ATS systems use parsing techniques that have limitations. They are not adept at handling resumes that deviate from the standard mold, potentially causing them to overlook promising candidates.
The issue at the heart of this problem is that ATS filters out resumes that don’t precisely match its predefined criteria. However, in doing so, it can inadvertently discard resumes that hold hidden potential—those belonging to talented individuals whose profiles don’t conform to the conventional format. This situation is akin to searching for a needle in a haystack and unintentionally discarding valuable pieces of straw.
This problem of resume parsing and its consequences are further explored in the paper titled “The Dark Side of Applicant Tracking Systems: How Unfairness and Bias Undermine Diversity and Inclusion in Hiring,” authored by Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner (2016). The paper sheds light on how ATS can unwittingly introduce unfairness and bias into the hiring process, thus undermining efforts to promote diversity and inclusion in recruitment.
Our approach offers a distinctive solution. Instead of outrightly filtering out resumes, we propose ranking them based on their alignment with what companies are seeking. This approach ensures that we don’t miss the hidden gems among the applicants. What sets us apart is that we bypass parsing techniques like those employed by ATS. Instead, we employ text similarity analysis, avoiding the parsing-related pitfalls that ATS systems encounter. Our overarching goal is to enhance recruitment efficiency, accuracy, and flexibility. By prioritizing resumes based on their content, we offer companies a more intelligent and inclusive method for identifying the most suitable candidates for their needs.
The project aims to develop an innovative Resume Ranking Tool that leverages Natural Language Processing (NLP) techniques to revolutionize the recruitment process. Instead of relying on traditional Applicant Tracking Systems (ATS) with their limitations, this tool will rank all resumes based on the intersection between the desired candidate qualifications and the content of resumes. The primary goal is to make the recruitment process more efficient, fair, inclusive, and flexible.
Cosine Similarity-Based Ranking: Implement a cosine similarity algorithm to calculate the degree of similarity between each resume and the desired candidate qualifications. Resumes will be ranked based on this similarity score.
User-Friendly Interface: Create an intuitive and user-friendly web interface where recruiters can easily upload job criteria and resumes. The ranked results will be displayed in a clear and accessible format.
Customizable Criteria: Allow users to input specific job criteria, including skills, experience, education, and other qualifications. The tool will adapt the ranking accordingly.
No Parsing Limitations: Emphasize that the tool does not rely on parsing techniques, eliminating the common parsing problems associated with ATS.
Improved Recruitment Efficiency: Traditional ATS systems often struggle with parsing errors and rigid filtering, leading to missed opportunities. By ranking resumes based on content similarity, our tool ensures recruiters quickly identify the most relevant candidates without any chance of losing high potential candidate due to parsing errors.
Inclusivity: ATS systems can inadvertently exclude candidates with non-standard resume formats or unique experiences. Our tool ensures that every candidate has a fair chance to be considered, regardless of resume style.
Flexibility: Recruiters can fine-tune their criteria to adapt to changing hiring needs, making the tool suitable for various job roles and industries. This flexibility eliminates the rigidity of ATS systems.
Parsing Problems: Traditional ATS systems often encounter difficulties in parsing resumes with non-standard structures or complex formatting. These parsing errors can result in the exclusion of potentially qualified candidates. Our tool avoids parsing altogether, focusing on content similarity, thus eliminating parsing-related issues.
Inflexible Filtering: ATS systems typically rely on rigid keyword filtering, which may inadvertently exclude candidates who use different terms or phrasing. Our tool, driven by similarity scores, provides a more comprehensive and flexible approach that captures relevant candidates regardless of specific keywords.
Lack of Inclusivity: ATS systems may unintentionally overlook candidates with diverse backgrounds and experiences due to their predefined criteria. Our tool is designed to be inclusive, ensuring that all candidates have a fair chance based on their qualifications and not their resume format.
The Resume Ranking Tool offers a transformative approach to recruitment, directly addressing the limitations of traditional ATS systems. It empowers businesses to efficiently identify and select the best-suited candidates based on resume content, ultimately enhancing the quality of hires and streamlining the recruitment process while overcoming the weaknesses of ATS.
The output of this project is a Shiny Apps-based Web App that can be used by submitting a folder containing resumes in PDF format and then entering the desired ideal criteria as input. This will result in an output in the form of a ranked resume table.
The implementation of the Resume Ranking Tool will have a significant positive impact on businesses and their recruitment processes. Here are the key areas where this project can make a difference:
Traditional recruitment processes, especially when reliant on ATS, can be time-consuming and prone to missing potential candidates. ATS systems often encounter difficulties in parsing resumes with non-standard structures or complex formatting. These parsing errors can result in the exclusion of potentially qualified candidates, leading to inefficiencies in the recruitment process. By employing a content-based ranking approach, our tool eliminates parsing problems, streamlining the candidate selection process. This increased efficiency means that recruiters can quickly identify the most qualified candidates without being bogged down by parsing errors or rigid filtering.
One of the limitations of ATS is that it often excludes candidates with non-standard resume formats or unique experiences. This lack of inclusivity can hinder diversity and result in the unintentional exclusion of talented individuals. Our tool, by focusing on content and similarity rather than rigid filtering, ensures inclusivity in the hiring process. It accommodates candidates with diverse resume styles and experiences, enabling recruiters to discover talented individuals who might have been overlooked by traditional systems. This approach contributes to a more diverse and inclusive workforce, fostering innovation and representation.
Traditional ATS systems rely on predefined criteria and rigid keyword filtering, which may inadvertently exclude candidates who use different terms or phrasing. This lack of flexibility can be a hindrance, particularly in fast-paced industries where skill demands can change rapidly. In contrast, our tool provides businesses with the agility to adapt to evolving hiring needs. Recruiters can customize job criteria, including skills, experience, education, and other qualifications, making the tool suitable for various job roles and industries. This adaptability ensures that no qualified candidate is overlooked, even as recruitment requirements change over time.
Manual resume screening and the limitations of ATS can lead to increased recruitment costs. Traditional ATS systems often require substantial human intervention to address parsing errors and filter out candidates. By automating and improving the selection process through content-based ranking, our tool reduces the time and resources required for recruitment. It enables HR teams to focus their efforts on strategic aspects of the hiring process, such as interviews and candidate engagement, leading to significant cost savings in the long run.
The tool provides recruiters with data-driven insights into candidate rankings, empowering organizations to make more informed and objective hiring decisions. In contrast, traditional ATS systems may lack the data-driven capabilities necessary for precise candidate evaluation. This data-driven approach enhances recruitment strategies, leading to better hires and improved workforce performance. Recruiters can leverage analytics to gain valuable insights into candidate suitability and tailor their hiring processes accordingly.
In summary, the Resume Ranking and Matching Tool offers a holistic approach to the recruitment process, addressing the limitations of traditional ATS systems. It eliminates parsing problems, ensures inclusivity, provides adaptability, reduces costs, and supports data-driven decision-making. By adopting this tool, companies can elevate their recruitment procedures, securing a competitive advantage in the talent acquisition arena and ensuring they access the most qualified candidates.
The cleanText() function will be used to clean text
data.
The output is a cleaned text in the form of character or corpus based on the as.corpus argument’.
cleanText <- function(text, as.corpus = T){
list_stop_words_indo <- readLines("stopwords_indo.txt", warn = FALSE, encoding = "UTF-8")
text_corpus <- text %>% VectorSource() %>% VCorpus()
text_corpus <- tm_map(x = text_corpus,
FUN = content_transformer(tolower))
text_corpus <- tm_map(x = text_corpus,
FUN = removeWords,
stopwords(kind = "en"))
text_corpus <- tm_map(x = text_corpus,
FUN = removeWords,
list_stop_words_indo)
text_corpus <- tm_map(x = text_corpus,
FUN = removePunctuation)
text_corpus <- tm_map(x = text_corpus,
FUN = stemDocument)
text_corpus <- tm_map(x = text_corpus,
FUN = stripWhitespace)
if (as.corpus){
return(text_corpus)
}
else(
return(sapply(text_corpus, as.character))
)
}The folder_to_table() function serves the purpose of
transforming a folder of PDF documents into a structured table,
represented as a data frame. The resulting table comprises two columns:
file_name, which contains the PDF file names, and
text, which contains the cleaned raw text extracted from those
PDFs. This function simplifies the process of working with PDF data and
is especially useful for tasks like text analysis or data
pre-processing.
The output of the folder_to_table() function is a data
frame structured as follows:
folder_to_table <- function(folder_path) {
# Function to convert PDF to text
convert_pdf_to_text <- function(pdf_path) {
pdf_text_content <- pdf_text(pdf_path)
extracted_text <- list()
for (page in seq_along(pdf_text_content)) {
text <- pdf_text_content[[page]]
extracted_text[[page]] <- text
}
all_text <- paste(extracted_text, collapse = "\n")
}
# Function to get file name without extension
get_file_name <- function(file_path) {
file_path_sans_ext(basename(file_path))
}
# Get PDF files from the specified folder
pdf_files <- list.files(folder_path, pattern = ".pdf", full.names = TRUE)
# Convert PDFs to text
pdf_texts <- lapply(pdf_files, convert_pdf_to_text)
# Create a data table with file names and extracted text
table_data <- data.table(
file_name = paste(sapply(pdf_files, get_file_name), ".pdf", sep = ""),
text = unlist(pdf_texts)
)
return(table_data)
}The rank_resume() function is designed to rank resumes
based on their similarity to desired criteria. This function takes two
inputs: the ideal criteria and a data frame containing resumes. It then
produces a ranked list of resumes, with those most closely matching the
criteria receiving higher ranks.
The output of the rank_resume() function is a data frame
that includes the following columns:
rank_resume <- function(ideal_criteria, resume_df) {
# Clean the ideal criteria and resume text
clean_ideal_criteria <- cleanText(ideal_criteria, FALSE)
resume_df$text <- cleanText(resume_df$text, FALSE)
# Tokenize the cleaned ideal criteria into unique words
unlist_ideal_criteria <- unlist(str_split(clean_ideal_criteria, " ")) %>% unique()
# Check if the resume data frame is empty
if (nrow(resume_df) == 0) {
stop("Folder is empty.")
}
# Calculate word count, matched ideal words, and rank resumes
rank_df <- resume_df %>%
mutate(
Word_Count = map_int(tolower(text), ~ length(intersect(unlist_ideal_criteria, unlist(str_split(., "\\W+"))))),
Matching_Words = map_chr(tolower(text), ~ paste(intersect(unlist_ideal_criteria, unlist(str_split(., "\\W+"))), collapse = " ")),
Matching_Percentage = round(Word_Count / length(unlist_ideal_criteria), 2)
) %>%
arrange(desc(Word_Count)) %>%
mutate(Rank = row_number()) %>%
select(-text, -Word_Count) %>%
column_to_rownames(var = "Rank") # Set "Rank" as row names
return(rank_df)
}The get_WordCloud() function is a powerful tool designed
to create a word cloud that highlights words matched between a resume
and the provided ideal criteria. This function takes two inputs: the
matching words and is particularly useful for visualizing the alignment
between a resume and desired criteria.
The output is a wordcloud plot of the input text/words
get_WordCloud <- function(matching_words){
# Split the input string into individual words
words_list <- unlist(str_split(matching_words, " "))
# Create a dataframe of matching words and count their occurrences.
words <- data.frame(word = words_list) %>%
count(word, sort = TRUE)
# Generate a word cloud with specified settings.
words %>%
with(
wordcloud(
words = word, # Words to be included in the word cloud.
colors = "black", # Text color (can be customized).
random.order = FALSE, # Preserve the order of words.
max.words = 150 # Maximum number of words to display.
)
)
}cleanText() Testingraw_text <- "This is a sample text for testing the text cleaner function. It contains various elements such as numbers like 12345, punctuation marks (.,;:!?), and common English stopwords like 'the,' 'and,' 'is,' and 'in.' Additionally, it includes some mixed-case words like 'WordS,' 'CLEaner,' and 'FunctioN.' The text also has some special characters and symbols: @username, #hashtag, $price, %percentage, and &ersand. We should test if the text cleaner can handle these different elements effectively and produce clean and normalized text output."
cleanText(raw_text, F)## 1
## "sampl text test text cleaner function contain various element number like 12345 punctuat mark common english stopword like addit includ mixedcas word like word cleaner function text also special charact symbol usernam hashtag price percentag ampersand test text cleaner can handl differ element effect produc clean normal text output"
folder_to_table() TestingTo test this function, I will use a collection of designer profession resumes that I obtained from Kaggle by this link: https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset
## [1] "PRODUCT DESIGNER\nProfessional Summary\n4-5 years engineering experience and 1-2 years working experience. Able to work independently and under pressure, detail oriented, excellent\nproblem solver, Innovator. Efficient Mechanical Engineer leveraging a strong technical background in bringing products from the laboratory to\nmass-manufacturing. Mechanical Engineer with [Number] + years of training in varied industries, including manufacturing and high-tech\nenvironments. Creative manufacturing engineer. Lead team member on process redesign for [Describe product] . Design engineer who has worked\non [Number] new products, including the [Product name] recognized for industry excellence.\nSkills\n CAD\n Complex problem solving\n Stress analysis training\n Component functions and\n testing requirements Engine components, pumps, and fuel systems knowledgeFEA toolsAutoCAD proficientTeam\n Technical direction and leadershipManufacturing systems integrationManufacturing systems integration\n product strategies\n Works well in diverse team\n environment\n Strong decision maker\n\nWork History\nProduct designer 10/2014 to Current\nCompany Name – City , State\n The team wants to develop a portable, easily shipped, cost effective hardware that can send and receive digital content directly from\n satellites.\n Personally involve with prototype designing and 3D modeling.\n Cooperating with a startup called Outernet (https://www.outernet.is/en/), a for-profit media company that already has two satellites covering\n North America, Europe, and the Middle East and has recently started broadcasting free Internet content.\n Assisting drafters in developing the structural design of products using drafting tools or computer-assisted design (CAD) or drafting\n equipment and software.\n Completing project mechanical design while providing technical solutions feedback.\nproduct design 09/2014 to Current\nCompany Name – City , State\n Two engineers and designers to collaborate together to create new innovative wearable pieces for a fashion show competition.\n Will access new Makerspce, which includes a 3D printer, will be given a $500 budget to create their wearable piece.\nRESEARCH EcoPRT Research Assistant 01/2014 to 05/2014\nCompany Name – City , State\n The goal is to develop an economical, automated transit system.\n It will focus on the hands on design and development of a small manned autonomous vehicle.\n www.ecoprt.com).\n The key in the design is to understand the impact weight has on the overall cost and performance, and the incorporation of automated\n control.\n Aspects of the development will possibly include\nproduct design 01/2014 to 05/2014\nCompany Name – City , State\n VOLUNTEER The purpose of this project is to design and fabricate a cable management system for a public-access electric\n EXPERIENCE vehicle charging station.\n This system will dispense and retract 20 feet of cable for operation and provide secured storage for the cable when not in use.\n The prototype will be subjected to the following constraints\nTeam member 10/2013 to 04/2014\nCompany Name – City , State\n Attending scheduled control and mechanical teams' training classes.\n EXPERIENCE · Learned shop safety, vehicle glider equations, drive cycle modeling, and Simulation.\n Learned the powertrain architecture and components of the 2013 Chevrolet Malibu.\n Learned vehicle dynamics.\n And practiced model simulation by using MATLAB Simulink.\n Mechanical Engineering Components design project (material design.\nmaterial design 10/2013 to 04/2014\n\nCompany Name – City , State\n Designed fillet welds connections and bolts for the plate girder, which holds the pipe with horizontal and vertical force loads.\n Calculated the related shear or bending stresses for the welds and bolts to determine the right materials and sizes of welds (thickness) and\n bolts.\nEddy Current DYNO Research Assistant 09/2013 to 05/2014\nCompany Name – City , State\n Built the engine stander for our engine and Eddy current dynamometer.\n Currently installing the Eddy current dynamometer with graduate students.\n Future possibility of experimenting with torque, horsepower, RPM, EGR (Exhaust Gas Recirculation) and temperature measurements of the\n Kubota Diesel Engine after installation.\n Possibility of learning the engine tuning.\nResearch Assistant 06/2013 to 08/2013\nCompany Name – City , State\n Graphed sketches and figures for professor's Thermodynamics eBook.\n Learned how to use Smartdraw.\n Performed literature reviews on ongoing research topics and eBook materials.\n Added video links and real-world images to the eBook.\nProgram Assistant 05/2013 to 06/2013\nCompany Name – City , State\n Assisting Dr.\n Eischen, the director of the Hangzhou Engineering Study Abroad Program at Zhejiang University, during his program this coming summer.\n Helping with tasks such as translating, program activities, running errands, classes, transportation, and culture immersion.\n2323 04/2013 to 10/2013\nCompany Name – City , State\n Designed Airplane Landing Gear by modeling with a mass-spring-damper SDOF system and designing the spring k and damper C that\n limits the given amplitude.\n Part 2\nwew 10/2012 to 04/2013\nCompany Name – City , State\n Utilized MATLAB for statistical analysis of an elastic band rocket.\n Learned how to make experimental designs, statistical processes, statistics simulations, and graphical displays of data on computer\n workstations.\n Used statistical methods including point and interval estimation of population parameters and curve and surface fitting (regression analysis).\n Graphic Communications Project (3D design.\nrer 10/2012 to 04/2013\nCompany Name – City , State\n Utilized SolidWorks to design a tape floss container.\n Developed the ability to use SolidWorks within the context of a concurrent design process to understand how everyday objects are\n designed and created.\n Emphasis placed on decision-making processes involving creating geometry and the development of modeling strategies that incorporate the\n intentions of the designer.\nre 02/2009 to 04/2009\nCompany Name – City , State\n Visited construction sites with senior engineers.\n Kept record of site investigations.\n Dealt with paperwork with senior engineers and answered phone calls.\n Helped install residential wiring in new construction sites.\n Investigated electrical problems and developed the ability to read electrical diagrams and wire electrical panels.\nEducation\nMaster of science : Mechanical engineering Robotic & Manufacture Current Columbia University in the City of New York - City , State\n Sep -2015 Dec Mechanical engineering Robotic & Manufacture\n\n Coursework in Advanced Mechanical Engineering\n Coursework in Drafting, Computer-Aided Design (CAD) and Computer-Aided Manufacturing (CAM)\nBachelor of science : Mechanical Engineering 1 2010 North Carolina State University, Raleigh (NCSU) - City , State\nGPA: Magna Cum Laude GPA: 3.5 GPA: 3.63/4.0 Mechanical Engineering Magna Cum Laude GPA: 3.5 GPA: 3.63/4.0\nNorth Carolina State University -\nGPA: Magna Cum Laude Magna Cum Laude\nAccomplishments\n Listed in the dean's list for three semesters during Junior and Senior Year · Chosen to be on the cover of NC State freshman admissions\n booklet · In the process of receiving the Professional Development Certificate · NCSU Chinese basketball team player.\n Math and physics club member · Control and Mechanical Team member of NCSU EcoCAR2 · Took the global training class at NC\n State University · CUSA member (Chinese undergraduate student association).\nSkills\n3D, 3D modeling, AutoCAD, broadcasting, budget, C, cable, Chinese, com, hardware, content, controller, data analysis, Dec, decision-making,\ndesigning, product design, English, fashion, focus, Fortran, frame, Graphic, Lathe, Linux, director, Maple, materials, MATLAB, mechanical,\nMechanical Engineering, access, Mill, modeling, navigation, printer, processes, profit, speaking, Python, Quantitative analysis, reading, read,\nresearch, safety, Simulation, sketching, SolidWorks, statistical analysis, Statistics, phone, translating, transportation, video, Welding, wiring, written\n"
Works Perfectly!
rank_resume() Testing“I will intentionally use the skills listed on resume 10751444.pdf. If resume 10751444.pdf is ranked 1st and have 1.00 matching percentage, it means my function is working well!”
ideal_designer <- "3D modeling, AutoCAD, broadcasting, budget, C, cable, Chinese, com, hardware, content, controller, data analysis, Dec, decision-making,\ndesigning, product design, English, fashion, focus, Fortran, frame, Graphic, Lathe, Linux, director, Maple, materials, MATLAB, mechanical,\nMechanical Engineering, access, Mill, modeling, navigation, printer, processes, profit, speaking, Python, Quantitative analysis, reading, read,\nresearch, safety, Simulation, sketching, SolidWorks, statistical analysis, Statistics, phone, translating, transportation, video, Welding, wiring, written\n"
rank_resume(ideal_designer, designers)Nice.
rank_resume()function works well!
rank_resume() Testing (2)This time, I will intentionally use the criteria possessed by resume 10748989.pdf. If resume 10748989.pdf ranks 1st and have 1.00 matching percentage, it means my function is working well!
But first, let’s take a look at resume 10748989.pdf!
Now let’s rank the resumes. The resume 10748989.pdf should ranks 1st and have 1.00 matching percentage!
ideal_designer <- "Building codes knowledge Complex problem solving Strong analytical ability Excellent attention to detail Commercial interior design Working drawings and procedures Space planning methodology Sketching Rendering Digital drafting 3D rendering software Proficient in SketchUp"
rank_resume(ideal_designer, designers)Perfect.
rank_resume()function works well!
matching_criteria_cloud() Testingideal_designer <- "Photoshop, Adobe Illustrator, AutoCAD, SketchUp, InDesign, Graphic Design, UI/UX Design, Typography, Motion Graphics"
rank <- rank_resume(ideal_designer, designers)
rank[1, "Matching_Words"]## [1] "adob autocad sketchup indesign graphic design"
Cool!
get_WordCloud()works perfectly
This is just a temporary overview, the web app is not yet completed and is not functioning as it should