Resume Ranking Tool with NLP

Introduction

Background

In today’s job market, finding the right candidates is tough. Companies need skilled professionals, and they need them fast. But here’s the problem: going through resumes one by one takes forever, and it’s not always effective. Glassdoor found that, on average, each of their corporate job postings attracts approximately 250 resumes.

To speed things up, many companies use something called an Applicant Tracking System (ATS). It’s like a robot that reads resumes and looks for specific words. It’s good at some things, but not everything.

ATS has a flaw. It only likes resumes that look a certain way. If your resume doesn’t fit the mold, it might get ignored. That’s not fair to talented people who don’t have traditional resumes.

My project is different. Instead of parsing resumes, I look at what’s inside. I check how well a resume matches what a company wants. It’s like finding a puzzle piece that fits, no matter its shape.

I want to make hiring smarter and faster. With my project, companies can find the best candidates, even if their resumes are a bit different. It’s a bridge between robots and real people, making hiring better for everyone. We believe this will help companies succeed in today’s competitive job market.

Problem Statement

In the fast-paced world of job recruitment, finding the right candidates can be a tough challenge. Companies often use Applicant Tracking Systems (ATS) to help them sort through resumes. But here’s the problem: ATS systems use parsing techniques that have limitations. They are not adept at handling resumes that deviate from the standard mold, potentially causing them to overlook promising candidates.

The issue at the heart of this problem is that ATS filters out resumes that don’t precisely match its predefined criteria. However, in doing so, it can inadvertently discard resumes that hold hidden potential—those belonging to talented individuals whose profiles don’t conform to the conventional format. This situation is akin to searching for a needle in a haystack and unintentionally discarding valuable pieces of straw.

This problem of resume parsing and its consequences are further explored in the paper titled “The Dark Side of Applicant Tracking Systems: How Unfairness and Bias Undermine Diversity and Inclusion in Hiring,” authored by Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner (2016). The paper sheds light on how ATS can unwittingly introduce unfairness and bias into the hiring process, thus undermining efforts to promote diversity and inclusion in recruitment.

Our approach offers a distinctive solution. Instead of outrightly filtering out resumes, we propose ranking them based on their alignment with what companies are seeking. This approach ensures that we don’t miss the hidden gems among the applicants. What sets us apart is that we bypass parsing techniques like those employed by ATS. Instead, we employ text similarity analysis, avoiding the parsing-related pitfalls that ATS systems encounter. Our overarching goal is to enhance recruitment efficiency, accuracy, and flexibility. By prioritizing resumes based on their content, we offer companies a more intelligent and inclusive method for identifying the most suitable candidates for their needs.

Project Idea

The project aims to develop an innovative Resume Ranking Tool that leverages Natural Language Processing (NLP) techniques to revolutionize the recruitment process. Instead of relying on traditional Applicant Tracking Systems (ATS) with their limitations, this tool will rank all resumes based on the intersection between the desired candidate qualifications and the content of resumes. The primary goal is to make the recruitment process more efficient, fair, inclusive, and flexible.

Key Features:

Cosine Similarity-Based Ranking: Implement a cosine similarity algorithm to calculate the degree of similarity between each resume and the desired candidate qualifications. Resumes will be ranked based on this similarity score.
User-Friendly Interface: Create an intuitive and user-friendly web interface where recruiters can easily upload job criteria and resumes. The ranked results will be displayed in a clear and accessible format.
Customizable Criteria: Allow users to input specific job criteria, including skills, experience, education, and other qualifications. The tool will adapt the ranking accordingly.
No Parsing Limitations: Emphasize that the tool does not rely on parsing techniques, eliminating the common parsing problems associated with ATS.

Value Proposition:

Improved Recruitment Efficiency: Traditional ATS systems often struggle with parsing errors and rigid filtering, leading to missed opportunities. By ranking resumes based on content similarity, our tool ensures recruiters quickly identify the most relevant candidates without any chance of losing high potential candidate due to parsing errors.
Inclusivity: ATS systems can inadvertently exclude candidates with non-standard resume formats or unique experiences. Our tool ensures that every candidate has a fair chance to be considered, regardless of resume style.
Flexibility: Recruiters can fine-tune their criteria to adapt to changing hiring needs, making the tool suitable for various job roles and industries. This flexibility eliminates the rigidity of ATS systems.

Comparing to ATS:

Parsing Problems: Traditional ATS systems often encounter difficulties in parsing resumes with non-standard structures or complex formatting. These parsing errors can result in the exclusion of potentially qualified candidates. Our tool avoids parsing altogether, focusing on content similarity, thus eliminating parsing-related issues.
Inflexible Filtering: ATS systems typically rely on rigid keyword filtering, which may inadvertently exclude candidates who use different terms or phrasing. Our tool, driven by similarity scores, provides a more comprehensive and flexible approach that captures relevant candidates regardless of specific keywords.
Lack of Inclusivity: ATS systems may unintentionally overlook candidates with diverse backgrounds and experiences due to their predefined criteria. Our tool is designed to be inclusive, ensuring that all candidates have a fair chance based on their qualifications and not their resume format.

Conclusion:

The Resume Ranking Tool offers a transformative approach to recruitment, directly addressing the limitations of traditional ATS systems. It empowers businesses to efficiently identify and select the best-suited candidates based on resume content, ultimately enhancing the quality of hires and streamlining the recruitment process while overcoming the weaknesses of ATS.

Work Flow

Data Preprocessing:

Collecting Data: Gather a large number of resumes in a folder, containing a collection of PDF resumes from applicants.
Converting PDFs to Raw Text: Convert all PDF files into raw text.
Creating a Dataframe: Create a dataframe with two columns, file_name (PDF file name) and text (raw resume text).

Text Preprocessing:

Text Corpus Creation: Transform the input text into a text corpus.
Lowercasing: Convert all text to lowercase.
Removing Stop-Words: Eliminate common stop-words from the text.
Punctuation Removal: Strip away punctuation marks.
Text Stemming: Apply text stemming to reduce words to their base forms.
Removing Extra Spaces: Cleanse the text by removing extra spaces.

Resume Ranking:

Tokenizing the Ideal Criteria: The cleaned ideal criteria are split into individual words.
Scoring Resumes: : For each resume, the function calculates the word count of matched words between the resume and the ideal criteria.
Ranking Resumes: Resumes are ranked based on their word count scores, with resumes containing more matched words receiving higher rankings.
Generating Ranked Resume Dataframe: The function generates a data frame that includes the resume names, the matched words in each resume (as a string), and a ratio indicating the proportion of matched words relative to the total words in the ideal criteria.

Output

The output of this project is a Shiny Apps-based Web App that can be used by submitting a folder containing resumes in PDF format and then entering the desired ideal criteria as input. This will result in an output in the form of a ranked resume table.

Business Impact

The implementation of the Resume Ranking Tool will have a significant positive impact on businesses and their recruitment processes. Here are the key areas where this project can make a difference:

1. Enhanced Recruitment Efficiency

Traditional recruitment processes, especially when reliant on ATS, can be time-consuming and prone to missing potential candidates. ATS systems often encounter difficulties in parsing resumes with non-standard structures or complex formatting. These parsing errors can result in the exclusion of potentially qualified candidates, leading to inefficiencies in the recruitment process. By employing a content-based ranking approach, our tool eliminates parsing problems, streamlining the candidate selection process. This increased efficiency means that recruiters can quickly identify the most qualified candidates without being bogged down by parsing errors or rigid filtering.

2. Diverse and Inclusive Hiring

One of the limitations of ATS is that it often excludes candidates with non-standard resume formats or unique experiences. This lack of inclusivity can hinder diversity and result in the unintentional exclusion of talented individuals. Our tool, by focusing on content and similarity rather than rigid filtering, ensures inclusivity in the hiring process. It accommodates candidates with diverse resume styles and experiences, enabling recruiters to discover talented individuals who might have been overlooked by traditional systems. This approach contributes to a more diverse and inclusive workforce, fostering innovation and representation.

3. Adaptability to Changing Needs

Traditional ATS systems rely on predefined criteria and rigid keyword filtering, which may inadvertently exclude candidates who use different terms or phrasing. This lack of flexibility can be a hindrance, particularly in fast-paced industries where skill demands can change rapidly. In contrast, our tool provides businesses with the agility to adapt to evolving hiring needs. Recruiters can customize job criteria, including skills, experience, education, and other qualifications, making the tool suitable for various job roles and industries. This adaptability ensures that no qualified candidate is overlooked, even as recruitment requirements change over time.

4. Cost Savings

Manual resume screening and the limitations of ATS can lead to increased recruitment costs. Traditional ATS systems often require substantial human intervention to address parsing errors and filter out candidates. By automating and improving the selection process through content-based ranking, our tool reduces the time and resources required for recruitment. It enables HR teams to focus their efforts on strategic aspects of the hiring process, such as interviews and candidate engagement, leading to significant cost savings in the long run.

5. Data-Driven Decision-Making

The tool provides recruiters with data-driven insights into candidate rankings, empowering organizations to make more informed and objective hiring decisions. In contrast, traditional ATS systems may lack the data-driven capabilities necessary for precise candidate evaluation. This data-driven approach enhances recruitment strategies, leading to better hires and improved workforce performance. Recruiters can leverage analytics to gain valuable insights into candidate suitability and tailor their hiring processes accordingly.

In summary, the Resume Ranking and Matching Tool offers a holistic approach to the recruitment process, addressing the limitations of traditional ATS systems. It eliminates parsing problems, ensures inclusivity, provides adaptability, reduces costs, and supports data-driven decision-making. By adopting this tool, companies can elevate their recruitment procedures, securing a competitive advantage in the talent acquisition arena and ensuring they access the most qualified candidates.

Code Preparation

Used Libraries

library(dplyr)
library(purrr)
library(tm)
library(NLP)
library(proxy)
library(NLP)
library(pdftools)
library(data.table)
library(tools)
library(tidytext)
library(textclean)
library(tibble)
library(stringr)
library(wordcloud)

Text Cleaner Function

The cleanText() function will be used to clean text data.

Function Input:

text: Basically a raw text.
as.corpus: If TRUE, function will return text in the form of corpus.

Function WorkFlow:

Transforming the input text into a text corpus
Converting all text to lowercase
Removing stop-words
Removing punctuation marks
Applying text stemming to reduce words to their base forms
Eliminating extra spaces
Returning the cleaned text corpus

Function Output:

The output is a cleaned text in the form of character or corpus based on the as.corpus argument’.

cleanText <- function(text, as.corpus = T){
  list_stop_words_indo <- readLines("stopwords_indo.txt", warn = FALSE, encoding = "UTF-8")
  
  text_corpus <- text %>% VectorSource() %>% VCorpus()
  
  text_corpus <- tm_map(x = text_corpus,
                          FUN = content_transformer(tolower))

  text_corpus <- tm_map(x = text_corpus,
                        FUN = removeWords,
                        stopwords(kind = "en"))
  
  text_corpus <- tm_map(x = text_corpus,
                        FUN = removeWords,
                        list_stop_words_indo)

  text_corpus <- tm_map(x = text_corpus,
                        FUN = removePunctuation)

  text_corpus <- tm_map(x = text_corpus,
                        FUN = stemDocument)

  text_corpus <- tm_map(x = text_corpus,
                        FUN = stripWhitespace)
  
  if (as.corpus){
    return(text_corpus)
  }
  else(
    return(sapply(text_corpus, as.character))
  )
}

Folder to Table Converter Function

The folder_to_table() function serves the purpose of transforming a folder of PDF documents into a structured table, represented as a data frame. The resulting table comprises two columns: file_name, which contains the PDF file names, and text, which contains the cleaned raw text extracted from those PDFs. This function simplifies the process of working with PDF data and is especially useful for tasks like text analysis or data pre-processing.

Function Input:

folder_path: This parameter specifies the path to the folder containing the PDF files to be processed. It should be provided as a character string.

Function Workflow:

PDF to Text Conversion: The function employs an internal process to convert each PDF document within the specified folder into plain text. This process involves extracting text from each page of the PDF and combining it into a single text block.
File Name Extraction: For each PDF file, the function extracts the file name without the file extension. This extracted file name is used as the file_name value in the resulting table.
Data Table Creation: The extracted file names and their corresponding text contents are organized into a data table with two columns: file_name and text.

Function Output:

The output of the folder_to_table() function is a data frame structured as follows:

file_name: This column contains the PDF file names, representing each document.
text: This column contains the cleaned raw text content extracted from the PDF documents.

folder_to_table <- function(folder_path) {
  # Function to convert PDF to text
  convert_pdf_to_text <- function(pdf_path) {
    pdf_text_content <- pdf_text(pdf_path)
    
    extracted_text <- list()
    
    for (page in seq_along(pdf_text_content)) {
      text <- pdf_text_content[[page]]
      extracted_text[[page]] <- text
    }
    
    all_text <- paste(extracted_text, collapse = "\n")
  }
  
  # Function to get file name without extension
  get_file_name <- function(file_path) {
    file_path_sans_ext(basename(file_path))
  }
  
  # Get PDF files from the specified folder
  pdf_files <- list.files(folder_path, pattern = ".pdf", full.names = TRUE)
  
  # Convert PDFs to text
  pdf_texts <- lapply(pdf_files, convert_pdf_to_text)
  
  # Create a data table with file names and extracted text
  table_data <- data.table(
    file_name = paste(sapply(pdf_files, get_file_name), ".pdf", sep = ""),
    text = unlist(pdf_texts)
  )
  
  return(table_data)
}

Resume Ranker Function

The rank_resume() function is designed to rank resumes based on their similarity to desired criteria. This function takes two inputs: the ideal criteria and a data frame containing resumes. It then produces a ranked list of resumes, with those most closely matching the criteria receiving higher ranks.

Function Input:

ideal_criteria: This is the set of criteria or keywords that define the ideal candidate. It serves as a reference for ranking resumes.
resume_df: This data frame should contain the resumes to be ranked, with each resume represented as text in the text column.

Ranking Process Steps:

Cleaning the Ideal Criteria and Resumes: Both the ideal criteria and the resume texts are preprocessed to remove noise, such as stop words and punctuation.
Tokenizing Ideal Criteria: The cleaned ideal criteria are tokenized into individual words.
Calculating Word Count and Matched Ideal Words: For each resume, the function calculates the word count of matched words between the resume and the ideal criteria. Additionally, it identifies the specific words that match.
Scoring Resumes Based on Matched Word Count: Resumes are scored based on the count of matched words, with higher counts resulting in higher scores.
Sorting Resumes by Score: Resumes are sorted in descending order of their scores, with the most relevant resumes ranked at the top.
Returning Ranked Resumes as a Data Frame: The function returns the ranked resumes as a data frame, excluding the original text content of the resumes.

Function Output:

The output of the rank_resume() function is a data frame that includes the following columns:

Resume: The names or identifiers of the resumes.
Matched_Words: A string containing the words that matched between the resume and the ideal criteria, separated by spaces.
Matched_Ideal: A ratio indicating the proportion of matched words relative to the total words in the ideal criteria.

rank_resume <- function(ideal_criteria, resume_df) {
  # Clean the ideal criteria and resume text
  clean_ideal_criteria <- cleanText(ideal_criteria, FALSE)
  resume_df$text <- cleanText(resume_df$text, FALSE)
  
  # Tokenize the cleaned ideal criteria into unique words
  unlist_ideal_criteria <- unlist(str_split(clean_ideal_criteria, " ")) %>% unique()
  
  # Check if the resume data frame is empty
  if (nrow(resume_df) == 0) {
    stop("Folder is empty.")
  }
  
  # Calculate word count, matched ideal words, and rank resumes
  rank_df <- resume_df %>%
    mutate(
      Word_Count = map_int(tolower(text), ~ length(intersect(unlist_ideal_criteria, unlist(str_split(., "\\W+"))))),
      Matching_Words = map_chr(tolower(text), ~ paste(intersect(unlist_ideal_criteria, unlist(str_split(., "\\W+"))), collapse = " ")),
      Matching_Percentage = round(Word_Count / length(unlist_ideal_criteria), 2)
    ) %>%
    arrange(desc(Word_Count)) %>%
    mutate(Rank = row_number()) %>%
    select(-text, -Word_Count) %>%
    column_to_rownames(var = "Rank")  # Set "Rank" as row names
  
  return(rank_df)
}

WordCloud Function

The get_WordCloud() function is a powerful tool designed to create a word cloud that highlights words matched between a resume and the provided ideal criteria. This function takes two inputs: the matching words and is particularly useful for visualizing the alignment between a resume and desired criteria.

Function Input:

matching_words: This is a string containing words that match between a resume and the ideal criteria.

Function WorkFlow:

Word List Creation: The input string is split into individual words, creating a list of words.
Counting Occurrences: The function counts the occurrences of each word in the list and creates a dataframe with the words and their respective counts.
Word Cloud Visualization: Using the word counts, the function generates a word cloud with specified settings, allowing for customization of text color, word scaling, order preservation, and the maximum number of words to display.

Function Output:

The output is a wordcloud plot of the input text/words

get_WordCloud <- function(matching_words){
  
  # Split the input string into individual words
  words_list <- unlist(str_split(matching_words, " "))
  
  # Create a dataframe of matching words and count their occurrences.
  words <- data.frame(word = words_list) %>%
    count(word, sort = TRUE)
  
  # Generate a word cloud with specified settings.
  words %>%
    with(
      wordcloud(
        words = word,           # Words to be included in the word cloud.
        colors = "black",       # Text color (can be customized).
        random.order = FALSE,   # Preserve the order of words.
        max.words = 150 # Maximum number of words to display.
      )
    )
}

Function Testing

`cleanText()` Testing

raw_text <- "This is a sample text for testing the text cleaner function. It contains various elements such as numbers like 12345, punctuation marks (.,;:!?), and common English stopwords like 'the,' 'and,' 'is,' and 'in.' Additionally, it includes some mixed-case words like 'WordS,' 'CLEaner,' and 'FunctioN.' The text also has some special characters and symbols: @username, #hashtag, $price, %percentage, and &ampersand. We should test if the text cleaner can handle these different elements effectively and produce clean and normalized text output."

cleanText(raw_text, F)

##                                                                                                                                                                                                                                                                                                                                               1 
## "sampl text test text cleaner function contain various element number like 12345 punctuat mark common english stopword like addit includ mixedcas word like word cleaner function text also special charact symbol usernam hashtag price percentag ampersand test text cleaner can handl differ element effect produc clean normal text output"

`folder_to_table()` Testing

To test this function, I will use a collection of designer profession resumes that I obtained from Kaggle by this link: https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset

designers <- folder_to_table("DESIGNER")

designers %>% head(2)

Resume vs Extracted Text Comparison

Resume

knitr::include_graphics("10751444.png")

Extracted Text

designers[designers$file_name == "10751444.pdf", ]$text

## [1] "PRODUCT DESIGNER\nProfessional Summary\n4-5 years engineering experience and 1-2 years working experience. Able to work independently and under pressure, detail oriented, excellent\nproblem solver, Innovator. Efficient Mechanical Engineer leveraging a strong technical background in bringing products from the laboratory to\nmass-manufacturing. Mechanical Engineer with [Number] + years of training in varied industries, including manufacturing and high-tech\nenvironments. Creative manufacturing engineer. Lead team member on process redesign for [Describe product] . Design engineer who has worked\non [Number] new products, including the [Product name] recognized for industry excellence.\nSkills\n       CAD\n       Complex problem solving\n       Stress analysis training\n       Component functions and\n       testing requirements         Engine components, pumps, and fuel systems knowledgeFEA toolsAutoCAD proficientTeam\n       Technical direction and leadershipManufacturing systems integrationManufacturing systems integration\n       product strategies\n       Works well in diverse team\n       environment\n       Strong decision maker\n\nWork History\nProduct designer 10/2014 to Current\nCompany Name â€“ City , State\n      The team wants to develop a portable, easily shipped, cost effective hardware that can send and receive digital content directly from\n      satellites.\n      Personally involve with prototype designing and 3D modeling.\n      Cooperating with a startup called Outernet (https://www.outernet.is/en/), a for-profit media company that already has two satellites covering\n      North America, Europe, and the Middle East and has recently started broadcasting free Internet content.\n      Assisting drafters in developing the structural design of products using drafting tools or computer-assisted design (CAD) or drafting\n      equipment and software.\n      Completing project mechanical design while providing technical solutions feedback.\nproduct design 09/2014 to Current\nCompany Name â€“ City , State\n      Two engineers and designers to collaborate together to create new innovative wearable pieces for a fashion show competition.\n      Will access new Makerspce, which includes a 3D printer, will be given a $500 budget to create their wearable piece.\nRESEARCH EcoPRT Research Assistant 01/2014 to 05/2014\nCompany Name â€“ City , State\n      The goal is to develop an economical, automated transit system.\n      It will focus on the hands on design and development of a small manned autonomous vehicle.\n      www.ecoprt.com).\n      The key in the design is to understand the impact weight has on the overall cost and performance, and the incorporation of automated\n      control.\n      Aspects of the development will possibly include\nproduct design 01/2014 to 05/2014\nCompany Name â€“ City , State\n      VOLUNTEER The purpose of this project is to design and fabricate a cable management system for a public-access electric\n      EXPERIENCE vehicle charging station.\n      This system will dispense and retract 20 feet of cable for operation and provide secured storage for the cable when not in use.\n      The prototype will be subjected to the following constraints\nTeam member 10/2013 to 04/2014\nCompany Name â€“ City , State\n      Attending scheduled control and mechanical teams' training classes.\n      EXPERIENCE Â· Learned shop safety, vehicle glider equations, drive cycle modeling, and Simulation.\n      Learned the powertrain architecture and components of the 2013 Chevrolet Malibu.\n      Learned vehicle dynamics.\n      And practiced model simulation by using MATLAB Simulink.\n      Mechanical Engineering Components design project (material design.\nmaterial design 10/2013 to 04/2014\n\nCompany Name â€“ City , State\n      Designed fillet welds connections and bolts for the plate girder, which holds the pipe with horizontal and vertical force loads.\n      Calculated the related shear or bending stresses for the welds and bolts to determine the right materials and sizes of welds (thickness) and\n      bolts.\nEddy Current DYNO Research Assistant 09/2013 to 05/2014\nCompany Name â€“ City , State\n      Built the engine stander for our engine and Eddy current dynamometer.\n      Currently installing the Eddy current dynamometer with graduate students.\n      Future possibility of experimenting with torque, horsepower, RPM, EGR (Exhaust Gas Recirculation) and temperature measurements of the\n      Kubota Diesel Engine after installation.\n      Possibility of learning the engine tuning.\nResearch Assistant 06/2013 to 08/2013\nCompany Name â€“ City , State\n      Graphed sketches and figures for professor's Thermodynamics eBook.\n      Learned how to use Smartdraw.\n      Performed literature reviews on ongoing research topics and eBook materials.\n      Added video links and real-world images to the eBook.\nProgram Assistant 05/2013 to 06/2013\nCompany Name â€“ City , State\n      Assisting Dr.\n      Eischen, the director of the Hangzhou Engineering Study Abroad Program at Zhejiang University, during his program this coming summer.\n      Helping with tasks such as translating, program activities, running errands, classes, transportation, and culture immersion.\n2323 04/2013 to 10/2013\nCompany Name â€“ City , State\n      Designed Airplane Landing Gear by modeling with a mass-spring-damper SDOF system and designing the spring k and damper C that\n      limits the given amplitude.\n      Part 2\nwew 10/2012 to 04/2013\nCompany Name â€“ City , State\n      Utilized MATLAB for statistical analysis of an elastic band rocket.\n      Learned how to make experimental designs, statistical processes, statistics simulations, and graphical displays of data on computer\n      workstations.\n      Used statistical methods including point and interval estimation of population parameters and curve and surface fitting (regression analysis).\n      Graphic Communications Project (3D design.\nrer 10/2012 to 04/2013\nCompany Name â€“ City , State\n      Utilized SolidWorks to design a tape floss container.\n      Developed the ability to use SolidWorks within the context of a concurrent design process to understand how everyday objects are\n      designed and created.\n      Emphasis placed on decision-making processes involving creating geometry and the development of modeling strategies that incorporate the\n      intentions of the designer.\nre 02/2009 to 04/2009\nCompany Name â€“ City , State\n      Visited construction sites with senior engineers.\n      Kept record of site investigations.\n      Dealt with paperwork with senior engineers and answered phone calls.\n      Helped install residential wiring in new construction sites.\n      Investigated electrical problems and developed the ability to read electrical diagrams and wire electrical panels.\nEducation\nMaster of science : Mechanical engineering Robotic & Manufacture Current Columbia University in the City of New York - City , State\n      Sep -2015 Dec Mechanical engineering Robotic & Manufacture\n\n       Coursework in Advanced Mechanical Engineering\n       Coursework in Drafting, Computer-Aided Design (CAD) and Computer-Aided Manufacturing (CAM)\nBachelor of science : Mechanical Engineering 1 2010 North Carolina State University, Raleigh (NCSU) - City , State\nGPA: Magna Cum Laude GPA: 3.5 GPA: 3.63/4.0 Mechanical Engineering Magna Cum Laude GPA: 3.5 GPA: 3.63/4.0\nNorth Carolina State University -\nGPA: Magna Cum Laude Magna Cum Laude\nAccomplishments\n       Listed in the dean's list for three semesters during Junior and Senior Year Â· Chosen to be on the cover of NC State freshman admissions\n       booklet Â· In the process of receiving the Professional Development Certificate Â· NCSU Chinese basketball team player.\n       Math and physics club member Â· Control and Mechanical Team member of NCSU EcoCAR2 Â· Took the global training class at NC\n       State University Â· CUSA member (Chinese undergraduate student association).\nSkills\n3D, 3D modeling, AutoCAD, broadcasting, budget, C, cable, Chinese, com, hardware, content, controller, data analysis, Dec, decision-making,\ndesigning, product design, English, fashion, focus, Fortran, frame, Graphic, Lathe, Linux, director, Maple, materials, MATLAB, mechanical,\nMechanical Engineering, access, Mill, modeling, navigation, printer, processes, profit, speaking, Python, Quantitative analysis, reading, read,\nresearch, safety, Simulation, sketching, SolidWorks, statistical analysis, Statistics, phone, translating, transportation, video, Welding, wiring, written\n"

Works Perfectly!

`rank_resume()` Testing

“I will intentionally use the skills listed on resume 10751444.pdf. If resume 10751444.pdf is ranked 1st and have 1.00 matching percentage, it means my function is working well!”

ideal_designer <- "3D modeling, AutoCAD, broadcasting, budget, C, cable, Chinese, com, hardware, content, controller, data analysis, Dec, decision-making,\ndesigning, product design, English, fashion, focus, Fortran, frame, Graphic, Lathe, Linux, director, Maple, materials, MATLAB, mechanical,\nMechanical Engineering, access, Mill, modeling, navigation, printer, processes, profit, speaking, Python, Quantitative analysis, reading, read,\nresearch, safety, Simulation, sketching, SolidWorks, statistical analysis, Statistics, phone, translating, transportation, video, Welding, wiring, written\n"

rank_resume(ideal_designer, designers)

Nice. rank_resume() function works well!

`rank_resume()` Testing (2)

This time, I will intentionally use the criteria possessed by resume 10748989.pdf. If resume 10748989.pdf ranks 1st and have 1.00 matching percentage, it means my function is working well!

But first, let’s take a look at resume 10748989.pdf!

knitr::include_graphics("10748989.png")

Now let’s rank the resumes. The resume 10748989.pdf should ranks 1st and have 1.00 matching percentage!

ideal_designer <- "Building codes knowledge Complex problem solving Strong analytical ability Excellent attention to detail Commercial interior design Working drawings and procedures Space planning methodology Sketching Rendering Digital drafting 3D rendering software Proficient in SketchUp"

rank_resume(ideal_designer, designers)

Perfect. rank_resume() function works well!

`matching_criteria_cloud()` Testing

ideal_designer <- "Photoshop, Adobe Illustrator, AutoCAD, SketchUp, InDesign, Graphic Design, UI/UX Design, Typography, Motion Graphics"

rank <- rank_resume(ideal_designer, designers)

rank[1, "Matching_Words"]

## [1] "adob autocad sketchup indesign graphic design"

get_WordCloud(rank[1, "Matching_Words"])

Cool! get_WordCloud() works perfectly

Shiny Web-App

Overview

This is just a temporary overview, the web app is not yet completed and is not functioning as it should

knitr::include_graphics("overview.png")

Input Folder Example

knitr::include_graphics("input-folder-example.png")

User Work Flow

The user uploads a folder containing a collection of resumes.
The user enters the desired criteria.
The web app generates a ranking in the form of a table.
The user can click on a resume in the table.
When a resume in the table is clicked, the web app generates a word cloud that represents the criteria of the clicked resume that matches the ideal criteria.

Limitations & Notes

The resumes input must be a folder with a .zip extension
The contents of the folder should consist of a collection of files with a .pdf extension
Images on the resume will be ignored
There are no restrictions on the type of text input. HOWEVER, please note that stop-words (words such as: “yang”; “adalah”; “tidak”; “saya”; “is”; “am”; “no”; “yes”; etc) will be eliminated during the text processing process
There no limitation on the number of input words. HOWEVER, please note that duplicated words will be eliminated during text processing
Please note that words will be reduced to their base forms (example: “Analytical” -> “Analyt”).

Resume Ranking Tool with NLP

Ardian

September 04th, 2023

Introduction

Background

Problem Statement

Project Idea

Key Features:

Value Proposition:

Comparing to ATS:

Conclusion:

Work Flow

Data Preprocessing:

Text Preprocessing:

Resume Ranking:

Output

Business Impact

1. Enhanced Recruitment Efficiency

2. Diverse and Inclusive Hiring

3. Adaptability to Changing Needs

4. Cost Savings

5. Data-Driven Decision-Making

Code Preparation

Used Libraries

Text Cleaner Function

Function Input:

Function WorkFlow:

Function Output:

Folder to Table Converter Function

Function Input:

Function Workflow:

Function Output:

Resume Ranker Function

Function Input:

Ranking Process Steps:

Function Output:

WordCloud Function

Function Input:

Function WorkFlow:

Function Output:

Function Testing

cleanText() Testing

folder_to_table() Testing

Resume vs Extracted Text Comparison

Resume

Extracted Text

rank_resume() Testing

rank_resume() Testing (2)

matching_criteria_cloud() Testing

Shiny Web-App

Overview

Input Folder Example

User Work Flow

Limitations & Notes

`cleanText()` Testing

`folder_to_table()` Testing

`rank_resume()` Testing

`rank_resume()` Testing (2)

`matching_criteria_cloud()` Testing