Data Background

The data were a compilation of useful informations from the human resources department. The data were used to analyze the human resources performance, which consists of employees’ name, Date of birth, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, performance score, absences, most recent performance review date, and employee engangement score. Data source was created by Dr. Carla Patalono, a professional in HR and was used in HR Metrics and Analytics course, at New England College of Business.

Business questions that can be explore are: 1. Is there any relationship between employee’s employment status and their performance score? 2. Why do employees with fairly good performance choose to resign? 3. What is the overall diversity profile of the organization? 4. What are our best recruiting sources if we want to ensure an organization with the best human resources? 5. Can we predict who is going to terminate and who isn’t?

Install Library and Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Data Input

HR <- read.csv("HRDataset_v14.csv")

Data Description

Data Inspection

head (HR)
##              Employee_Name EmpID MarriedID MaritalStatusID GenderID EmpStatusID
## 1      Adinolfi, Wilson  K 10026         0               0        1           1
## 2 Ait Sidi, Karthikeyan    10084         1               1        1           5
## 3        Akinkuolie, Sarah 10196         1               1        0           5
## 4             Alagbe,Trina 10088         1               1        0           1
## 5         Anderson, Carol  10069         0               2        0           5
## 6        Anderson, Linda   10002         0               0        0           1
##   DeptID PerfScoreID FromDiversityJobFairID Salary Termd PositionID
## 1      5           4                      0  62506     0         19
## 2      3           3                      0 104437     1         27
## 3      5           3                      0  64955     1         20
## 4      5           3                      0  64991     0         19
## 5      5           3                      0  50825     1         19
## 6      5           4                      0  57568     0         19
##                   Position State  Zip      DOB Sex MaritalDesc CitizenDesc
## 1  Production Technician I    MA 1960 07/10/83  M       Single  US Citizen
## 2                  Sr. DBA    MA 2148 05/05/75  M      Married  US Citizen
## 3 Production Technician II    MA 1810 09/19/88   F     Married  US Citizen
## 4  Production Technician I    MA 1886 09/27/88   F     Married  US Citizen
## 5  Production Technician I    MA 2169 09/08/89   F    Divorced  US Citizen
## 6  Production Technician I    MA 1844 05/22/77   F      Single  US Citizen
##   HispanicLatino RaceDesc DateofHire DateofTermination        TermReason
## 1             No    White   7/5/2011                   N/A-StillEmployed
## 2             No    White  3/30/2015         6/16/2016     career change
## 3             No    White   7/5/2011         9/24/2012             hours
## 4             No    White   1/7/2008                   N/A-StillEmployed
## 5             No    White  7/11/2011          9/6/2016  return to school
## 6             No    White   1/9/2012                   N/A-StillEmployed
##         EmploymentStatus        Department    ManagerName ManagerID
## 1                 Active Production        Michael Albert        22
## 2 Voluntarily Terminated             IT/IS     Simon Roup         4
## 3 Voluntarily Terminated Production        Kissy Sullivan        20
## 4                 Active Production          Elijiah Gray        16
## 5 Voluntarily Terminated Production        Webster Butler        39
## 6                 Active Production              Amy Dunn        11
##   RecruitmentSource PerformanceScore EngagementSurvey EmpSatisfaction
## 1          LinkedIn          Exceeds             4.60               5
## 2            Indeed      Fully Meets             4.96               3
## 3          LinkedIn      Fully Meets             3.02               3
## 4            Indeed      Fully Meets             4.84               5
## 5     Google Search      Fully Meets             5.00               4
## 6          LinkedIn          Exceeds             5.00               5
##   SpecialProjectsCount LastPerformanceReview_Date DaysLateLast30 Absences
## 1                    0                  1/17/2019              0        1
## 2                    6                  2/24/2016              0       17
## 3                    0                  5/15/2012              0        3
## 4                    0                   1/3/2019              0       15
## 5                    0                   2/1/2016              0        2
## 6                    0                   1/7/2019              0       15

Data Cleansing & Coertions

glimpse(HR)
## Rows: 311
## Columns: 36
## $ Employee_Name              <chr> "Adinolfi, Wilson  K", "Ait Sidi, Karthikey…
## $ EmpID                      <int> 10026, 10084, 10196, 10088, 10069, 10002, 1…
## $ MarriedID                  <int> 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0…
## $ MaritalStatusID            <int> 0, 1, 1, 1, 2, 0, 0, 4, 0, 2, 1, 1, 2, 0, 2…
## $ GenderID                   <int> 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1…
## $ EmpStatusID                <int> 1, 5, 5, 1, 5, 1, 1, 1, 3, 1, 5, 5, 1, 1, 5…
## $ DeptID                     <int> 5, 3, 5, 5, 5, 5, 4, 5, 5, 3, 5, 5, 3, 5, 5…
## $ PerfScoreID                <int> 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3…
## $ FromDiversityJobFairID     <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0…
## $ Salary                     <int> 62506, 104437, 64955, 64991, 50825, 57568, …
## $ Termd                      <int> 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1…
## $ PositionID                 <int> 19, 27, 20, 19, 19, 19, 24, 19, 19, 14, 19,…
## $ Position                   <chr> "Production Technician I", "Sr. DBA", "Prod…
## $ State                      <chr> "MA", "MA", "MA", "MA", "MA", "MA", "MA", "…
## $ Zip                        <int> 1960, 2148, 1810, 1886, 2169, 1844, 2110, 2…
## $ DOB                        <chr> "07/10/83", "05/05/75", "09/19/88", "09/27/…
## $ Sex                        <chr> "M ", "M ", "F", "F", "F", "F", "F", "M ", …
## $ MaritalDesc                <chr> "Single", "Married", "Married", "Married", …
## $ CitizenDesc                <chr> "US Citizen", "US Citizen", "US Citizen", "…
## $ HispanicLatino             <chr> "No", "No", "No", "No", "No", "No", "No", "…
## $ RaceDesc                   <chr> "White", "White", "White", "White", "White"…
## $ DateofHire                 <chr> "7/5/2011", "3/30/2015", "7/5/2011", "1/7/2…
## $ DateofTermination          <chr> "", "6/16/2016", "9/24/2012", "", "9/6/2016…
## $ TermReason                 <chr> "N/A-StillEmployed", "career change", "hour…
## $ EmploymentStatus           <chr> "Active", "Voluntarily Terminated", "Volunt…
## $ Department                 <chr> "Production       ", "IT/IS", "Production  …
## $ ManagerName                <chr> "Michael Albert", "Simon Roup", "Kissy Sull…
## $ ManagerID                  <int> 22, 4, 20, 16, 39, 11, 10, 19, 12, 7, 14, 2…
## $ RecruitmentSource          <chr> "LinkedIn", "Indeed", "LinkedIn", "Indeed",…
## $ PerformanceScore           <chr> "Exceeds", "Fully Meets", "Fully Meets", "F…
## $ EngagementSurvey           <dbl> 4.60, 4.96, 3.02, 4.84, 5.00, 5.00, 3.04, 5…
## $ EmpSatisfaction            <int> 5, 3, 3, 5, 4, 5, 3, 4, 3, 5, 4, 3, 4, 4, 5…
## $ SpecialProjectsCount       <int> 0, 6, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0, 5, 0, 0…
## $ LastPerformanceReview_Date <chr> "1/17/2019", "2/24/2016", "5/15/2012", "1/3…
## $ DaysLateLast30             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Absences                   <int> 1, 17, 3, 15, 2, 15, 19, 19, 4, 16, 12, 15,…

Filter the columns information with the same information, and erase the columns that isn’t needed: 1. Employee_Name = EmpID 2. MarriedID = MaritalStatusID = MaritalDesc 3. GenderID = Sex 4. DeptID = Department 5. PositionID = Position 6. State = zip 7. FromDiversityJobFairID = RecruitmentSource 8. EngagementSurvey = EmpSatisfaction 9. HispanicLatino = RaceDesc 10. TermD = EmploymentStatus 11. ManagerName = ManagerID 12. PerfScoreID = PerformanceScore

HR <- select(.data = HR,
       - EmpID, 
       - EmpStatusID,
       - Employee_Name,
       - MaritalStatusID, 
       - MaritalDesc,
       - Sex,
       - DeptID,
       - PositionID,
       - Zip,
       - FromDiversityJobFairID,
       - EmpSatisfaction,
       - HispanicLatino,
       - Termd,
       - ManagerID,
       - DaysLateLast30,
       - PerfScoreID)
head (HR)
##   MarriedID GenderID Salary                 Position State      DOB CitizenDesc
## 1         0        1  62506  Production Technician I    MA 07/10/83  US Citizen
## 2         1        1 104437                  Sr. DBA    MA 05/05/75  US Citizen
## 3         1        0  64955 Production Technician II    MA 09/19/88  US Citizen
## 4         1        0  64991  Production Technician I    MA 09/27/88  US Citizen
## 5         0        0  50825  Production Technician I    MA 09/08/89  US Citizen
## 6         0        0  57568  Production Technician I    MA 05/22/77  US Citizen
##   RaceDesc DateofHire DateofTermination        TermReason
## 1    White   7/5/2011                   N/A-StillEmployed
## 2    White  3/30/2015         6/16/2016     career change
## 3    White   7/5/2011         9/24/2012             hours
## 4    White   1/7/2008                   N/A-StillEmployed
## 5    White  7/11/2011          9/6/2016  return to school
## 6    White   1/9/2012                   N/A-StillEmployed
##         EmploymentStatus        Department    ManagerName RecruitmentSource
## 1                 Active Production        Michael Albert          LinkedIn
## 2 Voluntarily Terminated             IT/IS     Simon Roup            Indeed
## 3 Voluntarily Terminated Production        Kissy Sullivan          LinkedIn
## 4                 Active Production          Elijiah Gray            Indeed
## 5 Voluntarily Terminated Production        Webster Butler     Google Search
## 6                 Active Production              Amy Dunn          LinkedIn
##   PerformanceScore EngagementSurvey SpecialProjectsCount
## 1          Exceeds             4.60                    0
## 2      Fully Meets             4.96                    6
## 3      Fully Meets             3.02                    0
## 4      Fully Meets             4.84                    0
## 5      Fully Meets             5.00                    0
## 6          Exceeds             5.00                    0
##   LastPerformanceReview_Date Absences
## 1                  1/17/2019        1
## 2                  2/24/2016       17
## 3                  5/15/2012        3
## 4                   1/3/2019       15
## 5                   2/1/2016        2
## 6                   1/7/2019       15

Checking for missing values

anyNA (HR)
## [1] FALSE

Data Manipulation & Transformation

Transforming data type to the correct data type. From the data structure above, the data that needs to be transform are: 1. MarriedID = as.factor 2. GenderID = as.factor 3. DOB = as.date (MM/DD/YY) 4. Date of hire = as.date (MM/DD/YY) 5. Date of termination = as.date (MM/DD/YY) 6. Last Performance Review Date = as.date (MM/DD/YY)

glimpse(HR)
## Rows: 311
## Columns: 20
## $ MarriedID                  <int> 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0…
## $ GenderID                   <int> 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1…
## $ Salary                     <int> 62506, 104437, 64955, 64991, 50825, 57568, …
## $ Position                   <chr> "Production Technician I", "Sr. DBA", "Prod…
## $ State                      <chr> "MA", "MA", "MA", "MA", "MA", "MA", "MA", "…
## $ DOB                        <chr> "07/10/83", "05/05/75", "09/19/88", "09/27/…
## $ CitizenDesc                <chr> "US Citizen", "US Citizen", "US Citizen", "…
## $ RaceDesc                   <chr> "White", "White", "White", "White", "White"…
## $ DateofHire                 <chr> "7/5/2011", "3/30/2015", "7/5/2011", "1/7/2…
## $ DateofTermination          <chr> "", "6/16/2016", "9/24/2012", "", "9/6/2016…
## $ TermReason                 <chr> "N/A-StillEmployed", "career change", "hour…
## $ EmploymentStatus           <chr> "Active", "Voluntarily Terminated", "Volunt…
## $ Department                 <chr> "Production       ", "IT/IS", "Production  …
## $ ManagerName                <chr> "Michael Albert", "Simon Roup", "Kissy Sull…
## $ RecruitmentSource          <chr> "LinkedIn", "Indeed", "LinkedIn", "Indeed",…
## $ PerformanceScore           <chr> "Exceeds", "Fully Meets", "Fully Meets", "F…
## $ EngagementSurvey           <dbl> 4.60, 4.96, 3.02, 4.84, 5.00, 5.00, 3.04, 5…
## $ SpecialProjectsCount       <int> 0, 6, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0, 5, 0, 0…
## $ LastPerformanceReview_Date <chr> "1/17/2019", "2/24/2016", "5/15/2012", "1/3…
## $ Absences                   <int> 1, 17, 3, 15, 2, 15, 19, 19, 4, 16, 12, 15,…
HR <- mutate(.data = HR,
       MarriedID = as.factor(MarriedID),
       GenderID = as.factor(GenderID),
       DOB = as.Date(DOB, format = "%m%d%y"),
       DateofHire = as.Date(DateofHire, format = "%m%d%y"),
       DateofTermination = as.Date(DateofTermination, format = "%m%d%y"),
       LastPerformanceReview_Date = as.Date(LastPerformanceReview_Date, format = "%m%d%y"))

Answering Business Questions

Is there any relationship between employee’s employment status and their performance score?

table(HR$EmploymentStatus, HR$PerformanceScore)
##                         
##                          Exceeds Fully Meets Needs Improvement PIP
##   Active                      29         162                 8   8
##   Terminated for Cause         1           8                 5   2
##   Voluntarily Terminated       7          73                 5   3

Answer: Employee’s employment status is consist of active, terminated for cause, and voluntarily terminated. While the performance score consists of a range between 1 to 4. With 1 = exceeds expectation, 2 = fully meets expectation, 3 = need improvement, and 4 = Performance Improvement Plan (PIP). A PIP is a structured program or process that an employer may implement when an employee’s performance is not meeting the expected standards or when there are specific areas that need improvement. PIPs are typically used as a last resort to help employees improve their performance before more serious actions, such as termination, are considered.

Based on the table, people who exceeds expectation and full meets expectation dominately still active for work, and only a small percentage of people are terminated for cause (fired) and voluntarily terminated (resign). Based on the table, the difference between people who need improvement and PIP in the status of active employees, terminated for cause, and voluntarily terminated is not much different. This shows that the company is really trying to ensure that active employees really have good performance. Employees with good performance who resign may be due to personal reasons, that can be furtherly discuss.

Why do employees with fairly good performance choose to resign?

performance <- HR %>%
  select(EmploymentStatus, PerformanceScore, TermReason) %>%
  filter(PerformanceScore == "Fully Meets", 
         EmploymentStatus == "Voluntarily Terminated")
table(performance$EmploymentStatus, performance$TermReason)
##                         
##                          Another position attendance career change hours
##   Voluntarily Terminated               17          1             6     8
##                         
##                          maternity leave - did not return medical issues
##   Voluntarily Terminated                                2              3
##                         
##                          military more money performance relocation out of area
##   Voluntarily Terminated        3          9           1                      5
##                         
##                          retiring return to school unhappy
##   Voluntarily Terminated        3                5      10

Answer: Based on the data, we found out that employees with fairly good performance choose to resign, mostly because there are other job’s positioning (17), feeling unhappy (10), and there are other jobs with more money (9).

What is the overall diversity profile of the organization?

diversity <- HR %>%
  select(EmploymentStatus, RaceDesc, CitizenDesc) %>%
  filter(EmploymentStatus == "Active")
table(diversity$EmploymentStatus, diversity$RaceDesc)
##         
##          American Indian or Alaska Native Asian Black or African American
##   Active                                3    20                        51
##         
##          Hispanic Two or more races White
##   Active        1                 8   124
table(diversity$EmploymentStatus, diversity$CitizenDesc)
##         
##          Eligible NonCitizen Non-Citizen US Citizen
##   Active                   7           1        199

Answer: Based on the data, we found out that the diversity profile of the organization is quite diverse. Where the company is dominated with white people (124), followed by black or african american (51), and asian (20). While a small percentage of the employee’s organization are two or more races (8), american indian or alaska native (3), and hispanic (1). Based on the data, we also found out that the diversity profile of citizenship in the organization is dominated by the US citizen (199).

What are our best recruiting sources if we want to ensure an organization with the best human resources?

recruiting <- HR %>%
  select(EmploymentStatus, PerformanceScore, RecruitmentSource) %>%
  filter(EmploymentStatus == "Active",
         PerformanceScore %in% c("Exceeds", "Fully Meets"))
table(recruiting$PerformanceScore, recruiting$RecruitmentSource)
##              
##               CareerBuilder Diversity Job Fair Employee Referral Google Search
##   Exceeds                 1                  4                 5             1
##   Fully Meets             9                  6                21            18
##              
##               Indeed LinkedIn Other Website
##   Exceeds         11        6     0       1
##   Fully Meets     49       49     1       9

Answer: Based on the data, we found out that employees who exceeds the organization’s performance expectation are obtained in Indeed (11), followed by LinkedIn (6), Employee referral (5), and Diversity Job Fair (4). While a small percentage are obtained in CareerBuilder, Google Search, and other website. While the employees who fully meets the organization’s performance expectation are obtained in Indeed (49) and LinkedIn (49), followed by employee referral (21), google search (18). While a small percentage are obtained in CareerBuilder, Diversity Job Fair, and other website. Thus, it can be simplify that the best recruiting sources with the best human resources employed are Indeed and LinkedIn.

Can we predict who is going to terminate and who isn’t?

The reasons for termination are dominated by other job’s positioning with more money and employee’s happiness, which can be seen based on the employee’s engagement survey. Based on that, first, we try to see the range of salary where people choose to resign. To simplify to understand the relationship between employee’s salary, happiness, and employment status, the salary range are categorized.

predict_terminate <- HR %>%
  select(EmploymentStatus, EngagementSurvey, Salary) %>%
  filter(EmploymentStatus == "Voluntarily Terminated")
predict_terminate <- predict_terminate %>%
  mutate(Salary = case_when(
    between(Salary, 45000, 55000) ~ "$45,000 - $55,000",
    between(Salary, 55100, 65000) ~ "$55,000 - $65,000",
    between(Salary, 65100, 75000) ~ "$65,000 - $75,000",
    between(Salary, 75100, 85000) ~ "$75,000 - $85,000",
    between(Salary, 85100, 95000) ~ "$85,000 - $95,000",
    between(Salary, 95100, 105000) ~ "$95,000 - $105,000",
    between(Salary, 105100, 150000) ~ ">$105,000"
  ))
predict_terminate <- predict_terminate %>%
  mutate(EngagementSurvey = case_when(
    between(EngagementSurvey, 0, 1) ~ "1",
    between(EngagementSurvey, 1.1, 2) ~ "2",
    between(EngagementSurvey, 2.1, 3) ~ "3",
    between(EngagementSurvey, 3.1, 4) ~ "4",
    between(EngagementSurvey, 4.1, 5) ~ "5"
  ))
table(predict_terminate$EngagementSurvey, predict_terminate$Salary)
##    
##     $45,000 - $55,000 $55,000 - $65,000 $65,000 - $75,000 $75,000 - $85,000
##   3                 1                 2                 1                 1
##   4                12                 6                 4                 0
##   5                14                22                12                 3
##    
##     $85,000 - $95,000 $95,000 - $105,000 >$105,000
##   3                 0                  0         0
##   4                 0                  0         1
##   5                 2                  2         2

On employees with engagement survey results 5, employees who resigns are dominated with salary range between $55.000-$65.000, followed by employees with salary range between $45.000-$55.000, and $65.000-$75.000. On employees with engagement survey results 4, employees who resigns are dominated with salary range between $45.000-$55.000. On employees with engagement survey results 3, employees who resigns are distributed fairly in salary range between $45.000 to $85.000. Based on above, we can roughly predict that employees who will terminate are employees with small salary range, and engagement surveys varies between 3 to 5.

Data Source

Dr. Carla Patalano. (2020). Human Resources Data Set [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/1572001