Data Background
The data were a compilation of useful informations from the human resources department. The data were used to analyze the human resources performance, which consists of employees’ name, Date of birth, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, performance score, absences, most recent performance review date, and employee engangement score. Data source was created by Dr. Carla Patalono, a professional in HR and was used in HR Metrics and Analytics course, at New England College of Business.
Business questions that can be explore are: 1. Is there any relationship between employee’s employment status and their performance score? 2. Why do employees with fairly good performance choose to resign? 3. What is the overall diversity profile of the organization? 4. What are our best recruiting sources if we want to ensure an organization with the best human resources? 5. Can we predict who is going to terminate and who isn’t?
Install Library and Packages
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Data Description
- ‘Employee_Name’ = employee’s name
- ‘EmpID’ = employee’s ID
- ‘MarriedID’ = employee’s married status (0 = not married, 1 = married)
- ‘MaritalStatusID’ = employee’s marital status (0 = single, 1 = married, 2 = divorced, 3 = separated, 4 = widowed)
- ’GenderID = sex (0 = female, 1 = male)
- ‘EmpStatusID’ = employee’s status ID
- ‘DeptID’= employee’s department ID
- ‘PerfScoreID’ = employee’s performance score rating
- ‘FromDiversityJobFairID’ = Employee’s recruitment channel
- ‘Salary’= Employee’s salary
- ‘Termd’ = Whether the employee’s work has been terminated or not
- ‘Position’ = Employee’s job position
- ‘State’ = Employee’s work state
- ‘Zip’ = Employee’s country zip code
- ‘DOB’ = Employee’s Date of Birth
- ‘Sex’ = Employee’s sex (M = male, F = Female)
- ‘CitizenDesc’ = employee’s citizen description
- ‘HispanicLatino’ = Whether employee’s race a hispanic latino or no
- ‘RaceDesc’ = Employee’s race description
- ‘DateofHire’ = Date of employee’s contract hire
- ‘DateofTermination’ = Date of employee’s contract termination
- ‘TermReason’ = Reason for employee’s contract termination
- ‘EmploymentStatus’ = Employee’s employment status
- ‘Department’ = Employee’s working department
- ‘ManagerName’ = Employee’s manager name
- ‘ManagerID’ = Employee’s manager ID
- ‘RecruitmentSource’ = Employee’s recruitment channel
- ‘PerformanceScore’ = An assessment or evaluation of an individual’s job performance within an organization
- ‘EngangementSurvey’ = A tool to measure and assess the level of engagement and satisfaction among their employees
- ‘EmpSatisfaction’ = Employee’s satisfaction rating working in the company
- ‘SpecialProjectsCount’ = Number of special projects done by the employee
- ‘LastPerformanceReview_Date’
- ‘DaysLateLast30’
- ‘Absences’ = days of employee’s absent
Data Inspection
## Employee_Name EmpID MarriedID MaritalStatusID GenderID EmpStatusID
## 1 Adinolfi, Wilson K 10026 0 0 1 1
## 2 Ait Sidi, Karthikeyan 10084 1 1 1 5
## 3 Akinkuolie, Sarah 10196 1 1 0 5
## 4 Alagbe,Trina 10088 1 1 0 1
## 5 Anderson, Carol 10069 0 2 0 5
## 6 Anderson, Linda 10002 0 0 0 1
## DeptID PerfScoreID FromDiversityJobFairID Salary Termd PositionID
## 1 5 4 0 62506 0 19
## 2 3 3 0 104437 1 27
## 3 5 3 0 64955 1 20
## 4 5 3 0 64991 0 19
## 5 5 3 0 50825 1 19
## 6 5 4 0 57568 0 19
## Position State Zip DOB Sex MaritalDesc CitizenDesc
## 1 Production Technician I MA 1960 07/10/83 M Single US Citizen
## 2 Sr. DBA MA 2148 05/05/75 M Married US Citizen
## 3 Production Technician II MA 1810 09/19/88 F Married US Citizen
## 4 Production Technician I MA 1886 09/27/88 F Married US Citizen
## 5 Production Technician I MA 2169 09/08/89 F Divorced US Citizen
## 6 Production Technician I MA 1844 05/22/77 F Single US Citizen
## HispanicLatino RaceDesc DateofHire DateofTermination TermReason
## 1 No White 7/5/2011 N/A-StillEmployed
## 2 No White 3/30/2015 6/16/2016 career change
## 3 No White 7/5/2011 9/24/2012 hours
## 4 No White 1/7/2008 N/A-StillEmployed
## 5 No White 7/11/2011 9/6/2016 return to school
## 6 No White 1/9/2012 N/A-StillEmployed
## EmploymentStatus Department ManagerName ManagerID
## 1 Active Production Michael Albert 22
## 2 Voluntarily Terminated IT/IS Simon Roup 4
## 3 Voluntarily Terminated Production Kissy Sullivan 20
## 4 Active Production Elijiah Gray 16
## 5 Voluntarily Terminated Production Webster Butler 39
## 6 Active Production Amy Dunn 11
## RecruitmentSource PerformanceScore EngagementSurvey EmpSatisfaction
## 1 LinkedIn Exceeds 4.60 5
## 2 Indeed Fully Meets 4.96 3
## 3 LinkedIn Fully Meets 3.02 3
## 4 Indeed Fully Meets 4.84 5
## 5 Google Search Fully Meets 5.00 4
## 6 LinkedIn Exceeds 5.00 5
## SpecialProjectsCount LastPerformanceReview_Date DaysLateLast30 Absences
## 1 0 1/17/2019 0 1
## 2 6 2/24/2016 0 17
## 3 0 5/15/2012 0 3
## 4 0 1/3/2019 0 15
## 5 0 2/1/2016 0 2
## 6 0 1/7/2019 0 15
Data Cleansing & Coertions
## Rows: 311
## Columns: 36
## $ Employee_Name <chr> "Adinolfi, Wilson K", "Ait Sidi, Karthikey…
## $ EmpID <int> 10026, 10084, 10196, 10088, 10069, 10002, 1…
## $ MarriedID <int> 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0…
## $ MaritalStatusID <int> 0, 1, 1, 1, 2, 0, 0, 4, 0, 2, 1, 1, 2, 0, 2…
## $ GenderID <int> 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1…
## $ EmpStatusID <int> 1, 5, 5, 1, 5, 1, 1, 1, 3, 1, 5, 5, 1, 1, 5…
## $ DeptID <int> 5, 3, 5, 5, 5, 5, 4, 5, 5, 3, 5, 5, 3, 5, 5…
## $ PerfScoreID <int> 4, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3…
## $ FromDiversityJobFairID <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0…
## $ Salary <int> 62506, 104437, 64955, 64991, 50825, 57568, …
## $ Termd <int> 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1…
## $ PositionID <int> 19, 27, 20, 19, 19, 19, 24, 19, 19, 14, 19,…
## $ Position <chr> "Production Technician I", "Sr. DBA", "Prod…
## $ State <chr> "MA", "MA", "MA", "MA", "MA", "MA", "MA", "…
## $ Zip <int> 1960, 2148, 1810, 1886, 2169, 1844, 2110, 2…
## $ DOB <chr> "07/10/83", "05/05/75", "09/19/88", "09/27/…
## $ Sex <chr> "M ", "M ", "F", "F", "F", "F", "F", "M ", …
## $ MaritalDesc <chr> "Single", "Married", "Married", "Married", …
## $ CitizenDesc <chr> "US Citizen", "US Citizen", "US Citizen", "…
## $ HispanicLatino <chr> "No", "No", "No", "No", "No", "No", "No", "…
## $ RaceDesc <chr> "White", "White", "White", "White", "White"…
## $ DateofHire <chr> "7/5/2011", "3/30/2015", "7/5/2011", "1/7/2…
## $ DateofTermination <chr> "", "6/16/2016", "9/24/2012", "", "9/6/2016…
## $ TermReason <chr> "N/A-StillEmployed", "career change", "hour…
## $ EmploymentStatus <chr> "Active", "Voluntarily Terminated", "Volunt…
## $ Department <chr> "Production ", "IT/IS", "Production …
## $ ManagerName <chr> "Michael Albert", "Simon Roup", "Kissy Sull…
## $ ManagerID <int> 22, 4, 20, 16, 39, 11, 10, 19, 12, 7, 14, 2…
## $ RecruitmentSource <chr> "LinkedIn", "Indeed", "LinkedIn", "Indeed",…
## $ PerformanceScore <chr> "Exceeds", "Fully Meets", "Fully Meets", "F…
## $ EngagementSurvey <dbl> 4.60, 4.96, 3.02, 4.84, 5.00, 5.00, 3.04, 5…
## $ EmpSatisfaction <int> 5, 3, 3, 5, 4, 5, 3, 4, 3, 5, 4, 3, 4, 4, 5…
## $ SpecialProjectsCount <int> 0, 6, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0, 5, 0, 0…
## $ LastPerformanceReview_Date <chr> "1/17/2019", "2/24/2016", "5/15/2012", "1/3…
## $ DaysLateLast30 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ Absences <int> 1, 17, 3, 15, 2, 15, 19, 19, 4, 16, 12, 15,…
Filter the columns information with the same information, and erase the columns that isn’t needed: 1. Employee_Name = EmpID 2. MarriedID = MaritalStatusID = MaritalDesc 3. GenderID = Sex 4. DeptID = Department 5. PositionID = Position 6. State = zip 7. FromDiversityJobFairID = RecruitmentSource 8. EngagementSurvey = EmpSatisfaction 9. HispanicLatino = RaceDesc 10. TermD = EmploymentStatus 11. ManagerName = ManagerID 12. PerfScoreID = PerformanceScore
HR <- select(.data = HR,
- EmpID,
- EmpStatusID,
- Employee_Name,
- MaritalStatusID,
- MaritalDesc,
- Sex,
- DeptID,
- PositionID,
- Zip,
- FromDiversityJobFairID,
- EmpSatisfaction,
- HispanicLatino,
- Termd,
- ManagerID,
- DaysLateLast30,
- PerfScoreID)## MarriedID GenderID Salary Position State DOB CitizenDesc
## 1 0 1 62506 Production Technician I MA 07/10/83 US Citizen
## 2 1 1 104437 Sr. DBA MA 05/05/75 US Citizen
## 3 1 0 64955 Production Technician II MA 09/19/88 US Citizen
## 4 1 0 64991 Production Technician I MA 09/27/88 US Citizen
## 5 0 0 50825 Production Technician I MA 09/08/89 US Citizen
## 6 0 0 57568 Production Technician I MA 05/22/77 US Citizen
## RaceDesc DateofHire DateofTermination TermReason
## 1 White 7/5/2011 N/A-StillEmployed
## 2 White 3/30/2015 6/16/2016 career change
## 3 White 7/5/2011 9/24/2012 hours
## 4 White 1/7/2008 N/A-StillEmployed
## 5 White 7/11/2011 9/6/2016 return to school
## 6 White 1/9/2012 N/A-StillEmployed
## EmploymentStatus Department ManagerName RecruitmentSource
## 1 Active Production Michael Albert LinkedIn
## 2 Voluntarily Terminated IT/IS Simon Roup Indeed
## 3 Voluntarily Terminated Production Kissy Sullivan LinkedIn
## 4 Active Production Elijiah Gray Indeed
## 5 Voluntarily Terminated Production Webster Butler Google Search
## 6 Active Production Amy Dunn LinkedIn
## PerformanceScore EngagementSurvey SpecialProjectsCount
## 1 Exceeds 4.60 0
## 2 Fully Meets 4.96 6
## 3 Fully Meets 3.02 0
## 4 Fully Meets 4.84 0
## 5 Fully Meets 5.00 0
## 6 Exceeds 5.00 0
## LastPerformanceReview_Date Absences
## 1 1/17/2019 1
## 2 2/24/2016 17
## 3 5/15/2012 3
## 4 1/3/2019 15
## 5 2/1/2016 2
## 6 1/7/2019 15
Checking for missing values
## [1] FALSE
Data Manipulation & Transformation
Transforming data type to the correct data type. From the data structure above, the data that needs to be transform are: 1. MarriedID = as.factor 2. GenderID = as.factor 3. DOB = as.date (MM/DD/YY) 4. Date of hire = as.date (MM/DD/YY) 5. Date of termination = as.date (MM/DD/YY) 6. Last Performance Review Date = as.date (MM/DD/YY)
## Rows: 311
## Columns: 20
## $ MarriedID <int> 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0…
## $ GenderID <int> 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1…
## $ Salary <int> 62506, 104437, 64955, 64991, 50825, 57568, …
## $ Position <chr> "Production Technician I", "Sr. DBA", "Prod…
## $ State <chr> "MA", "MA", "MA", "MA", "MA", "MA", "MA", "…
## $ DOB <chr> "07/10/83", "05/05/75", "09/19/88", "09/27/…
## $ CitizenDesc <chr> "US Citizen", "US Citizen", "US Citizen", "…
## $ RaceDesc <chr> "White", "White", "White", "White", "White"…
## $ DateofHire <chr> "7/5/2011", "3/30/2015", "7/5/2011", "1/7/2…
## $ DateofTermination <chr> "", "6/16/2016", "9/24/2012", "", "9/6/2016…
## $ TermReason <chr> "N/A-StillEmployed", "career change", "hour…
## $ EmploymentStatus <chr> "Active", "Voluntarily Terminated", "Volunt…
## $ Department <chr> "Production ", "IT/IS", "Production …
## $ ManagerName <chr> "Michael Albert", "Simon Roup", "Kissy Sull…
## $ RecruitmentSource <chr> "LinkedIn", "Indeed", "LinkedIn", "Indeed",…
## $ PerformanceScore <chr> "Exceeds", "Fully Meets", "Fully Meets", "F…
## $ EngagementSurvey <dbl> 4.60, 4.96, 3.02, 4.84, 5.00, 5.00, 3.04, 5…
## $ SpecialProjectsCount <int> 0, 6, 0, 0, 0, 0, 4, 0, 0, 6, 0, 0, 5, 0, 0…
## $ LastPerformanceReview_Date <chr> "1/17/2019", "2/24/2016", "5/15/2012", "1/3…
## $ Absences <int> 1, 17, 3, 15, 2, 15, 19, 19, 4, 16, 12, 15,…
HR <- mutate(.data = HR,
MarriedID = as.factor(MarriedID),
GenderID = as.factor(GenderID),
DOB = as.Date(DOB, format = "%m%d%y"),
DateofHire = as.Date(DateofHire, format = "%m%d%y"),
DateofTermination = as.Date(DateofTermination, format = "%m%d%y"),
LastPerformanceReview_Date = as.Date(LastPerformanceReview_Date, format = "%m%d%y"))Answering Business Questions
Is there any relationship between employee’s employment status and their performance score?
##
## Exceeds Fully Meets Needs Improvement PIP
## Active 29 162 8 8
## Terminated for Cause 1 8 5 2
## Voluntarily Terminated 7 73 5 3
Answer: Employee’s employment status is consist of active, terminated for cause, and voluntarily terminated. While the performance score consists of a range between 1 to 4. With 1 = exceeds expectation, 2 = fully meets expectation, 3 = need improvement, and 4 = Performance Improvement Plan (PIP). A PIP is a structured program or process that an employer may implement when an employee’s performance is not meeting the expected standards or when there are specific areas that need improvement. PIPs are typically used as a last resort to help employees improve their performance before more serious actions, such as termination, are considered.
Based on the table, people who exceeds expectation and full meets expectation dominately still active for work, and only a small percentage of people are terminated for cause (fired) and voluntarily terminated (resign). Based on the table, the difference between people who need improvement and PIP in the status of active employees, terminated for cause, and voluntarily terminated is not much different. This shows that the company is really trying to ensure that active employees really have good performance. Employees with good performance who resign may be due to personal reasons, that can be furtherly discuss.
Why do employees with fairly good performance choose to resign?
performance <- HR %>%
select(EmploymentStatus, PerformanceScore, TermReason) %>%
filter(PerformanceScore == "Fully Meets",
EmploymentStatus == "Voluntarily Terminated")##
## Another position attendance career change hours
## Voluntarily Terminated 17 1 6 8
##
## maternity leave - did not return medical issues
## Voluntarily Terminated 2 3
##
## military more money performance relocation out of area
## Voluntarily Terminated 3 9 1 5
##
## retiring return to school unhappy
## Voluntarily Terminated 3 5 10
Answer: Based on the data, we found out that employees with fairly good performance choose to resign, mostly because there are other job’s positioning (17), feeling unhappy (10), and there are other jobs with more money (9).
What is the overall diversity profile of the organization?
diversity <- HR %>%
select(EmploymentStatus, RaceDesc, CitizenDesc) %>%
filter(EmploymentStatus == "Active")##
## American Indian or Alaska Native Asian Black or African American
## Active 3 20 51
##
## Hispanic Two or more races White
## Active 1 8 124
##
## Eligible NonCitizen Non-Citizen US Citizen
## Active 7 1 199
Answer: Based on the data, we found out that the diversity profile of the organization is quite diverse. Where the company is dominated with white people (124), followed by black or african american (51), and asian (20). While a small percentage of the employee’s organization are two or more races (8), american indian or alaska native (3), and hispanic (1). Based on the data, we also found out that the diversity profile of citizenship in the organization is dominated by the US citizen (199).
What are our best recruiting sources if we want to ensure an organization with the best human resources?
recruiting <- HR %>%
select(EmploymentStatus, PerformanceScore, RecruitmentSource) %>%
filter(EmploymentStatus == "Active",
PerformanceScore %in% c("Exceeds", "Fully Meets"))##
## CareerBuilder Diversity Job Fair Employee Referral Google Search
## Exceeds 1 4 5 1
## Fully Meets 9 6 21 18
##
## Indeed LinkedIn Other Website
## Exceeds 11 6 0 1
## Fully Meets 49 49 1 9
Answer: Based on the data, we found out that employees who exceeds the organization’s performance expectation are obtained in Indeed (11), followed by LinkedIn (6), Employee referral (5), and Diversity Job Fair (4). While a small percentage are obtained in CareerBuilder, Google Search, and other website. While the employees who fully meets the organization’s performance expectation are obtained in Indeed (49) and LinkedIn (49), followed by employee referral (21), google search (18). While a small percentage are obtained in CareerBuilder, Diversity Job Fair, and other website. Thus, it can be simplify that the best recruiting sources with the best human resources employed are Indeed and LinkedIn.
Can we predict who is going to terminate and who isn’t?
The reasons for termination are dominated by other job’s positioning with more money and employee’s happiness, which can be seen based on the employee’s engagement survey. Based on that, first, we try to see the range of salary where people choose to resign. To simplify to understand the relationship between employee’s salary, happiness, and employment status, the salary range are categorized.
predict_terminate <- HR %>%
select(EmploymentStatus, EngagementSurvey, Salary) %>%
filter(EmploymentStatus == "Voluntarily Terminated")predict_terminate <- predict_terminate %>%
mutate(Salary = case_when(
between(Salary, 45000, 55000) ~ "$45,000 - $55,000",
between(Salary, 55100, 65000) ~ "$55,000 - $65,000",
between(Salary, 65100, 75000) ~ "$65,000 - $75,000",
between(Salary, 75100, 85000) ~ "$75,000 - $85,000",
between(Salary, 85100, 95000) ~ "$85,000 - $95,000",
between(Salary, 95100, 105000) ~ "$95,000 - $105,000",
between(Salary, 105100, 150000) ~ ">$105,000"
))predict_terminate <- predict_terminate %>%
mutate(EngagementSurvey = case_when(
between(EngagementSurvey, 0, 1) ~ "1",
between(EngagementSurvey, 1.1, 2) ~ "2",
between(EngagementSurvey, 2.1, 3) ~ "3",
between(EngagementSurvey, 3.1, 4) ~ "4",
between(EngagementSurvey, 4.1, 5) ~ "5"
))##
## $45,000 - $55,000 $55,000 - $65,000 $65,000 - $75,000 $75,000 - $85,000
## 3 1 2 1 1
## 4 12 6 4 0
## 5 14 22 12 3
##
## $85,000 - $95,000 $95,000 - $105,000 >$105,000
## 3 0 0 0
## 4 0 0 1
## 5 2 2 2
On employees with engagement survey results 5, employees who resigns are dominated with salary range between $55.000-$65.000, followed by employees with salary range between $45.000-$55.000, and $65.000-$75.000. On employees with engagement survey results 4, employees who resigns are dominated with salary range between $45.000-$55.000. On employees with engagement survey results 3, employees who resigns are distributed fairly in salary range between $45.000 to $85.000. Based on above, we can roughly predict that employees who will terminate are employees with small salary range, and engagement surveys varies between 3 to 5.
Data Source
Dr. Carla Patalano. (2020). Human Resources Data Set [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DSV/1572001