Business Problem

IBM has recently faced a record number of employee departures from their company. This is deterimental to profit, productivity, and morale of the company. Worst of all, it is disruptive and makes the work of hiring even more challenging.

We want to predict which employees will leave this year due to voluntary resignation. If we do this successfully, IBM can try to prevent the departure of current employees or attempt to fill their spots before they leave.

Most importantly, we want to figure out the profiles of certain employees in their departure. With this information, the leadership of IBM can identify who is at risk of leaving and effectively motivate them to stay.

Three Step Process

We see three essential steps to performing the analysis we want.

Step 1: Format and clean the data

  • Re-format the data and clean empty variables
  • Perform dimensionality reduction to create richer, more meaningful fields

Step 2: Perform data analysis to predict which employees will leave

  • Use LASSO to further reduce the fields to our final set
  • Create a predictive model to determine which employees will leave

Step 3: Output final data

  • Run the model on the test data to output attrition predictions
  • Describe clusters of employees that may depart

Sample of Data

The challenge with our data is going to be to reduce the 32 variables to our final set. We sourced our data from this Kaggle dataset.

Important to note: Values in this project may be partially fictitious due to legal consequences for HR data leakage.

Data Dictionary

Per the online documentation for the dataset, the definitions are as follows:

Name Description
Attrition Employment status at IBM (Possible Values: Current Employee, Voluntary Resignation, Termination)
Age Current age of employee (Possible Values: 18+)
BusinessTravel How often does the employee travel for business (Possible Values: Non-Travel, Travel_Rarely, Travel_Frequently)
DailyRate How much the employee can earn in a given day (in USD)
Department Current department of employee (Possible Values: Human Resources, Research & Development, Sales)
DistanceFromHome Miles from employee’s home
Education Most recent degree achieved (Possible Values: 1 ‘Below College’ 2 ‘College’ 3 ‘Bachelor’ 4 ‘Master’ 5 ‘Doctor’)
EducationField Field of most recent study (Possible Values: Human Resources, Life Sciences, Marketing, Medical, Technical Degree, Other)
EmployeeNumber Employee ID number
EnvironmentSatisfaction Employee satisfaction with their work environment (Possible Values: 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’)
Gender Gender (Male/Female)
HourlyRate Current hourly rate for job in USD
JobInvolvement Self-rated assessment describing how involved they must be at their job (Possible Values: 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’)
JobLevel Current job level at the organization (out of 5) (Possible Values: 1 - Intern, 2 - Junior, 3 - Mid-Level, 4 - Senior, 5 - Director)
JobRole Employee’s current role at the company (Possible Values: Healthcare Representative, Human Resources, Laboratory Technician, Manager, Manufacturing Director, Research Director, Research Scientist, Sales Executive, Sales Representative)
JobSatisfaction Employee rated satisfaction of job on most recent company survey (Possible Values: 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’)
MaritalStatus Marital status (Possible Values: Single, Married, Divorced)
MonthlyIncome Most recent earned income in USD
MonthlyRate Current monthly rate for job in USD
NumCompaniesWorked Number of companies worked at before current company
OverTime Whether the employee must work overtime for their job (Possible Values: Yes/No)
PercentSalaryHike Percent of salary raised last year
PerformanceRating Performance rating by manager last year (Possible Values: 1 ‘Low’ 2 ‘Good’ 3 ‘Excellent’ 4 ‘Outstanding’)
RelationshipSatisfaction Employee satisfaction in current relationship (Possible Values: 1 ‘Low’ 2 ‘Medium’ 3 ‘High’ 4 ‘Very High’)
StockOptionLevel Current stock option level, out of three (Possible Values: 0 - 3)
TotalWorkingYears Number of years working overall
TrainingTimesLastYear Number of trainings received last year
WorkLifeBalance Employee’s rating of their current work-life balance (Possible Values: 1 ‘Bad’ 2 ‘Good’ 3 ‘Better’ 4 ‘Best’)
YearsAtCompany Number of years at current company
YearsInCurrentRole Number of years at current role at current company
YearsSinceLastPromotion Number of years since last promoted
YearsWithCurrManager Number of years working with current manager