Employee Attrition Analysis
Introduction
Most of the work we do in the field of people analytics is oriented to helping organizations understand what is most important to their employees, with the goal of making improvements to increase employee engagement and productivity, and reduce unwanted attrition.
Attrition in business can mean the reduction in staff and employees in a company through normal means, such as retirement and resignation, the loss of customers or clients to old age or growing out of the company’s target demographic. Changes in management style, company structure, or other aspects of the company might cause employees to leave the company voluntarily, resulting in a higher attrition rate. Another possible cause of attrition is when a company eliminates a job completely. There are different turnover rates across industries, with hospitality and retail having higher rates compared to other industries. But a high turnover rate can be costly. When you think about your investment in recruiting and training employees and only having them stay on for a short period of time, you are not getting back a return on your investment. Customer attrition generally has a negative effect on the company’s profits and growth. This paper addresses the following issues concerning the attrition of an employee with respect to several paramters. In this paper, we investigate how the general parameters like Education, Department, Monthly Income, OverTime and others impact the attrition of an employee.
Read and Inspection Data
In this case I take the attrition data from Practical Statistics material course
# Read the data
attrition <- read.csv("data_input/attrition.csv")#View the data
head(attrition) EmployeeNumber Age Attrition BusinessTravel Department
1 1 41 Yes Travel_Rarely Sales
2 2 49 No Travel_Frequently Research & Development
3 4 37 Yes Travel_Rarely Research & Development
4 5 33 No Travel_Frequently Research & Development
5 7 27 No Travel_Rarely Research & Development
6 8 32 No Travel_Frequently Research & Development
DistanceFromHome Education EducationField EnvironmentSatisfaction Gender
1 1 College Life Sciences 2 Female
2 8 Below College Life Sciences 3 Male
3 2 College Other 4 Male
4 3 Master Life Sciences 4 Female
5 2 Below College Medical 1 Male
6 2 College Life Sciences 4 Male
HourlyRate JobInvolvement JobLevel JobRole JobSatisfaction
1 94 3 2 Sales Executive 4
2 61 2 2 Research Scientist 2
3 92 2 1 Laboratory Technician 3
4 56 3 1 Research Scientist 3
5 40 3 1 Laboratory Technician 2
6 79 3 1 Laboratory Technician 4
MaritalStatus MonthlyIncome NumCompaniesWorked OverTime PercentSalaryHike
1 Single 5993 8 Yes 11
2 Married 5130 1 No 23
3 Single 2090 6 Yes 15
4 Married 2909 1 Yes 11
5 Married 3468 9 No 12
6 Single 3068 0 No 13
PerformanceRating RelationshipSatisfaction StockOptionLevel TotalWorkingYears
1 3 1 0 8
2 4 4 1 10
3 3 2 0 7
4 3 3 0 8
5 3 4 1 6
6 3 3 0 8
TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole
1 0 1 6 4
2 3 3 10 7
3 3 3 0 0
4 3 3 8 7
5 3 3 2 2
6 2 2 7 7
YearsSinceLastPromotion YearsWithCurrManager
1 0 5
2 1 7
3 0 0
4 3 0
5 2 2
6 3 6
#Investigate the dataset
str(attrition)'data.frame': 1470 obs. of 30 variables:
$ EmployeeNumber : int 1 2 4 5 7 8 10 11 12 13 ...
$ Age : int 41 49 37 33 27 32 59 30 38 36 ...
$ Attrition : chr "Yes" "No" "Yes" "No" ...
$ BusinessTravel : chr "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" "Travel_Frequently" ...
$ Department : chr "Sales" "Research & Development" "Research & Development" "Research & Development" ...
$ DistanceFromHome : int 1 8 2 3 2 2 3 24 23 27 ...
$ Education : chr "College" "Below College" "College" "Master" ...
$ EducationField : chr "Life Sciences" "Life Sciences" "Other" "Life Sciences" ...
$ EnvironmentSatisfaction : int 2 3 4 4 1 4 3 4 4 3 ...
$ Gender : chr "Female" "Male" "Male" "Female" ...
$ HourlyRate : int 94 61 92 56 40 79 81 67 44 94 ...
$ JobInvolvement : int 3 2 2 3 3 3 4 3 2 3 ...
$ JobLevel : int 2 2 1 1 1 1 1 1 3 2 ...
$ JobRole : chr "Sales Executive" "Research Scientist" "Laboratory Technician" "Research Scientist" ...
$ JobSatisfaction : int 4 2 3 3 2 4 1 3 3 3 ...
$ MaritalStatus : chr "Single" "Married" "Single" "Married" ...
$ MonthlyIncome : int 5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
$ NumCompaniesWorked : int 8 1 6 1 9 0 4 1 0 6 ...
$ OverTime : chr "Yes" "No" "Yes" "Yes" ...
$ PercentSalaryHike : int 11 23 15 11 12 13 20 22 21 13 ...
$ PerformanceRating : int 3 4 3 3 3 3 4 4 4 3 ...
$ RelationshipSatisfaction: int 1 4 2 3 4 3 1 2 2 2 ...
$ StockOptionLevel : int 0 1 0 0 1 0 3 1 0 2 ...
$ TotalWorkingYears : int 8 10 7 8 6 8 12 1 10 17 ...
$ TrainingTimesLastYear : int 0 3 3 3 3 2 3 2 2 3 ...
$ WorkLifeBalance : int 1 3 3 3 3 2 2 3 3 2 ...
$ YearsAtCompany : int 6 10 0 8 2 7 1 1 9 7 ...
$ YearsInCurrentRole : int 4 7 0 7 2 7 0 0 7 7 ...
$ YearsSinceLastPromotion : int 0 1 0 3 2 3 0 0 1 7 ...
$ YearsWithCurrManager : int 5 7 0 0 2 6 0 0 8 7 ...
#Checking the rows and columns in the dataset
dim(attrition)[1] 1470 30
From our inspection we can conclude :
- Attrition data contain 1470 rows and 30 coloumns
- Each of column name mention as above code in str
and we find some datatype of the columns not in the correct type, we have to change all character columns into categories, as below column :
- Attrition
- BusinessTravel
- Department
- Education
- EducationField
- Gender
- JobRole
- MaritalStatus
- OverTime
actually we can change it directly as we read the data in the previous step, but here I just want to describe steps to inspect the data first .
# Converting the character columns into categories / factor
attrition[,c("Attrition","BusinessTravel","Department","Education","EducationField","Gender","JobRole","MaritalStatus","OverTime")] <- lapply(attrition[,c("Attrition","BusinessTravel","Department","Education","EducationField","Gender","JobRole","MaritalStatus","OverTime")], as.factor)# Check data again
str(attrition)'data.frame': 1470 obs. of 30 variables:
$ EmployeeNumber : int 1 2 4 5 7 8 10 11 12 13 ...
$ Age : int 41 49 37 33 27 32 59 30 38 36 ...
$ Attrition : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
$ BusinessTravel : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
$ Department : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
$ DistanceFromHome : int 1 8 2 3 2 2 3 24 23 27 ...
$ Education : Factor w/ 5 levels "Bachelor","Below College",..: 3 2 3 5 2 3 1 2 1 1 ...
$ EducationField : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
$ EnvironmentSatisfaction : int 2 3 4 4 1 4 3 4 4 3 ...
$ Gender : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
$ HourlyRate : int 94 61 92 56 40 79 81 67 44 94 ...
$ JobInvolvement : int 3 2 2 3 3 3 4 3 2 3 ...
$ JobLevel : int 2 2 1 1 1 1 1 1 3 2 ...
$ JobRole : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
$ JobSatisfaction : int 4 2 3 3 2 4 1 3 3 3 ...
$ MaritalStatus : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
$ MonthlyIncome : int 5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
$ NumCompaniesWorked : int 8 1 6 1 9 0 4 1 0 6 ...
$ OverTime : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
$ PercentSalaryHike : int 11 23 15 11 12 13 20 22 21 13 ...
$ PerformanceRating : int 3 4 3 3 3 3 4 4 4 3 ...
$ RelationshipSatisfaction: int 1 4 2 3 4 3 1 2 2 2 ...
$ StockOptionLevel : int 0 1 0 0 1 0 3 1 0 2 ...
$ TotalWorkingYears : int 8 10 7 8 6 8 12 1 10 17 ...
$ TrainingTimesLastYear : int 0 3 3 3 3 2 3 2 2 3 ...
$ WorkLifeBalance : int 1 3 3 3 3 2 2 3 3 2 ...
$ YearsAtCompany : int 6 10 0 8 2 7 1 1 9 7 ...
$ YearsInCurrentRole : int 4 7 0 7 2 7 0 0 7 7 ...
$ YearsSinceLastPromotion : int 0 1 0 3 2 3 0 0 1 7 ...
$ YearsWithCurrManager : int 5 7 0 0 2 6 0 0 8 7 ...
From the above output character data is already change into categories / factor data type. Then we can check for the missing value also , whether missing value available or not in datasheet.
# Check missing value in each column
colSums(is.na(attrition)) EmployeeNumber Age Attrition
0 0 0
BusinessTravel Department DistanceFromHome
0 0 0
Education EducationField EnvironmentSatisfaction
0 0 0
Gender HourlyRate JobInvolvement
0 0 0
JobLevel JobRole JobSatisfaction
0 0 0
MaritalStatus MonthlyIncome NumCompaniesWorked
0 0 0
OverTime PercentSalaryHike PerformanceRating
0 0 0
RelationshipSatisfaction StockOptionLevel TotalWorkingYears
0 0 0
TrainingTimesLastYear WorkLifeBalance YearsAtCompany
0 0 0
YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 0 0
# Check missing value for all data which will result boolean output true or false
anyNA(attrition)[1] FALSE
From the output , there is no missing value, sounds good.
The Who, When & Why Of Employee Turnover
On this steps we are getting to process the data and analyzing it
summary(attrition) EmployeeNumber Age Attrition BusinessTravel
Min. : 1.0 Min. :18.00 No :1233 Non-Travel : 150
1st Qu.: 491.2 1st Qu.:30.00 Yes: 237 Travel_Frequently: 277
Median :1020.5 Median :36.00 Travel_Rarely :1043
Mean :1024.9 Mean :36.92
3rd Qu.:1555.8 3rd Qu.:43.00
Max. :2068.0 Max. :60.00
Department DistanceFromHome Education
Human Resources : 63 Min. : 1.000 Bachelor :572
Research & Development:961 1st Qu.: 2.000 Below College:170
Sales :446 Median : 7.000 College :282
Mean : 9.193 Doctor : 48
3rd Qu.:14.000 Master :398
Max. :29.000
EducationField EnvironmentSatisfaction Gender HourlyRate
Human Resources : 27 Min. :1.000 Female:588 Min. : 30.00
Life Sciences :606 1st Qu.:2.000 Male :882 1st Qu.: 48.00
Marketing :159 Median :3.000 Median : 66.00
Medical :464 Mean :2.722 Mean : 65.89
Other : 82 3rd Qu.:4.000 3rd Qu.: 83.75
Technical Degree:132 Max. :4.000 Max. :100.00
JobInvolvement JobLevel JobRole JobSatisfaction
Min. :1.00 Min. :1.000 Sales Executive :326 Min. :1.000
1st Qu.:2.00 1st Qu.:1.000 Research Scientist :292 1st Qu.:2.000
Median :3.00 Median :2.000 Laboratory Technician :259 Median :3.000
Mean :2.73 Mean :2.064 Manufacturing Director :145 Mean :2.729
3rd Qu.:3.00 3rd Qu.:3.000 Healthcare Representative:131 3rd Qu.:4.000
Max. :4.00 Max. :5.000 Manager :102 Max. :4.000
(Other) :215
MaritalStatus MonthlyIncome NumCompaniesWorked OverTime PercentSalaryHike
Divorced:327 Min. : 1009 Min. :0.000 No :1054 Min. :11.00
Married :673 1st Qu.: 2911 1st Qu.:1.000 Yes: 416 1st Qu.:12.00
Single :470 Median : 4919 Median :2.000 Median :14.00
Mean : 6503 Mean :2.693 Mean :15.21
3rd Qu.: 8379 3rd Qu.:4.000 3rd Qu.:18.00
Max. :19999 Max. :9.000 Max. :25.00
PerformanceRating RelationshipSatisfaction StockOptionLevel TotalWorkingYears
Min. :3.000 Min. :1.000 Min. :0.0000 Min. : 0.00
1st Qu.:3.000 1st Qu.:2.000 1st Qu.:0.0000 1st Qu.: 6.00
Median :3.000 Median :3.000 Median :1.0000 Median :10.00
Mean :3.154 Mean :2.712 Mean :0.7939 Mean :11.28
3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:1.0000 3rd Qu.:15.00
Max. :4.000 Max. :4.000 Max. :3.0000 Max. :40.00
TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole
Min. :0.000 Min. :1.000 Min. : 0.000 Min. : 0.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000 1st Qu.: 2.000
Median :3.000 Median :3.000 Median : 5.000 Median : 3.000
Mean :2.799 Mean :2.761 Mean : 7.008 Mean : 4.229
3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000 3rd Qu.: 7.000
Max. :6.000 Max. :4.000 Max. :40.000 Max. :18.000
YearsSinceLastPromotion YearsWithCurrManager
Min. : 0.000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 2.000
Median : 1.000 Median : 3.000
Mean : 2.188 Mean : 4.123
3rd Qu.: 3.000 3rd Qu.: 7.000
Max. :15.000 Max. :17.000
The dataset has:
- About 1470 employee observations and 30 features
- Mean total employee working years about 10 years
- Mean employee years at company is around 7 years
- Mean of the employee still working with the current manager or years with current manager is about 4 years
- The average of Environment Satisfaction and job satisfaction of the employee is in the same level around 2.7
- For the job level in scale 1 to 5, more higher more chalenging of the job responsibility
- Mean of the monly income is 6503 USD
It is important to see which variables are contibuting the most in attrition. But before that we need to know if the variable are any where correlated. There are many continuous variables where we can have a look at their distribution and create a grid of pairplot but that would be too much as there are so many variables.
Describe Each Column
Attrition
# lets crosscheck rate & percentage attration
table(attrition$Attrition)
No Yes
1233 237
round((prop.table(table(attrition$Attrition)))*100,2)
No Yes
83.88 16.12
plot(attrition$Attrition, main="Attration Rate") Attrition is the target variable which has the output “Yes” or No“. From the data”Yes" value is 237 and “No” value is 1233, in percentage value around 16 % turnover meanwhile 84 % still stay. It means employee whose still stay in the company still have a largest number than employees whose leave.
Gender & Age
# Check the gender and age distribution of employee
table(attrition$Gender)
Female Male
588 882
# check age of the employee
hist(attrition$Age,main=" Distribution of Age",xlab="Age", ylab="Count")Most employee is male and average age is between 30-40 years
Bussiness Travel
with(attrition, table(BusinessTravel))BusinessTravel
Non-Travel Travel_Frequently Travel_Rarely
150 277 1043
Most of the employees rarely travel
Department
with(attrition, table(Department))Department
Human Resources Research & Development Sales
63 961 446
Most employees in Research & Development departments
Distance from Home
hist(attrition$DistanceFromHome, main="Distance from Home Distribution", xlab="Distance from Home",ylab="Count")Most of the employees live near the office
Education Level
with(attrition, table(Education))Education
Bachelor Below College College Doctor Master
572 170 282 48 398
Most of the employees have a bachelor degree
Environment Satisfaction
with(attrition, table(EnvironmentSatisfaction))EnvironmentSatisfaction
1 2 3 4
284 287 453 446
Base on the survey of the dataset, below is the level of satisfaction :
- Low
- Medium
- High
- Very High
from the output most of the employees have high environment satisfaction.
Job Involvement
JobInvolvement :
- Low
- Medium
- High
- Very High
with(attrition, table(JobInvolvement))JobInvolvement
1 2 3 4
83 375 868 144
most of the employees have a high job involvement
Job Level
with(attrition, table(JobLevel))JobLevel
1 2 3 4 5
543 534 218 106 69
most of the employees have a low job level
Job Role
with(attrition, table(JobRole))JobRole
Healthcare Representative Human Resources Laboratory Technician
131 52 259
Manager Manufacturing Director Research Director
102 145 80
Research Scientist Sales Executive Sales Representative
292 326 83
Most of the employees in Sales Executive job role
Job Satisfaction
with(attrition, table(JobSatisfaction))JobSatisfaction
1 2 3 4
289 280 442 459
Most of the employees have a high job satisfaction
Marital Status
with(attrition, table(MaritalStatus))MaritalStatus
Divorced Married Single
327 673 470
most of the employees status is married
Monthly Income
plot(density(attrition$MonthlyIncome), main="Monthly Income Distribution" )most of the employees not paying much or earn less
Number of Companies Worked
hist(attrition$NumCompaniesWorked, main = "Number of Companies Worked with", xlab = "Number of companies", ylab = "Count")Most employees change the companies less than 2 times
Overtime
with(attrition, table(OverTime))OverTime
No Yes
1054 416
Not much employees have an overtime in their work
Percent Salary Hike
hist(attrition$PercentSalaryHike, main = "Percent salary Hike Distribution", xlab = "Percent salary Hike", ylab = "Count" ) Percent salary hike between 12-14%
Performance Rating
with(attrition, table(PerformanceRating))PerformanceRating
3 4
1244 226
level of Performance rating of the employees :
- Low
- Good
- Excellent
- Outstanding
and most of the employees performance is Excellent
Relationship Satisfaction
with(attrition, table(RelationshipSatisfaction))RelationshipSatisfaction
1 2 3 4
276 303 459 432
level of Relationship Satisfaction of the employees :
- Low
- Medim
- High
- Very High
Mostly of the employees have a high relationship satisfaction
Work Life Balance
with(attrition, table(WorkLifeBalance))WorkLifeBalance
1 2 3 4
80 344 893 153
Level of Work Life Balance of the employees :
- Bad
- Good
- Better
- Best
most of the employees have a better work life balance
Number of Trainings
hist(attrition$TrainingTimesLastYear, main = "Number of Employees Training", xlab = "Training Count",ylab = "Count")Not much employees got training , between 2-3 times a years.
Years at Company, Years in Current Role, Years Since Last Promotion , and Years With Current Manager
I will make a density plot for this column distribution
par(mfrow = c(2,2))
plot(density(attrition$YearsAtCompany), main = "Years at Company")
plot(density(attrition$YearsInCurrentRole), main = "Years in Current Role")
plot(density(attrition$YearsSinceLastPromotion), main = "Years since last Promotion")
plot(density(attrition$YearsWithCurrManager), main = "Years with Current Manager")- Most of the employees served the company for less than 10 years
- Most of the employees have been in the same current role in long period between 10-15 years, maybe its kind of boring for some people
- Most of the employees just promoted in the last 5 years
- Most of the employees have been worked in the same manager for less than 5 years
Investigate Relationship of Attrition with other Variables
Department vs Job Role
resign <- attrition [attrition$Attrition == "Yes",]resign$Attrition <- droplevels(resign$Attrition)
xtabs(~ Department + JobRole , data = resign) JobRole
Department Healthcare Representative Human Resources
Human Resources 0 12
Research & Development 9 0
Sales 0 0
JobRole
Department Laboratory Technician Manager Manufacturing Director
Human Resources 0 0 0
Research & Development 62 3 10
Sales 0 2 0
JobRole
Department Research Director Research Scientist Sales Executive
Human Resources 0 0 0
Research & Development 2 47 0
Sales 0 0 57
JobRole
Department Sales Representative
Human Resources 0
Research & Development 0
Sales 33
- Most employees whose resign was from Research & Development which Job Role is Laboratory Technician as much 62 employees
- Follow by Sales Department which Job Role is Sales Executive as much 57 employees,
- The third one is from Research & Development also which Job Role is Research Scientist as much 47 employees
Job Role vs Distance FromHome
xtabs(~ JobRole + DistanceFromHome, data=resign) DistanceFromHome
JobRole 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Healthcare Representative 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0
Human Resources 1 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 1
Laboratory Technician 4 11 3 3 2 3 6 4 4 3 0 1 0 2 2 2 2
Manager 0 3 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Manufacturing Director 1 2 1 0 0 0 1 1 0 2 0 0 0 0 0 0 0
Research Director 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
Research Scientist 7 4 5 2 3 1 1 2 2 4 1 0 0 1 0 2 2
Sales Executive 6 2 3 4 2 1 0 2 5 2 2 2 5 0 1 3 0
Sales Representative 7 3 2 0 3 1 3 0 6 0 0 2 0 0 0 0 0
DistanceFromHome
JobRole 18 19 20 21 22 23 24 25 26 27 28 29
Healthcare Representative 0 0 1 1 0 1 1 0 0 0 0 1
Human Resources 1 0 1 0 2 1 0 0 0 0 0 0
Laboratory Technician 0 0 0 1 0 0 5 2 0 0 1 1
Manager 0 0 0 0 0 0 0 0 0 0 0 1
Manufacturing Director 0 0 0 0 1 1 0 0 0 0 0 0
Research Director 0 0 0 0 0 0 0 0 0 0 0 0
Research Scientist 2 1 0 0 1 1 1 3 0 0 0 1
Sales Executive 1 1 1 0 1 1 3 1 3 3 1 1
Sales Representative 0 1 1 1 1 0 2 0 0 0 0 0
- From the output near and far from the office have a high value in some job role. Most attrition employees comes from Laboratory Technician (11), Research Scientist (7),Sales Representative (7), and Sales Executive (6) which live near the office. Meanwhile only 5 Laboratory Technician who lives far from the office leave the company
Monthly Income vs Age vS Gender
aggregate(cbind(Age, MonthlyIncome) ~ Gender, data = resign, mean) Gender Age MonthlyIncome
1 Female 32.57471 4769.736
2 Male 34.20667 4797.160
par(mfrow = c(1,2))
par(cex.main=1)
boxplot(MonthlyIncome ~ Attrition, data = resign, main = "Attrition based on Monthly Income", ylab ="monthly income",xlab ="Attrition")
boxplot(Age ~ Attrition ,data=resign, main ="Attrition based on age",ylab ="Age",xlab ="Attrition")- Mostly employees resign is male around 34 years old with the lowest monthly income, I think it’s a younger group of age
Number of Companies Worked vs Gender
xtabs(~ NumCompaniesWorked + Attrition, data=resign) Attrition
NumCompaniesWorked Yes
0 23
1 98
2 16
3 16
4 17
5 16
6 16
7 17
8 6
9 12
plot(NumCompaniesWorked~Gender,data = resign,
xlab="Number of companies worked in", ylab="Attrition",
main="Effect of number of companies on attrition")- Mostly male employees and change in the companies less than 2 times
Percent Salary Hike vs Training Time Last Year
xtabs(~ PercentSalaryHike + TrainingTimesLastYear, data=resign) TrainingTimesLastYear
PercentSalaryHike 0 1 2 3 4 5 6
11 3 3 15 6 8 5 1
12 3 1 20 6 1 1 1
13 0 2 16 12 2 1 1
14 1 2 6 9 5 0 1
15 2 0 6 8 1 1 0
16 0 1 4 5 4 0 0
17 1 0 4 7 1 0 1
18 1 0 4 5 0 2 1
19 0 0 4 5 0 0 0
20 2 0 3 1 0 1 0
21 0 0 4 0 0 1 0
22 2 0 4 3 1 2 0
23 0 0 4 0 2 0 0
24 0 0 3 2 1 0 0
25 0 0 1 0 0 0 0
- Employees who got percent of salary hike between 11-13 % and got less training just 1 or 2 times a years, most leave the company
Impact of Fixed Variable on Attrition
Over Time vs Business Travel vs Performance Rating
xtabs(~ OverTime + BusinessTravel + PerformanceRating, data=resign), , PerformanceRating = 3
BusinessTravel
OverTime Non-Travel Travel_Frequently Travel_Rarely
No 5 28 63
Yes 3 30 71
, , PerformanceRating = 4
BusinessTravel
OverTime Non-Travel Travel_Frequently Travel_Rarely
No 0 5 9
Yes 4 6 13
- Most employees have an excellent (level 3) performance rating, having a lot of overtime and rarely travelling leave the company
Work Life Balance vs Relationship satisfaction
xtabs(~ WorkLifeBalance + RelationshipSatisfaction, data=resign) RelationshipSatisfaction
WorkLifeBalance 1 2 3 4
1 7 4 9 5
2 10 8 23 17
3 32 26 30 39
4 8 7 9 3
- Employees having better (level 3) work life balance and very high (level 4) relationship satisfaction mostly leave the company
Marital Status
xtabs(~ Attrition +MaritalStatus, data=resign) MaritalStatus
Attrition Divorced Married Single
Yes 33 84 120
- Yes its commonly, single status mostly leave the company
Environment Satisfaction
xtabs(~ Attrition + EnvironmentSatisfaction, data=resign) EnvironmentSatisfaction
Attrition 1 2 3 4
Yes 72 43 62 60
- It explains that attrition is due to low environmental satisfaction.
Education vs Education Field
xtabs(~ Education + EducationField, data=resign) EducationField
Education Human Resources Life Sciences Marketing Medical Other
Bachelor 4 37 15 25 2
Below College 1 8 4 10 2
College 0 18 6 15 1
Doctor 1 1 1 0 0
Master 1 25 9 13 6
EducationField
Education Technical Degree
Bachelor 16
Below College 6
College 4
Doctor 2
Master 4
- Most attrition is in bachelor degree education and in Life Sciences education field
Job Level vs Job Involvement
xtabs(~ JobInvolvement + JobLevel, data=resign) JobLevel
JobInvolvement 1 2 3 4 5
1 15 8 2 1 2
2 46 13 12 0 0
3 75 28 17 2 3
4 7 3 1 2 0
- Mostly attrition in the lowest job level, and in hight job involvement in company
Job Satisfaction
xtabs(~ Attrition + JobSatisfaction, data=resign) JobSatisfaction
Attrition 1 2 3 4
Yes 66 46 73 52
- Mostly attrition in high or level 3 job satisfaction
Years Since Last Promotion
xtabs(~ Attrition + YearsSinceLastPromotion, data=resign) YearsSinceLastPromotion
Attrition 0 1 2 3 4 5 6 7 9 10 11 13 14 15
Yes 110 49 27 9 5 2 6 16 4 1 2 2 1 3
- Most attrition occurs in Employees who did not get promoted
Conclusion
As we can see from this data analysis, we can answer of the question Who, When & Why Of Employee Turnover. Below some of the conclusion which answer those question :
- The biggest part employees whose turnover is from Research & Development Department which Job Role is Laboratory Technician, Sales Department which Job Role is Sales Executive, and follow by Research & Development also which Job Role is Research Scientist, and their marital status is single.
- The biggest factor to why employee attrition is monthly income. Mostly employee attrition is male around 34 years old with the lowest monthly income, I think it’s a younger group of age. They got percent of salary hike between 11-13 % and got less training just 1 or 2 times a years. From the analysis low environmental satisfaction give a big impact also in attrition.
- Most attrition occurs in Employees who did not get promoted, follow by employees who got promotion in the last 2 until 3 years.
- Mostly employees whose turnover have a lowest job level, but they have a lot of overtime in their work, and its mostly happening in male gender with the number companies they have worked less than 2 company.
- Employees whose rarely travel in their work also have a significant attrition, I think it could be they are bored with their routine and environment.
- From the Education point of view, most attrition is in bachelor degree education and in Life Sciences education field.
Organization can target employees based on above factor and determine organizational changes that can improve the working environments and hence minimize the attrition rate.