Employee Attrition Analysis

Introduction

Most of the work we do in the field of people analytics is oriented to helping organizations understand what is most important to their employees, with the goal of making improvements to increase employee engagement and productivity, and reduce unwanted attrition.

Attrition in business can mean the reduction in staff and employees in a company through normal means, such as retirement and resignation, the loss of customers or clients to old age or growing out of the company’s target demographic. Changes in management style, company structure, or other aspects of the company might cause employees to leave the company voluntarily, resulting in a higher attrition rate. Another possible cause of attrition is when a company eliminates a job completely. There are different turnover rates across industries, with hospitality and retail having higher rates compared to other industries. But a high turnover rate can be costly. When you think about your investment in recruiting and training employees and only having them stay on for a short period of time, you are not getting back a return on your investment. Customer attrition generally has a negative effect on the company’s profits and growth. This paper addresses the following issues concerning the attrition of an employee with respect to several paramters. In this paper, we investigate how the general parameters like Education, Department, Monthly Income, OverTime and others impact the attrition of an employee.

Read and Inspection Data

In this case I take the attrition data from Practical Statistics material course

# Read the data
attrition <- read.csv("data_input/attrition.csv")
#View the data
head(attrition)
  EmployeeNumber Age Attrition    BusinessTravel             Department
1              1  41       Yes     Travel_Rarely                  Sales
2              2  49        No Travel_Frequently Research & Development
3              4  37       Yes     Travel_Rarely Research & Development
4              5  33        No Travel_Frequently Research & Development
5              7  27        No     Travel_Rarely Research & Development
6              8  32        No Travel_Frequently Research & Development
  DistanceFromHome     Education EducationField EnvironmentSatisfaction Gender
1                1       College  Life Sciences                       2 Female
2                8 Below College  Life Sciences                       3   Male
3                2       College          Other                       4   Male
4                3        Master  Life Sciences                       4 Female
5                2 Below College        Medical                       1   Male
6                2       College  Life Sciences                       4   Male
  HourlyRate JobInvolvement JobLevel               JobRole JobSatisfaction
1         94              3        2       Sales Executive               4
2         61              2        2    Research Scientist               2
3         92              2        1 Laboratory Technician               3
4         56              3        1    Research Scientist               3
5         40              3        1 Laboratory Technician               2
6         79              3        1 Laboratory Technician               4
  MaritalStatus MonthlyIncome NumCompaniesWorked OverTime PercentSalaryHike
1        Single          5993                  8      Yes                11
2       Married          5130                  1       No                23
3        Single          2090                  6      Yes                15
4       Married          2909                  1      Yes                11
5       Married          3468                  9       No                12
6        Single          3068                  0       No                13
  PerformanceRating RelationshipSatisfaction StockOptionLevel TotalWorkingYears
1                 3                        1                0                 8
2                 4                        4                1                10
3                 3                        2                0                 7
4                 3                        3                0                 8
5                 3                        4                1                 6
6                 3                        3                0                 8
  TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole
1                     0               1              6                  4
2                     3               3             10                  7
3                     3               3              0                  0
4                     3               3              8                  7
5                     3               3              2                  2
6                     2               2              7                  7
  YearsSinceLastPromotion YearsWithCurrManager
1                       0                    5
2                       1                    7
3                       0                    0
4                       3                    0
5                       2                    2
6                       3                    6
#Investigate the dataset
str(attrition)
'data.frame':   1470 obs. of  30 variables:
 $ EmployeeNumber          : int  1 2 4 5 7 8 10 11 12 13 ...
 $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
 $ Attrition               : chr  "Yes" "No" "Yes" "No" ...
 $ BusinessTravel          : chr  "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" "Travel_Frequently" ...
 $ Department              : chr  "Sales" "Research & Development" "Research & Development" "Research & Development" ...
 $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
 $ Education               : chr  "College" "Below College" "College" "Master" ...
 $ EducationField          : chr  "Life Sciences" "Life Sciences" "Other" "Life Sciences" ...
 $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
 $ Gender                  : chr  "Female" "Male" "Male" "Female" ...
 $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
 $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
 $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
 $ JobRole                 : chr  "Sales Executive" "Research Scientist" "Laboratory Technician" "Research Scientist" ...
 $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
 $ MaritalStatus           : chr  "Single" "Married" "Single" "Married" ...
 $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
 $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
 $ OverTime                : chr  "Yes" "No" "Yes" "Yes" ...
 $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
 $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
 $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
 $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
 $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
 $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
 $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
 $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
 $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
 $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
 $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...
#Checking the rows and columns in the dataset
dim(attrition)
[1] 1470   30

From our inspection we can conclude :

  • Attrition data contain 1470 rows and 30 coloumns
  • Each of column name mention as above code in str

and we find some datatype of the columns not in the correct type, we have to change all character columns into categories, as below column :

  • Attrition
  • BusinessTravel
  • Department
  • Education
  • EducationField
  • Gender
  • JobRole
  • MaritalStatus
  • OverTime

actually we can change it directly as we read the data in the previous step, but here I just want to describe steps to inspect the data first .

# Converting the character columns into categories / factor
attrition[,c("Attrition","BusinessTravel","Department","Education","EducationField","Gender","JobRole","MaritalStatus","OverTime")] <- lapply(attrition[,c("Attrition","BusinessTravel","Department","Education","EducationField","Gender","JobRole","MaritalStatus","OverTime")], as.factor)
# Check data again 
str(attrition)
'data.frame':   1470 obs. of  30 variables:
 $ EmployeeNumber          : int  1 2 4 5 7 8 10 11 12 13 ...
 $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
 $ Attrition               : Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
 $ BusinessTravel          : Factor w/ 3 levels "Non-Travel","Travel_Frequently",..: 3 2 3 2 3 2 3 3 2 3 ...
 $ Department              : Factor w/ 3 levels "Human Resources",..: 3 2 2 2 2 2 2 2 2 2 ...
 $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
 $ Education               : Factor w/ 5 levels "Bachelor","Below College",..: 3 2 3 5 2 3 1 2 1 1 ...
 $ EducationField          : Factor w/ 6 levels "Human Resources",..: 2 2 5 2 4 2 4 2 2 4 ...
 $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
 $ Gender                  : Factor w/ 2 levels "Female","Male": 1 2 2 1 2 2 1 2 2 2 ...
 $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
 $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
 $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
 $ JobRole                 : Factor w/ 9 levels "Healthcare Representative",..: 8 7 3 7 3 3 3 3 5 1 ...
 $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
 $ MaritalStatus           : Factor w/ 3 levels "Divorced","Married",..: 3 2 3 2 2 3 2 1 3 2 ...
 $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
 $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
 $ OverTime                : Factor w/ 2 levels "No","Yes": 2 1 2 2 1 1 2 1 1 1 ...
 $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
 $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
 $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
 $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
 $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
 $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
 $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
 $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
 $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
 $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
 $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...

From the above output character data is already change into categories / factor data type. Then we can check for the missing value also , whether missing value available or not in datasheet.

# Check missing value in each column
colSums(is.na(attrition))
          EmployeeNumber                      Age                Attrition 
                       0                        0                        0 
          BusinessTravel               Department         DistanceFromHome 
                       0                        0                        0 
               Education           EducationField  EnvironmentSatisfaction 
                       0                        0                        0 
                  Gender               HourlyRate           JobInvolvement 
                       0                        0                        0 
                JobLevel                  JobRole          JobSatisfaction 
                       0                        0                        0 
           MaritalStatus            MonthlyIncome       NumCompaniesWorked 
                       0                        0                        0 
                OverTime        PercentSalaryHike        PerformanceRating 
                       0                        0                        0 
RelationshipSatisfaction         StockOptionLevel        TotalWorkingYears 
                       0                        0                        0 
   TrainingTimesLastYear          WorkLifeBalance           YearsAtCompany 
                       0                        0                        0 
      YearsInCurrentRole  YearsSinceLastPromotion     YearsWithCurrManager 
                       0                        0                        0 
# Check missing value for all data which will result boolean output true or false 
anyNA(attrition)
[1] FALSE

From the output , there is no missing value, sounds good.

The Who, When & Why Of Employee Turnover

On this steps we are getting to process the data and analyzing it

summary(attrition)
 EmployeeNumber        Age        Attrition            BusinessTravel
 Min.   :   1.0   Min.   :18.00   No :1233   Non-Travel       : 150  
 1st Qu.: 491.2   1st Qu.:30.00   Yes: 237   Travel_Frequently: 277  
 Median :1020.5   Median :36.00              Travel_Rarely    :1043  
 Mean   :1024.9   Mean   :36.92                                      
 3rd Qu.:1555.8   3rd Qu.:43.00                                      
 Max.   :2068.0   Max.   :60.00                                      
                                                                     
                  Department  DistanceFromHome         Education  
 Human Resources       : 63   Min.   : 1.000   Bachelor     :572  
 Research & Development:961   1st Qu.: 2.000   Below College:170  
 Sales                 :446   Median : 7.000   College      :282  
                              Mean   : 9.193   Doctor       : 48  
                              3rd Qu.:14.000   Master       :398  
                              Max.   :29.000                      
                                                                  
          EducationField EnvironmentSatisfaction    Gender      HourlyRate    
 Human Resources : 27    Min.   :1.000           Female:588   Min.   : 30.00  
 Life Sciences   :606    1st Qu.:2.000           Male  :882   1st Qu.: 48.00  
 Marketing       :159    Median :3.000                        Median : 66.00  
 Medical         :464    Mean   :2.722                        Mean   : 65.89  
 Other           : 82    3rd Qu.:4.000                        3rd Qu.: 83.75  
 Technical Degree:132    Max.   :4.000                        Max.   :100.00  
                                                                              
 JobInvolvement    JobLevel                          JobRole    JobSatisfaction
 Min.   :1.00   Min.   :1.000   Sales Executive          :326   Min.   :1.000  
 1st Qu.:2.00   1st Qu.:1.000   Research Scientist       :292   1st Qu.:2.000  
 Median :3.00   Median :2.000   Laboratory Technician    :259   Median :3.000  
 Mean   :2.73   Mean   :2.064   Manufacturing Director   :145   Mean   :2.729  
 3rd Qu.:3.00   3rd Qu.:3.000   Healthcare Representative:131   3rd Qu.:4.000  
 Max.   :4.00   Max.   :5.000   Manager                  :102   Max.   :4.000  
                                (Other)                  :215                  
  MaritalStatus MonthlyIncome   NumCompaniesWorked OverTime   PercentSalaryHike
 Divorced:327   Min.   : 1009   Min.   :0.000      No :1054   Min.   :11.00    
 Married :673   1st Qu.: 2911   1st Qu.:1.000      Yes: 416   1st Qu.:12.00    
 Single  :470   Median : 4919   Median :2.000                 Median :14.00    
                Mean   : 6503   Mean   :2.693                 Mean   :15.21    
                3rd Qu.: 8379   3rd Qu.:4.000                 3rd Qu.:18.00    
                Max.   :19999   Max.   :9.000                 Max.   :25.00    
                                                                               
 PerformanceRating RelationshipSatisfaction StockOptionLevel TotalWorkingYears
 Min.   :3.000     Min.   :1.000            Min.   :0.0000   Min.   : 0.00    
 1st Qu.:3.000     1st Qu.:2.000            1st Qu.:0.0000   1st Qu.: 6.00    
 Median :3.000     Median :3.000            Median :1.0000   Median :10.00    
 Mean   :3.154     Mean   :2.712            Mean   :0.7939   Mean   :11.28    
 3rd Qu.:3.000     3rd Qu.:4.000            3rd Qu.:1.0000   3rd Qu.:15.00    
 Max.   :4.000     Max.   :4.000            Max.   :3.0000   Max.   :40.00    
                                                                              
 TrainingTimesLastYear WorkLifeBalance YearsAtCompany   YearsInCurrentRole
 Min.   :0.000         Min.   :1.000   Min.   : 0.000   Min.   : 0.000    
 1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000   1st Qu.: 2.000    
 Median :3.000         Median :3.000   Median : 5.000   Median : 3.000    
 Mean   :2.799         Mean   :2.761   Mean   : 7.008   Mean   : 4.229    
 3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000   3rd Qu.: 7.000    
 Max.   :6.000         Max.   :4.000   Max.   :40.000   Max.   :18.000    
                                                                          
 YearsSinceLastPromotion YearsWithCurrManager
 Min.   : 0.000          Min.   : 0.000      
 1st Qu.: 0.000          1st Qu.: 2.000      
 Median : 1.000          Median : 3.000      
 Mean   : 2.188          Mean   : 4.123      
 3rd Qu.: 3.000          3rd Qu.: 7.000      
 Max.   :15.000          Max.   :17.000      
                                             

The dataset has:

  • About 1470 employee observations and 30 features
  • Mean total employee working years about 10 years
  • Mean employee years at company is around 7 years
  • Mean of the employee still working with the current manager or years with current manager is about 4 years
  • The average of Environment Satisfaction and job satisfaction of the employee is in the same level around 2.7
  • For the job level in scale 1 to 5, more higher more chalenging of the job responsibility
  • Mean of the monly income is 6503 USD

It is important to see which variables are contibuting the most in attrition. But before that we need to know if the variable are any where correlated. There are many continuous variables where we can have a look at their distribution and create a grid of pairplot but that would be too much as there are so many variables.

Describe Each Column

Attrition

# lets crosscheck rate & percentage attration
table(attrition$Attrition)

  No  Yes 
1233  237 
round((prop.table(table(attrition$Attrition)))*100,2)

   No   Yes 
83.88 16.12 
plot(attrition$Attrition, main="Attration Rate")

Attrition is the target variable which has the output “Yes” or No“. From the data”Yes" value is 237 and “No” value is 1233, in percentage value around 16 % turnover meanwhile 84 % still stay. It means employee whose still stay in the company still have a largest number than employees whose leave.

Gender & Age

# Check the gender and age distribution of employee
table(attrition$Gender)

Female   Male 
   588    882 
# check age of the employee
hist(attrition$Age,main=" Distribution of Age",xlab="Age", ylab="Count")

Most employee is male and average age is between 30-40 years

Bussiness Travel

with(attrition, table(BusinessTravel))
BusinessTravel
       Non-Travel Travel_Frequently     Travel_Rarely 
              150               277              1043 

Most of the employees rarely travel

Department

with(attrition, table(Department))
Department
       Human Resources Research & Development                  Sales 
                    63                    961                    446 

Most employees in Research & Development departments

Distance from Home

hist(attrition$DistanceFromHome, main="Distance from Home Distribution", xlab="Distance from Home",ylab="Count")

Most of the employees live near the office

Education Level

with(attrition, table(Education))
Education
     Bachelor Below College       College        Doctor        Master 
          572           170           282            48           398 

Most of the employees have a bachelor degree

Environment Satisfaction

with(attrition, table(EnvironmentSatisfaction))
EnvironmentSatisfaction
  1   2   3   4 
284 287 453 446 

Base on the survey of the dataset, below is the level of satisfaction :

  1. Low
  2. Medium
  3. High
  4. Very High

from the output most of the employees have high environment satisfaction.

Job Involvement

JobInvolvement :

  1. Low
  2. Medium
  3. High
  4. Very High
with(attrition, table(JobInvolvement))
JobInvolvement
  1   2   3   4 
 83 375 868 144 

most of the employees have a high job involvement

Job Level

with(attrition, table(JobLevel))
JobLevel
  1   2   3   4   5 
543 534 218 106  69 

most of the employees have a low job level

Job Role

with(attrition, table(JobRole))
JobRole
Healthcare Representative           Human Resources     Laboratory Technician 
                      131                        52                       259 
                  Manager    Manufacturing Director         Research Director 
                      102                       145                        80 
       Research Scientist           Sales Executive      Sales Representative 
                      292                       326                        83 

Most of the employees in Sales Executive job role

Job Satisfaction

with(attrition, table(JobSatisfaction))
JobSatisfaction
  1   2   3   4 
289 280 442 459 

Most of the employees have a high job satisfaction

Marital Status

with(attrition, table(MaritalStatus))
MaritalStatus
Divorced  Married   Single 
     327      673      470 

most of the employees status is married

Monthly Income

plot(density(attrition$MonthlyIncome), main="Monthly Income Distribution" )

most of the employees not paying much or earn less

Number of Companies Worked

hist(attrition$NumCompaniesWorked, main = "Number of Companies Worked with", xlab = "Number of companies", ylab = "Count")

Most employees change the companies less than 2 times

Overtime

with(attrition, table(OverTime))
OverTime
  No  Yes 
1054  416 

Not much employees have an overtime in their work

Percent Salary Hike

hist(attrition$PercentSalaryHike, main = "Percent salary Hike Distribution", xlab = "Percent salary Hike", ylab = "Count" )

Percent salary hike between 12-14%

Performance Rating

with(attrition, table(PerformanceRating))
PerformanceRating
   3    4 
1244  226 

level of Performance rating of the employees :

  1. Low
  2. Good
  3. Excellent
  4. Outstanding

and most of the employees performance is Excellent

Relationship Satisfaction

with(attrition, table(RelationshipSatisfaction))
RelationshipSatisfaction
  1   2   3   4 
276 303 459 432 

level of Relationship Satisfaction of the employees :

  1. Low
  2. Medim
  3. High
  4. Very High

Mostly of the employees have a high relationship satisfaction

Work Life Balance

with(attrition, table(WorkLifeBalance))
WorkLifeBalance
  1   2   3   4 
 80 344 893 153 

Level of Work Life Balance of the employees :

  1. Bad
  2. Good
  3. Better
  4. Best

most of the employees have a better work life balance

Number of Trainings

hist(attrition$TrainingTimesLastYear, main = "Number of Employees Training", xlab = "Training Count",ylab = "Count")

Not much employees got training , between 2-3 times a years.

Years at Company, Years in Current Role, Years Since Last Promotion , and Years With Current Manager

I will make a density plot for this column distribution

par(mfrow = c(2,2))

plot(density(attrition$YearsAtCompany), main = "Years at Company")
plot(density(attrition$YearsInCurrentRole), main = "Years in Current Role")
plot(density(attrition$YearsSinceLastPromotion), main = "Years since last Promotion")
plot(density(attrition$YearsWithCurrManager), main = "Years with Current Manager")

  • Most of the employees served the company for less than 10 years
  • Most of the employees have been in the same current role in long period between 10-15 years, maybe its kind of boring for some people
  • Most of the employees just promoted in the last 5 years
  • Most of the employees have been worked in the same manager for less than 5 years

Investigate Relationship of Attrition with other Variables

Department vs Job Role

resign <- attrition [attrition$Attrition == "Yes",]
resign$Attrition <- droplevels(resign$Attrition)
xtabs(~ Department + JobRole , data = resign)
                        JobRole
Department               Healthcare Representative Human Resources
  Human Resources                                0              12
  Research & Development                         9               0
  Sales                                          0               0
                        JobRole
Department               Laboratory Technician Manager Manufacturing Director
  Human Resources                            0       0                      0
  Research & Development                    62       3                     10
  Sales                                      0       2                      0
                        JobRole
Department               Research Director Research Scientist Sales Executive
  Human Resources                        0                  0               0
  Research & Development                 2                 47               0
  Sales                                  0                  0              57
                        JobRole
Department               Sales Representative
  Human Resources                           0
  Research & Development                    0
  Sales                                    33
  • Most employees whose resign was from Research & Development which Job Role is Laboratory Technician as much 62 employees
  • Follow by Sales Department which Job Role is Sales Executive as much 57 employees,
  • The third one is from Research & Development also which Job Role is Research Scientist as much 47 employees

Job Role vs Distance FromHome

xtabs(~ JobRole + DistanceFromHome, data=resign)
                           DistanceFromHome
JobRole                      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  Healthcare Representative  0  1  0  0  0  0  0  0  0  0  1  0  0  1  1  0  0
  Human Resources            1  1  0  0  0  1  0  1  1  0  0  0  1  0  0  0  1
  Laboratory Technician      4 11  3  3  2  3  6  4  4  3  0  1  0  2  2  2  2
  Manager                    0  3  0  0  0  0  0  0  0  0  0  0  0  0  1  0  0
  Manufacturing Director     1  2  1  0  0  0  1  1  0  2  0  0  0  0  0  0  0
  Research Director          0  1  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0
  Research Scientist         7  4  5  2  3  1  1  2  2  4  1  0  0  1  0  2  2
  Sales Executive            6  2  3  4  2  1  0  2  5  2  2  2  5  0  1  3  0
  Sales Representative       7  3  2  0  3  1  3  0  6  0  0  2  0  0  0  0  0
                           DistanceFromHome
JobRole                     18 19 20 21 22 23 24 25 26 27 28 29
  Healthcare Representative  0  0  1  1  0  1  1  0  0  0  0  1
  Human Resources            1  0  1  0  2  1  0  0  0  0  0  0
  Laboratory Technician      0  0  0  1  0  0  5  2  0  0  1  1
  Manager                    0  0  0  0  0  0  0  0  0  0  0  1
  Manufacturing Director     0  0  0  0  1  1  0  0  0  0  0  0
  Research Director          0  0  0  0  0  0  0  0  0  0  0  0
  Research Scientist         2  1  0  0  1  1  1  3  0  0  0  1
  Sales Executive            1  1  1  0  1  1  3  1  3  3  1  1
  Sales Representative       0  1  1  1  1  0  2  0  0  0  0  0
  • From the output near and far from the office have a high value in some job role. Most attrition employees comes from Laboratory Technician (11), Research Scientist (7),Sales Representative (7), and Sales Executive (6) which live near the office. Meanwhile only 5 Laboratory Technician who lives far from the office leave the company

Monthly Income vs Age vS Gender

aggregate(cbind(Age, MonthlyIncome) ~ Gender, data = resign, mean)
  Gender      Age MonthlyIncome
1 Female 32.57471      4769.736
2   Male 34.20667      4797.160
par(mfrow = c(1,2))
par(cex.main=1)
boxplot(MonthlyIncome ~ Attrition, data = resign, main = "Attrition based on Monthly Income", ylab ="monthly income",xlab ="Attrition")
boxplot(Age ~ Attrition ,data=resign, main ="Attrition based on age",ylab ="Age",xlab ="Attrition")

  • Mostly employees resign is male around 34 years old with the lowest monthly income, I think it’s a younger group of age

Number of Companies Worked vs Gender

xtabs(~ NumCompaniesWorked + Attrition, data=resign)
                  Attrition
NumCompaniesWorked Yes
                 0  23
                 1  98
                 2  16
                 3  16
                 4  17
                 5  16
                 6  16
                 7  17
                 8   6
                 9  12
plot(NumCompaniesWorked~Gender,data = resign,
xlab="Number of companies worked in", ylab="Attrition",
main="Effect of number of companies on attrition")

  • Mostly male employees and change in the companies less than 2 times

Percent Salary Hike vs Training Time Last Year

xtabs(~ PercentSalaryHike + TrainingTimesLastYear, data=resign)
                 TrainingTimesLastYear
PercentSalaryHike  0  1  2  3  4  5  6
               11  3  3 15  6  8  5  1
               12  3  1 20  6  1  1  1
               13  0  2 16 12  2  1  1
               14  1  2  6  9  5  0  1
               15  2  0  6  8  1  1  0
               16  0  1  4  5  4  0  0
               17  1  0  4  7  1  0  1
               18  1  0  4  5  0  2  1
               19  0  0  4  5  0  0  0
               20  2  0  3  1  0  1  0
               21  0  0  4  0  0  1  0
               22  2  0  4  3  1  2  0
               23  0  0  4  0  2  0  0
               24  0  0  3  2  1  0  0
               25  0  0  1  0  0  0  0
  • Employees who got percent of salary hike between 11-13 % and got less training just 1 or 2 times a years, most leave the company

Impact of Fixed Variable on Attrition

Over Time vs Business Travel vs Performance Rating

xtabs(~ OverTime + BusinessTravel + PerformanceRating, data=resign)
, , PerformanceRating = 3

        BusinessTravel
OverTime Non-Travel Travel_Frequently Travel_Rarely
     No           5                28            63
     Yes          3                30            71

, , PerformanceRating = 4

        BusinessTravel
OverTime Non-Travel Travel_Frequently Travel_Rarely
     No           0                 5             9
     Yes          4                 6            13
  • Most employees have an excellent (level 3) performance rating, having a lot of overtime and rarely travelling leave the company

Work Life Balance vs Relationship satisfaction

xtabs(~ WorkLifeBalance + RelationshipSatisfaction, data=resign)
               RelationshipSatisfaction
WorkLifeBalance  1  2  3  4
              1  7  4  9  5
              2 10  8 23 17
              3 32 26 30 39
              4  8  7  9  3
  • Employees having better (level 3) work life balance and very high (level 4) relationship satisfaction mostly leave the company

Marital Status

xtabs(~ Attrition  +MaritalStatus, data=resign)
         MaritalStatus
Attrition Divorced Married Single
      Yes       33      84    120
  • Yes its commonly, single status mostly leave the company

Environment Satisfaction

xtabs(~ Attrition + EnvironmentSatisfaction, data=resign)
         EnvironmentSatisfaction
Attrition  1  2  3  4
      Yes 72 43 62 60
  • It explains that attrition is due to low environmental satisfaction.

Education vs Education Field

xtabs(~ Education + EducationField, data=resign)
               EducationField
Education       Human Resources Life Sciences Marketing Medical Other
  Bachelor                    4            37        15      25     2
  Below College               1             8         4      10     2
  College                     0            18         6      15     1
  Doctor                      1             1         1       0     0
  Master                      1            25         9      13     6
               EducationField
Education       Technical Degree
  Bachelor                    16
  Below College                6
  College                      4
  Doctor                       2
  Master                       4
  • Most attrition is in bachelor degree education and in Life Sciences education field

Job Level vs Job Involvement

xtabs(~ JobInvolvement + JobLevel, data=resign)
              JobLevel
JobInvolvement  1  2  3  4  5
             1 15  8  2  1  2
             2 46 13 12  0  0
             3 75 28 17  2  3
             4  7  3  1  2  0
  • Mostly attrition in the lowest job level, and in hight job involvement in company

Job Satisfaction

xtabs(~ Attrition + JobSatisfaction, data=resign)
         JobSatisfaction
Attrition  1  2  3  4
      Yes 66 46 73 52
  • Mostly attrition in high or level 3 job satisfaction

Years Since Last Promotion

xtabs(~ Attrition + YearsSinceLastPromotion, data=resign)
         YearsSinceLastPromotion
Attrition   0   1   2   3   4   5   6   7   9  10  11  13  14  15
      Yes 110  49  27   9   5   2   6  16   4   1   2   2   1   3
  • Most attrition occurs in Employees who did not get promoted

Conclusion

As we can see from this data analysis, we can answer of the question Who, When & Why Of Employee Turnover. Below some of the conclusion which answer those question :

  • The biggest part employees whose turnover is from Research & Development Department which Job Role is Laboratory Technician, Sales Department which Job Role is Sales Executive, and follow by Research & Development also which Job Role is Research Scientist, and their marital status is single.
  • The biggest factor to why employee attrition is monthly income. Mostly employee attrition is male around 34 years old with the lowest monthly income, I think it’s a younger group of age. They got percent of salary hike between 11-13 % and got less training just 1 or 2 times a years. From the analysis low environmental satisfaction give a big impact also in attrition.
  • Most attrition occurs in Employees who did not get promoted, follow by employees who got promotion in the last 2 until 3 years.
  • Mostly employees whose turnover have a lowest job level, but they have a lot of overtime in their work, and its mostly happening in male gender with the number companies they have worked less than 2 company.
  • Employees whose rarely travel in their work also have a significant attrition, I think it could be they are bored with their routine and environment.
  • From the Education point of view, most attrition is in bachelor degree education and in Life Sciences education field.

Organization can target employees based on above factor and determine organizational changes that can improve the working environments and hence minimize the attrition rate.