ACKNOWLDGEMENT
* The dashboard is for educational purpose only and it should be used for understanding the survival analysis technique. The outputs and analysis displayed in the flexdashboard should not strictly be considered for medical related advice or judgement.
DISCLAIMER
* > Authors don’t bear any responsibility for any consequences emnating from the content covered in the dashboard, the web url and links.
Employee Attrition occurs when workforce of organization diminishes gradually over the period of time for many reasons like retirement, resignation for professional or personal reason.
LinkedIn’s 2018 talent turnover report states about attritions in various sectors. Readers interested in this report, they can visit LinkedIn site. According to 2019 Retention Report by Work Institute, the attrition rate will be as high as 35% by 2023. It also points out that only job satisfaction can not keep the employees loyal. Career development, work-life balance and supervisor behavior are the key reasons for leaving the organization.
The challenge in HR retention is not about creating an umbrella HR Planning but offering the sparsely different HR models which are highly customizable at micro level or individual level. HR manager needs an predictive model which can predict the propensity of attrition at individual level so that he can plan in way before employee actually walks out.But how to do that as HR manager is not an prediction expert and they come from no mathematics modeling background, generally.
This flexdashboard is an attempt to address this problem. Let’s begin, so click on next menu item on “MENU” tab.
[1] "Age" "Attrition"
[3] "BusinessTravel" "DailyRate"
[5] "Department" "DistanceFromHome"
[7] "Education" "EducationField"
[9] "EmployeeCount" "EmployeeNumber"
[11] "EnvironmentSatisfaction" "Gender"
[13] "HourlyRate" "JobInvolvement"
[15] "JobLevel" "JobRole"
[17] "JobSatisfaction" "MaritalStatus"
[19] "MonthlyIncome" "MonthlyRate"
[21] "NumCompaniesWorked" "Over18"
[23] "OverTime" "PercentSalaryHike"
[25] "PerformanceRating" "RelationshipSatisfaction"
[27] "StandardHours" "StockOptionLevel"
[29] "TotalWorkingYears" "TrainingTimesLastYear"
[31] "WorkLifeBalance" "YearsAtCompany"
[33] "YearsInCurrentRole" "YearsSinceLastPromotion"
[35] "YearsWithCurrManager"
'data.frame': 1470 obs. of 35 variables:
$ Age : int 41 49 37 33 27 32 59 30 38 36 ...
$ Attrition : chr "Yes" "No" "Yes" "No" ...
$ BusinessTravel : chr "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" "Travel_Frequently" ...
$ DailyRate : int 1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
$ Department : chr "Sales" "Research & Development" "Research & Development" "Research & Development" ...
$ DistanceFromHome : int 1 8 2 3 2 2 3 24 23 27 ...
$ Education : int 2 1 2 4 1 2 3 1 3 3 ...
$ EducationField : chr "Life Sciences" "Life Sciences" "Other" "Life Sciences" ...
$ EmployeeCount : int 1 1 1 1 1 1 1 1 1 1 ...
$ EmployeeNumber : int 1 2 4 5 7 8 10 11 12 13 ...
$ EnvironmentSatisfaction : int 2 3 4 4 1 4 3 4 4 3 ...
$ Gender : chr "Female" "Male" "Male" "Female" ...
$ HourlyRate : int 94 61 92 56 40 79 81 67 44 94 ...
$ JobInvolvement : int 3 2 2 3 3 3 4 3 2 3 ...
$ JobLevel : int 2 2 1 1 1 1 1 1 3 2 ...
$ JobRole : chr "Sales Executive" "Research Scientist" "Laboratory Technician" "Research Scientist" ...
$ JobSatisfaction : int 4 2 3 3 2 4 1 3 3 3 ...
$ MaritalStatus : chr "Single" "Married" "Single" "Married" ...
$ MonthlyIncome : int 5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
$ MonthlyRate : int 19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
$ NumCompaniesWorked : int 8 1 6 1 9 0 4 1 0 6 ...
$ Over18 : chr "Y" "Y" "Y" "Y" ...
$ OverTime : chr "Yes" "No" "Yes" "Yes" ...
$ PercentSalaryHike : int 11 23 15 11 12 13 20 22 21 13 ...
$ PerformanceRating : int 3 4 3 3 3 3 4 4 4 3 ...
$ RelationshipSatisfaction: int 1 4 2 3 4 3 1 2 2 2 ...
$ StandardHours : int 80 80 80 80 80 80 80 80 80 80 ...
$ StockOptionLevel : int 0 1 0 0 1 0 3 1 0 2 ...
$ TotalWorkingYears : int 8 10 7 8 6 8 12 1 10 17 ...
$ TrainingTimesLastYear : int 0 3 3 3 3 2 3 2 2 3 ...
$ WorkLifeBalance : int 1 3 3 3 3 2 2 3 3 2 ...
$ YearsAtCompany : int 6 10 0 8 2 7 1 1 9 7 ...
$ YearsInCurrentRole : int 4 7 0 7 2 7 0 0 7 7 ...
$ YearsSinceLastPromotion : int 0 1 0 3 2 3 0 0 1 7 ...
$ YearsWithCurrManager : int 5 7 0 0 2 6 0 0 8 7 ...
NULL
Age Attrition BusinessTravel DailyRate
Min. :18.00 Length:1470 Length:1470 Min. : 102.0
1st Qu.:30.00 Class :character Class :character 1st Qu.: 465.0
Median :36.00 Mode :character Mode :character Median : 802.0
Mean :36.92 Mean : 802.5
3rd Qu.:43.00 3rd Qu.:1157.0
Max. :60.00 Max. :1499.0
Department DistanceFromHome Education EducationField
Length:1470 Min. : 1.000 Min. :1.000 Length:1470
Class :character 1st Qu.: 2.000 1st Qu.:2.000 Class :character
Mode :character Median : 7.000 Median :3.000 Mode :character
Mean : 9.193 Mean :2.913
3rd Qu.:14.000 3rd Qu.:4.000
Max. :29.000 Max. :5.000
EmployeeCount EmployeeNumber EnvironmentSatisfaction Gender
Min. :1 Min. : 1.0 Min. :1.000 Length:1470
1st Qu.:1 1st Qu.: 491.2 1st Qu.:2.000 Class :character
Median :1 Median :1020.5 Median :3.000 Mode :character
Mean :1 Mean :1024.9 Mean :2.722
3rd Qu.:1 3rd Qu.:1555.8 3rd Qu.:4.000
Max. :1 Max. :2068.0 Max. :4.000
HourlyRate JobInvolvement JobLevel JobRole
Min. : 30.00 Min. :1.00 Min. :1.000 Length:1470
1st Qu.: 48.00 1st Qu.:2.00 1st Qu.:1.000 Class :character
Median : 66.00 Median :3.00 Median :2.000 Mode :character
Mean : 65.89 Mean :2.73 Mean :2.064
3rd Qu.: 83.75 3rd Qu.:3.00 3rd Qu.:3.000
Max. :100.00 Max. :4.00 Max. :5.000
JobSatisfaction MaritalStatus MonthlyIncome MonthlyRate
Min. :1.000 Length:1470 Min. : 1009 Min. : 2094
1st Qu.:2.000 Class :character 1st Qu.: 2911 1st Qu.: 8047
Median :3.000 Mode :character Median : 4919 Median :14236
Mean :2.729 Mean : 6503 Mean :14313
3rd Qu.:4.000 3rd Qu.: 8379 3rd Qu.:20462
Max. :4.000 Max. :19999 Max. :26999
NumCompaniesWorked Over18 OverTime PercentSalaryHike
Min. :0.000 Length:1470 Length:1470 Min. :11.00
1st Qu.:1.000 Class :character Class :character 1st Qu.:12.00
Median :2.000 Mode :character Mode :character Median :14.00
Mean :2.693 Mean :15.21
3rd Qu.:4.000 3rd Qu.:18.00
Max. :9.000 Max. :25.00
PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
Min. :3.000 Min. :1.000 Min. :80 Min. :0.0000
1st Qu.:3.000 1st Qu.:2.000 1st Qu.:80 1st Qu.:0.0000
Median :3.000 Median :3.000 Median :80 Median :1.0000
Mean :3.154 Mean :2.712 Mean :80 Mean :0.7939
3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:80 3rd Qu.:1.0000
Max. :4.000 Max. :4.000 Max. :80 Max. :3.0000
TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany
Min. : 0.00 Min. :0.000 Min. :1.000 Min. : 0.000
1st Qu.: 6.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.: 3.000
Median :10.00 Median :3.000 Median :3.000 Median : 5.000
Mean :11.28 Mean :2.799 Mean :2.761 Mean : 7.008
3rd Qu.:15.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.: 9.000
Max. :40.00 Max. :6.000 Max. :4.000 Max. :40.000
YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.: 2.000
Median : 3.000 Median : 1.000 Median : 3.000
Mean : 4.229 Mean : 2.188 Mean : 4.123
3rd Qu.: 7.000 3rd Qu.: 3.000 3rd Qu.: 7.000
Max. :18.000 Max. :15.000 Max. :17.000
attr$EmployeeCount=NULL # Removing variable because of constant value
attr$EmployeeNumber=NULL # Removing variable because it is simply a serial number
attr$StandardHours=NULL # Removing variable because of constant value
# Converting Attrition character into factor
attr$Attrition=as.factor(attr$Attrition)
# Attrition=data.frame(Attrition)
# Converting BusinessTravel into factor variable
attr$BusinessTravel=as.factor(attr$BusinessTravel)
# Creating HighRate variable based on DailyRate
attr$HighRate=as.factor(ifelse(attr$DailyRate>750,"HighRate","LowRate"))
# Converting Department Variable into factor vairable
attr$Department=as.factor(attr$Department)
# Coverting Education into factor variable
attr$Education=as.factor(attr$Education)
# Converting EducationField variable into factor variable
attr$EducationField=as.factor(attr$EducationField)
# Converting Gender into factor variable
attr$Gender=as.factor(attr$Gender)
# Changing JobRole into factor variable
attr$JobRole=as.factor(attr$JobRole)
# Changing Marital Status into Factor Variable
attr$MaritalStatus=as.factor(attr$MaritalStatus)
# Removing Over18 Variable
attr$Over18=NULL
# Changing OverTime variable into factor variable
attr$OverTime=as.factor(attr$OverTime)
# Changing Attrition variable into numeric variable
Attrition=ifelse(attr$Attrition=="Yes",1,0)
attr$status=Attrition
attr$Attrition=NULL
time=attr$YearsAtCompany
attr$time=time
attr$YearsAtCompany=NULL
attr$BusinessTravel=as.factor(attr$BusinessTravel)
attr$Department=as.factor(attr$Department)
attr$EducationField=as.factor(attr$EducationField)
attr$Gender=as.factor(attr$Gender)
attr$JobRole=as.factor(attr$JobRole)
attr$MaritalStatus=as.factor(attr$MaritalStatus)
attr$OverTime=as.factor(attr$OverTime)
attr$Over18=NULL
WHAT IS SURVIVAL ANALYSIS
Survival Analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems.
This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?
More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an “event” in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research. Survival Analysis can also be used in non traditional fields like predicting the employee attrition, predicting the customer churn, predictive maintenance, earthquake prediction and many other.
The case study in focus is Estimating Employee Attrition based on Survival Analysis Techniques. In this flexdashbaord, we are going to use Kaplan Meier Model, Cox Proportional Hazard Model, Random Forest Model and Weibull’s Model.
Source:https://en.wikipedia.org/wiki/Survival_analysis
The dataset used in this Flexdashboard can be accessed from the following link—
https://www.kaggle.com/patelprashant/employee-attrition
Call: survfit(formula = Surv(time, status) ~ 1, data = attr)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0 1470 16 0.989 0.00271 0.984 0.994
1 1426 59 0.948 0.00583 0.937 0.960
2 1255 27 0.928 0.00690 0.914 0.941
3 1128 20 0.911 0.00769 0.896 0.927
4 1000 19 0.894 0.00851 0.877 0.911
5 890 21 0.873 0.00947 0.855 0.892
6 694 9 0.862 0.01007 0.842 0.882
7 618 11 0.846 0.01091 0.825 0.868
8 528 9 0.832 0.01173 0.809 0.855
9 448 8 0.817 0.01264 0.793 0.842
10 366 18 0.777 0.01516 0.748 0.807
11 246 2 0.770 0.01568 0.740 0.802
13 200 2 0.763 0.01644 0.731 0.796
14 176 2 0.754 0.01736 0.721 0.789
15 158 1 0.749 0.01789 0.715 0.785
16 138 1 0.744 0.01857 0.708 0.781
17 126 1 0.738 0.01934 0.701 0.777
18 117 1 0.732 0.02018 0.693 0.772
19 104 1 0.725 0.02117 0.684 0.767
20 93 1 0.717 0.02233 0.674 0.762
21 66 1 0.706 0.02449 0.660 0.756
22 52 1 0.692 0.02753 0.641 0.749
23 37 1 0.674 0.03253 0.613 0.741
24 35 1 0.654 0.03686 0.586 0.731
31 16 1 0.614 0.05256 0.519 0.726
32 13 1 0.566 0.06641 0.450 0.713
33 10 1 0.510 0.08037 0.374 0.694
40 1 1 0.000 NaN NA NA
Call: survfit(formula = Surv(time, status) ~ Gender, data = attr)
n events median 0.95LCL 0.95UCL
Gender=Female 588 87 32 32 NA
Gender=Male 882 150 NA 31 NA
Call: survfit(formula = Surv(time, status) ~ PerformanceRating, data = attr)
n events median 0.95LCL 0.95UCL
PerformanceRating=3 1244 200 33 32 NA
PerformanceRating=4 226 37 NA NA NA
Call: survfit(formula = Surv(time, status) ~ OverTime, data = attr)
n events median 0.95LCL 0.95UCL
OverTime=No 1054 110 40 32 NA
OverTime=Yes 416 127 24 16 NA
Call: survfit(formula = Surv(time, status) ~ MaritalStatus, data = attr)
n events median 0.95LCL 0.95UCL
MaritalStatus=Divorced 327 33 NA NA NA
MaritalStatus=Married 673 84 33 32 NA
MaritalStatus=Single 470 120 24 19 NA
Call: survfit(formula = Surv(time, status) ~ JobRole, data = attr)
n events median 0.95LCL 0.95UCL
JobRole=Healthcare Representative 131 9 40 NA NA
JobRole=Human Resources 52 12 20 NA NA
JobRole=Laboratory Technician 259 62 NA 17 NA
JobRole=Manager 102 5 NA NA NA
JobRole=Manufacturing Director 145 10 33 NA NA
JobRole=Research Director 80 2 NA 31 NA
JobRole=Research Scientist 292 47 NA NA NA
JobRole=Sales Executive 326 57 23 19 NA
JobRole=Sales Representative 83 33 5 3 NA
Call: survfit(formula = Surv(time, status) ~ WorkLifeBalance, data = attr)
n events median 0.95LCL 0.95UCL
WorkLifeBalance=1 80 25 14 10 NA
WorkLifeBalance=2 344 58 40 NA NA
WorkLifeBalance=3 893 127 33 32 NA
WorkLifeBalance=4 153 27 NA NA NA
Call: survfit(formula = Surv(time, status) ~ BusinessTravel, data = attr)
n events median 0.95LCL 0.95UCL
BusinessTravel=Non-Travel 150 12 NA NA NA
BusinessTravel=Travel_Frequently 277 69 NA 21 NA
BusinessTravel=Travel_Rarely 1043 156 33 31 NA
Call: survfit(formula = Surv(time, status) ~ JobInvolvement, data = attr)
n events median 0.95LCL 0.95UCL
JobInvolvement=1 83 28 23 10 NA
JobInvolvement=2 375 71 NA 33 NA
JobInvolvement=3 868 125 40 31 NA
JobInvolvement=4 144 13 NA 19 NA
Call: survfit(formula = Surv(time, status) ~ JobRole, data = attr)
n events median 0.95LCL 0.95UCL
JobRole=Healthcare Representative 131 9 40 NA NA
JobRole=Human Resources 52 12 20 NA NA
JobRole=Laboratory Technician 259 62 NA 17 NA
JobRole=Manager 102 5 NA NA NA
JobRole=Manufacturing Director 145 10 33 NA NA
JobRole=Research Director 80 2 NA 31 NA
JobRole=Research Scientist 292 47 NA NA NA
JobRole=Sales Executive 326 57 23 19 NA
JobRole=Sales Representative 83 33 5 3 NA
Call: survfit(formula = Surv(time, status) ~ JobSatisfaction, data = attr)
n events median 0.95LCL 0.95UCL
JobSatisfaction=1 289 66 NA 20 NA
JobSatisfaction=2 280 46 31 31 NA
JobSatisfaction=3 442 73 NA 23 NA
JobSatisfaction=4 459 52 33 32 NA
Call:
coxph(formula = Surv(time, status) ~ ., data = attr)
coef exp(coef) se(coef) z
Age -1.847e-02 9.817e-01 1.116e-02 -1.655
BusinessTravelTravel_Frequently 1.466e+00 4.332e+00 3.480e-01 4.212
BusinessTravelTravel_Rarely 9.173e-01 2.503e+00 3.277e-01 2.799
DailyRate 3.353e-04 1.000e+00 3.451e-04 0.972
DepartmentResearch & Development 1.569e+01 6.486e+06 1.648e+03 0.010
DepartmentSales 1.585e+01 7.662e+06 1.648e+03 0.010
DistanceFromHome 2.357e-02 1.024e+00 8.307e-03 2.837
Education2 4.350e-01 1.545e+00 2.580e-01 1.686
Education3 3.934e-01 1.482e+00 2.267e-01 1.736
Education4 1.772e-01 1.194e+00 2.538e-01 0.698
Education5 -2.248e-01 7.987e-01 5.625e-01 -0.400
EducationFieldLife Sciences -5.309e-01 5.881e-01 6.282e-01 -0.845
EducationFieldMarketing -3.597e-02 9.647e-01 6.638e-01 -0.054
EducationFieldMedical -5.633e-01 5.693e-01 6.249e-01 -0.901
EducationFieldOther -4.182e-01 6.582e-01 6.823e-01 -0.613
EducationFieldTechnical Degree -6.625e-02 9.359e-01 6.385e-01 -0.104
EnvironmentSatisfaction -3.022e-01 7.392e-01 6.252e-02 -4.834
GenderMale 3.783e-01 1.460e+00 1.465e-01 2.582
HourlyRate 3.078e-03 1.003e+00 3.515e-03 0.876
JobInvolvement -2.948e-01 7.447e-01 9.185e-02 -3.209
JobLevel -1.070e-01 8.985e-01 2.716e-01 -0.394
JobRoleHuman Resources 1.667e+01 1.744e+07 1.648e+03 0.010
JobRoleLaboratory Technician 1.511e+00 4.530e+00 4.496e-01 3.360
JobRoleManager 7.133e-01 2.041e+00 7.914e-01 0.901
JobRoleManufacturing Director 6.167e-01 1.853e+00 5.055e-01 1.220
JobRoleResearch Director -5.654e-01 5.681e-01 9.647e-01 -0.586
JobRoleResearch Scientist 5.766e-01 1.780e+00 4.651e-01 1.240
JobRoleSales Executive 7.812e-01 2.184e+00 1.068e+00 0.732
JobRoleSales Representative 1.687e+00 5.405e+00 1.096e+00 1.540
JobSatisfaction -3.635e-01 6.952e-01 6.316e-02 -5.756
MaritalStatusMarried 1.682e-01 1.183e+00 2.247e-01 0.749
MaritalStatusSingle 7.689e-01 2.157e+00 2.799e-01 2.747
MonthlyIncome -2.678e-05 1.000e+00 6.661e-05 -0.402
MonthlyRate 1.474e-05 1.000e+00 9.880e-06 1.492
NumCompaniesWorked 2.484e-01 1.282e+00 2.958e-02 8.397
OverTimeYes 1.478e+00 4.385e+00 1.457e-01 10.148
PercentSalaryHike 1.354e-03 1.001e+00 3.072e-02 0.044
PerformanceRating 4.181e-03 1.004e+00 3.138e-01 0.013
RelationshipSatisfaction -2.035e-01 8.158e-01 6.612e-02 -3.078
StockOptionLevel -1.994e-01 8.192e-01 1.293e-01 -1.542
TotalWorkingYears -1.490e-01 8.615e-01 3.063e-02 -4.866
TrainingTimesLastYear -2.022e-01 8.169e-01 6.074e-02 -3.329
WorkLifeBalance -1.326e-01 8.758e-01 9.355e-02 -1.418
YearsInCurrentRole -3.429e-01 7.097e-01 3.520e-02 -9.743
YearsSinceLastPromotion 5.580e-02 1.057e+00 3.294e-02 1.694
YearsWithCurrManager -3.343e-01 7.158e-01 3.567e-02 -9.374
HighRateLowRate 3.898e-01 1.477e+00 2.857e-01 1.364
p
Age 0.097877
BusinessTravelTravel_Frequently 2.53e-05
BusinessTravelTravel_Rarely 0.005127
DailyRate 0.331244
DepartmentResearch & Development 0.992405
DepartmentSales 0.992324
DistanceFromHome 0.004555
Education2 0.091746
Education3 0.082626
Education4 0.485103
Education5 0.689376
EducationFieldLife Sciences 0.397995
EducationFieldMarketing 0.956789
EducationFieldMedical 0.367387
EducationFieldOther 0.539932
EducationFieldTechnical Degree 0.917352
EnvironmentSatisfaction 1.34e-06
GenderMale 0.009820
HourlyRate 0.381277
JobInvolvement 0.001331
JobLevel 0.693507
JobRoleHuman Resources 0.991926
JobRoleLaboratory Technician 0.000780
JobRoleManager 0.367421
JobRoleManufacturing Director 0.222527
JobRoleResearch Director 0.557789
JobRoleResearch Scientist 0.215075
JobRoleSales Executive 0.464374
JobRoleSales Representative 0.123522
JobSatisfaction 8.63e-09
MaritalStatusMarried 0.454110
MaritalStatusSingle 0.006014
MonthlyIncome 0.687687
MonthlyRate 0.135778
NumCompaniesWorked < 2e-16
OverTimeYes < 2e-16
PercentSalaryHike 0.964844
PerformanceRating 0.989370
RelationshipSatisfaction 0.002082
StockOptionLevel 0.123135
TotalWorkingYears 1.14e-06
TrainingTimesLastYear 0.000871
WorkLifeBalance 0.156242
YearsInCurrentRole < 2e-16
YearsSinceLastPromotion 0.090279
YearsWithCurrManager < 2e-16
HighRateLowRate 0.172430
Likelihood ratio test=843.6 on 47 df, p=< 2.2e-16
n= 1470, number of events= 237
* Dr AMITA SHARMA Post Doc from Erasmus University, Rotterdam, the Netherlands Assistant Professor Institute of Agri Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner (Raj),India Blog: www.thinkingai.in
* ARUN KUMAR SHARMA Machine Learning Enthusiast 13 Years of Financial Services Marketing Exp Blogger, Writer and Machine Learning Consutlant Certified Business Analytics Professional Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx) Certified in Text Analytics, openSAP Email: aks10000@gmail.com Tel:9468567418
---
title: "Understanding Employee Attrition with Survival Analysis"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
social: ["facebook","twitter", "menu"]
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(survival)
library(ggplot2)
library(ggfortify)
library(ranger)
library(DT)
```
ACKNOWELDGEMENT & DISCLAIMER
====================================================
ACKNOWLDGEMENT
* The dashboard is for educational purpose only and it should be used for understanding the survival analysis technique. The outputs and analysis displayed in the flexdashboard should not strictly be considered for medical related advice or judgement.
DISCLAIMER
* > Authors don't bear any responsibility for any consequences emnating from the content covered in the dashboard, the web url and links.
What is Employee Attrition? {data-navmenu="MENU"}
===================================================
Employee Attrition occurs when workforce of organization diminishes gradually over the period of time for many reasons like retirement, resignation for professional or personal reason.
LinkedIn's 2018 talent turnover report states about attritions in various sectors. Readers interested in this report, they can visit LinkedIn site. According to 2019 Retention Report by Work Institute, the attrition rate will be as high as 35% by 2023. It also points out that only job satisfaction can not keep the employees loyal. Career development, work-life balance and supervisor behavior are the key reasons for leaving the organization.
The challenge in HR retention is not about creating an umbrella HR Planning but offering the sparsely different HR models which are highly customizable at micro level or individual level. HR manager needs an predictive model which can predict the propensity of attrition at individual level so that he can plan in way before employee actually walks out.But how to do that as HR manager is not an prediction expert and they come from no mathematics modeling background, generally.
This flexdashboard is an attempt to address this problem. Let's begin, so click on next menu item on "MENU" tab.
```{r}
setwd("C:/Users/arunkumar/Desktop/R/flexdashboard/HR Employee Attrition")
attr=read.csv("attrition.csv", header=TRUE, stringsAsFactors = FALSE)
```
Overview of Dataset {data-navmenu="MENU"}
=========================================================
Row {.tabset}
---------------------------------------------------------
### Dataset Table
```{r}
DT::datatable(attr, filter="top")
```
### Variables in the Dataset
```{r}
print(colnames(attr))
```
### Structure of the Dataset
```{r}
print(str(attr))
```
### Summary Statistics of the Dataset
```{r}
print(summary(attr))
```
### Corrplot of Numeric Variables
```{r}
cor_data=dplyr::select_if(attr, is.numeric)
cor_data=cor_data[,-c(5,18)]
cormat=cor(cor_data)
corrplot::corrplot(cormat, title = "Correlation Plot of All Numeric Variables")
```
Data Manipulation {data-navmenu="MENU"}
========================================
Row {.tabset}
------------------------------------------
### Data Manipulation
```{r echo=TRUE}
attr$EmployeeCount=NULL # Removing variable because of constant value
attr$EmployeeNumber=NULL # Removing variable because it is simply a serial number
attr$StandardHours=NULL # Removing variable because of constant value
# Converting Attrition character into factor
attr$Attrition=as.factor(attr$Attrition)
# Attrition=data.frame(Attrition)
# Converting BusinessTravel into factor variable
attr$BusinessTravel=as.factor(attr$BusinessTravel)
# Creating HighRate variable based on DailyRate
attr$HighRate=as.factor(ifelse(attr$DailyRate>750,"HighRate","LowRate"))
# Converting Department Variable into factor vairable
attr$Department=as.factor(attr$Department)
# Coverting Education into factor variable
attr$Education=as.factor(attr$Education)
# Converting EducationField variable into factor variable
attr$EducationField=as.factor(attr$EducationField)
# Converting Gender into factor variable
attr$Gender=as.factor(attr$Gender)
# Changing JobRole into factor variable
attr$JobRole=as.factor(attr$JobRole)
# Changing Marital Status into Factor Variable
attr$MaritalStatus=as.factor(attr$MaritalStatus)
# Removing Over18 Variable
attr$Over18=NULL
# Changing OverTime variable into factor variable
attr$OverTime=as.factor(attr$OverTime)
# Changing Attrition variable into numeric variable
Attrition=ifelse(attr$Attrition=="Yes",1,0)
attr$status=Attrition
attr$Attrition=NULL
time=attr$YearsAtCompany
attr$time=time
attr$YearsAtCompany=NULL
attr$BusinessTravel=as.factor(attr$BusinessTravel)
attr$Department=as.factor(attr$Department)
attr$EducationField=as.factor(attr$EducationField)
attr$Gender=as.factor(attr$Gender)
attr$JobRole=as.factor(attr$JobRole)
attr$MaritalStatus=as.factor(attr$MaritalStatus)
attr$OverTime=as.factor(attr$OverTime)
attr$Over18=NULL
```
### Data Table after Data Manipulation
```{r}
datatable(attr,
filter="top")
```
Survival Analysis: Explainer {data-navmenu="MENU"}
===================================================
Row
----------------------------------------
WHAT IS SURVIVAL ANALYSIS
Survival Analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems.
This topic is called reliability theory or reliability analysis in engineering, duration analysis or duration modelling in economics, and event history analysis in sociology. Survival analysis attempts to answer questions such as: what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?
More generally, survival analysis involves the modelling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs for each subject, after which the organism or mechanism is dead or broken. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research. Survival Analysis can also be used in non traditional fields like predicting the employee attrition, predicting the customer churn, predictive maintenance, earthquake prediction and many other.
The case study in focus is Estimating Employee Attrition based on Survival Analysis Techniques. In this flexdashbaord, we are going to use Kaplan Meier Model, Cox Proportional Hazard Model, Random Forest Model and Weibull's Model.
Source:https://en.wikipedia.org/wiki/Survival_analysis
The dataset used in this Flexdashboard can be accessed from the following link---
https://www.kaggle.com/patelprashant/employee-attrition
Survival Analysis: Employee Attrition {data-navmenu="MENU"}
======================================================
Row {.tabset}
------------------------------------------------------
### Kaplan Meier Model
```{r}
fit1=survfit(Surv(time, status)~1, data=attr)
summary(fit1)
```
### KM Model Plot
```{r}
plot(fit1, xlab="Survival Time of Employee in Years",ylab="Survival Probabilities", main="Plot of Kaplan Meier Model on Attrition", col="blue", lwd=2, ylim=c(0.4,1))
```
### KM Model With Gender Comparison
```{r}
fit2=survfit(Surv(time, status)~Gender,data=attr)
print(fit2)
```
### KM Model Plot with Gender Comparison
```{r}
plot(fit2, col=1:2, lwd=2, mark.time=TRUE,
xlab="Survival Time of Employee in Years", ylab="Survival Probabilities", main="KM Model Plot with Gender Comparison")
legend(32, 1, c("Female", "Male"),
col=1:2, lwd=2, bty='n')
```
### KM Model With Performance Rating
```{r}
fit3=survfit(Surv(time, status)~PerformanceRating, data=attr)
fit3
```
### KM Model Plot with Performance Rating
```{r}
plot(fit3,fun='event',col=1:2,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Performance Rating")
legend(20,1, paste("PerformanceRating", 3:4, sep=' ='), col=1:2,
lty=1:2, lwd=2, bty='n')
```
### KM Model With OverTime
```{r}
fit4=survfit(Surv(time, status)~OverTime, data=attr)
fit4
```
### KM Model Plot with OverTime
```{r}
plot(fit4,fun='event',col=3:4,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with OverTime")
legend(20,1,c("OverTime","No OverTime"),
lty=1:2,col=3:4, lwd=2, bty='n')
```
### KM Model With Marital Status
```{r}
fit5=survfit(Surv(time, status)~MaritalStatus, data=attr)
fit5
```
### KM Model Plot with Marital Status
```{r}
plot(fit5,fun='event',col=5:7,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Marital Status")
legend(20,1,c("Divorced","Married","Single"),
lty=1:3,col=5:7, lwd=2, bty='n')
```
### KM Model With OverTime
```{r}
fit6=survfit(Surv(time, status)~JobRole, data=attr)
fit6
```
### KM Model Plot with Job Role
```{r}
plot(fit6,fun='event',col=1:9,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job Role")
legend(25,1,c("Healthcare Representative","Human Resources","Laboratory Technician", "Manager","Manufacturing Director","Research Director","Research Scientist","Sales Executive","Sales Representative"),
lty=1:9,col=1:9, lwd=2, bty='n', cex=0.5)
```
### KM Model With WorkLifeBalance
```{r}
fit7=survfit(Surv(time, status)~WorkLifeBalance, data=attr)
fit7
```
### KM Model Plot with WorkLifeBalance
```{r}
plot(fit7,fun='event',col=11:14,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Work Life Balance")
legend(25,1,c("WLB=1","WLB=2","WLB=3","WLB=4"),
lty=1:4,col=11:14, lwd=2, bty='n', cex=0.5)
```
### KM Model With BusinessTravel
```{r}
fit8=survfit(Surv(time, status)~BusinessTravel, data=attr)
fit8
```
### KM Model Plot with WorkLifeBalance
```{r}
plot(fit8,fun='event',col=21:23,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Business Travel")
legend(25,1,c("Non-Travel","Travel-Frequently","Travel-Rarely"),
lty=9:11,col=21:23, lwd=2, bty='n', cex=0.5)
```
### KM Model With Job Involvement
```{r}
fit9=survfit(Surv(time, status)~JobInvolvement, data=attr)
fit9
```
### KM Model Plot with WorkLifeBalance
```{r}
plot(fit9,fun='event',col=21:24,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job involvement")
legend(25,1,c("JB=1","JB=2","JB=3","JB=4"),
lty=9:12,col=21:24, lwd=2, bty='n', cex=0.5)
```
### KM Model With Job Level
```{r}
fit10=survfit(Surv(time, status)~JobRole, data=attr)
fit10
```
### KM Model Plot with Job Level
```{r}
plot(fit10,fun='event',col=21:25,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job Level")
legend(25,1,c("JL=1","JL=2","JL=3","JL=4","JL=5"),
lty=9:13,col=21:25, lwd=2, bty='n', cex=0.5)
```
### KM Model With Job Satisfaction
```{r}
fit11=survfit(Surv(time, status)~JobSatisfaction, data=attr)
fit11
```
### KM Model Plot with WorkLifeBalance
```{r}
plot(fit11,fun='event',col=1:4,
xlab="Survival of Employee in Years", ylab="Survival", main="KM Model Plot with Job Satisfaction")
legend(25,1,c("JS=1","JS=2","JS=3","JS=4"),
lty=9:11,col=1:4, lwd=2, bty='n', cex=0.5)
```
### COX PH Model Multivariate Analysis
```{r}
fit12=coxph(Surv(time, status)~., data=attr)
fit12
```
About Us {data-navmenu="MENU"}
==============================================
### DASHBOARD PREPARED BY (CONTACT FOR MACHINE LEARNING TRAINING, COACHING & CONSULTING)
* Dr AMITA SHARMA
Post Doc from Erasmus University, Rotterdam, the Netherlands
Assistant Professor
Institute of Agri Business Management,
Swami Keshwanand Rajasthan Agricultural University,
Bikaner (Raj),India
Blog: www.thinkingai.in
* ARUN KUMAR SHARMA
Machine Learning Enthusiast
13 Years of Financial Services Marketing Exp
Blogger, Writer and Machine Learning Consutlant
Certified Business Analytics Professional
Certified in Predictive Analytics, Indian Institute of Mnamagement,IIMx Bangalore
Certified in Macroeconomic Forecasting, International Monetary Fund(IMFx)
Certified in Text Analytics, openSAP
Email: aks10000@gmail.com
Tel:9468567418