IBM HR Analytics Employee Attrition and Performance

1. OVERVIEW

Employee attrition is an issue that has been puzzling the Human Resource Managers of various companies for a long time. In this project, we try to analyse what factors lead to employee retention in companies, and what factors influence them the most. We use a dataset that is published by the Human Resource department of IBM.

2. ANALYSIS

2(a). Basic Analyses

The data set used is as follows:

setwd("C:\\Users\\Tejajay\\Desktop\\Internship\\3. Data Analytics")
ibmhr <- read.csv(paste("IBMHRRetentionData.csv", sep=""))

We take the attrition history of 1470 employees, out of which we check how many have been retained by the company.

length(ibmhr$Attrition)
## [1] 1470
table(ibmhr$Attrition)
## 
##   No  Yes 
## 1233  237

We can see that 237 employees have been retained whereas 1233 employees have been let go of. Let us create a subset of employees that have been retained, and see how different factors have played a role in their retention.

retained <- ibmhr[ which(ibmhr$Attrition=='Yes'), ]

Let us look at some summary statistics of these employees.

Average age:

mean(retained$ï..Age)
## [1] 33.60759

So, the average age of the retained employee is around 33 years of age, which is quite young and early in the employee’s career. This means that there are definitely more important factors that play into the retention of the employee.

Average educational qualification:

mean(retained$Education)
## [1] 2.839662

On a scale of 1 to 5, where: 1-below college, 2-college, 3-bachelors, 4-masters, 5-doctorate; most employees rank around 2.8, which means that they have attended college, and most of them have bachelors’ degrees.

Educational Field data:

table(retained$EducationField)
## 
##  Human Resources    Life Sciences        Marketing          Medical 
##                7               89               35               63 
##            Other Technical Degree 
##               11               32

Clearly, the highest number of retained employees come from a Life Sciences background, closely followed by employees from an educational background in medical sciences.

We also look at factors like gender and salary to analyse the difference between retained employees and employees who were fired / moved onto other jobs.

mean(retained$MonthlyIncome)
## [1] 4787.093
notretained <- ibmhr[ which(ibmhr$Attrition=='No'), ]
mean(notretained$MonthlyIncome)
## [1] 6832.74
table(retained$Gender)
## 
## Female   Male 
##     87    150
table(notretained$Gender)
## 
## Female   Male 
##    501    732

Interestingly enough, candidates who have not being retained have a higher monthly income on average, compared to those who have been retained. In both cases, there is a higher number of males compared to females.

Let us look at the data, to see what kind of job roles have retained more people, and what roles have let go of more people (this could be an indicator of the demand associated with that particular job role).

table(retained$JobRole)
## 
## Healthcare Representative           Human Resources 
##                         9                        12 
##     Laboratory Technician                   Manager 
##                        62                         5 
##    Manufacturing Director         Research Director 
##                        10                         2 
##        Research Scientist           Sales Executive 
##                        47                        57 
##      Sales Representative 
##                        33
table(notretained$JobRole)
## 
## Healthcare Representative           Human Resources 
##                       122                        40 
##     Laboratory Technician                   Manager 
##                       197                        97 
##    Manufacturing Director         Research Director 
##                       135                        78 
##        Research Scientist           Sales Executive 
##                       245                       269 
##      Sales Representative 
##                        50

As per the tables above, laboratory technicians are the job role that get retained the most, whereas sales executives are the largest group of employees to be let-off.

Now, let us look at benefits offered to the employees (work-life balance, business travel, salary hike, regularity of promotion, stock options etc.) as well as factors such as performance rating and job and relationship satisfaction, which influence attrition.

mean(retained$WorkLifeBalance)
## [1] 2.658228
mean(notretained$WorkLifeBalance)
## [1] 2.781022
table(retained$BusinessTravel)
## 
##        Non-Travel Travel_Frequently     Travel_Rarely 
##                12                69               156
table(notretained$BusinessTravel)
## 
##        Non-Travel Travel_Frequently     Travel_Rarely 
##               138               208               887
mean(retained$PercentSalaryHike)
## [1] 15.09705
mean(notretained$PercentSalaryHike)
## [1] 15.23114
mean(retained$YearsSinceLastPromotion)
## [1] 1.945148
mean(notretained$YearsSinceLastPromotion)
## [1] 2.234388
mean(retained$StockOptionLevel)
## [1] 0.5274262
mean(notretained$StockOptionLevel)
## [1] 0.8450933
mean(retained$StandardHours)
## [1] 80
mean(notretained$StandardHours)
## [1] 80
mean(retained$PerformanceRating)
## [1] 3.156118
mean(notretained$PerformanceRating)
## [1] 3.153285
mean(retained$RelationshipSatisfaction)
## [1] 2.599156
mean(notretained$RelationshipSatisfaction)
## [1] 2.733982

As seen in the above analysis, both retained and non-retained employees have the same standard working hours and around the same performance rating, relationship satisfaction, as well as some other benefits (percent salary hike, business travel) as well as work-life balance. The only factor with some difference is the years since last promotion: it is lesser for retained employees as compared to non-retained employees i.e retained employees have been, on average, promoted more quickly than non-retained ones.

2(b). Graphical Analysis and Testing

So far, we have seen that the only variable which shows some difference between retained and non-retained employees is in the years since previous promotion. To further check this, we shall take a null hypothesis that “years since last promotion” is not related to the factors such as performance rating, relationship satisfaction, standard hours etc. We shall run t-test to see if this is true or not.

t.test(ibmhr$PerformanceRating, ibmhr$YearsSinceLastPromotion)
## 
##  Welch Two Sample t-test
## 
## data:  ibmhr$PerformanceRating and ibmhr$YearsSinceLastPromotion
## t = 11.422, df = 1505.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.8000935 1.1318793
## sample estimates:
## mean of x mean of y 
##  3.153741  2.187755
t.test(ibmhr$JobSatisfaction, ibmhr$YearsSinceLastPromotion)
## 
##  Welch Two Sample t-test
## 
## data:  ibmhr$JobSatisfaction and ibmhr$YearsSinceLastPromotion
## t = 6.088, df = 1808.5, p-value = 1.393e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3665894 0.7150433
## sample estimates:
## mean of x mean of y 
##  2.728571  2.187755
t.test(ibmhr$RelationshipSatisfaction, ibmhr$YearsSinceLastPromotion)
## 
##  Welch Two Sample t-test
## 
## data:  ibmhr$RelationshipSatisfaction and ibmhr$YearsSinceLastPromotion
## t = 5.9163, df = 1795.6, p-value = 3.935e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3506173 0.6983623
## sample estimates:
## mean of x mean of y 
##  2.712245  2.187755
t.test(ibmhr$StandardHours, ibmhr$YearsSinceLastPromotion)
## 
##  Welch Two Sample t-test
## 
## data:  ibmhr$StandardHours and ibmhr$YearsSinceLastPromotion
## t = 925.81, df = 1469, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  77.64738 77.97711
## sample estimates:
## mean of x mean of y 
## 80.000000  2.187755

In all the above tests, we get an alternative hypothesis that “true difference is not equal to 0” which means that we fail to reject the null hypothesis. This means that, there is, in fact, no relation between years since last promotion and all other considered factors.

Now let us consider a so-far unconsidered factor: the employee’s experience at the company itself. We shall look at the average experience of a retained employee vs non-retained employee.

mean(retained$YearsAtCompany)
## [1] 5.130802
mean(notretained$YearsAtCompany)
## [1] 7.369019
mean(retained$YearsInCurrentRole)
## [1] 2.902954
mean(notretained$YearsInCurrentRole)
## [1] 4.484185

We can see that, on average, a candidate who moves to another company has nearly 2 years more experience (at 7.4 years) than a retained employee (at 5.1 years). An employee who is likely to move to a different company has, on average, worked around 1.5 years more in the same position, than one who stays. I.e, candidates are likely to move, once they have stayed in the same position (without any changes) for around 4.5 years.

We shall now try to see how different (hitherto-unconsidered) variables are related to attrition (for both retained and non-retained employees), by drawing plots. This will help us develop new insighs regarding the data.

plot(ibmhr$Attrition, ibmhr$EnvironmentSatisfaction, main = "Environment Satisfaction vs Attrition", ylab = "Environment Satisfaction", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$Gender, main = "Gender vs Attrition", ylab = "Gender", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$JobInvolvement, main = "Job Involvement vs Attrition", ylab = "Job Involvement", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$MaritalStatus, main = "Marital Status vs Attrition", ylab = "Marital Status", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$YearsAtCompany, main = "Experience at company vs Attrition", ylab = "Years at Company", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$MonthlyIncome, main = "Income vs Attrition", ylab = "Income", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$NumCompaniesWorked, main = "Number of Companies Worked At vs Attrition", ylab = "Number of Companies Worked At", xlab = "Attrition")

plot(ibmhr$Attrition, ibmhr$TrainingTimesLastYear, main = "Frequency of Training and Skill Development vs Attrition", ylab = "Training times last year", xlab = "Attrition")

As we see in the above plots, employees who have received lesser training tend to stay, compared to those who have received more training (and have consequently left).

3. CONCLUSIONS

From what we have seen so far, we have been able to come to the following conclusions: (1) Retained employees have worked for a shorter period in the company itself, as well as the particular position, than non-retained employees. (2) Employees stay because they get promoted more quickly than their counterparts; i.e employees undergoing attrition have lower number of years between promotions, compared to those who leave the company. (3) Employees who stay generally do due to lower income than their counterparts who leave the company (4) Employees who stay have, on average, received lesser training in the past one year, compared to the employees who have left the company.