Employee attrition is an issue that has been puzzling the Human Resource Managers of various companies for a long time. In this project, we try to analyse what factors lead to employee retention in companies, and what factors influence them the most. We use a dataset that is published by the Human Resource department of IBM.

The data set used is as follows:

```
setwd("C:\\Users\\Tejajay\\Desktop\\Internship\\3. Data Analytics")
ibmhr <- read.csv(paste("IBMHRRetentionData.csv", sep=""))
```

We take the attrition history of 1470 employees, out of which we check how many have been retained by the company.

`length(ibmhr$Attrition)`

`## [1] 1470`

`table(ibmhr$Attrition)`

```
##
## No Yes
## 1233 237
```

We can see that 237 employees have been retained whereas 1233 employees have been let go of. Let us create a subset of employees that have been retained, and see how different factors have played a role in their retention.

`retained <- ibmhr[ which(ibmhr$Attrition=='Yes'), ]`

Let us look at some summary statistics of these employees.

Average age:

`mean(retained$ï..Age)`

`## [1] 33.60759`

So, the average age of the retained employee is around 33 years of age, which is quite young and early in the employee’s career. This means that there are definitely more important factors that play into the retention of the employee.

Average educational qualification:

`mean(retained$Education)`

`## [1] 2.839662`

On a scale of 1 to 5, where: 1-below college, 2-college, 3-bachelors, 4-masters, 5-doctorate; most employees rank around 2.8, which means that they have attended college, and most of them have bachelors’ degrees.

Educational Field data:

`table(retained$EducationField)`

```
##
## Human Resources Life Sciences Marketing Medical
## 7 89 35 63
## Other Technical Degree
## 11 32
```

Clearly, the highest number of retained employees come from a Life Sciences background, closely followed by employees from an educational background in medical sciences.

We also look at factors like gender and salary to analyse the difference between retained employees and employees who were fired / moved onto other jobs.

`mean(retained$MonthlyIncome)`

`## [1] 4787.093`

```
notretained <- ibmhr[ which(ibmhr$Attrition=='No'), ]
mean(notretained$MonthlyIncome)
```

`## [1] 6832.74`

`table(retained$Gender)`

```
##
## Female Male
## 87 150
```

`table(notretained$Gender)`

```
##
## Female Male
## 501 732
```

Interestingly enough, candidates who have not being retained have a higher monthly income on average, compared to those who have been retained. In both cases, there is a higher number of males compared to females.

Let us look at the data, to see what kind of job roles have retained more people, and what roles have let go of more people (this could be an indicator of the demand associated with that particular job role).

`table(retained$JobRole)`

```
##
## Healthcare Representative Human Resources
## 9 12
## Laboratory Technician Manager
## 62 5
## Manufacturing Director Research Director
## 10 2
## Research Scientist Sales Executive
## 47 57
## Sales Representative
## 33
```

`table(notretained$JobRole)`

```
##
## Healthcare Representative Human Resources
## 122 40
## Laboratory Technician Manager
## 197 97
## Manufacturing Director Research Director
## 135 78
## Research Scientist Sales Executive
## 245 269
## Sales Representative
## 50
```

As per the tables above, laboratory technicians are the job role that get retained the most, whereas sales executives are the largest group of employees to be let-off.

Now, let us look at benefits offered to the employees (work-life balance, business travel, salary hike, regularity of promotion, stock options etc.) as well as factors such as performance rating and job and relationship satisfaction, which influence attrition.

`mean(retained$WorkLifeBalance)`

`## [1] 2.658228`

`mean(notretained$WorkLifeBalance)`

`## [1] 2.781022`

`table(retained$BusinessTravel)`

```
##
## Non-Travel Travel_Frequently Travel_Rarely
## 12 69 156
```

`table(notretained$BusinessTravel)`

```
##
## Non-Travel Travel_Frequently Travel_Rarely
## 138 208 887
```

`mean(retained$PercentSalaryHike)`

`## [1] 15.09705`

`mean(notretained$PercentSalaryHike)`

`## [1] 15.23114`

`mean(retained$YearsSinceLastPromotion)`

`## [1] 1.945148`

`mean(notretained$YearsSinceLastPromotion)`

`## [1] 2.234388`

`mean(retained$StockOptionLevel)`

`## [1] 0.5274262`

`mean(notretained$StockOptionLevel)`

`## [1] 0.8450933`

`mean(retained$StandardHours)`

`## [1] 80`

`mean(notretained$StandardHours)`

`## [1] 80`

`mean(retained$PerformanceRating)`

`## [1] 3.156118`

`mean(notretained$PerformanceRating)`

`## [1] 3.153285`

`mean(retained$RelationshipSatisfaction)`

`## [1] 2.599156`

`mean(notretained$RelationshipSatisfaction)`

`## [1] 2.733982`

As seen in the above analysis, both retained and non-retained employees have the same standard working hours and around the same performance rating, relationship satisfaction, as well as some other benefits (percent salary hike, business travel) as well as work-life balance. The only factor with some difference is the years since last promotion: it is lesser for retained employees as compared to non-retained employees i.e retained employees have been, on average, promoted more quickly than non-retained ones.

So far, we have seen that the only variable which shows some difference between retained and non-retained employees is in the years since previous promotion. To further check this, we shall take a null hypothesis that “years since last promotion” is not related to the factors such as performance rating, relationship satisfaction, standard hours etc. We shall run t-test to see if this is true or not.

`t.test(ibmhr$PerformanceRating, ibmhr$YearsSinceLastPromotion)`

```
##
## Welch Two Sample t-test
##
## data: ibmhr$PerformanceRating and ibmhr$YearsSinceLastPromotion
## t = 11.422, df = 1505.8, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8000935 1.1318793
## sample estimates:
## mean of x mean of y
## 3.153741 2.187755
```

`t.test(ibmhr$JobSatisfaction, ibmhr$YearsSinceLastPromotion)`

```
##
## Welch Two Sample t-test
##
## data: ibmhr$JobSatisfaction and ibmhr$YearsSinceLastPromotion
## t = 6.088, df = 1808.5, p-value = 1.393e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3665894 0.7150433
## sample estimates:
## mean of x mean of y
## 2.728571 2.187755
```

`t.test(ibmhr$RelationshipSatisfaction, ibmhr$YearsSinceLastPromotion)`

```
##
## Welch Two Sample t-test
##
## data: ibmhr$RelationshipSatisfaction and ibmhr$YearsSinceLastPromotion
## t = 5.9163, df = 1795.6, p-value = 3.935e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.3506173 0.6983623
## sample estimates:
## mean of x mean of y
## 2.712245 2.187755
```

`t.test(ibmhr$StandardHours, ibmhr$YearsSinceLastPromotion)`

```
##
## Welch Two Sample t-test
##
## data: ibmhr$StandardHours and ibmhr$YearsSinceLastPromotion
## t = 925.81, df = 1469, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 77.64738 77.97711
## sample estimates:
## mean of x mean of y
## 80.000000 2.187755
```

In all the above tests, we get an alternative hypothesis that “true difference is not equal to 0” which means that we fail to reject the null hypothesis. This means that, there is, in fact, no relation between years since last promotion and all other considered factors.

Now let us consider a so-far unconsidered factor: the employee’s experience at the company itself. We shall look at the average experience of a retained employee vs non-retained employee.

`mean(retained$YearsAtCompany)`

`## [1] 5.130802`

`mean(notretained$YearsAtCompany)`

`## [1] 7.369019`

`mean(retained$YearsInCurrentRole)`

`## [1] 2.902954`

`mean(notretained$YearsInCurrentRole)`

`## [1] 4.484185`

We can see that, on average, a candidate who moves to another company has nearly 2 years more experience (at 7.4 years) than a retained employee (at 5.1 years). An employee who is likely to move to a different company has, on average, worked around 1.5 years more in the same position, than one who stays. I.e, candidates are likely to move, once they have stayed in the same position (without any changes) for around 4.5 years.

We shall now try to see how different (hitherto-unconsidered) variables are related to attrition (for both retained and non-retained employees), by drawing plots. This will help us develop new insighs regarding the data.

`plot(ibmhr$Attrition, ibmhr$EnvironmentSatisfaction, main = "Environment Satisfaction vs Attrition", ylab = "Environment Satisfaction", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$Gender, main = "Gender vs Attrition", ylab = "Gender", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$JobInvolvement, main = "Job Involvement vs Attrition", ylab = "Job Involvement", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$MaritalStatus, main = "Marital Status vs Attrition", ylab = "Marital Status", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$YearsAtCompany, main = "Experience at company vs Attrition", ylab = "Years at Company", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$MonthlyIncome, main = "Income vs Attrition", ylab = "Income", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$NumCompaniesWorked, main = "Number of Companies Worked At vs Attrition", ylab = "Number of Companies Worked At", xlab = "Attrition")`

`plot(ibmhr$Attrition, ibmhr$TrainingTimesLastYear, main = "Frequency of Training and Skill Development vs Attrition", ylab = "Training times last year", xlab = "Attrition")`

As we see in the above plots, employees who have received lesser training tend to stay, compared to those who have received more training (and have consequently left).

From what we have seen so far, we have been able to come to the following conclusions: (1) Retained employees have worked for a shorter period in the company itself, as well as the particular position, than non-retained employees. (2) Employees stay because they get promoted more quickly than their counterparts; i.e employees undergoing attrition have lower number of years between promotions, compared to those who leave the company. (3) Employees who stay generally do due to lower income than their counterparts who leave the company (4) Employees who stay have, on average, received lesser training in the past one year, compared to the employees who have left the company.