Introduction

Human Resource Analytics

Human resource analytics (HR analytics) is an area in the field of analytics that refers to applying analytic processes to the human resource department of an organization in the hope of improving employee performance and therefore getting a better return on investment. HR analytics does not just deal with gathering data on employee efficiency. Instead, it aims to provide insight into each process by gathering data and then using it to make relevant decisions about how to improve these processes.

What HR analytics does is correlate business data and people data, which can help establish important connections later on. The key aspect of HR analytics is to provide data on the impact the HR department has on the organization as a whole. Establishing a relationship between what HR does and business outcomes - and then creating strategies based on that information - is what HR analytics is all about. HR has core functions that can be enhanced by applying processes in analytics. These are acquisition, optimization, paying and developing the workforce of the organization. HR analytics can help to dig into problems and issues surrounding these requirements, and using analytical workflow, guide the managers to answer questions and gain insights from information at hand, then make relevant decisions and take appropriate actions.

Human Resource Managment

Human resource management (HRM) is the strategic and coherent approach to the of management an organization’s most valued assets - the people working there who individually and collectively contribute to the achievement of the objectives of the business. The terms “human resource management” and “human resources” (HR) have largely replaced the term “personnel management” as a description of the processes involved in managing people in organizations.

The Problem Of HR Attrition

Human Resources are critical resources of any organiazation. Organizations spend huge amount of time and money to hire and nuture their employees. It is a huge loss for companies if employees leave, especially the key resources. So if HR can predict weather employees are at risk for leaving the company, it will allow them to identify the attrition risks and help understand and provie necessary support to retain those employees or do preventive hiring to minimize the impact to the orgranization.

Reasons Of Attrition

In an intensely competitive environment, where HR managers are poaching from each other, organisations can either hold on to their employees tight or lose them to competition. For gone are the days, when employees would stick to an employer for years for want of a better choice. Now, opportunities abound. It is a fact that, retention of key employees is critical to the long-term health and success of any organisation. The performance of employees is often linked directly to quality work, customer satisfaction, and increased product sales and even to the image of a company. Whereas the same is often indirectly linked to, satisfied colleagues and reporting staff, effective succession planning and deeply embedded organisational knowledge and learning.

Concept Of Employee Retention

Effective employee retention is a systematic effort by employers to create and foster an environment that encourages current employees to remain employed by having policies and practices in place that address their diverse needs. A strong retention strategy becomes a powerful recruitment tool. Retention of key employees is critical to the long-term health and success of any organization. It is a known fact that retaining your best employees ensures customer satisfaction, increased product sales, satisfied colleagues and reporting staff, effective succession planning and deeply imbedded organizational knowledge and learning. Employee retention matters as organizational issues such as training time and investment; lost knowledge; insecure employees and a costly candidate search are involved. Hence failing to retain a key employee is a costly proposition for an organization. Various estimates suggest that losing a middle manager in most organizations costs up to five times of his salary. Intelligent employers always realise the importance of retaining the best talent. Retaining talent has never been so important in the Indian scenario; however, things have changed in recent years. In prominent Indian metros at least, there is no dearth of opportunities for the best in the business, or even for the second or the third best. Retention of key employees and treating attrition troubles has never been so important to companies.

Why Employee Retention Is Important?

High turnover often leaves customers and employees in the lurch; departing employees take a great deal of knowledge with them. This lack of continuity makes it hard to meet your organization’s goals and serve customers well.

Replacing employees costs money. The cost of replacing an employee is estimated as up to twice the individual’s annual salary (or higher for some positions, such as middle management), and this doesn’t even include the cost of lost knowledge.

Recruiting employees consumes a great deal of time and effort, much of it futile. You’re not the only one out there vying for qualified employees, and job searchers make decisions based on more than the sum of salary and benefits.

Bringing employees up to speed takes even more time. And when you’re short-staffed, you often need to put in extra time to get the work done.

Fields in the dataset include:

. sat_l: Level of satisfaction (0-1)

. last_evaluation: Time since last performance evaluation (in Years)

. number_project: Number of projects completed while at work

. average_montly_hours: Average monthly hours at workplace

. time_spend_company: Number of years spent in the company

. Work_accident: Whether the employee had a workplace accident

. left: Whether the employee left the workplace or not (1 or 0) Factor

. promotion_last_5years: Whether the employee was promoted in the last five years

. sales: Department in which they work for

. salary: Relative level of salary

Reading the dataset

hr<-read.csv("hr.csv")
View(hr)
summary(hr)
##      sat_l        last_evaluation  number_project  average_montly_hours
##  Min.   :0.0900   Min.   :0.3600   Min.   :2.000   Min.   : 96.0       
##  1st Qu.:0.4400   1st Qu.:0.5600   1st Qu.:3.000   1st Qu.:156.0       
##  Median :0.6400   Median :0.7200   Median :4.000   Median :200.0       
##  Mean   :0.6128   Mean   :0.7161   Mean   :3.803   Mean   :201.1       
##  3rd Qu.:0.8200   3rd Qu.:0.8700   3rd Qu.:5.000   3rd Qu.:245.0       
##  Max.   :1.0000   Max.   :1.0000   Max.   :7.000   Max.   :310.0       
##                                                                        
##  time_spend_company Work_accident         left       
##  Min.   : 2.000     Min.   :0.0000   Min.   :0.0000  
##  1st Qu.: 3.000     1st Qu.:0.0000   1st Qu.:0.0000  
##  Median : 3.000     Median :0.0000   Median :0.0000  
##  Mean   : 3.498     Mean   :0.1446   Mean   :0.2381  
##  3rd Qu.: 4.000     3rd Qu.:0.0000   3rd Qu.:0.0000  
##  Max.   :10.000     Max.   :1.0000   Max.   :1.0000  
##                                                      
##  promotion_last_5years         sales         salary    
##  Min.   :0.00000       sales      :4140   high  :1237  
##  1st Qu.:0.00000       technical  :2720   low   :7316  
##  Median :0.00000       support    :2229   medium:6446  
##  Mean   :0.02127       IT         :1227                
##  3rd Qu.:0.00000       product_mng: 902                
##  Max.   :1.00000       marketing  : 858                
##                        (Other)    :2923

Loading Libraries

library(psych)
## Warning: package 'psych' was built under R version 3.3.3
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
library(lattice)
## Warning: package 'lattice' was built under R version 3.3.3
library(car)
## Warning: package 'car' was built under R version 3.3.3
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.3.3
## corrplot 0.84 loaded

Basic Information about column, types and if any missing data

From here we can see that the dimensions of the dataset is 14999*10, that is it has 14999 rows and 10 columns which we will be using in our analysis.

dim(hr)
## [1] 14999    10

This gives us the various data types used in the dataset.

str(hr)
## 'data.frame':    14999 obs. of  10 variables:
##  $ sat_l                : num  0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num  0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : int  2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : int  157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : int  3 6 4 5 3 3 4 5 5 3 ...
##  $ Work_accident        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ sales                : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ salary               : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...

Alternative method of viewing the datset summary.

describe(hr)
##                       vars     n   mean    sd median trimmed   mad   min
## sat_l                    1 14999   0.61  0.25   0.64    0.63  0.28  0.09
## last_evaluation          2 14999   0.72  0.17   0.72    0.72  0.22  0.36
## number_project           3 14999   3.80  1.23   4.00    3.74  1.48  2.00
## average_montly_hours     4 14999 201.05 49.94 200.00  200.64 65.23 96.00
## time_spend_company       5 14999   3.50  1.46   3.00    3.28  1.48  2.00
## Work_accident            6 14999   0.14  0.35   0.00    0.06  0.00  0.00
## left                     7 14999   0.24  0.43   0.00    0.17  0.00  0.00
## promotion_last_5years    8 14999   0.02  0.14   0.00    0.00  0.00  0.00
## sales*                   9 14999   6.94  2.75   8.00    7.23  2.97  1.00
## salary*                 10 14999   2.35  0.63   2.00    2.41  1.48  1.00
##                       max  range  skew kurtosis   se
## sat_l                   1   0.91 -0.48    -0.67 0.00
## last_evaluation         1   0.64 -0.03    -1.24 0.00
## number_project          7   5.00  0.34    -0.50 0.01
## average_montly_hours  310 214.00  0.05    -1.14 0.41
## time_spend_company     10   8.00  1.85     4.77 0.01
## Work_accident           1   1.00  2.02     2.08 0.00
## left                    1   1.00  1.23    -0.49 0.00
## promotion_last_5years   1   1.00  6.64    42.03 0.00
## sales*                 10   9.00 -0.79    -0.62 0.02
## salary*                 3   2.00 -0.42    -0.67 0.01

The summary statistics for Work_accident, left and promotion_last_5years does not make sense, as they are categorical variables

Checking Missing Values

sum(is.na.data.frame(hr))
## [1] 0

Since there is no missing data, so this seems to be a good dataset.

Fnding how many employees leave the company and how many do not leave.

left<-with(hr,table(left))
View(left)
left
## left
##     0     1 
## 11428  3571
barplot(left,main="No.of Employees who left", ylab="No of employees",xlab = "left",col = c("lightblue", "mistyrose"))

From here we can see that the employees leaving the company is more that 3 times than those who are not leaving.

Employees who had work accidents

wa<-with(hr,table(Work_accident))
wa
## Work_accident
##     0     1 
## 12830  2169
barplot(wa,main="Work Accidents",xlab="Work Accident",ylab="Number(s)",col = c("lightblue", "mistyrose"))

Effect of work accidents on employee attrition

m<-xtabs(~left+Work_accident, data=hr)
barplot(m,beside = TRUE)

Effect Of Satisfaction Level on Attrition

n<-xtabs(~left+sat_l, data=hr)

barplot(n,beside = TRUE)

boxplot(hr$sat_l~hr$left)

It can be noted, large number of people who had lower satisfaction levels (from the stating part of the graph), have left the company. Especially, people who have satisfaction level less than 0.5. This makes sense. But there is also a surge in at higher level of satisfaction. Need to understand and deal with these employees with a different stategy

How last evaluation scores influencing whether to stay or leave the company?

o<-xtabs(~left+last_evaluation, data=hr)
barplot(o,beside = TRUE)

boxplot(hr$last_evaluation~hr$left)

From the plots we can clearly see that the people with low evaluation and very high evaluation are leaving, where as people with average evaluation scores are staying back.

How time spent in company influences attrition?

r<-table(hr$left,hr$time_spend_company)
barplot(r, beside=TRUE, legend = rownames(r))

People who have spent 2 years are not leaving the company. But as experience grows people start leaving and highest after they spend 5 years in the company. But once they cross the golden years ‘7’, they are not leaving.

Which department has maximum attrition?

depa<-table(hr$left,hr$sales)
depee<-prop.table(depa)
barplot(depee, beside = TRUE)

The percentage of people leaving the company is evenly distributed across all depts. Surprisingly, the percentage is high in HR itself. Lowest in management

Effect of Promotion On Attrition

prom<-table(hr$left,hr$promotion_last_5years)
prom
##    
##         0     1
##   0 11128   300
##   1  3552    19
barplot(prom,beside = TRUE)

Very few people who got promoted in last 5 years left the company, compared to people who are not promoted in last 5 years

Effect Of Salary On Attrition

sal<-table(hr$left,hr$salary)
sal
##    
##     high  low medium
##   0 1155 5144   5129
##   1   82 2172   1317
barplot(sal, beside = TRUE)

So, we can see that the employees with lower salary are leaving the company whereas the ratio of employees leaving the company with high salary is very less. This is very obvious as one desires to have a hefty salary.

Relation Between Satisfaction Level and Salary

sat<-table(hr$sat_l,hr$salary)
sat
##       
##        high low medium
##   0.09    4 113     78
##   0.1     9 205    144
##   0.11    2 210    123
##   0.12    1  11     18
##   0.13    4  27     23
##   0.14   11  35     27
##   0.15   14  36     26
##   0.16    9  29     41
##   0.17    8  31     33
##   0.18    6  34     23
##   0.19    5  39     30
##   0.2     7  38     24
##   0.21   12  25     30
##   0.22   10  27     23
##   0.23    3  24     27
##   0.24    8  37     35
##   0.25    2  16     16
##   0.26    3  16     11
##   0.27    1  18     11
##   0.28    5  16     10
##   0.29    2  16     20
##   0.3     2  20     17
##   0.31    2  36     21
##   0.32    4  26     20
##   0.33    8  14     14
##   0.34    2  25     21
##   0.35    2  18     17
##   0.36   10  74     55
##   0.37   12 141     88
##   0.38    7 115     67
##   0.39   11  98     66
##   0.4     6 124     79
##   0.41    3 119     49
##   0.42    9  80     66
##   0.43   11 117     96
##   0.44   12 108     91
##   0.45   18 112     73
##   0.46    4  64     27
##   0.47    3  23     16
##   0.48    9  70     70
##   0.49   29  79    101
##   0.5    32  94    103
##   0.51   11  90     86
##   0.52   16  86     94
##   0.53   27  86     66
##   0.54   18  81     86
##   0.55   20  94     65
##   0.56   20  73     94
##   0.57   19  88    103
##   0.58   22  78     82
##   0.59   17  91    111
##   0.6    15  94     84
##   0.61   21  92     95
##   0.62   19  79     90
##   0.63   26  90     93
##   0.64   26  92     69
##   0.65   19 106     74
##   0.66   18 103    107
##   0.67   11  76     90
##   0.68   20  71     71
##   0.69   24  86     99
##   0.7    15  97     93
##   0.71   15  74     82
##   0.72   20 112     98
##   0.73   20 119    107
##   0.74   28 118    111
##   0.75   19 111     96
##   0.76   21 113    100
##   0.77   30 102    120
##   0.78   24 117    100
##   0.79   14 109     94
##   0.8    20 103     99
##   0.81   14 103    103
##   0.82   28 113    100
##   0.83   11 127     96
##   0.84   13 132    102
##   0.85   13  96     98
##   0.86   15  96     89
##   0.87   19 120     86
##   0.88   16  84     87
##   0.89   24 110    103
##   0.9    14  96    110
##   0.91   10 120     94
##   0.92   14 102     82
##   0.93   21  76     72
##   0.94   13  82     72
##   0.95   13  86     82
##   0.96   26  88     89
##   0.97   22  90     64
##   0.98   16  74     93
##   0.99   15  75     82
##   1       3  55     53
satt<-prop.table(sat)
barplot(satt, beside = TRUE)

scatterplot(hr$salary,hr$sat_l)

Correlation Between Variables

corrgram(hr,order=FALSE,lower.panel=panel.shade,upper.panel=panel.pie)

Insights From Corrgram

Satisfaction level reduces as people spend more time in the company. Also, interestingly when they work on more number of projects.

Evaluation score is positively correlated with spending more montly hours and number of projects.

As satisfaction level reduces, people tend to leave company.

Hypothesis Testing Of Various Variables On Attrition

Satisfaction Level And Attrition
x<- xtabs(~sat_l+left,hr)
x
##       left
## sat_l    0   1
##   0.09   0 195
##   0.1    0 358
##   0.11   0 335
##   0.12  26   4
##   0.13  51   3
##   0.14  63  10
##   0.15  73   3
##   0.16  78   1
##   0.17  67   5
##   0.18  63   0
##   0.19  68   6
##   0.2   65   4
##   0.21  66   1
##   0.22  59   1
##   0.23  52   2
##   0.24  77   3
##   0.25  31   3
##   0.26  29   1
##   0.27  24   6
##   0.28  28   3
##   0.29  37   1
##   0.3   37   2
##   0.31  42  17
##   0.32  42   8
##   0.33  33   3
##   0.34  44   4
##   0.35  34   3
##   0.36  43  96
##   0.37  47 194
##   0.38  35 154
##   0.39  35 140
##   0.4   40 169
##   0.41  39 132
##   0.42  47 108
##   0.43  46 178
##   0.44  58 153
##   0.45  46 157
##   0.46  27  68
##   0.47  38   4
##   0.48 139  10
##   0.49 207   2
##   0.5  226   3
##   0.51 182   5
##   0.52 196   0
##   0.53 171   8
##   0.54 179   6
##   0.55 175   4
##   0.56 182   5
##   0.57 202   8
##   0.58 179   3
##   0.59 212   7
##   0.6  189   4
##   0.61 202   6
##   0.62 186   2
##   0.63 205   4
##   0.64 185   2
##   0.65 198   1
##   0.66 217  11
##   0.67 176   1
##   0.68 161   1
##   0.69 209   0
##   0.7  195  10
##   0.71 167   4
##   0.72 200  30
##   0.73 204  42
##   0.74 206  51
##   0.75 188  38
##   0.76 189  45
##   0.77 201  51
##   0.78 191  50
##   0.79 172  45
##   0.8  194  28
##   0.81 169  51
##   0.82 183  58
##   0.83 187  47
##   0.84 185  62
##   0.85 171  36
##   0.86 159  41
##   0.87 167  58
##   0.88 162  25
##   0.89 181  56
##   0.9  168  52
##   0.91 181  43
##   0.92 178  20
##   0.93 169   0
##   0.94 167   0
##   0.95 181   0
##   0.96 203   0
##   0.97 176   0
##   0.98 183   0
##   0.99 172   0
##   1    111   0
chisq.test(x)
## 
##  Pearson's Chi-squared test
## 
## data:  x
## X-squared = 7937.7, df = 91, p-value < 2.2e-16
t.test(sat_l~left,data=hr)
## 
##  Welch Two Sample t-test
## 
## data:  sat_l by left
## t = 46.636, df = 5167, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2171815 0.2362417
## sample estimates:
## mean in group 0 mean in group 1 
##       0.6668096       0.4400980

Analysing All Fields Together

 scatterplotMatrix(~sat_l+last_evaluation+number_project+average_montly_hours+time_spend_company+Work_accident+left+promotion_last_5years,data=hr,main="Overall Analysis")
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth