1) Introduction

Human resource analytics (HR analytics) is an area in the field of analytics that refers to applying analytic processes to the human resource department of an organization in the hope of improving employee performance and therefore getting a better return on investment. HR analytics does not just deal with gathering data on employee efficiency. Instead, it aims to provide insight into each process by gathering data and then using it to make relevant decisions about how to improve these processes.

2) What HR analytics does?

What HR analytics does is correlate business data and people data, which can help establish important connections later on. The key aspect of HR analytics is to provide data on the impact the HR department has on the organization as a whole. Establishing a relationship between what HR does and business outcomes - and then creating strategies based on that information

HR has core functions that can be enhanced by applying processes in analytics. These are acquisition, optimization, paying and developing the workforce of the organization. HR analytics can help to dig into problems and issues surrounding these requirements, and using analytical workflow, guide the managers to answer questions and gain insights from information at hand, then make relevant decisions and take appropriate actions.

Our example concerns a big company that wants to understand why some of their best and most experienced employees are leaving prematurely. The company also wishes to predict which valuable employees will leave next.

3) Data

For this study,I have collected the dataset from https://www.kaggle.com/datasets .

4) An empirical field study of Human resource analytics in one particular company

4.1) Fields in the dataset include:

Satisfaction Level
Last evaluation
Number of projects
Average monthly hours
Time spent at the company
Whether they have had a work accident
Whether they have had a promotion in the last 5 years
Departments (column sales)
Salary
Whether the employee has left

4.2) Analyzing Solution

We have two goals: first, we want to understand why valuable employees leave, and second, we want to predict who will leave next.

Therefore, we propose to work with the HR department to gather relevant data about the employees and to communicate the significant effect that could explain and predict employees’ departure.

4.3) Result

On average people who leave have a low satisfaction level, they work more and didn’t get promoted within the past five years.

On average valuable employees that leave are not satisfied, work on many projects, spend many hours in the company each month and aren’t promoted.

4.4) Conclusion

This paper was motivated by the need for research that could improve our understanding of how HR analytics influences the employee leaving strategies and employee may leave strategies in the company .The unique contribution of this paper is that we investigated the satisfaction level,promotion in last 5 years,salary and average working hour of people who leave the company and valuable employees leaving the company or may leave the company. We observed the people leaving the company has no promotion in last 5 years with low salary of higher working hours but from company point of view ,they have only low satisfaction level compared to others so they can plan accordingly.

5) Model

Appendix 1

Analytical Base Table:

Reading the data into a dataframe

hresource <- read.csv(paste("human.csv", sep=""))
attach(hresource)
dim(hresource)
## [1] 14999    10
library(psych)
describe(hresource)
##                       vars     n   mean    sd median trimmed   mad   min
## satisfaction_level       1 14999   0.61  0.25   0.64    0.63  0.28  0.09
## last_evaluation          2 14999   0.72  0.17   0.72    0.72  0.22  0.36
## number_project           3 14999   3.80  1.23   4.00    3.74  1.48  2.00
## average_montly_hours     4 14999 201.05 49.94 200.00  200.64 65.23 96.00
## time_spend_company       5 14999   3.50  1.46   3.00    3.28  1.48  2.00
## Work_accident            6 14999   0.14  0.35   0.00    0.06  0.00  0.00
## left                     7 14999   0.24  0.43   0.00    0.17  0.00  0.00
## promotion_last_5years    8 14999   0.02  0.14   0.00    0.00  0.00  0.00
## sales*                   9 14999   6.94  2.75   8.00    7.23  2.97  1.00
## salary*                 10 14999   2.35  0.63   2.00    2.41  1.48  1.00
##                       max  range  skew kurtosis   se
## satisfaction_level      1   0.91 -0.48    -0.67 0.00
## last_evaluation         1   0.64 -0.03    -1.24 0.00
## number_project          7   5.00  0.34    -0.50 0.01
## average_montly_hours  310 214.00  0.05    -1.14 0.41
## time_spend_company     10   8.00  1.85     4.77 0.01
## Work_accident           1   1.00  2.02     2.08 0.00
## left                    1   1.00  1.23    -0.49 0.00
## promotion_last_5years   1   1.00  6.64    42.03 0.00
## sales*                 10   9.00 -0.79    -0.62 0.02
## salary*                 3   2.00 -0.42    -0.67 0.01
View(hresource)
str(hresource)
## 'data.frame':    14999 obs. of  10 variables:
##  $ satisfaction_level   : num  0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num  0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : int  2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : int  157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : int  3 6 4 5 3 3 4 5 5 3 ...
##  $ Work_accident        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ sales                : Factor w/ 10 levels "accounting","hr",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ salary               : Factor w/ 3 levels "high","low","medium": 2 3 3 2 2 2 2 2 2 2 ...

Visualization

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
first <- hresource %>% select(satisfaction_level:promotion_last_5years)
M <- cor(first)
library(corrplot)
## corrplot 0.84 loaded
corrplot(M, method="circle")

On average people who leave have a low satisfaction level, they work more and didn’t get promoted within the past five years.

Appendix 2

Who is leaving ?

library(lattice)
leaving <- hresource[ which(hresource$left=='1'), ]
par(mfrow=c(1,3))
histogram(leaving$satisfaction_level,col="green",xlab="Degree of Satisfaction", main="Satisfaction  distribution of left")

histogram(leaving$promotion_last_5years,col="blue",xlab="promotion of last 5 years", main="promotion over 5 years of left")

histogram(leaving$last_evaluation,col="yellow",xlab="last evaluation", main="evaluation of left ")

par( mfrow= c(1,2) )

histogram(leaving$Work_accident,col="lightblue", main = "Work accident",xlab ="workaccident")

histogram(leaving$salary,col="lightblue", main = "Salary",xlab="salary")

In the total of 15 000 employees that compose our database, here are the people that have left:

nrow(leaving)
## [1] 3571

Appendix 3

Why good people leave?

goodleaving <- leaving %>% filter(last_evaluation >= 0.70 | time_spend_company >= 4 | number_project > 5)
nrow(goodleaving)
## [1] 2014
goodleaving2<- hresource %>% filter(last_evaluation >= 0.70 | time_spend_company >= 4 | number_project > 5)
goodleavingselected <- goodleaving2 %>% select(satisfaction_level:promotion_last_5years)
M <- cor(goodleavingselected)
corrplot(M, method="circle")

Here it’s much clearer. On average valuable employees that leave are not satisfied, work on many projects, spend many hours in the company each month and aren’t promoted.

Appendix 4

Relation of every variables

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
ggplot(hresource, aes(satisfaction_level, average_montly_hours)) + geom_point(aes(color = left)) +  scale_x_continuous("satisfaction level ") + scale_y_continuous("average monthly hours")+ labs(title="satisfaction of people leaving company")

ggplot(hresource, aes(satisfaction_level, average_montly_hours)) + geom_point(aes(color = left)) +  scale_x_continuous("satisfaction level ") + scale_y_continuous("average monthly hours")+ labs(title="satisfaction of people leaving company with salary")+facet_wrap( ~ salary)

Corrgram

library(corrgram)
    corrgram(hresource, order=TRUE, lower.panel=panel.shade,
    upper.panel=panel.pie, text.panel=panel.txt,
    main="human_resource analysis ")

Appendix 5

Relation between number of project and satisfcation level

relation3 <- lm(hresource$satisfaction_level ~ hresource$number_project)
plot(hresource$number_project, hresource$satisfaction_level,
     ylab = "Satisfaction Level",
     xlab = "number of projects",
     main = "relation between number of project and satisfaction level ",
     col = "lightblue")
abline (relation3, col = "black")

summary(relation3)
## 
## Call:
## lm(formula = hresource$satisfaction_level ~ hresource$number_project)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56483 -0.21483  0.03169  0.20401  0.45052 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               0.722509   0.006517  110.86   <2e-16 ***
## hresource$number_project -0.028839   0.001630  -17.69   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2461 on 14997 degrees of freedom
## Multiple R-squared:  0.02044,    Adjusted R-squared:  0.02037 
## F-statistic: 312.9 on 1 and 14997 DF,  p-value: < 2.2e-16

Here p-value is less than 0.05, hence number of project and satisfaction level have significant relationship with each other.

Relation between satisfaction level and average monthly hours

relation4 <- lm(hresource$average_montly_hours ~ hresource$satisfaction_level)
plot(hresource$satisfaction_level, hresource$average_montly_hours,
     ylab = "average monthly hours",
     xlab = "Satisfaction Level",
     main = "relation between Satisfaction Level and average monthly hours",
     col = "lightblue")
abline (relation4, col = "black")

summary(relation4)
## 
## Call:
## lm(formula = hresource$average_montly_hours ~ hresource$satisfaction_level)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -106.914  -45.176   -0.619   43.985  109.301 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   203.518      1.085 187.648   <2e-16 ***
## hresource$satisfaction_level   -4.027      1.640  -2.456   0.0141 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 49.93 on 14997 degrees of freedom
## Multiple R-squared:  0.0004019,  Adjusted R-squared:  0.0003353 
## F-statistic:  6.03 on 1 and 14997 DF,  p-value: 0.01408

Here p-value is less than 0.05, hence average monthly working hours and satisfaction level have significant relationship with each other.

Relation between left people and satisfaction level

relation1 <- lm(hresource$satisfaction_level ~ hresource$left)
plot(hresource$left, hresource$satisfaction_level,
     ylab = "Satisfaction Level",
     xlab = "left",
     main = "relation between left(0 or 1) and satisfaction level of people",
     col = "lightblue")
abline (relation1, col = "black")

summary(relation1)
## 
## Call:
## lm(formula = hresource$satisfaction_level ~ hresource$left)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5468 -0.1368 -0.0001  0.1732  0.4799 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.666810   0.002143  311.12   <2e-16 ***
## hresource$left -0.226712   0.004393  -51.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2291 on 14997 degrees of freedom
## Multiple R-squared:  0.1508, Adjusted R-squared:  0.1508 
## F-statistic:  2664 on 1 and 14997 DF,  p-value: < 2.2e-16

Here p-value is less than 0.05, hence people leaving the company and satisfaction level have significant relationship with each other.

Relation between promotion in last5 years and satisfaction level

relation2 <- lm(hresource$satisfaction_level ~ hresource$promotion_last_5years)
plot(hresource$promotion_last_5years, hresource$satisfaction_level,
     ylab = "Satisfaction Level",
     xlab = "promotion",
     main = "relation between promotion(0 or 1) and satisfaction level ",
     col = "lightblue")
abline (relation2, col = "black")

summary(relation2)
## 
## Call:
## lm(formula = hresource$satisfaction_level ~ hresource$promotion_last_5years)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.5460 -0.1719  0.0281  0.2081  0.3881 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     0.611895   0.002051 298.273  < 2e-16 ***
## hresource$promotion_last_5years 0.044124   0.014067   3.137  0.00171 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2486 on 14997 degrees of freedom
## Multiple R-squared:  0.0006556,  Adjusted R-squared:  0.000589 
## F-statistic: 9.839 on 1 and 14997 DF,  p-value: 0.001712

Here p-value is less than 0.05, hence people leaving the company and promotion in last 5 years have significant relationship with each other.