————————————————————————-

————————————————————————-

1.1 Import libraries

## Loading required package: ggplot2

————————————————————————-

1.2. Loading data

hr_data <- read.csv("data/HR_data_cleaned.csv",stringsAsFactors = FALSE)

————————————————————————-

1.3. Quick review of the dataset

str(hr_data)
## 'data.frame':    11991 obs. of  10 variables:
##  $ satisfaction_level   : num  0.38 0.8 0.11 0.72 0.37 0.41 0.1 0.92 0.89 0.42 ...
##  $ last_evaluation      : num  0.53 0.86 0.88 0.87 0.52 0.5 0.77 0.85 1 0.53 ...
##  $ number_project       : int  2 5 7 5 2 2 6 5 5 2 ...
##  $ average_montly_hours : int  157 262 272 223 159 153 247 259 224 142 ...
##  $ time_spend_company   : int  3 6 4 5 3 3 4 5 5 3 ...
##  $ work_accident        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ left                 : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ promotion_last_5years: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ department           : chr  "sales" "sales" "sales" "sales" ...
##  $ salary               : chr  "low" "medium" "medium" "low" ...
head(hr_data)
##   satisfaction_level last_evaluation number_project average_montly_hours
## 1               0.38            0.53              2                  157
## 2               0.80            0.86              5                  262
## 3               0.11            0.88              7                  272
## 4               0.72            0.87              5                  223
## 5               0.37            0.52              2                  159
## 6               0.41            0.50              2                  153
##   time_spend_company work_accident left promotion_last_5years department
## 1                  3             0    1                     0      sales
## 2                  6             0    1                     0      sales
## 3                  4             0    1                     0      sales
## 4                  5             0    1                     0      sales
## 5                  3             0    1                     0      sales
## 6                  3             0    1                     0      sales
##   salary
## 1    low
## 2 medium
## 3 medium
## 4    low
## 5    low
## 6    low
summary(hr_data)
##  satisfaction_level last_evaluation  number_project  average_montly_hours
##  Min.   :0.0900     Min.   :0.3600   Min.   :2.000   Min.   : 96.0       
##  1st Qu.:0.4800     1st Qu.:0.5700   1st Qu.:3.000   1st Qu.:157.0       
##  Median :0.6600     Median :0.7200   Median :4.000   Median :200.0       
##  Mean   :0.6297     Mean   :0.7167   Mean   :3.803   Mean   :200.5       
##  3rd Qu.:0.8200     3rd Qu.:0.8600   3rd Qu.:5.000   3rd Qu.:243.0       
##  Max.   :1.0000     Max.   :1.0000   Max.   :7.000   Max.   :310.0       
##  time_spend_company work_accident         left       promotion_last_5years
##  Min.   : 2.000     Min.   :0.0000   Min.   :0.000   Min.   :0.00000      
##  1st Qu.: 3.000     1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.00000      
##  Median : 3.000     Median :0.0000   Median :0.000   Median :0.00000      
##  Mean   : 3.365     Mean   :0.1543   Mean   :0.166   Mean   :0.01693      
##  3rd Qu.: 4.000     3rd Qu.:0.0000   3rd Qu.:0.000   3rd Qu.:0.00000      
##  Max.   :10.000     Max.   :1.0000   Max.   :1.000   Max.   :1.00000      
##   department           salary         
##  Length:11991       Length:11991      
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
## 

————————————————————————-

1.4. Variables treatment

hr_data$left <- as.factor(hr_data$left)
hr_data$left <- factor(hr_data$left,levels=c(0,1),
                       labels=c("People who stay","People who left"))

hr_data$promotion_last_5years <- as.factor(hr_data$promotion_last_5years)
hr_data$promotion_last_5years <- factor(hr_data$promotion_last_5years,levels=c(0,1),
                         labels=c("Not Promoted","Promoted")) 

hr_data$work_accident <- as.factor(hr_data$work_accident)
hr_data$work_accident <-factor(hr_data$work_accident,levels=c(0,1),
                            labels=c("No Accident","Accident")) 

————————————————————————-

1.5. Analysis through some visualizations

————————————————————————-

1.5.1.- Salary by department

Conclusions:

  • Majority of employees who left either had low or medium salary.
  • Sales, Support and Technical were the areas with the highest number of people who left.
  • Barley employees with high salaries left the company.

————————————————————————-

1.5.2.- Distribution of level of satisfaction

Conclusions:

  • For people who leave the company there are three different scenarios:
    • 0 -> staff totally disappointed
    • around 0,4-> group with satisfaction level below the average
    • 0.7-0.9 -> with high satisfacion and above the average
  • It seems clear to think that employees who were disappointed in the company left (employees with a satisfaction level below the mean) but, what happens to those who were happy in the company, why do they left?
  • There are no material differences among departments.

————————————————————————-

1.5.3.- Time spent at the company

Conclusions:

  • Most people left the company after working among 3-5 years
  • Most of the employees who stay in the company have been working for three years

————————————————————————-

1.5.4.- Average monthly hours worked

Conclusions:

  • The employees that worked until 150 hours and also those who worked more than 250 hours left the company more
  • employees working more than 6 years didn’t leave.
  • ten years is the maximum value so it could indicate it is a young company
  • During the first two years of experience, employees barely leave the company. After that, the proportion of people leaving the company is older and older. As the employees experience grows and passed the 6 year, they stop leaving.

————————————————————————-

1.5.5.- Number of Projects done by employee

Conclusions:

  • Above 50% people with 2 projects left the company, probably they worked hard and weren’t well paid or not highly valuated.
  • After the two first years at company, it seems that from 3 projects people start being overworked and the turnover starts to grow. So, it seems to be also a strong reason to leave the company.
  • Most of people who remain at company done among 3-5 projects.
  • Everybody with 7 projects counts left the company, probably they were overworked.

————————————————————————-

1.5.6.- Performance distribution by employee

Conclusions:

  • Those employees with a performance level between 0.6 and 0.8 remained in the company. However, both employees with performance level greater than 0.8 and employees with performance level below 0.6, tended to leave more of the company.

————————————————————————-

1.5.7.- Performance distribution by employee & department

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

————————————————————————-

1.5.8.- Work accidents

Conclusions:

  • As I could check in the previous section, there isn’t seemed to be a direct relation between accidents and leaving the company.