Set Working directory

Calling neccssary libraries

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggpubr
## 
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
## 
##     myeloma

Load csv file

people_analytics <- read.csv("WA_Fn-UseC_-HR-Employee-Attrition.csv")

View the structure of my dataset

str(people_analytics)
## 'data.frame':    1470 obs. of  35 variables:
##  $ Age                     : int  41 49 37 33 27 32 59 30 38 36 ...
##  $ Attrition               : chr  "Yes" "No" "Yes" "No" ...
##  $ BusinessTravel          : chr  "Travel_Rarely" "Travel_Frequently" "Travel_Rarely" "Travel_Frequently" ...
##  $ DailyRate               : int  1102 279 1373 1392 591 1005 1324 1358 216 1299 ...
##  $ Department              : chr  "Sales" "Research & Development" "Research & Development" "Research & Development" ...
##  $ DistanceFromHome        : int  1 8 2 3 2 2 3 24 23 27 ...
##  $ Education               : int  2 1 2 4 1 2 3 1 3 3 ...
##  $ EducationField          : chr  "Life Sciences" "Life Sciences" "Other" "Life Sciences" ...
##  $ EmployeeCount           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ EmployeeNumber          : int  1 2 4 5 7 8 10 11 12 13 ...
##  $ EnvironmentSatisfaction : int  2 3 4 4 1 4 3 4 4 3 ...
##  $ Gender                  : chr  "Female" "Male" "Male" "Female" ...
##  $ HourlyRate              : int  94 61 92 56 40 79 81 67 44 94 ...
##  $ JobInvolvement          : int  3 2 2 3 3 3 4 3 2 3 ...
##  $ JobLevel                : int  2 2 1 1 1 1 1 1 3 2 ...
##  $ JobRole                 : chr  "Sales Executive" "Research Scientist" "Laboratory Technician" "Research Scientist" ...
##  $ JobSatisfaction         : int  4 2 3 3 2 4 1 3 3 3 ...
##  $ MaritalStatus           : chr  "Single" "Married" "Single" "Married" ...
##  $ MonthlyIncome           : int  5993 5130 2090 2909 3468 3068 2670 2693 9526 5237 ...
##  $ MonthlyRate             : int  19479 24907 2396 23159 16632 11864 9964 13335 8787 16577 ...
##  $ NumCompaniesWorked      : int  8 1 6 1 9 0 4 1 0 6 ...
##  $ Over18                  : chr  "Y" "Y" "Y" "Y" ...
##  $ OverTime                : chr  "Yes" "No" "Yes" "Yes" ...
##  $ PercentSalaryHike       : int  11 23 15 11 12 13 20 22 21 13 ...
##  $ PerformanceRating       : int  3 4 3 3 3 3 4 4 4 3 ...
##  $ RelationshipSatisfaction: int  1 4 2 3 4 3 1 2 2 2 ...
##  $ StandardHours           : int  80 80 80 80 80 80 80 80 80 80 ...
##  $ StockOptionLevel        : int  0 1 0 0 1 0 3 1 0 2 ...
##  $ TotalWorkingYears       : int  8 10 7 8 6 8 12 1 10 17 ...
##  $ TrainingTimesLastYear   : int  0 3 3 3 3 2 3 2 2 3 ...
##  $ WorkLifeBalance         : int  1 3 3 3 3 2 2 3 3 2 ...
##  $ YearsAtCompany          : int  6 10 0 8 2 7 1 1 9 7 ...
##  $ YearsInCurrentRole      : int  4 7 0 7 2 7 0 0 7 7 ...
##  $ YearsSinceLastPromotion : int  0 1 0 3 2 3 0 0 1 7 ...
##  $ YearsWithCurrManager    : int  5 7 0 0 2 6 0 0 8 7 ...

Introduction

Summary stats

summary(people_analytics)
##       Age         Attrition         BusinessTravel       DailyRate     
##  Min.   :18.00   Length:1470        Length:1470        Min.   : 102.0  
##  1st Qu.:30.00   Class :character   Class :character   1st Qu.: 465.0  
##  Median :36.00   Mode  :character   Mode  :character   Median : 802.0  
##  Mean   :36.92                                         Mean   : 802.5  
##  3rd Qu.:43.00                                         3rd Qu.:1157.0  
##  Max.   :60.00                                         Max.   :1499.0  
##   Department        DistanceFromHome   Education     EducationField    
##  Length:1470        Min.   : 1.000   Min.   :1.000   Length:1470       
##  Class :character   1st Qu.: 2.000   1st Qu.:2.000   Class :character  
##  Mode  :character   Median : 7.000   Median :3.000   Mode  :character  
##                     Mean   : 9.193   Mean   :2.913                     
##                     3rd Qu.:14.000   3rd Qu.:4.000                     
##                     Max.   :29.000   Max.   :5.000                     
##  EmployeeCount EmployeeNumber   EnvironmentSatisfaction    Gender         
##  Min.   :1     Min.   :   1.0   Min.   :1.000           Length:1470       
##  1st Qu.:1     1st Qu.: 491.2   1st Qu.:2.000           Class :character  
##  Median :1     Median :1020.5   Median :3.000           Mode  :character  
##  Mean   :1     Mean   :1024.9   Mean   :2.722                             
##  3rd Qu.:1     3rd Qu.:1555.8   3rd Qu.:4.000                             
##  Max.   :1     Max.   :2068.0   Max.   :4.000                             
##    HourlyRate     JobInvolvement    JobLevel       JobRole         
##  Min.   : 30.00   Min.   :1.00   Min.   :1.000   Length:1470       
##  1st Qu.: 48.00   1st Qu.:2.00   1st Qu.:1.000   Class :character  
##  Median : 66.00   Median :3.00   Median :2.000   Mode  :character  
##  Mean   : 65.89   Mean   :2.73   Mean   :2.064                     
##  3rd Qu.: 83.75   3rd Qu.:3.00   3rd Qu.:3.000                     
##  Max.   :100.00   Max.   :4.00   Max.   :5.000                     
##  JobSatisfaction MaritalStatus      MonthlyIncome    MonthlyRate   
##  Min.   :1.000   Length:1470        Min.   : 1009   Min.   : 2094  
##  1st Qu.:2.000   Class :character   1st Qu.: 2911   1st Qu.: 8047  
##  Median :3.000   Mode  :character   Median : 4919   Median :14236  
##  Mean   :2.729                      Mean   : 6503   Mean   :14313  
##  3rd Qu.:4.000                      3rd Qu.: 8379   3rd Qu.:20462  
##  Max.   :4.000                      Max.   :19999   Max.   :26999  
##  NumCompaniesWorked    Over18            OverTime         PercentSalaryHike
##  Min.   :0.000      Length:1470        Length:1470        Min.   :11.00    
##  1st Qu.:1.000      Class :character   Class :character   1st Qu.:12.00    
##  Median :2.000      Mode  :character   Mode  :character   Median :14.00    
##  Mean   :2.693                                            Mean   :15.21    
##  3rd Qu.:4.000                                            3rd Qu.:18.00    
##  Max.   :9.000                                            Max.   :25.00    
##  PerformanceRating RelationshipSatisfaction StandardHours StockOptionLevel
##  Min.   :3.000     Min.   :1.000            Min.   :80    Min.   :0.0000  
##  1st Qu.:3.000     1st Qu.:2.000            1st Qu.:80    1st Qu.:0.0000  
##  Median :3.000     Median :3.000            Median :80    Median :1.0000  
##  Mean   :3.154     Mean   :2.712            Mean   :80    Mean   :0.7939  
##  3rd Qu.:3.000     3rd Qu.:4.000            3rd Qu.:80    3rd Qu.:1.0000  
##  Max.   :4.000     Max.   :4.000            Max.   :80    Max.   :3.0000  
##  TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany  
##  Min.   : 0.00     Min.   :0.000         Min.   :1.000   Min.   : 0.000  
##  1st Qu.: 6.00     1st Qu.:2.000         1st Qu.:2.000   1st Qu.: 3.000  
##  Median :10.00     Median :3.000         Median :3.000   Median : 5.000  
##  Mean   :11.28     Mean   :2.799         Mean   :2.761   Mean   : 7.008  
##  3rd Qu.:15.00     3rd Qu.:3.000         3rd Qu.:3.000   3rd Qu.: 9.000  
##  Max.   :40.00     Max.   :6.000         Max.   :4.000   Max.   :40.000  
##  YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
##  Min.   : 0.000     Min.   : 0.000          Min.   : 0.000      
##  1st Qu.: 2.000     1st Qu.: 0.000          1st Qu.: 2.000      
##  Median : 3.000     Median : 1.000          Median : 3.000      
##  Mean   : 4.229     Mean   : 2.188          Mean   : 4.123      
##  3rd Qu.: 7.000     3rd Qu.: 3.000          3rd Qu.: 7.000      
##  Max.   :18.000     Max.   :15.000          Max.   :17.000

Checking for missing values

sapply(people_analytics, function(x) sum(is.na(x)))
##                      Age                Attrition           BusinessTravel 
##                        0                        0                        0 
##                DailyRate               Department         DistanceFromHome 
##                        0                        0                        0 
##                Education           EducationField            EmployeeCount 
##                        0                        0                        0 
##           EmployeeNumber  EnvironmentSatisfaction                   Gender 
##                        0                        0                        0 
##               HourlyRate           JobInvolvement                 JobLevel 
##                        0                        0                        0 
##                  JobRole          JobSatisfaction            MaritalStatus 
##                        0                        0                        0 
##            MonthlyIncome              MonthlyRate       NumCompaniesWorked 
##                        0                        0                        0 
##                   Over18                 OverTime        PercentSalaryHike 
##                        0                        0                        0 
##        PerformanceRating RelationshipSatisfaction            StandardHours 
##                        0                        0                        0 
##         StockOptionLevel        TotalWorkingYears    TrainingTimesLastYear 
##                        0                        0                        0 
##          WorkLifeBalance           YearsAtCompany       YearsInCurrentRole 
##                        0                        0                        0 
##  YearsSinceLastPromotion     YearsWithCurrManager 
##                        0                        0

Age distribution

Converting categorical variables to factors

people_analytics$Attrition <- factor(people_analytics$Attrition)
people_analytics$BusinessTravel <- factor(people_analytics$BusinessTravel)
people_analytics$Department <- factor(people_analytics$Department)
people_analytics$EducationField <- factor(people_analytics$EducationField)
people_analytics$Gender <- factor(people_analytics$Gender)
people_analytics$JobRole <- factor(people_analytics$JobRole)
people_analytics$MaritalStatus <- factor(people_analytics$MaritalStatus)
people_analytics$Over18 <- factor(people_analytics$Over18)
people_analytics$OverTime <- factor(people_analytics$OverTime)

Attrition by Gender

Attrition by Job Satisfaction

This implies that higher levels of job satisfaction are linked to lower attrition rates, emphasising the need of addressing job satisfaction-related aspects in order to reduce employee turnover.

Attrition by Distance From Home

Attrition by Department

Average Attrition by Department

This information is useful for identifying departments with higher-than-average attrition rates, highlighting potential areas for additional research or action. Furthermore, it aids in identifying departments with reduced attrition rates, which may serve as models for best practices or areas where retention measures are particularly effective.

Creating survival object and fitting Cox proportional hazards regression model

## Call:
## coxph(formula = SurvivalData ~ Age + Gender + Department + JobSatisfaction + 
##     Education + DistanceFromHome + EnvironmentSatisfaction, data = people_analytics)
## 
##   n= 1470, number of events= 237 
## 
##                                       coef exp(coef)  se(coef)      z Pr(>|z|)
## Age                              -0.092092  0.912021  0.010354 -8.894  < 2e-16
## GenderMale                        0.136691  1.146474  0.136033  1.005 0.314977
## DepartmentResearch & Development -0.331178  0.718077  0.303638 -1.091 0.275404
## DepartmentSales                   0.037113  1.037810  0.308860  0.120 0.904355
## JobSatisfaction                  -0.237065  0.788940  0.057515 -4.122 3.76e-05
## Education                        -0.003332  0.996674  0.064957 -0.051 0.959094
## DistanceFromHome                  0.020144  1.020348  0.007564  2.663 0.007741
## EnvironmentSatisfaction          -0.224178  0.799173  0.060131 -3.728 0.000193
##                                     
## Age                              ***
## GenderMale                          
## DepartmentResearch & Development    
## DepartmentSales                     
## JobSatisfaction                  ***
## Education                           
## DistanceFromHome                 ** 
## EnvironmentSatisfaction          ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                                  exp(coef) exp(-coef) lower .95 upper .95
## Age                                 0.9120     1.0965    0.8937    0.9307
## GenderMale                          1.1465     0.8722    0.8782    1.4968
## DepartmentResearch & Development    0.7181     1.3926    0.3960    1.3021
## DepartmentSales                     1.0378     0.9636    0.5665    1.9012
## JobSatisfaction                     0.7889     1.2675    0.7048    0.8831
## Education                           0.9967     1.0033    0.8775    1.1320
## DistanceFromHome                    1.0203     0.9801    1.0053    1.0356
## EnvironmentSatisfaction             0.7992     1.2513    0.7103    0.8991
## 
## Concordance= 0.737  (se = 0.018 )
## Likelihood ratio test= 147.8  on 8 df,   p=<2e-16
## Wald test            = 129.9  on 8 df,   p=<2e-16
## Score (logrank) test = 134.9  on 8 df,   p=<2e-16

It is useful because it can handle scenarios in which not everyone experiences the event under consideration (such as leaving a job), and it can examine multiple aspects at once. Furthermore, it produces understandable data, allowing us to identify which characteristics are most essential in predicting why people may leave their positions.

This analysis employing the Cox proportional hazards regression model provides insight into important predictors of attrition inside an organisation.

Here’s what it means for businesses.

Overall, the model’s concordance statistic (c-index) of 0.737 indicates that it has strong predictive potential for discriminating between departing and remaining personnel. The likelihood ratio test, Wald test, and Score (logrank) test all show that the model’s coefficients differ considerably from zero, confirming its validity in predicting attrition.

Forest plot of the Cox proportional hazards regression model

ggforest(cox_model, main = "Cox Proportional Hazards Regression")

Identifying high-risk roles with high attrition rates

The key takeaway here is that different job roles have varying attrition rates inside the firm. For example:

Understanding attrition rates by job role can help businesses in a variety of ways:

Overall, this study sheds light on workforce dynamics and can help guide strategic decisions targeted at increasing staff retention and organisational stability.

Identifying high-risk departments with high attrition rates

This data reveals that different departments within the organisation experience varying turnover rates:

Understanding the departmental attrition rates provides useful insights for business management.

Understanding and resolving departmental disparities in attrition rates allows firms to improve employee satisfaction, minimise turnover costs, and retain a stable and productive workforce.