Libraries used

library(readxl)
## Warning: package 'readxl' was built under R version 3.5.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(arules)
## Warning: package 'arules' was built under R version 3.5.3
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 3.5.3
## Loading required package: grid

Sheet = “Revised_HTM_EA_Data” is read from the Excel Workbook.

data <- na.omit(read_excel("C:/Users/Sharathchandra/Desktop/Skill Assessment/Diamond_Resorts/Case-Fictional_HR-Sharathchandra.xlsx",sheet = "Revised_HTM_EA_Data"))

Data is cleansed and modified majorily in MS Excel as part of Excel Analytics - Pivot Tables & Analysis. The numerical values in columns except “EmployeeNumber” column has been put into bins and thus are characterized in Excel and then factrozied in R. Please see the formula applied for each column in Excel Sheet = “Revised_HTM_EA_Data”. This is done as a prerequiste to apply Apriori algorithm on the dataset. And, more importantly with words and bins it is easier to decode reasons for Attrition value and at the same time, to give a clear picture to the executives.

Attrition = No, 1233 observations Attrition = Yes, 237 observations Total = 1470 observations

data <- data %>% mutate_if(is.character, funs(as.factor))
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## please use list() instead
## 
## # Before:
## funs(name = f(.)
## 
## # After: 
## list(name = ~f(.))
## This warning is displayed once per session.
data <- data[,-c(1)]

First, I wanted to see the main factors affecting employees to stay and continue their employment in the company. Thus, I have checked for rules only with RHS equal to “Attrition=No”. I have tuned the algorithm according to the number of observations in the dataset.

I have made sure to remove the redundant rules and only take unique rules under consideration. These rules gives us a story of what factors are leading to employees to stay employed in the company. The visualization of the rules are sorted by confidence (With 90% confidence, these factors are important) and colored by lift (How many times these factors have occurred in rules?).

rulesHR1 <- apriori(data, parameter = list(supp = 0.1, conf = 0.9, maxlen = 5), appearance = list(default = "lhs", rhs = "Attrition=No"),control = list(verbose=F))
subsetRules1 <- which(colSums(is.subset(rulesHR1, rulesHR1)) > 1)
length(subsetRules1) 
## [1] 29
rulesHRAttritionNo  <- rulesHR1[-subsetRules1]
length(rulesHRAttritionNo)
## [1] 16
summary(rulesHRAttritionNo)
## set of 16 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4  5 
##  1 11  3  1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    3.00    3.00    3.25    3.25    5.00 
## 
## summary of quality measures:
##     support         confidence          lift           count      
##  Min.   :0.1143   Min.   :0.9000   Min.   :1.073   Min.   :168.0  
##  1st Qu.:0.1274   1st Qu.:0.9031   1st Qu.:1.077   1st Qu.:187.2  
##  Median :0.1418   Median :0.9088   Median :1.084   Median :208.5  
##  Mean   :0.1573   Mean   :0.9132   Mean   :1.089   Mean   :231.2  
##  3rd Qu.:0.1755   3rd Qu.:0.9209   3rd Qu.:1.098   3rd Qu.:258.0  
##  Max.   :0.3279   Max.   :0.9447   Max.   :1.126   Max.   :482.0  
## 
## mining info:
##  data ntransactions support confidence
##  data          1470     0.1        0.9
rulesConf1 <- sort(rulesHRAttritionNo, by = "confidence", decreasing = T)
inspect(head(rulesConf1))
##     lhs                                         rhs              support confidence     lift count
## [1] {Department=Research & Development,                                                           
##      MonthlyIncome=Income less than 10,000$} => {Attrition=No} 0.1394558  0.9447005 1.126285   205
## [2] {Age=Between 40 and 50 years,                                                                 
##      Department=Research & Development}      => {Attrition=No} 0.1333333  0.9333333 1.112733   196
## [3] {Age=Between 30 and 40 years,                                                                 
##      JobSatisfaction=Rating 4}               => {Attrition=No} 0.1258503  0.9250000 1.102798   185
## [4] {DistanceFromHome=Less than 5 Miles,                                                          
##      MonthlyIncome=Income less than 10,000$} => {Attrition=No} 0.1149660  0.9234973 1.101006   169
## [5] {Age=Between 40 and 50 years,                                                                 
##      BusinessTravel=Travel_Rarely}           => {Attrition=No} 0.1408163  0.9200000 1.096837   207
## [6] {Age=Between 30 and 40 years,                                                                 
##      YearsAtCompany=Between 6 and 10 years}  => {Attrition=No} 0.1278912  0.9126214 1.088040   188
rulesHRAttritionNoViz <- plot(rulesConf1, method = "graph")

Then, Iwanted to check for main factors affecting employees to quit their employment and leave the company. Thus, I have checked for rules only with RHS equal to “Attrition=Yes”. I have tuned the algorithm according to the number of observations in the dataset.

Same principle as explained before.

rulesHR2 <- apriori(data, parameter = list(supp = 0.0025, conf = 0.9, maxlen = 5), appearance = list(default = "lhs", rhs = "Attrition=Yes"),control = list(verbose=F))
length(rulesHR2)
## [1] 39
subsetRules2 <- which(colSums(is.subset(rulesHR2, rulesHR2)) > 1)
length(subsetRules2) 
## [1] 30
rulesHRAttritionYes  <- rulesHR2[-subsetRules2]
length(rulesHRAttritionYes)
## [1] 9
summary(rulesHRAttritionYes)
## set of 9 rules
## 
## rule length distribution (lhs + rhs):sizes
## 3 4 5 
## 1 2 6 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   5.000   4.556   5.000   5.000 
## 
## summary of quality measures:
##     support           confidence      lift           count      
##  Min.   :0.002721   Min.   :1    Min.   :6.203   Min.   :4.000  
##  1st Qu.:0.002721   1st Qu.:1    1st Qu.:6.203   1st Qu.:4.000  
##  Median :0.002721   Median :1    Median :6.203   Median :4.000  
##  Mean   :0.002797   Mean   :1    Mean   :6.203   Mean   :4.111  
##  3rd Qu.:0.002721   3rd Qu.:1    3rd Qu.:6.203   3rd Qu.:4.000  
##  Max.   :0.003401   Max.   :1    Max.   :6.203   Max.   :5.000  
## 
## mining info:
##  data ntransactions support confidence
##  data          1470  0.0025        0.9
rulesConf2 <- sort(rulesHRAttritionYes, by = "confidence", decreasing = T)
inspect(head(rulesConf2))
##     lhs                                           rhs                 support confidence     lift count
## [1] {Age=Less than 20 years,                                                                           
##      BusinessTravel=Travel_Frequently}         => {Attrition=Yes} 0.003401361          1 6.202532     5
## [2] {BusinessTravel=Travel_Frequently,                                                                 
##      JobRole=Sales Representative,                                                                     
##      JobSatisfaction=Rating 1}                 => {Attrition=Yes} 0.002721088          1 6.202532     4
## [3] {BusinessTravel=Travel_Frequently,                                                                 
##      DistanceFromHome=Between 6 and 10 miles,                                                          
##      JobRole=Sales Representative}             => {Attrition=Yes} 0.002721088          1 6.202532     4
## [4] {BusinessTravel=Travel_Frequently,                                                                 
##      DistanceFromHome=Greater than 26 miles,                                                           
##      JobLevel=Level 2,                                                                                 
##      JobRole=Sales Executive}                  => {Attrition=Yes} 0.002721088          1 6.202532     4
## [5] {BusinessTravel=Travel_Frequently,                                                                 
##      Department=Sales,                                                                                 
##      DistanceFromHome=Greater than 26 miles,                                                           
##      JobLevel=Level 2}                         => {Attrition=Yes} 0.002721088          1 6.202532     4
## [6] {Age=Between 30 and 40 years,                                                                      
##      DistanceFromHome=Between 16 and 20 miles,                                                         
##      JobLevel=Level 1,                                                                                 
##      JobSatisfaction=Rating 1}                 => {Attrition=Yes} 0.002721088          1 6.202532     4
rulesHRAttritionYesViz <- plot(rulesConf2, method = "graph")

Apart from this R Notebook, please do see the attached Excel for Analytics, Visualizations & Pivot Table and Word Document for Executive summary and Visualizations.