Original

https://www.r-bloggers.com/hr-analytics-using-machine-learning-to-predict-employee-turnover/


Objective The current Assignment dataset is taken from GitHub which is on the Prediction of Employee Attrition and its adverse effects on few charecteristics.the particular image is the reference for the visualisation considered: https://i0.wp.com/www.business-science.io/figure/source/2017-9-18-hr_employee_attrition/unnamed-chunk-28-1.png?zoom=1.25&w=456

Deconstruct
The visualisation chosen had the following three main issues: * The visualisation used in the current reference does not explain the purpose of the decimal intervals on y-axis and does not satisfoctorily keep the viewers attention as it complicates the visualisation. Using of a constant color without much of a shae difference makes it an uneven visualtion for the viewer to intake such amount of information as it would stress the person looking for the first time and not so easy to understand. The visualisation shows the difference in attrition for different job roles but doesnt explain how it differs to a respective role with or without attrition in that specific role. It mixes up the analogy and many thoughts pop up before analysing the current visualisation

Reconstruct * Making sure that each specific job role is explained with and without the attrition on them as that sends out a clear message on who is affected by attrition. In an attempt to picture the visualisation we have used different plotting methods in order to have a neat presentation compared to the one already in use. In plot 4, the plot shows that employess of median monthly income who left is less than who are currently present, for the job roles Sales Respresentative,Research scientist,Labaratory Techinician. These are the low level job levels. Similarly Attrition by performance rating and Attrition rate by Working overtime is one of the way to neatly depict the factors influenced by attrition.

Reference * The current dataset has been taken from github. The link to the data is : https://github.com/varunagarwal97/Predict-Employee-Attrition/blob/master/HR.csv * . **. #### Code

The following code was used to fix the issues identified in the original.

library(tidyverse)
library(ggthemes)
library(ggrepel)
library(RColorBrewer)
library(scales)
library(gridExtra)
library(DT)
library(Information)
library(caret)
library(lattice)
data_emp <- read_csv("C:/Users/amand/OneDrive/Desktop/Predict-Employee-Attrition-master/HR.csv")
glimpse(data_emp)
## Observations: 1,470
## Variables: 35
## $ Age                      <dbl> 41, 49, 37, 33, 27, 32, 59, 30, 38, 3...
## $ Attrition                <chr> "Yes", "No", "Yes", "No", "No", "No",...
## $ BusinessTravel           <chr> "Travel_Rarely", "Travel_Frequently",...
## $ DailyRate                <dbl> 1102, 279, 1373, 1392, 591, 1005, 132...
## $ Department               <chr> "Sales", "Research & Development", "R...
## $ DistanceFromHome         <dbl> 1, 8, 2, 3, 2, 2, 3, 24, 23, 27, 16, ...
## $ Education                <dbl> 2, 1, 2, 4, 1, 2, 3, 1, 3, 3, 3, 2, 1...
## $ EducationField           <chr> "Life Sciences", "Life Sciences", "Ot...
## $ EmployeeCount            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ EmployeeNumber           <dbl> 1, 2, 4, 5, 7, 8, 10, 11, 12, 13, 14,...
## $ EnvironmentSatisfaction  <dbl> 2, 3, 4, 4, 1, 4, 3, 4, 4, 3, 1, 4, 1...
## $ Gender                   <chr> "Female", "Male", "Male", "Female", "...
## $ HourlyRate               <dbl> 94, 61, 92, 56, 40, 79, 81, 67, 44, 9...
## $ JobInvolvement           <dbl> 3, 2, 2, 3, 3, 3, 4, 3, 2, 3, 4, 2, 3...
## $ JobLevel                 <dbl> 2, 2, 1, 1, 1, 1, 1, 1, 3, 2, 1, 2, 1...
## $ JobRole                  <chr> "Sales Executive", "Research Scientis...
## $ JobSatisfaction          <dbl> 4, 2, 3, 3, 2, 4, 1, 3, 3, 3, 2, 3, 3...
## $ MaritalStatus            <chr> "Single", "Married", "Single", "Marri...
## $ MonthlyIncome            <dbl> 5993, 5130, 2090, 2909, 3468, 3068, 2...
## $ MonthlyRate              <dbl> 19479, 24907, 2396, 23159, 16632, 118...
## $ NumCompaniesWorked       <dbl> 8, 1, 6, 1, 9, 0, 4, 1, 0, 6, 0, 0, 1...
## $ Over18                   <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y...
## $ OverTime                 <chr> "Yes", "No", "Yes", "Yes", "No", "No"...
## $ PercentSalaryHike        <dbl> 11, 23, 15, 11, 12, 13, 20, 22, 21, 1...
## $ PerformanceRating        <dbl> 3, 4, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3...
## $ RelationshipSatisfaction <dbl> 1, 4, 2, 3, 4, 3, 1, 2, 2, 2, 3, 4, 4...
## $ StandardHours            <dbl> 80, 80, 80, 80, 80, 80, 80, 80, 80, 8...
## $ StockOptionLevel         <dbl> 0, 1, 0, 0, 1, 0, 3, 1, 0, 2, 1, 0, 1...
## $ TotalWorkingYears        <dbl> 8, 10, 7, 8, 6, 8, 12, 1, 10, 17, 6, ...
## $ TrainingTimesLastYear    <dbl> 0, 3, 3, 3, 3, 2, 3, 2, 2, 3, 5, 3, 1...
## $ WorkLifeBalance          <dbl> 1, 3, 3, 3, 3, 2, 2, 3, 3, 2, 3, 3, 2...
## $ YearsAtCompany           <dbl> 6, 10, 0, 8, 2, 7, 1, 1, 9, 7, 5, 9, ...
## $ YearsInCurrentRole       <dbl> 4, 7, 0, 7, 2, 7, 0, 0, 7, 7, 4, 5, 2...
## $ YearsSinceLastPromotion  <dbl> 0, 1, 0, 3, 2, 3, 0, 0, 1, 7, 0, 0, 4...
## $ YearsWithCurrManager     <dbl> 5, 7, 0, 0, 2, 6, 0, 0, 8, 7, 3, 8, 3...
#Employee Ecperience vs Job Level
ggplot(data_emp, aes(x = factor(JobLevel), y = TotalWorkingYears)) + geom_boxplot()

#The above plot clearly shows that the job level and the experience is directly related. The employee with job level 5 has highest experience compared to other employees.

#Attrition rates plots
#Attrition Rates
plot1 <- data_emp %>%group_by(Attrition)%>%summarize(attrition_count=n())%>%ungroup()%>%
  mutate(attrition_rate=round(attrition_count/sum(attrition_count),2))%>%ggplot(aes(x=reorder(Attrition,attrition_rate),y=attrition_rate,fill=Attrition)) + geom_bar(stat='identity',alpha=0.5) + theme_fivethirtyeight()+
  theme(axis.text.x=element_text(angle=0,vjust=0.5),legend.position='none',plot.title = element_text(size=10)) +labs(title="Attrition Rate")+scale_y_continuous(labels=percent_format())+geom_text(aes(label=attrition_rate),vjust=0.1)

plot1

#Attrition Rate over time
plot2 <- data_emp %>%group_by(OverTime)%>%
  summarize(attrition_rate=mean(Attrition=="Yes"))%>%ggplot(aes(x=reorder(OverTime,attrition_rate),y=attrition_rate,fill=OverTime)) + geom_bar(stat='identity',alpha=0.5) +  theme_fivethirtyeight()+
  theme(axis.text.x=element_text(angle=0,vjust=0.5),legend.position='none',plot.title = element_text(size=10)) +labs(title="Attrition Rate by Working OverTime")+scale_y_continuous(labels=percent_format())

plot2

# Attrition vs Performance Rating
plot3 <- ggplot(data_emp,aes(PerformanceRating,fill=Attrition))+geom_bar(alpha=0.5,position = "dodge")+scale_x_continuous(limits=c(1,5))+  theme_fivethirtyeight()+
  theme(axis.text.x=element_text(angle=0,vjust=0.5),legend.position='bottom',plot.title = element_text(size=14)) +labs(title="Attrition vs Performance Rating")+scale_y_continuous(labels=comma_format())

plot3

#Attrition vs Job Role and monthly salary
plot4 <- data_emp %>% ggplot(aes(JobRole,MonthlyIncome,fill=Attrition))+geom_boxplot(alpha=0.5)+coord_flip()+  theme_fivethirtyeight()+
  theme(axis.text.x=element_text(angle=0,vjust=0.5),legend.position='bottom',plot.title = element_text(size=10)) +scale_fill_manual(values=c("blue","green","yellow"))+
  labs(title="Job Level and Monthly Salary impact over Attrition")+scale_y_continuous(labels=comma_format())

plot4

#### Reconstruction