As an employer, we are always interested in
What factors are significantly associated with employees’ performance?
How to improve employees’ competence in their job?
From the data we collected and evaluate its features, we are going to explore the answers to the questions above.
library(tidyverse)
library(lattice)
hrs <- read_csv('hrs.csv')
indexing hrs.csv [===========================================] 9.72GB/s, eta: 0s
head(hrs)
sapply(hrs,class)
Employee_Name EmpID MarriedID
"character" "numeric" "numeric"
MaritalStatusID GenderID EmpStatusID
"numeric" "numeric" "numeric"
DeptID PerfScoreID FromDiversityJobFairID
"numeric" "numeric" "numeric"
Salary Termd PositionID
"numeric" "numeric" "numeric"
Position State Zip
"character" "character" "character"
DOB Sex MaritalDesc
"Date" "character" "character"
CitizenDesc HispanicLatino RaceDesc
"character" "character" "character"
DateofHire DateofTermination TermReason
"Date" "Date" "character"
EmploymentStatus Department ManagerName
"character" "character" "character"
ManagerID RecruitmentSource PerformanceScore
"numeric" "character" "character"
EngagementSurvey EmpSatisfaction SpecialProjectsCount
"numeric" "numeric" "numeric"
LastPerformanceReview_Date DaysLateLast30 Absences
"Date" "numeric" "numeric"
hrs$DOB <- as.Date(hrs$DOB,'%m/%d/%y')
hrs$DateofHire <- as.Date(hrs$DateofHire,'%m/%d/%Y')
hrs$LastPerformanceReview_Date <- as.Date(hrs$LastPerformanceReview_Date,'%m/%d/%Y')
hrs$DateofTermination <- as.Date(hrs$DateofTermination,'%m/%d/%Y')
hrs <- hrs %>% mutate_at(c('Position','State','Zip','Sex','MaritalDesc','CitizenDesc',
'RaceDesc','EmploymentStatus','Department','ManagerName',
'RecruitmentSource','PerformanceScore'),as.factor)
sapply(hrs,class)
Employee_Name EmpID MarriedID
"character" "numeric" "numeric"
MaritalStatusID GenderID EmpStatusID
"numeric" "numeric" "numeric"
DeptID PerfScoreID FromDiversityJobFairID
"numeric" "numeric" "numeric"
Salary Termd PositionID
"numeric" "numeric" "numeric"
Position State Zip
"factor" "factor" "factor"
DOB Sex MaritalDesc
"Date" "factor" "factor"
CitizenDesc HispanicLatino RaceDesc
"factor" "character" "factor"
DateofHire DateofTermination TermReason
"Date" "Date" "character"
EmploymentStatus Department ManagerName
"factor" "factor" "factor"
ManagerID RecruitmentSource PerformanceScore
"numeric" "factor" "factor"
EngagementSurvey EmpSatisfaction SpecialProjectsCount
"numeric" "numeric" "numeric"
LastPerformanceReview_Date DaysLateLast30 Absences
"Date" "numeric" "numeric"
hrs %>% select_if(is.numeric) %>% select(-1) %>% cor() %>% data.frame() %>%
select(6) %>% arrange(-PerfScoreID)
bwplot(PerfScoreID~EmpSatisfaction+EngagementSurvey | Department,data=hrs,
main='Employees Performance,satisfaction and engagement by department',
sub='Coef: satisfacetion 0.3035;engagement: 0.5449')
cor(hrs[c('EmpSatisfaction','EngagementSurvey')],hrs$PerfScoreID)
bwplot(PerfScoreID~EmpSatisfaction | Department,data=hrs,
main='Employees Performance,satisfaction by department',
sub='Coef: satisfacetion 0.3035')
bwplot(PerfScoreID~EngagementSurvey | Department,data=hrs,
main='Employees Performance,engagement by department',
sub='Coef: engagement: 0.5449')
bwplot(PerfScoreID~DaysLateLast30 | Department, data=hrs,
main='Employee Performance,DaysLateLast30 by department',
sub='Coef: DaysLateLast30: -0.7347')
Based on the exploration above , we can say there are 4 major factors related with employee performance, 1.Employee Engagement 2. Employee Satisfaction 3. Attendance 4. Salary.
We also find something unusual in Production and IT/IS. For Production, some employees were engaged more, but their performance was lower. That’s an issue to be addressed and needs to solve it. For IT/IS, some employees were late more, but their performance was higher, it’s also a conflict to be investigated and solved.