1 Human Resource Analytics

Out of curiousity I took this datasets from one of the Medium article. My small step effort to understand the HR analytics part of it.Thank you Nafee Afshin for such a beautiful blog and various R community Kagglers and Coders for breaking it down and make easy to understand various elements of Human Resource.

We all know that there is job opening in every company every now and then. Depends upon size of human resource how often they have their position open. It’s not easy for both employer and employee to find the right working conditions, satisfaction level and many other things to consider for a perfect match. Apart from finding perfect match both the epmloyer and employee need to invest both time and money to train within the companies policy and working condition, if there is new tool to learn they have to invest the time to learn. CRISP-DM stands for CRoss Industry Standard Process for Data Mining.

When we apply for the job or start working for any company we should be familiar with Human Resource Department of the organisation. In some big organization Human resource (HR) departments seems increasingly turning to analytics department by having Business Intelligence people working for them to drive company hiring and workforce management more smoothly and within budget. As with every nooks and corner of the company (e.g. marketing, finance, etc), HR departments also do require analytics solutions which could be very unique to their domain and reltional database between departments.

We might think in particular, HR departments would usually have ‘small’ datasets in relation to other departments which might not be true becuase HR department is very crucial department for growth of the company. They are more intepretaion driven than predictive based model driven.

General skills can be transfered from organisation to organisation but how to apply those tools depends upon the goal of the company and the projects everyone is involved. It all depends upon what is the motive behind working. The goal of most of the company is to utilise its manpower in right direction so that it can propel its ship smoothly. There is always bumps on the way, someone leaving the company for better future, family and better offer when someone has skills & proper talent. It is definetly not in favorable condition for the company to loose hardworking and good ethics employeer. But wouldn’t it be ideal for the company to invest time to find the reason What makes employer leave the company. If it just because of current Manager,Employeer Working Satisfaction, distance to commute, Working Environment etc. There are various factors if one need to consider what makes Employee/ Emloyeer happy. It is balance of lot of things which play critical role. I am trying to explore some of this with this small datasets . We all love Revenue turnover whether we are employee or employeer but no one likes Employee Churn over ratio or Attrition. Lets dive in and see what can we find from this datasets.

Attrition: It is basically the turnover rate of employees inside an organization.

The goal of this particular project would be find if there is any relationship between the variables and if we could find the insights for Employee Churn over the period of time. If possible I would love to do some Machine learning to predict the Churn ratio over period of time.

Just to make things little clear Human Analytics is not only gathering data about employee and increasing efficiency but also its aim is to return better ROI for the company and keep employee happy with the fulfillment of need and happier workplace enivironment. If we could better understand the patterns and take right steps we all wil be benefitting from each other.

So lets start diving into our datasets by importing necessary libraries.

Library

2 DATA INSIGHTS:

Lets have a glimpse of our datasets.

## Observations: 1,200
## Variables: 28
## $ EmpNumber                    <fct> E1001000, E1001006, E1001007, E1001…
## $ Age                          <int> 32, 47, 40, 41, 60, 27, 50, 28, 36,…
## $ Gender                       <fct> Male, Male, Male, Male, Male, Male,…
## $ EducationBackground          <fct> Marketing, Marketing, Life Sciences…
## $ MaritalStatus                <fct> Single, Single, Married, Divorced, …
## $ EmpDepartment                <fct> Sales, Sales, Sales, Human Resource…
## $ EmpJobRole                   <fct> Sales Executive, Sales Executive, S…
## $ BusinessTravelFrequency      <fct> Travel_Rarely, Travel_Rarely, Trave…
## $ DistanceFromHome             <int> 10, 14, 5, 10, 16, 10, 8, 1, 8, 1, …
## $ EmpEducationLevel            <int> 3, 4, 4, 4, 4, 2, 4, 2, 3, 3, 3, 3,…
## $ EmpEnvironmentSatisfaction   <int> 4, 4, 4, 2, 1, 4, 4, 1, 1, 3, 1, 4,…
## $ EmpHourlyRate                <int> 55, 42, 48, 73, 84, 32, 54, 67, 63,…
## $ EmpJobInvolvement            <int> 3, 3, 2, 2, 3, 3, 3, 1, 4, 3, 1, 3,…
## $ EmpJobLevel                  <int> 2, 2, 3, 5, 2, 3, 1, 1, 3, 3, 1, 4,…
## $ EmpJobSatisfaction           <int> 4, 1, 1, 4, 1, 1, 2, 2, 1, 3, 3, 3,…
## $ NumCompaniesWorked           <int> 1, 2, 5, 3, 8, 1, 7, 7, 9, 4, 2, 9,…
## $ OverTime                     <fct> No, No, Yes, No, No, No, No, Yes, N…
## $ EmpLastSalaryHikePercent     <int> 12, 12, 21, 15, 14, 21, 15, 13, 14,…
## $ EmpRelationshipSatisfaction  <int> 4, 4, 3, 2, 4, 3, 4, 4, 1, 4, 3, 4,…
## $ TotalWorkExperienceInYears   <int> 10, 20, 20, 23, 10, 9, 4, 10, 10, 1…
## $ TrainingTimesLastYear        <int> 2, 2, 2, 2, 1, 4, 2, 4, 2, 4, 5, 2,…
## $ EmpWorkLifeBalance           <int> 2, 3, 3, 2, 3, 2, 3, 3, 3, 4, 3, 2,…
## $ ExperienceYearsAtThisCompany <int> 10, 7, 18, 21, 2, 9, 2, 7, 8, 1, 5,…
## $ ExperienceYearsInCurrentRole <int> 7, 7, 13, 6, 2, 7, 2, 7, 7, 0, 2, 2…
## $ YearsSinceLastPromotion      <int> 0, 1, 1, 12, 2, 1, 2, 3, 0, 0, 1, 1…
## $ YearsWithCurrManager         <int> 8, 7, 12, 6, 2, 7, 2, 7, 5, 0, 4, 1…
## $ Attrition                    <fct> No, No, No, No, No, No, No, Yes, No…
## $ PerformanceRating            <int> 3, 3, 4, 3, 3, 4, 3, 3, 3, 3, 3, 3,…

3 Exploratory Data Analysis

## [1] "Missing Data"

## $data.frame
##    name   size
## 1 hr_df 0.2 Mb
## 
## $dimensions
##   rows columns
## 1 1200      28
## 
## $column.details
##                          column   class unique.values missing.count
## 1                     EmpNumber  factor          1200             0
## 2                 EmpHourlyRate integer            71             0
## 3                           Age integer            43             0
## 4    TotalWorkExperienceInYears integer            40             0
## 5  ExperienceYearsAtThisCompany integer            37             0
## 6              DistanceFromHome integer            29             0
## 7                    EmpJobRole  factor            19             0
## 8  ExperienceYearsInCurrentRole integer            19             0
## 9          YearsWithCurrManager integer            18             0
## 10      YearsSinceLastPromotion integer            16             0
## 11     EmpLastSalaryHikePercent integer            15             0
## 12           NumCompaniesWorked integer            10             0
## 13        TrainingTimesLastYear integer             7             0
## 14          EducationBackground  factor             6             0
## 15                EmpDepartment  factor             6             0
## 16            EmpEducationLevel integer             5             0
## 17                  EmpJobLevel integer             5             0
## 18   EmpEnvironmentSatisfaction integer             4             0
## 19            EmpJobInvolvement integer             4             0
## 20           EmpJobSatisfaction integer             4             0
## 21  EmpRelationshipSatisfaction integer             4             0
## 22           EmpWorkLifeBalance integer             4             0
## 23                MaritalStatus  factor             3             0
## 24      BusinessTravelFrequency  factor             3             0
## 25            PerformanceRating integer             3             0
## 26                       Gender  factor             2             0
## 27                     OverTime  factor             2             0
## 28                    Attrition  factor             2             0
##    missing.pct
## 1            0
## 2            0
## 3            0
## 4            0
## 5            0
## 6            0
## 7            0
## 8            0
## 9            0
## 10           0
## 11           0
## 12           0
## 13           0
## 14           0
## 15           0
## 16           0
## 17           0
## 18           0
## 19           0
## 20           0
## 21           0
## 22           0
## 23           0
## 24           0
## 25           0
## 26           0
## 27           0
## 28           0

We have almost 1200 rows of data with 28 variables. Total size of our datasets is 0.2MB.

Sanity Check:

Look for any missing Rows and the graph shows no missing rows.Aww thats so sweet we don’t have any missing data at all. It seems like we have clean datasets which happens rarely in Data Analytics field and especially in People Analytics as lot of information are personal and because of privacy, confidentaility lots of data are not provided publicly.

3.1 Head

We will analyse most of the individuals variables at first and then try to link dependent and independent variables for both machine learning and Exploratory Data Analysis.

Lets start by Exploratory Data Analysis. Lets look at the top of the table to get sense of datasets.

EmpNumber	Age	Gender	EducationBackground	MaritalStatus	EmpDepartment	EmpJobRole	BusinessTravelFrequency	DistanceFromHome	EmpEducationLevel	EmpEnvironmentSatisfaction	EmpHourlyRate	EmpJobInvolvement	EmpJobLevel	EmpJobSatisfaction	NumCompaniesWorked	OverTime	EmpLastSalaryHikePercent	EmpRelationshipSatisfaction	TotalWorkExperienceInYears	TrainingTimesLastYear	EmpWorkLifeBalance	ExperienceYearsAtThisCompany	ExperienceYearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager	Attrition	PerformanceRating
E1001000	32	Male	Marketing	Single	Sales	Sales Executive	Travel_Rarely	10	3	4	55	3	2	4	1	No	12	4	10	2	2	10	7	0	8	No	3
E1001006	47	Male	Marketing	Single	Sales	Sales Executive	Travel_Rarely	14	4	4	42	3	2	1	2	No	12	4	20	2	3	7	7	1	7	No	3
E1001007	40	Male	Life Sciences	Married	Sales	Sales Executive	Travel_Frequently	5	4	4	48	2	3	1	5	Yes	21	3	20	2	3	18	13	1	12	No	4
E1001009	41	Male	Human Resources	Divorced	Human Resources	Manager	Travel_Rarely	10	4	2	73	2	5	4	3	No	15	2	23	2	2	21	6	12	6	No	3
E1001010	60	Male	Marketing	Single	Sales	Sales Executive	Travel_Rarely	16	4	1	84	3	2	1	8	No	14	4	10	1	3	2	2	2	2	No	3

3.2 Sample

By looking at the top of table we should usually get the sense of what kind of datasets we are dealing with. What does the column and value represents. But going one further step I usually love to randomly sample my datasets to find something new. It doesnt take long time but saves much time if I could catch something fishy in the beginning of the analysis.

If I have small set of data I usually use sample_n to pass around and get sense.We could also use sample_frac to see more random fraction of data.

	EmpNumber	Age	Gender	EducationBackground	MaritalStatus	EmpDepartment	EmpJobRole	BusinessTravelFrequency	DistanceFromHome	EmpEducationLevel	EmpEnvironmentSatisfaction	EmpHourlyRate	EmpJobInvolvement	EmpJobLevel	EmpJobSatisfaction	NumCompaniesWorked	OverTime	EmpLastSalaryHikePercent	EmpRelationshipSatisfaction	TotalWorkExperienceInYears	TrainingTimesLastYear	EmpWorkLifeBalance	ExperienceYearsAtThisCompany	ExperienceYearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager	Attrition	PerformanceRating
154	E1001281	47	Female	Life Sciences	Single	Sales	Sales Executive	Travel_Rarely	4	2	4	83	3	2	4	1	Yes	17	3	9	0	3	9	0	0	7	No	3
408	E1001737	31	Female	Life Sciences	Single	Research & Development	Manager R&D	Travel_Frequently	1	5	3	100	4	3	2	1	No	11	1	10	2	3	10	8	4	7	Yes	3

3.3 Data Lookup

At first glance,by running Random Sample all the datasets looks normal. Lets see how many unique value each column has so that we can get more gist of it.

If we would like to get more sense of data. I usually run the unique code in each column to know which field are interesting enough to diagnose. If any column have handful of unique values it is easy to look into without wasting lot of times.

## [1] Male   Female
## Levels: Female Male

## [1] Single   Married  Divorced
## Levels: Divorced Married Single

3.4 Duplicate data

## No duplicate combinations found of: EmpNumber

## [1] 1200   28

3.5 Range

If we would like to drill down each column then we can do this by following way knowing min and max value of each column which give you sense of how spread your data is.Sometimes you can catch the outliers easily instead of plotting boxplot everytime.

EmpNumber	E1001000	E100998
Age	18	60
Gender	Female	Male
EducationBackground	Human Resources	Technical Degree
MaritalStatus	Divorced	Single
EmpDepartment	Data Science	Sales
EmpJobRole	Business Analyst	Technical Lead
BusinessTravelFrequency	Non-Travel	Travel_Rarely
DistanceFromHome	1	29
EmpEducationLevel	1	5
EmpEnvironmentSatisfaction	1	4
EmpHourlyRate	30	100
EmpJobInvolvement	1	4
EmpJobLevel	1	5
EmpJobSatisfaction	1	4
NumCompaniesWorked	0	9
OverTime	No	Yes
EmpLastSalaryHikePercent	11	25
EmpRelationshipSatisfaction	1	4
TotalWorkExperienceInYears	0	40
TrainingTimesLastYear	0	6
EmpWorkLifeBalance	1	4
ExperienceYearsAtThisCompany	0	40
ExperienceYearsInCurrentRole	0	18
YearsSinceLastPromotion	0	15
YearsWithCurrManager	0	17
Attrition	No	Yes
PerformanceRating	2	4

3.6 Recode

Lets Recode some of our attributes for quick analysis as well as to fit better in our graphical representation.

EmpDepartment
Education Level
Age_bin
Generation
Hike Percentage
Years With manager
Emp HourlyRate

This is the Employee Department
Var1	Freq
DS	20
Development	361
Finance	49
HR	54
R&D	343
Sales	373

This is the Education Level
Var1	Freq
Bachelors Degree	0.3741667
College Degree	0.1991667
Masters Degree	0.2683333
No College Degree	0.1233333
Phd	0.0350000

This is the Age_bin distribution
Var1	Freq
0_20	0.0183333
20_30	0.2391667
30_40	0.4250000
40_50	0.2241667
50_60	0.0933333

This is the Generation distribution
Var1	Freq
Baby Boomers	52
Gen Xers	406
Millenials Gen Y.1	216
Millenials Gen Y.2	526

This is the Salary Hike Percentage distribution
Var1	Freq
11-13	324
14-16	422
17-20	271
21-23	131
24-26	52

This is the relationship with Manager in years distribution
Var1	Freq
Pro-Veteran	57
1-2 years Old	348
2-5 Years hired	214
5-10 Years Vet	366
Recently Hired Newbie	215

This is the Hourly Rate distribution
Var1	Freq
30_39	135
40_49	183
50_59	167
60_69	166
70_79	184
80_89	168
90_100	197

4 Gender

Lets start to dig in with the Gender distribution.

Employee Distribution Count by Gender
How is Male and Female distributed Across the Department ?

EmpDepartment	m_count	pct_m	f_count	pct_f
Development	219	30	142	30
Sales	216	30	157	33
R&D	214	30	129	27
HR	37	5	17	4
Finance	27	4	22	5
DS	12	2	8	2

A

60% Male and 40% Female.

B

Above chart shows how Female and Male population are distributed among Department. Male population are equally distributed among the top three branches of the organisation whereas we can see slight deviation in Female department. Females seems to love the Sales department out of all brances.

5 EmpDepartment

1. Waterfall Chart
1. Pie chart
1. Donut Chart

Employee distributed among the Organization

Three biggest Department are: Sales, Development & R&D. Distribution are

Sales : 31.1%
Development : 30.6%
R&D : 28.6%
HR : 4.5%
Finance : 4.08%
DS : 1.67 %

5.1 EmpDepartment & Gender

A.Workforce breakdown among different Department
B.Workforce breakdown among different Department by Gender
C.EmpDepartment and Educational_Background
D.EmpDepartment and Educational_Levels

A

More Employee working in Sales department tends to leave compare to rest.
HR being the second.

B

It seems like most of the department is splitted between 60% Male population and 40% Female population except Human Resource department where there are 70% Male and 30% Female population.

C

Major Dominant in all Departments is Life Science Field except in Human Resource department where backgorund field knowledge is Human Resource and Sales department where the person need to have Marketing background knowledge.

D

In all field major Education level is Bachelor Degree.
Except Data Science where major education level is College Degree(35%).
Finance: Compare to all other departments Finance have 10% Employee with PhD degree.
R&D: R&D have more Master Degree Employee comapare to other departments
HR and Finance: HR and Finance have 43% population with bachelor degree.
HR: HR has less employee with no college degree compare to rest of department.

6 Employee Job Role

A

We can see how does the Hourly Rate and Job Role impacts the attrition Rate.

B,C,D

Sales Executive and Developer are two pillars of whole organization. There are more Male population compare to Female Population in both field as it is clear from all three charts.
Attrition of Job Role Employee by Hourly Rate. We will dig into analysis part later down below.

E

Except Technical Architect and Senior Manager in R&D all other JobRole Employee could make 90-100$ an hour(colored PinK label).
22% Manager R&D make almost $ 90-100 an hour.
Technical Architecture has less layers of Hourly distribution (Only 4 )
Most of the Technical Architect fall between
$30-$39 : 29%
$40-$49 : 29%
$50-$59 : 14%
$70-$79 : 29%
Senior Manager R&D Payscale lies mostly between $60-$69 about 33%.
Most of the business Analysr make about $50-$59 an hour about 31%
Research Director Starting hourly Rate is $40-$$49.
Only 10% Data Scientist make $90-$100 an hour.

F & G

Business Analyst and Healthcare Reprsentative Job Role are prefered to Younger Female Candidate.
Research Director before age 40 goes mostly to Females than to Males.
Technical Lead, Senior Manager R&D , Senior Developer, Manufacturing Director have more Senior Female than Senior Male.
Delivery Managers before Age 40 goes to Male than to Female.

H

We can see most of the employee around average age 30-35 are attributed more regardless of Job Role.
Technical Lead , Sales Executive and Healthcare Representative have same average age of attrition.
Manufacturing Director, Manager and Business Analyst also have same age of attrition.

I

We could spot the average pay difference between Employee who tends to leave the company and who stays.
The average difference for each some jobs are pretty close whereas for some jobs there is huge difference.

J

PhD: - We can also Not every Job Role requires PhD so the Education background is limited according to Job Role.

Master Degree: - Senior Manager R&D have more Master degree.

Bacehlor Degree: - Almsot 53% Research Director have Bacehlor Degree.

College Degree: - Most of Data Scientist have only College Degree almost 35%

No College Degree: - 28% Sales Representative have no College Degree.

K

In Some Job Role Female Tends to make more money than Male do.
In Some the difference is Minimal and in some there is huge difference which could be because of Education levels and which could be becuase of Experience in Years.

7 Education level

Education level by Gender
Education level & Attrition.
Education level & Gender.
Education level, Attrition and Department

A

As we can see that Most of our workforce have Bachelor degree i.e 37% followed by Master degree i.e 27%. As we aspect Phd are very less comprising 4%.

B

when we break it down by Gender it seems like there are equal number of Phd candidates.42 Employee have done Phd and when we even break it down by gender it seems like we have equal number of Male and Female Employee.

C

Employee with no college degree seems to leave the company compare to rest of the education field either could be for getting higher education degreee or for lack of growth inside the organization.

D

Education level doesn’t play any role in How much Money you make. Education level is independent of How much money you could make.

E, F & G

Phd:
7% Employee are aged between 20-30.
45% Employee Aged between 30-40.
52% Married Employee have PhD.
Master Degree:
10 % Employee are aged between 20-30
56 % Employee are aged between 30-40.
45 % Married Employee have Master Degree.
Bachelor Degree:
3 % have Employee either have already completed or have Bachelor degree under Age 20.
38% Employee are aged between 30-40
45 % Married Employee have Bachelor Degree.
College Degree
- 2% are Aged Under 20.
- 40 % Employee are aged between 30-40
- 26% Married Employee have College Degree.
No College Degree:
4% Employee are aged under 20
4% Employee are aged 50-60
48% Employee between age 20-30
49% Married Employee have No College Degree.
73% Millenials have No College Degree.
3% Baby Boomers have No colllege Degree.
1 % Silent Gen who have No college Degree.

H

Phd:
33% Attrition in Human Resource Department.
0% Attrition in Finance.
Master Degree:
25 % Attrition in Data Science
9 % Attrition in Finance.
Bachelor Degree:
19% Attrition in Sales Department.
5% Attrition in Finance Department.
College Degree:
22% Attrition in Sales Department.
0% Attrition in Data Science.
No College Degree:
50 % Attrition in Data Science Department.
0 % Attrition in Human Resource.

8 Age

Lets look at the Age distribution in our datasets.

Density Age Distribution
Barplot Age Distribution
Mean Age Distribution
Percentage of Male & Female Distribution

Highest Age of Employment in the organisation is Age: 60.

This is the Maximum Age_bin distribution
	EmpNumber	Age	Gender	EducationBackground	MaritalStatus	EmpDepartment	EmpJobRole	BusinessTravelFrequency	DistanceFromHome	EmpEducationLevel	EmpEnvironmentSatisfaction	EmpHourlyRate	EmpJobInvolvement	EmpJobLevel	EmpJobSatisfaction	NumCompaniesWorked	OverTime	EmpLastSalaryHikePercent	EmpRelationshipSatisfaction	TotalWorkExperienceInYears	TrainingTimesLastYear	EmpWorkLifeBalance	ExperienceYearsAtThisCompany	ExperienceYearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager	Attrition	PerformanceRating	Educational_Levels	Age_bin	Generation	Hike_Pct	CatYearsManager	hourly_bin
5	E1001010	60	Male	Marketing	Single	Sales	Sales Executive	Travel_Rarely	16	4	1	84	3	2	1	8	No	14	4	10	1	3	2	2	2	2	No	3	Masters Degree	50_60	Baby Boomers	14-16	1-2 years Old	80_89
549	E1001975	60	Male	Medical	Divorced	R&D	Healthcare Representative	Travel_Rarely	1	4	3	92	1	3	4	3	No	20	3	19	2	4	1	0	0	0	No	4	Masters Degree	50_60	Baby Boomers	21-23	Recently Hired Newbie	90_100
1105	E100827	60	Female	Life Sciences	Married	Development	Developer	Travel_Rarely	7	3	1	41	3	5	1	5	No	11	4	33	5	1	29	8	11	10	No	3	Bachelors Degree	50_60	Baby Boomers	11-13	5-10 Years Vet	40_49

This is the Minimum Age_bin distribution
	EmpNumber	Age	Gender	EducationBackground	MaritalStatus	EmpDepartment	EmpJobRole	BusinessTravelFrequency	DistanceFromHome	EmpEducationLevel	EmpEnvironmentSatisfaction	EmpHourlyRate	EmpJobInvolvement	EmpJobLevel	EmpJobSatisfaction	NumCompaniesWorked	OverTime	EmpLastSalaryHikePercent	EmpRelationshipSatisfaction	TrainingTimesLastYear	EmpWorkLifeBalance	Attrition	PerformanceRating	Educational_Levels	Age_bin	Generation	Hike_Pct	CatYearsManager	hourly_bin
160	E1001290	18	Male	Life Sciences	Single	R&D	Research Scientist	Non-Travel	5	2	2	73	3	1	4	1	No	15	4	2	3	No	3	College Degree	0_20	Millenials Gen Y.1	14-16	Recently Hired Newbie	70_79
243	E1001434	18	Male	Medical	Single	R&D	Laboratory Technician	Non-Travel	8	1	3	80	3	1	3	1	No	12	4	0	3	Yes	3	No College Degree	0_20	Millenials Gen Y.1	11-13	Recently Hired Newbie	80_89
358	E1001646	18	Female	Life Sciences	Single	R&D	Manager R&D	Non-Travel	1	3	4	97	3	1	4	1	No	15	3	5	4	No	4	Bachelors Degree	0_20	Millenials Gen Y.1	14-16	Recently Hired Newbie	90_100
505	E1001902	18	Female	Medical	Single	Sales	Sales Representative	Travel_Frequently	3	2	2	70	3	1	4	1	Yes	12	3	2	4	Yes	3	College Degree	0_20	Millenials Gen Y.1	11-13	Recently Hired Newbie	70_79
625	E1002117	18	Female	Medical	Single	R&D	Manager R&D	Non-Travel	14	3	2	33	3	1	3	1	No	16	3	4	1	No	3	Bachelors Degree	0_20	Millenials Gen Y.1	17-20	Recently Hired Newbie	30_39
1012	E100683	18	Male	Life Sciences	Single	Development	Developer	Travel_Rarely	3	3	3	54	3	1	3	1	No	13	3	2	3	Yes	3	Bachelors Degree	0_20	Millenials Gen Y.1	14-16	Recently Hired Newbie	50_59
1017	E100689	18	Female	Medical	Single	Sales	Sales Representative	Travel_Rarely	10	3	4	69	2	1	3	1	No	12	1	2	3	No	3	Bachelors Degree	0_20	Millenials Gen Y.1	11-13	Recently Hired Newbie	60_69
1145	E100892	18	Male	Marketing	Single	Sales	Sales Representative	Travel_Frequently	5	3	2	69	3	1	2	1	Yes	14	4	3	3	Yes	3	Bachelors Degree	0_20	Millenials Gen Y.1	14-16	Recently Hired Newbie	60_69

Oldest Employee

3 Employees
We have Sales Executive who is at Age 60 (Retirment) who is still Single and makes $84 an hour.

Youngest Employee

8 Newbies
We have 18 year old Female in R&D Department who is Manager and makes $97 an hour.

Age is Normal Distributed.
In our Datasets there are 39.6% Female poulation and 60.4% Male Population.
Overall Average Age of our population datasets is 36.92
Average Age of Male Population is 36.62
Average Age of Female Population is 37.36

9 Age_bin

Age column has been spread from Minimum Age 18 to max Age 60. Lets bin all of them into decades. By doing so we will be able to reduce the noise and outliers and be more focused on inights without being baised.

Age distribution by Gender
Age distribution by Department

A, B & C

Our datasets when we broke down by the agebin group, it seems like we have balance datasets between male and Female across all age groups.

D

In Finance we can see there is difference between Male and Female Age Group.
We can see Male Outliers in Sales and Females Outliers in Development Department.
In Data Science Department also we can see some difference between male and Female Age Group.
There is less difference in R&D and HR departement between Male and Female Age.

E

The peaks of a Density Plot help display where values are concentrated over the interval.
we can illustrate how the distribution of a Age change over time.
In sales we can see lot of Young People followed by Development and then by R & D before their mean age.

F

We can see Age and Attrition of the Employee. Seems like Normal Distribution in both who leaves the company and who stays throughout.

G

60 % Female in Sales department of Age under 20 compare to Male which is just 40%.
80% Male compare to 20% Female in Development Department of Age under 20.
Rest every Age group is distributed between 60-40.

H

80% in Development and 70% in Sales Department Employee Attrition under 20.
4 % in Development and 5% in R&D of Age 40-50 Attrition.

10 Age | Department | Gender | Job Role

A

0-20: - In Finance and Data Science there is no Employee which is under age 20. - In Human Resource there are 8% Male Employee

20-30: - 34% Female in Sales and 32% Male in R&D.

30-40: - 33 % Female in Development & 33% Male in Sales

40-50: - 40% Female in Sales & 31 % Male in Development

50-60: - 41% Female in R&D & 32% Male in Sales.

11 Generation

We can even classify all the Employeer based on their age in four different category. As of dated today 2019.

Baby Boomers: Baby boomers were born between 1944 and 1964. They’re current between 55-75 years old (76 million in U.S.)
Gen X: Gen X was born between 1965 - 1979 and are currently between 40-54 years old (82 million people in U.S.)
Gen Y: Gen Y, or Millennials, were born between 1980 and 1994. They are currently between 25-39 years old.
Gen Y.1 = 25-29 years old (31 million people in U.S.)
Gen Y.2 = 29-39 (42 million people in U.S.)
Gen Z: Gen Z is the newest generation to be named and were born between 1995 and 2015. They are currently between 4-24 years old (nearly 74 million in U.S.)

As we don’t have many Gen Z in our datasets we will combine Gen Z with Gen Y.2. Otherwise we will have barplots with couple of dots.

A

We Don’t have much data for Millenials Gen Y.1.
We can see some outliers in Millenials Gen Y.2 in both Male and Female
We can also spot outliers in Baby Boomers in Female.
Male Baby Boomers have more experienced than Female.

B

There are more outliers in Millenials Gen Y.1 in both leaving the company as well as staying.
As the prior number of companies worked experience increases we can see Employee tend to leave the company also increases than those who stay.

C

We can see the proportion of Malea and Female Employee by Generations.
More Male Millenials Gen Y.1 and Baby Boomers than Female.

D

Millenials Gen Y.1 (Age 25-29) tends to leave the company more than other Generations.

12 Martial Status

A, B, C

Single tends to leave than Married and Divorced roughly about 24%.
Almost the Ratio of Male and Female by martial Status is 60:40(M:F)
Attrition Rate in Male population is 60:40(M:F) in Single and Married whereas in Divorced it is almost 80:20(M:F)
Employee who stays beside of their Martial Status is almost 60:40(M:F)

D

No Attrition in Finance Department if person is Divorced.
No Attrition in Data Science Department if person is Married.
High Attrition if the employee is Single in Sales and Data Science Departement.

13 Experience Years In CurrentRole & Attrition

A

As you gain more experience in current Role you are less likely to leave.
If you have 10 years of experience it is less likely you will leave until and unless there is major change. As we can see there is one outlier in 14 years of experience in current role.
Percenatage of Attrition keeps on decreasing as once Employee have more experience in current role.

B

There are less Male with more than 10 years of experience in current role than Female except one exception at 18 years of experience.
Female tends to be more stable as we move up in experience except year 9 and 10 where ratio changes from 60:40(M:F) to 70:30(M:F) then as we move up ratio seems 30:70(M:F)

14 Overtime

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  1200 
## 
##  
##                 | hr_df$OverTime 
## hr_df$Attrition |        No |       Yes | Row Total | 
## ----------------|-----------|-----------|-----------|
##              No |       765 |       257 |      1022 | 
##                 |     2.640 |     6.334 |           | 
##                 |     0.749 |     0.251 |     0.852 | 
##                 |     0.903 |     0.728 |           | 
##                 |     0.637 |     0.214 |           | 
## ----------------|-----------|-----------|-----------|
##             Yes |        82 |        96 |       178 | 
##                 |    15.157 |    36.368 |           | 
##                 |     0.461 |     0.539 |     0.148 | 
##                 |     0.097 |     0.272 |           | 
##                 |     0.068 |     0.080 |           | 
## ----------------|-----------|-----------|-----------|
##    Column Total |       847 |       353 |      1200 | 
##                 |     0.706 |     0.294 |           | 
## ----------------|-----------|-----------|-----------|
## 
##

A

Equal distribution on Overtime By Gender.
28% Male and 32% Female does Overtime. Female seems do to Overtime Compare to Male.

B

54 % People do leeave the company if they are doing Overtime.In our datasets we have 96 people left compare to 82 who did stay beside doing overtime.

C

32% Female Employee seems to do Overtime compare to 28% Male Employee.

D

When we groupby by OverTime, Gender and then by Attrition we see that :
31% Male Seems to leave the company compare to 23% Female who works OverTime.

E

In Data Science Employee seems to do more OverTime compare to rest of Other Department.

F

Sales and Data Science Department have more Attrition when they work OverTime than rest of Department.

G

Female Laboratory Technician tends to leave 100% when Work OverTime but if its Male Laboratory Technician then case seems to be completly opposite.
Female Sales Represenatative tends to leave 67% of time.
There is No Female Data Scientist & Female Technical Architect who have worked OverTime.
There is No Male Business Analyst who have worked OverTime.
67% of Male Finance Manager seems to leave the company if they work OverTime.

H

Male Technical Architect only seems to leave the company even Not working Overtime.

Rest of them are splitted more or less.

15 Employee Environment Satisfaction

Employee Environment Satisfaction vs. Attrition.
Employee Environment Satisfaction vs. Attrition vs. EmpDepartment.
Employee Environment Satisfaction vs. Job Role vs. Attrition.
Age vs. Job Role vs. Attrition.
Hourly Rate vs. Job Role vs. Attrition.

A - Employee working in top 3 Department have low satisfaction score roughly around 20%. - HR Employee have satisfaction score around 3 for 41% of Employees also only 20% of them have fully score of 4 as Employee Satisfaction Score.

B - As we can see if Employee are less satisfied with the Environment that you work with. You are more prone to leave the company that to stay.

C - In Finance even if the Employee have greater Environment Satisfaction they seems to leave. So In Finance department Satisfaction Level has no role to keep or leave the employee. - In Sales, Development, R&D, HR we can see there is effect of Emp Environment Satisfaction on Attrition.

D - Any Employee who has Satisfaction Score around 2.5 or less have all left the company. - There is one exception for the Finance Manager who has Satisfaction score for 4 and have left the company.

E - We have 25% of Employee whose Satisfaction score is 1 and have left the company.

16 Employee Relationship Satisfaction

A - Only Data Science field deviate towards less than 2 on average Employee Relationship Satisfaction compare to all other Department.

B

When we break it down by Gender and Attrition although Data Science field have less score on Employee Relationship Satisfaction they haven’t left the company. Males shows more less Employee Relationship Satisfaction score than Females.
In Human Resource Male with high Employee Relationship Satisfaction Score seems to leave than Female.
In Finance Male with low Employee Relationship Satisfaction definetly leaves the company.

17 Years with Manager

##        CatYearsManager  Pro-Veteran 1-2 years Old 2-5 Years hired 5-10 Years Vet Recently Hired Newbie
## Gender                                                                                                
## Female                         2.25         10.92            7.08          12.42                  6.92
## Male                           2.50         18.08           10.75          18.08                 11.00

##                              Attrition    No   Yes
## Gender CatYearsManager                            
## Female  Pro-Veteran                     2.17  0.08
##        1-2 years Old                    9.58  1.33
##        2-5 Years hired                  6.58  0.50
##        5-10 Years Vet                  10.92  1.50
##        Recently Hired Newbie            5.08  1.83
## Male    Pro-Veteran                     2.42  0.08
##        1-2 years Old                   15.25  2.83
##        2-5 Years hired                  9.42  1.33
##        5-10 Years Vet                  16.25  1.83
##        Recently Hired Newbie            7.50  3.50

##                  EmpDepartment    DS Development Finance    HR   R&D Sales
## Gender Attrition                                                          
## Female No                       0.58       10.83    1.83  1.08  9.42 10.58
##        Yes                      0.08        1.00    0.00  0.33  1.33  2.50
## Male   No                       0.92       15.00    2.00  2.75 15.50 14.67
##        Yes                      0.08        3.25    0.25  0.33  2.33  3.33

## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.

A

If Employee have dealt mostly with current Managers for really long time they tend to stay. As year increases Attrition decreases.
On an average Employee dealing with current Managers have Average Satisfaction score of 2.5

B - We can see the density plot of Employee Leaving Average Satisfaction score and Employee staying Average Satisfaction score.The peak value defines the difference in thier Average Satisfaction score.

18 Employee Job Satisfaction

A - Employee Job Satisfaction score with OverTime is splitted 70:30 (No:Yes)

B - Low Job Satisfaction Score tends to leave the company more than than high Employee Job Satisfaction.

C - Same top 3 department have low Employee Job Satisfaction score.

D - Martial Status seems independent of Employee Job Satisfaction. Roughly equally distributed.

19 Worklife Balance

WorkLifeBalance. From Data Dictionary 1: ‘Bad’ 2: ‘Good’ 3:‘Better’ 4: ‘Best’

A

15% Employee Working in Data Science Department have worst Work Life Balance.
15% Employee Working in Human Resource Department have best Work Life Balance.
Most of the Department have Work Life Balance score of 3.

B

Most of the Employee have Better Work Life Balance at different Age group.
27% of Emploee of Age 0-20 have Work Life Balance of 4.

C

Most of the Male and Female Employee have work Life Balance Score equally splitted between all scores.

D

WorkLife Balance of Score 1 are more likely to Leave as we can see there 27% Attrition.

20 Distance from Home

A

Most of the Employee lives pretty close to Home.
Both Male and Female employee lives within 10 Miles Radius.

B

Average Median Distance for all Employee is 7.
Averageg Mean Distance for all Employee is 9.16.

C

Mean Distance for All Female Employee is 9.181
Mean Distance for All Male Employee is 9.155.

D

Distance from Home is less likely factor for Attrition as we see very less Attrition just based on Distance from Home.

E - Lot of Millenials Gen Y.2 within 10 Mile Radius. - 30% Employee lived around 2 Miles Radius.

F - Distance from Home doesn’t seems like one of the attrition features among different Generations.

G

When We break it down by the department we don’t see patterns of Attrition could be distance from Home.
Three big departments have attrition more than any other departments.

H

Except Data Science and Finance all departments employee have same pattern of living.

I

Male and Female are equally distributed how far do they live from the Workplace.

J

Most of them fall right in middle.

K

Distance from the Home seems like Independent factor for Attrition.
We can see their are some outliers based on their position but all outliers didn’t left the company just because of commuting distance.

L & M - One can make lot of Intrepretation based on Which Position Employee leave how far from the Workplace. - And Also if the Employee is Male and Female and what’s the trend look like based on Gender.

21 Business Travel

A - Seems like in each department most of the Employee do travel rarely.

B - When we break it down by Male and Female we see same proportion in all 3 groups.

C - Travel Frequently groups tend to leave the company compare to rarely and non travel.

D - Travel Rarely comprises mostly Married Person and least comprise of Divorced Employee similar is the case with Travel Frequently.

Part_2 : Click Here to get redirected to Part_2

Please continue to Notebook_part_2 for the results that we find in notebook 1 and Attrition Analysis

Notebook_Part_2

Human_Resource_Part_1

Pankaj Shah

5/29/2019