Buckle Up its a Exploratory Data Analysis
Out of curiousity I took this datasets from one of the Medium article. My small step effort to understand the HR analytics part of it.Thank you Nafee Afshin for such a beautiful blog and various R community Kagglers and Coders for breaking it down and make easy to understand various elements of Human Resource.
We all know that there is job opening in every company every now and then. Depends upon size of human resource how often they have their position open. It’s not easy for both employer and employee to find the right working conditions, satisfaction level and many other things to consider for a perfect match. Apart from finding perfect match both the epmloyer and employee need to invest both time and money to train within the companies policy and working condition, if there is new tool to learn they have to invest the time to learn. CRISP-DM stands for CRoss Industry Standard Process for Data Mining.
When we apply for the job or start working for any company we should be familiar with Human Resource Department of the organisation. In some big organization Human resource (HR) departments seems increasingly turning to analytics department by having Business Intelligence people working for them to drive company hiring and workforce management more smoothly and within budget. As with every nooks and corner of the company (e.g. marketing, finance, etc), HR departments also do require analytics solutions which could be very unique to their domain and reltional database between departments.
We might think in particular, HR departments would usually have ‘small’ datasets in relation to other departments which might not be true becuase HR department is very crucial department for growth of the company. They are more intepretaion driven than predictive based model driven.
General skills can be transfered from organisation to organisation but how to apply those tools depends upon the goal of the company and the projects everyone is involved. It all depends upon what is the motive behind working. The goal of most of the company is to utilise its manpower in right direction so that it can propel its ship smoothly. There is always bumps on the way, someone leaving the company for better future, family and better offer when someone has skills & proper talent. It is definetly not in favorable condition for the company to loose hardworking and good ethics employeer. But wouldn’t it be ideal for the company to invest time to find the reason What makes employer leave the company. If it just because of current Manager,Employeer Working Satisfaction, distance to commute, Working Environment etc. There are various factors if one need to consider what makes Employee/ Emloyeer happy. It is balance of lot of things which play critical role. I am trying to explore some of this with this small datasets . We all love Revenue turnover whether we are employee or employeer but no one likes Employee Churn over ratio or Attrition. Lets dive in and see what can we find from this datasets.
Attrition: It is basically the turnover rate of employees inside an organization.
The goal of this particular project would be find if there is any relationship between the variables and if we could find the insights for Employee Churn over the period of time. If possible I would love to do some Machine learning to predict the Churn ratio over period of time.
Just to make things little clear Human Analytics is not only gathering data about employee and increasing efficiency but also its aim is to return better ROI for the company and keep employee happy with the fulfillment of need and happier workplace enivironment. If we could better understand the patterns and take right steps we all wil be benefitting from each other.
So lets start diving into our datasets by importing necessary libraries.
Library
Lets have a glimpse of our datasets.
## Observations: 1,200
## Variables: 28
## $ EmpNumber <fct> E1001000, E1001006, E1001007, E1001…
## $ Age <int> 32, 47, 40, 41, 60, 27, 50, 28, 36,…
## $ Gender <fct> Male, Male, Male, Male, Male, Male,…
## $ EducationBackground <fct> Marketing, Marketing, Life Sciences…
## $ MaritalStatus <fct> Single, Single, Married, Divorced, …
## $ EmpDepartment <fct> Sales, Sales, Sales, Human Resource…
## $ EmpJobRole <fct> Sales Executive, Sales Executive, S…
## $ BusinessTravelFrequency <fct> Travel_Rarely, Travel_Rarely, Trave…
## $ DistanceFromHome <int> 10, 14, 5, 10, 16, 10, 8, 1, 8, 1, …
## $ EmpEducationLevel <int> 3, 4, 4, 4, 4, 2, 4, 2, 3, 3, 3, 3,…
## $ EmpEnvironmentSatisfaction <int> 4, 4, 4, 2, 1, 4, 4, 1, 1, 3, 1, 4,…
## $ EmpHourlyRate <int> 55, 42, 48, 73, 84, 32, 54, 67, 63,…
## $ EmpJobInvolvement <int> 3, 3, 2, 2, 3, 3, 3, 1, 4, 3, 1, 3,…
## $ EmpJobLevel <int> 2, 2, 3, 5, 2, 3, 1, 1, 3, 3, 1, 4,…
## $ EmpJobSatisfaction <int> 4, 1, 1, 4, 1, 1, 2, 2, 1, 3, 3, 3,…
## $ NumCompaniesWorked <int> 1, 2, 5, 3, 8, 1, 7, 7, 9, 4, 2, 9,…
## $ OverTime <fct> No, No, Yes, No, No, No, No, Yes, N…
## $ EmpLastSalaryHikePercent <int> 12, 12, 21, 15, 14, 21, 15, 13, 14,…
## $ EmpRelationshipSatisfaction <int> 4, 4, 3, 2, 4, 3, 4, 4, 1, 4, 3, 4,…
## $ TotalWorkExperienceInYears <int> 10, 20, 20, 23, 10, 9, 4, 10, 10, 1…
## $ TrainingTimesLastYear <int> 2, 2, 2, 2, 1, 4, 2, 4, 2, 4, 5, 2,…
## $ EmpWorkLifeBalance <int> 2, 3, 3, 2, 3, 2, 3, 3, 3, 4, 3, 2,…
## $ ExperienceYearsAtThisCompany <int> 10, 7, 18, 21, 2, 9, 2, 7, 8, 1, 5,…
## $ ExperienceYearsInCurrentRole <int> 7, 7, 13, 6, 2, 7, 2, 7, 7, 0, 2, 2…
## $ YearsSinceLastPromotion <int> 0, 1, 1, 12, 2, 1, 2, 3, 0, 0, 1, 1…
## $ YearsWithCurrManager <int> 8, 7, 12, 6, 2, 7, 2, 7, 5, 0, 4, 1…
## $ Attrition <fct> No, No, No, No, No, No, No, Yes, No…
## $ PerformanceRating <int> 3, 3, 4, 3, 3, 4, 3, 3, 3, 3, 3, 3,…
## [1] "Missing Data"
## $data.frame
## name size
## 1 hr_df 0.2 Mb
##
## $dimensions
## rows columns
## 1 1200 28
##
## $column.details
## column class unique.values missing.count
## 1 EmpNumber factor 1200 0
## 2 EmpHourlyRate integer 71 0
## 3 Age integer 43 0
## 4 TotalWorkExperienceInYears integer 40 0
## 5 ExperienceYearsAtThisCompany integer 37 0
## 6 DistanceFromHome integer 29 0
## 7 EmpJobRole factor 19 0
## 8 ExperienceYearsInCurrentRole integer 19 0
## 9 YearsWithCurrManager integer 18 0
## 10 YearsSinceLastPromotion integer 16 0
## 11 EmpLastSalaryHikePercent integer 15 0
## 12 NumCompaniesWorked integer 10 0
## 13 TrainingTimesLastYear integer 7 0
## 14 EducationBackground factor 6 0
## 15 EmpDepartment factor 6 0
## 16 EmpEducationLevel integer 5 0
## 17 EmpJobLevel integer 5 0
## 18 EmpEnvironmentSatisfaction integer 4 0
## 19 EmpJobInvolvement integer 4 0
## 20 EmpJobSatisfaction integer 4 0
## 21 EmpRelationshipSatisfaction integer 4 0
## 22 EmpWorkLifeBalance integer 4 0
## 23 MaritalStatus factor 3 0
## 24 BusinessTravelFrequency factor 3 0
## 25 PerformanceRating integer 3 0
## 26 Gender factor 2 0
## 27 OverTime factor 2 0
## 28 Attrition factor 2 0
## missing.pct
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
## 11 0
## 12 0
## 13 0
## 14 0
## 15 0
## 16 0
## 17 0
## 18 0
## 19 0
## 20 0
## 21 0
## 22 0
## 23 0
## 24 0
## 25 0
## 26 0
## 27 0
## 28 0
We have almost 1200 rows of data with 28 variables. Total size of our datasets is 0.2MB.
Sanity Check:
Look for any missing Rows and the graph shows no missing rows.Aww thats so sweet we don’t have any missing data at all. It seems like we have clean datasets which happens rarely in Data Analytics field and especially in People Analytics as lot of information are personal and because of privacy, confidentaility lots of data are not provided publicly.
We will analyse most of the individuals variables at first and then try to link dependent and independent variables for both machine learning and Exploratory Data Analysis.
Lets start by Exploratory Data Analysis. Lets look at the top of the table to get sense of datasets.
EmpNumber | Age | Gender | EducationBackground | MaritalStatus | EmpDepartment | EmpJobRole | BusinessTravelFrequency | DistanceFromHome | EmpEducationLevel | EmpEnvironmentSatisfaction | EmpHourlyRate | EmpJobInvolvement | EmpJobLevel | EmpJobSatisfaction | NumCompaniesWorked | OverTime | EmpLastSalaryHikePercent | EmpRelationshipSatisfaction | TotalWorkExperienceInYears | TrainingTimesLastYear | EmpWorkLifeBalance | ExperienceYearsAtThisCompany | ExperienceYearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | Attrition | PerformanceRating |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E1001000 | 32 | Male | Marketing | Single | Sales | Sales Executive | Travel_Rarely | 10 | 3 | 4 | 55 | 3 | 2 | 4 | 1 | No | 12 | 4 | 10 | 2 | 2 | 10 | 7 | 0 | 8 | No | 3 |
E1001006 | 47 | Male | Marketing | Single | Sales | Sales Executive | Travel_Rarely | 14 | 4 | 4 | 42 | 3 | 2 | 1 | 2 | No | 12 | 4 | 20 | 2 | 3 | 7 | 7 | 1 | 7 | No | 3 |
E1001007 | 40 | Male | Life Sciences | Married | Sales | Sales Executive | Travel_Frequently | 5 | 4 | 4 | 48 | 2 | 3 | 1 | 5 | Yes | 21 | 3 | 20 | 2 | 3 | 18 | 13 | 1 | 12 | No | 4 |
E1001009 | 41 | Male | Human Resources | Divorced | Human Resources | Manager | Travel_Rarely | 10 | 4 | 2 | 73 | 2 | 5 | 4 | 3 | No | 15 | 2 | 23 | 2 | 2 | 21 | 6 | 12 | 6 | No | 3 |
E1001010 | 60 | Male | Marketing | Single | Sales | Sales Executive | Travel_Rarely | 16 | 4 | 1 | 84 | 3 | 2 | 1 | 8 | No | 14 | 4 | 10 | 1 | 3 | 2 | 2 | 2 | 2 | No | 3 |
By looking at the top of table we should usually get the sense of what kind of datasets we are dealing with. What does the column and value represents. But going one further step I usually love to randomly sample my datasets to find something new. It doesnt take long time but saves much time if I could catch something fishy in the beginning of the analysis.
If I have small set of data I usually use sample_n to pass around and get sense.We could also use sample_frac to see more random fraction of data.
EmpNumber | Age | Gender | EducationBackground | MaritalStatus | EmpDepartment | EmpJobRole | BusinessTravelFrequency | DistanceFromHome | EmpEducationLevel | EmpEnvironmentSatisfaction | EmpHourlyRate | EmpJobInvolvement | EmpJobLevel | EmpJobSatisfaction | NumCompaniesWorked | OverTime | EmpLastSalaryHikePercent | EmpRelationshipSatisfaction | TotalWorkExperienceInYears | TrainingTimesLastYear | EmpWorkLifeBalance | ExperienceYearsAtThisCompany | ExperienceYearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | Attrition | PerformanceRating | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
154 | E1001281 | 47 | Female | Life Sciences | Single | Sales | Sales Executive | Travel_Rarely | 4 | 2 | 4 | 83 | 3 | 2 | 4 | 1 | Yes | 17 | 3 | 9 | 0 | 3 | 9 | 0 | 0 | 7 | No | 3 |
408 | E1001737 | 31 | Female | Life Sciences | Single | Research & Development | Manager R&D | Travel_Frequently | 1 | 5 | 3 | 100 | 4 | 3 | 2 | 1 | No | 11 | 1 | 10 | 2 | 3 | 10 | 8 | 4 | 7 | Yes | 3 |
At first glance,by running Random Sample all the datasets looks normal. Lets see how many unique value each column has so that we can get more gist of it.
If we would like to get more sense of data. I usually run the unique code in each column to know which field are interesting enough to diagnose. If any column have handful of unique values it is easy to look into without wasting lot of times.
## [1] Male Female
## Levels: Female Male
## [1] Single Married Divorced
## Levels: Divorced Married Single
## No duplicate combinations found of: EmpNumber
## [1] 1200 28
If we would like to drill down each column then we can do this by following way knowing min and max value of each column which give you sense of how spread your data is.Sometimes you can catch the outliers easily instead of plotting boxplot everytime.
EmpNumber | E1001000 | E100998 |
Age | 18 | 60 |
Gender | Female | Male |
EducationBackground | Human Resources | Technical Degree |
MaritalStatus | Divorced | Single |
EmpDepartment | Data Science | Sales |
EmpJobRole | Business Analyst | Technical Lead |
BusinessTravelFrequency | Non-Travel | Travel_Rarely |
DistanceFromHome | 1 | 29 |
EmpEducationLevel | 1 | 5 |
EmpEnvironmentSatisfaction | 1 | 4 |
EmpHourlyRate | 30 | 100 |
EmpJobInvolvement | 1 | 4 |
EmpJobLevel | 1 | 5 |
EmpJobSatisfaction | 1 | 4 |
NumCompaniesWorked | 0 | 9 |
OverTime | No | Yes |
EmpLastSalaryHikePercent | 11 | 25 |
EmpRelationshipSatisfaction | 1 | 4 |
TotalWorkExperienceInYears | 0 | 40 |
TrainingTimesLastYear | 0 | 6 |
EmpWorkLifeBalance | 1 | 4 |
ExperienceYearsAtThisCompany | 0 | 40 |
ExperienceYearsInCurrentRole | 0 | 18 |
YearsSinceLastPromotion | 0 | 15 |
YearsWithCurrManager | 0 | 17 |
Attrition | No | Yes |
PerformanceRating | 2 | 4 |
Lets Recode some of our attributes for quick analysis as well as to fit better in our graphical representation.
Var1 | Freq |
---|---|
DS | 20 |
Development | 361 |
Finance | 49 |
HR | 54 |
R&D | 343 |
Sales | 373 |
Var1 | Freq |
---|---|
Bachelors Degree | 0.3741667 |
College Degree | 0.1991667 |
Masters Degree | 0.2683333 |
No College Degree | 0.1233333 |
Phd | 0.0350000 |
Var1 | Freq |
---|---|
0_20 | 0.0183333 |
20_30 | 0.2391667 |
30_40 | 0.4250000 |
40_50 | 0.2241667 |
50_60 | 0.0933333 |
Var1 | Freq |
---|---|
Baby Boomers | 52 |
Gen Xers | 406 |
Millenials Gen Y.1 | 216 |
Millenials Gen Y.2 | 526 |
Var1 | Freq |
---|---|
11-13 | 324 |
14-16 | 422 |
17-20 | 271 |
21-23 | 131 |
24-26 | 52 |
Var1 | Freq |
---|---|
Pro-Veteran | 57 |
1-2 years Old | 348 |
2-5 Years hired | 214 |
5-10 Years Vet | 366 |
Recently Hired Newbie | 215 |
Var1 | Freq |
---|---|
30_39 | 135 |
40_49 | 183 |
50_59 | 167 |
60_69 | 166 |
70_79 | 184 |
80_89 | 168 |
90_100 | 197 |
Lets start to dig in with the Gender distribution.
EmpDepartment | m_count | pct_m | f_count | pct_f |
---|---|---|---|---|
Development | 219 | 30 | 142 | 30 |
Sales | 216 | 30 | 157 | 33 |
R&D | 214 | 30 | 129 | 27 |
HR | 37 | 5 | 17 | 4 |
Finance | 27 | 4 | 22 | 5 |
DS | 12 | 2 | 8 | 2 |
A
B
Above chart shows how Female and Male population are distributed among Department. Male population are equally distributed among the top three branches of the organisation whereas we can see slight deviation in Female department. Females seems to love the Sales department out of all brances.
Employee distributed among the Organization
Three biggest Department are: Sales, Development & R&D. Distribution are
A
B
C
D
A
B,C,D
E
$70-$79 : 29%
Only 10% Data Scientist make $90-$100 an hour.
F & G
H
I
J
PhD: - We can also Not every Job Role requires PhD so the Education background is limited according to Job Role.
Master Degree: - Senior Manager R&D have more Master degree.
Bacehlor Degree: - Almsot 53% Research Director have Bacehlor Degree.
College Degree: - Most of Data Scientist have only College Degree almost 35%
No College Degree: - 28% Sales Representative have no College Degree.
K
A
As we can see that Most of our workforce have Bachelor degree i.e 37% followed by Master degree i.e 27%. As we aspect Phd are very less comprising 4%.
B
when we break it down by Gender it seems like there are equal number of Phd candidates.42 Employee have done Phd and when we even break it down by gender it seems like we have equal number of Male and Female Employee.
C
Employee with no college degree seems to leave the company compare to rest of the education field either could be for getting higher education degreee or for lack of growth inside the organization.
D
E, F & G
52% Married Employee have PhD.
45 % Married Employee have Master Degree.
45 % Married Employee have Bachelor Degree.
1 % Silent Gen who have No college Degree.
H
0% Attrition in Finance.
9 % Attrition in Finance.
5% Attrition in Finance Department.
0% Attrition in Data Science.
0 % Attrition in Human Resource.
Lets look at the Age distribution in our datasets.
EmpNumber | Age | Gender | EducationBackground | MaritalStatus | EmpDepartment | EmpJobRole | BusinessTravelFrequency | DistanceFromHome | EmpEducationLevel | EmpEnvironmentSatisfaction | EmpHourlyRate | EmpJobInvolvement | EmpJobLevel | EmpJobSatisfaction | NumCompaniesWorked | OverTime | EmpLastSalaryHikePercent | EmpRelationshipSatisfaction | TotalWorkExperienceInYears | TrainingTimesLastYear | EmpWorkLifeBalance | ExperienceYearsAtThisCompany | ExperienceYearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | Attrition | PerformanceRating | Educational_Levels | Age_bin | Generation | Hike_Pct | CatYearsManager | hourly_bin | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | E1001010 | 60 | Male | Marketing | Single | Sales | Sales Executive | Travel_Rarely | 16 | 4 | 1 | 84 | 3 | 2 | 1 | 8 | No | 14 | 4 | 10 | 1 | 3 | 2 | 2 | 2 | 2 | No | 3 | Masters Degree | 50_60 | Baby Boomers | 14-16 | 1-2 years Old | 80_89 |
549 | E1001975 | 60 | Male | Medical | Divorced | R&D | Healthcare Representative | Travel_Rarely | 1 | 4 | 3 | 92 | 1 | 3 | 4 | 3 | No | 20 | 3 | 19 | 2 | 4 | 1 | 0 | 0 | 0 | No | 4 | Masters Degree | 50_60 | Baby Boomers | 21-23 | Recently Hired Newbie | 90_100 |
1105 | E100827 | 60 | Female | Life Sciences | Married | Development | Developer | Travel_Rarely | 7 | 3 | 1 | 41 | 3 | 5 | 1 | 5 | No | 11 | 4 | 33 | 5 | 1 | 29 | 8 | 11 | 10 | No | 3 | Bachelors Degree | 50_60 | Baby Boomers | 11-13 | 5-10 Years Vet | 40_49 |
EmpNumber | Age | Gender | EducationBackground | MaritalStatus | EmpDepartment | EmpJobRole | BusinessTravelFrequency | DistanceFromHome | EmpEducationLevel | EmpEnvironmentSatisfaction | EmpHourlyRate | EmpJobInvolvement | EmpJobLevel | EmpJobSatisfaction | NumCompaniesWorked | OverTime | EmpLastSalaryHikePercent | EmpRelationshipSatisfaction | TotalWorkExperienceInYears | TrainingTimesLastYear | EmpWorkLifeBalance | ExperienceYearsAtThisCompany | ExperienceYearsInCurrentRole | YearsSinceLastPromotion | YearsWithCurrManager | Attrition | PerformanceRating | Educational_Levels | Age_bin | Generation | Hike_Pct | CatYearsManager | hourly_bin | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
160 | E1001290 | 18 | Male | Life Sciences | Single | R&D | Research Scientist | Non-Travel | 5 | 2 | 2 | 73 | 3 | 1 | 4 | 1 | No | 15 | 4 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | No | 3 | College Degree | 0_20 | Millenials Gen Y.1 | 14-16 | Recently Hired Newbie | 70_79 |
243 | E1001434 | 18 | Male | Medical | Single | R&D | Laboratory Technician | Non-Travel | 8 | 1 | 3 | 80 | 3 | 1 | 3 | 1 | No | 12 | 4 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | Yes | 3 | No College Degree | 0_20 | Millenials Gen Y.1 | 11-13 | Recently Hired Newbie | 80_89 |
358 | E1001646 | 18 | Female | Life Sciences | Single | R&D | Manager R&D | Non-Travel | 1 | 3 | 4 | 97 | 3 | 1 | 4 | 1 | No | 15 | 3 | 0 | 5 | 4 | 0 | 0 | 0 | 0 | No | 4 | Bachelors Degree | 0_20 | Millenials Gen Y.1 | 14-16 | Recently Hired Newbie | 90_100 |
505 | E1001902 | 18 | Female | Medical | Single | Sales | Sales Representative | Travel_Frequently | 3 | 2 | 2 | 70 | 3 | 1 | 4 | 1 | Yes | 12 | 3 | 0 | 2 | 4 | 0 | 0 | 0 | 0 | Yes | 3 | College Degree | 0_20 | Millenials Gen Y.1 | 11-13 | Recently Hired Newbie | 70_79 |
625 | E1002117 | 18 | Female | Medical | Single | R&D | Manager R&D | Non-Travel | 14 | 3 | 2 | 33 | 3 | 1 | 3 | 1 | No | 16 | 3 | 0 | 4 | 1 | 0 | 0 | 0 | 0 | No | 3 | Bachelors Degree | 0_20 | Millenials Gen Y.1 | 17-20 | Recently Hired Newbie | 30_39 |
1012 | E100683 | 18 | Male | Life Sciences | Single | Development | Developer | Travel_Rarely | 3 | 3 | 3 | 54 | 3 | 1 | 3 | 1 | No | 13 | 3 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | Yes | 3 | Bachelors Degree | 0_20 | Millenials Gen Y.1 | 14-16 | Recently Hired Newbie | 50_59 |
1017 | E100689 | 18 | Female | Medical | Single | Sales | Sales Representative | Travel_Rarely | 10 | 3 | 4 | 69 | 2 | 1 | 3 | 1 | No | 12 | 1 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | No | 3 | Bachelors Degree | 0_20 | Millenials Gen Y.1 | 11-13 | Recently Hired Newbie | 60_69 |
1145 | E100892 | 18 | Male | Marketing | Single | Sales | Sales Representative | Travel_Frequently | 5 | 3 | 2 | 69 | 3 | 1 | 2 | 1 | Yes | 14 | 4 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | Yes | 3 | Bachelors Degree | 0_20 | Millenials Gen Y.1 | 14-16 | Recently Hired Newbie | 60_69 |
Oldest Employee
Youngest Employee
Age column has been spread from Minimum Age 18 to max Age 60. Lets bin all of them into decades. By doing so we will be able to reduce the noise and outliers and be more focused on inights without being baised.
A, B & C
Our datasets when we broke down by the agebin group, it seems like we have balance datasets between male and Female across all age groups.
D
E
F
G
H
A
0-20: - In Finance and Data Science there is no Employee which is under age 20. - In Human Resource there are 8% Male Employee
20-30: - 34% Female in Sales and 32% Male in R&D.
30-40: - 33 % Female in Development & 33% Male in Sales
40-50: - 40% Female in Sales & 31 % Male in Development
50-60: - 41% Female in R&D & 32% Male in Sales.
We can even classify all the Employeer based on their age in four different category. As of dated today 2019.
As we don’t have many Gen Z in our datasets we will combine Gen Z with Gen Y.2. Otherwise we will have barplots with couple of dots.
A
B
C
D
A, B, C
D
A
B
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 1200
##
##
## | hr_df$OverTime
## hr_df$Attrition | No | Yes | Row Total |
## ----------------|-----------|-----------|-----------|
## No | 765 | 257 | 1022 |
## | 2.640 | 6.334 | |
## | 0.749 | 0.251 | 0.852 |
## | 0.903 | 0.728 | |
## | 0.637 | 0.214 | |
## ----------------|-----------|-----------|-----------|
## Yes | 82 | 96 | 178 |
## | 15.157 | 36.368 | |
## | 0.461 | 0.539 | 0.148 |
## | 0.097 | 0.272 | |
## | 0.068 | 0.080 | |
## ----------------|-----------|-----------|-----------|
## Column Total | 847 | 353 | 1200 |
## | 0.706 | 0.294 | |
## ----------------|-----------|-----------|-----------|
##
##
A
B
C
D
E
F
G
H
Rest of them are splitted more or less.
A - Employee working in top 3 Department have low satisfaction score roughly around 20%. - HR Employee have satisfaction score around 3 for 41% of Employees also only 20% of them have fully score of 4 as Employee Satisfaction Score.
B - As we can see if Employee are less satisfied with the Environment that you work with. You are more prone to leave the company that to stay.
C - In Finance even if the Employee have greater Environment Satisfaction they seems to leave. So In Finance department Satisfaction Level has no role to keep or leave the employee. - In Sales, Development, R&D, HR we can see there is effect of Emp Environment Satisfaction on Attrition.
D - Any Employee who has Satisfaction Score around 2.5 or less have all left the company. - There is one exception for the Finance Manager who has Satisfaction score for 4 and have left the company.
E - We have 25% of Employee whose Satisfaction score is 1 and have left the company.
A - Only Data Science field deviate towards less than 2 on average Employee Relationship Satisfaction compare to all other Department.
B
## CatYearsManager Pro-Veteran 1-2 years Old 2-5 Years hired 5-10 Years Vet Recently Hired Newbie
## Gender
## Female 2.25 10.92 7.08 12.42 6.92
## Male 2.50 18.08 10.75 18.08 11.00
## Attrition No Yes
## Gender CatYearsManager
## Female Pro-Veteran 2.17 0.08
## 1-2 years Old 9.58 1.33
## 2-5 Years hired 6.58 0.50
## 5-10 Years Vet 10.92 1.50
## Recently Hired Newbie 5.08 1.83
## Male Pro-Veteran 2.42 0.08
## 1-2 years Old 15.25 2.83
## 2-5 Years hired 9.42 1.33
## 5-10 Years Vet 16.25 1.83
## Recently Hired Newbie 7.50 3.50
## EmpDepartment DS Development Finance HR R&D Sales
## Gender Attrition
## Female No 0.58 10.83 1.83 1.08 9.42 10.58
## Yes 0.08 1.00 0.00 0.33 1.33 2.50
## Male No 0.92 15.00 2.00 2.75 15.50 14.67
## Yes 0.08 3.25 0.25 0.33 2.33 3.33
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
A
B - We can see the density plot of Employee Leaving Average Satisfaction score and Employee staying Average Satisfaction score.The peak value defines the difference in thier Average Satisfaction score.
A - Employee Job Satisfaction score with OverTime is splitted 70:30 (No:Yes)
B - Low Job Satisfaction Score tends to leave the company more than than high Employee Job Satisfaction.
C - Same top 3 department have low Employee Job Satisfaction score.
D - Martial Status seems independent of Employee Job Satisfaction. Roughly equally distributed.
WorkLifeBalance. From Data Dictionary 1: ‘Bad’ 2: ‘Good’ 3:‘Better’ 4: ‘Best’
A
B
C
D
A
B
C
D
E - Lot of Millenials Gen Y.2 within 10 Mile Radius. - 30% Employee lived around 2 Miles Radius.
F - Distance from Home doesn’t seems like one of the attrition features among different Generations.
G
H
I
J
K
L & M - One can make lot of Intrepretation based on Which Position Employee leave how far from the Workplace. - And Also if the Employee is Male and Female and what’s the trend look like based on Gender.
A - Seems like in each department most of the Employee do travel rarely.
B - When we break it down by Male and Female we see same proportion in all 3 groups.
C - Travel Frequently groups tend to leave the company compare to rarely and non travel.
D - Travel Rarely comprises mostly Married Person and least comprise of Divorced Employee similar is the case with Travel Frequently.
Part_2 : Click Here to get redirected to Part_2