Data description: The data set is an HR Employee Performance data set.It includes 35 different variables about employees’ demographics, evaluations about the company and attrition result. It has 1470 rows of records.
Variables: Age, Attrition, BusinessTravel, DailyRate, Department, DistanceFromHome, Education, EducationField, EmployeeCount, EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate, obInvolvement, JobLevel, JobRole, JobSatisfaction, MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked, Over18, OverTime, PercentSalaryHike, PerformanceRating, RelationshipSatisfaction, StandardHours, StockOptionLevel, TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance, YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion, YearsWithCurrManager The four major objectives are: 1.Provide summary statistics about employees 2.Get understanding about how the company’s employees think about their works 3.Explore relationships between employee attributes and monthly income 4.Investigate possible factors affect attrition
Age Attrition BusinessTravel DailyRate
Min. :18.00 No :1233 Non-Travel : 150 Min. : 102.0
1st Qu.:30.00 Yes: 237 Travel_Frequently: 277 1st Qu.: 465.0
Median :36.00 Travel_Rarely :1043 Median : 802.0
Mean :36.92 Mean : 802.5
3rd Qu.:43.00 3rd Qu.:1157.0
Max. :60.00 Max. :1499.0
Department DistanceFromHome Education
Human Resources : 63 Min. : 1.000 Length:1470
Research & Development:961 1st Qu.: 2.000 Class :character
Sales :446 Median : 7.000 Mode :character
Mean : 9.193
3rd Qu.:14.000
Max. :29.000
EducationField EmployeeCount EmployeeNumber
Human Resources : 27 Min. :1 Min. : 1.0
Life Sciences :606 1st Qu.:1 1st Qu.: 491.2
Marketing :159 Median :1 Median :1020.5
Medical :464 Mean :1 Mean :1024.9
Other : 82 3rd Qu.:1 3rd Qu.:1555.8
Technical Degree:132 Max. :1 Max. :2068.0
EnvironmentSatisfaction Gender HourlyRate JobInvolvement
Length:1470 Female:588 Min. : 30.00 Length:1470
Class :character Male :882 1st Qu.: 48.00 Class :character
Mode :character Median : 66.00 Mode :character
Mean : 65.89
3rd Qu.: 83.75
Max. :100.00
JobLevel JobRole JobSatisfaction
Min. :1.000 Sales Executive :326 Length:1470
1st Qu.:1.000 Research Scientist :292 Class :character
Median :2.000 Laboratory Technician :259 Mode :character
Mean :2.064 Manufacturing Director :145
3rd Qu.:3.000 Healthcare Representative:131
Max. :5.000 Manager :102
(Other) :215
MaritalStatus MonthlyIncome MonthlyRate NumCompaniesWorked
Divorced:327 Min. : 1009 Min. : 2094 Min. :0.000
Married :673 1st Qu.: 2911 1st Qu.: 8047 1st Qu.:1.000
Single :470 Median : 4919 Median :14236 Median :2.000
Mean : 6503 Mean :14313 Mean :2.693
3rd Qu.: 8379 3rd Qu.:20462 3rd Qu.:4.000
Max. :19999 Max. :26999 Max. :9.000
Over18 OverTime PercentSalaryHike PerformanceRating
Y:1470 No :1054 Min. :11.00 Length:1470
Yes: 416 1st Qu.:12.00 Class :character
Median :14.00 Mode :character
Mean :15.21
3rd Qu.:18.00
Max. :25.00
RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears
Length:1470 Min. :80 Min. :0.0000 Min. : 0.00
Class :character 1st Qu.:80 1st Qu.:0.0000 1st Qu.: 6.00
Mode :character Median :80 Median :1.0000 Median :10.00
Mean :80 Mean :0.7939 Mean :11.28
3rd Qu.:80 3rd Qu.:1.0000 3rd Qu.:15.00
Max. :80 Max. :3.0000 Max. :40.00
TrainingTimesLastYear WorkLifeBalance YearsAtCompany
Min. :0.000 Length:1470 Min. : 0.000
1st Qu.:2.000 Class :character 1st Qu.: 3.000
Median :3.000 Mode :character Median : 5.000
Mean :2.799 Mean : 7.008
3rd Qu.:3.000 3rd Qu.: 9.000
Max. :6.000 Max. :40.000
YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.: 2.000
Median : 3.000 Median : 1.000 Median : 3.000
Mean : 4.229 Mean : 2.188 Mean : 4.123
3rd Qu.: 7.000 3rd Qu.: 3.000 3rd Qu.: 7.000
Max. :18.000 Max. :15.000 Max. :17.000
| Status | Counts |
|---|---|
| Yes | 237 |
| No | 1233 |
| Department | Count |
|---|---|
| Human Resources | 63 |
| Research & Development | 961 |
| Sales | 446 |
In the original data set provided, variables “Education”, “EnvironmentSatisfaction”, “JobInvolvement”, “JobSatisfaction”, “PerformanceRating”, “RelationshipSatisfaction”, and “WorkLifeBalance” are dummy coded using 1-5 or 1-4 scale. For better understanding and visualizing purpose, these variables are reversed back to actual categories based on the explanations given by the data provider.
To begin the analysis, it is important to get some summary statstics about the 1,470 employees.
In 1470 employee records, 237 people left the company, which accounts for 16.1% of the total population.
Q1. Which departments are these employees come from?
961 employees are from the Research & Development department, accounts for the majority of the data set. Other two departments include in the data set are Sales and Human Resources.
Q2. How employees’ incomes distributed?
Since the data set doesn’t explain how “rate” and “income”, the analysis assume the “hourly rate” and “daily rate” are equal to “hourly income” and “daily income”. It is clear that most employees who left have a relatively low monthly income level. Also, the monthly income is positive skewed, which means the most of employees have lower than $10,000 per month income level.
The distributions of hourly rate and daily rate don’t provide much valuable insights of employees. For employees who choose to leave, their hourly rate and daily rate are not significantly different from people who stay in the company.
Q3. Working Years in the company?
In thhe company, majority employees have 0-12 years of work experience. And when taking look at the work experience they gained at the company, 0-10 years is the most common time length for all three departments. Most left employees choose to leave after they work for the company 0-5 years, or after they work 0-10 years in total. There are few employees working for the company over 20 years no matter they choose to leave or stay.
Q4. What is the job satisfaction levels?
The job satisfaction increases when the age and experience of the employee increases. We see that there is a downfall in satisfaction after age of 50 and monthly income of $14,000. There is also a mixed level of satisfaction for employees in the sales department.
Q5. What are the possible factors that affect attrition?
It is interesting to notice that the two major groups who left companies are employees who have “low” and “high” job satisfaction. For employees who are highly satisfied with thier job, it is more important for HR to know what cause them left.
---
title: "PallaviSaitu_Project"
author: "Pallavi Saitu"
date: "4/23/2019"
output:
flexdashboard::flex_dashboard:
storyboard: true
source_code: embed
---
### The HR Dashboard {data-commentary-width=400}
```{r}
library(tidyverse)
library(knitr)
library(gridExtra)
library(ggpubr)
knitr::include_graphics("/Users/pallavisaitu/Downloads/HR.jpg")
```
***
Data description:
The data set is an HR Employee Performance data set.It includes 35 different variables about employees’ demographics, evaluations about the company and attrition result. It has 1470 rows of records.
Variables:
Age, Attrition, BusinessTravel, DailyRate, Department, DistanceFromHome, Education, EducationField, EmployeeCount, EmployeeNumber, EnvironmentSatisfaction, Gender, HourlyRate, obInvolvement, JobLevel, JobRole, JobSatisfaction, MaritalStatus, MonthlyIncome, MonthlyRate, NumCompaniesWorked, Over18, OverTime, PercentSalaryHike, PerformanceRating, RelationshipSatisfaction, StandardHours, StockOptionLevel, TotalWorkingYears, TrainingTimesLastYear, WorkLifeBalance, YearsAtCompany, YearsInCurrentRole, YearsSinceLastPromotion, YearsWithCurrManager
The four major objectives are:
1.Provide summary statistics about employees
2.Get understanding about how the company’s employees think about their works
3.Explore relationships between employee attributes and monthly income
4.Investigate possible factors affect attrition
### Summary Statistics
```{r}
HR <- read.csv("/Users/pallavisaitu/Downloads/WA_Fn-UseC_-HR-Employee-Attrition.csv")
names(HR)[1] <- "Age" #Rename the column name to "Age" for consistence purpose.
# Employees' education
HR$Education[HR$Education=="1"] <- "Below College"
HR$Education[HR$Education=="2"] <- "College"
HR$Education[HR$Education=="3"] <- "Bachelor"
HR$Education[HR$Education=="4"] <- "Master"
HR$Education[HR$Education=="5"] <- "Doctor"
# Employees' environment satisfaction
HR$EnvironmentSatisfaction[HR$EnvironmentSatisfaction=="1"] <- "Low"
HR$EnvironmentSatisfaction[HR$EnvironmentSatisfaction=="2"] <- "Medium"
HR$EnvironmentSatisfaction[HR$EnvironmentSatisfaction=="3"] <- "High"
HR$EnvironmentSatisfaction[HR$EnvironmentSatisfaction=="4"] <- "Very High"
# Employees' job involvement
HR$JobInvolvement[HR$JobInvolvement=="1"] <- "Low"
HR$JobInvolvement[HR$JobInvolvement=="2"] <- "Medium"
HR$JobInvolvement[HR$JobInvolvement=="3"] <- "High"
HR$JobInvolvement[HR$JobInvolvement=="4"] <- "Very High"
# Employees' job satisfaction
HR$JobSatisfaction[HR$JobSatisfaction=="1"] <- "Low"
HR$JobSatisfaction[HR$JobSatisfaction=="2"] <- "Medium"
HR$JobSatisfaction[HR$JobSatisfaction=="3"] <- "High"
HR$JobSatisfaction[HR$JobSatisfaction=="4"] <- "Very High"
# Employees' performance rating
HR$PerformanceRating[HR$PerformanceRating=="1"] <- "Low"
HR$PerformanceRating[HR$PerformanceRating=="2"] <- "Good"
HR$PerformanceRating[HR$PerformanceRating=="3"] <- "Excellent"
HR$PerformanceRating[HR$PerformanceRating=="4"] <- "Outstanding"
# Employees' relationship satisfaction
HR$RelationshipSatisfaction[HR$RelationshipSatisfaction=="1"] <- "Low"
HR$RelationshipSatisfaction[HR$RelationshipSatisfaction=="2"] <- "Medium"
HR$RelationshipSatisfaction[HR$RelationshipSatisfaction=="3"] <- "High"
HR$RelationshipSatisfaction[HR$RelationshipSatisfaction=="4"] <- "Very High"
# Employees' life balance
HR$WorkLifeBalance[HR$WorkLifeBalance=="1"] <- "Bad"
HR$WorkLifeBalance[HR$WorkLifeBalance=="2"] <- "Good"
HR$WorkLifeBalance[HR$WorkLifeBalance=="3"] <- "Better"
HR$WorkLifeBalance[HR$WorkLifeBalance=="4"] <- "Best"
summary(HR)
HR$Attrition <- factor(HR$Attrition,levels=c("Yes","No"))
attrition <- data.frame(table(HR$Attrition))
names(attrition)[1] <- "Status"
names(attrition)[2] <- "Counts"
kable(attrition)
department <-data.frame(table(HR$Department))
kable(department,col.names = c("Department","Count"))
```
***
In the original data set provided, variables "Education", "EnvironmentSatisfaction", "JobInvolvement", "JobSatisfaction", "PerformanceRating", "RelationshipSatisfaction", and "WorkLifeBalance" are dummy coded using 1-5 or 1-4 scale. For better understanding and visualizing purpose, these variables are reversed back to actual categories based on the explanations given by the data provider.
To begin the analysis, it is important to get some summary statstics about the 1,470 employees.
In 1470 employee records, 237 people left the company, which accounts for 16.1% of the total population.
### Employees by Department
```{r}
inc_1 <- ggplot(HR, aes(x = MonthlyIncome, fill = Attrition)) +
geom_histogram(position = "dodge") + labs(x="Monthly Income", y="Number of employees")
inc_2 <- ggplot(HR, aes(x = HourlyRate, fill = Attrition)) +
geom_histogram(position = "dodge") + labs(x="Hourly Rate", y="Number of employees")
inc_3 <- ggplot(HR, aes(x = DailyRate, fill = Attrition)) +
geom_histogram(position = "dodge") + labs(x="Daily Rate", y="Number of employees")
grid.arrange(inc_1,inc_2,inc_3, ncol = 2, nrow = 2, top = "Income Distribution in company", bottom = "IBM HR Analytics")
```
***
Q1. Which departments are these employees come from?
961 employees are from the Research & Development department, accounts for the majority of the data set. Other two departments include in the data set are Sales and Human Resources.
### Employee Income Distribution
```{r}
ggplot(HR) +
geom_histogram(mapping=(aes(TotalWorkingYears)),fill="skyblue",col="white",binwidth = 1) +
labs(x="Total Working Years", y="Number of employees",caption="IBM HR Analytics", title="Total Working Years") + theme(legend.position="none")
ggplot(HR, aes(x= Department, y=TotalWorkingYears, group = Department, fill = Department)) +
geom_violin() + theme(legend.position="none") +
coord_flip() +
labs(x="Department",y="Total Working Years",caption="IBM HR Analytics", title="Total Working Years by Attrition") +
facet_wrap(~ Attrition)
ggplot(HR) +
geom_histogram(mapping=(aes(YearsAtCompany)),fill="skyblue",col="white",binwidth = 1) +
labs(x="Working Years at the company", y="Number of employees",caption="IBM HR Analytics", title="Working Years at Company") + theme(legend.position="none")
ggplot(HR, aes(x= Department, y=YearsAtCompany, group = Department, fill = Department)) +
geom_violin() + theme(legend.position="none") +
coord_flip() +
labs(x="Department",y="Working Years at the company",caption="IBM HR Analytics", title="Working Years at Company by Attrition") +
facet_wrap(~ Attrition)
```
***
Q2. How employees' incomes distributed?
Since the data set doesn't explain how "rate" and "income", the analysis assume the "hourly rate" and "daily rate" are equal to "hourly income" and "daily income". It is clear that most employees who left have a relatively low monthly income level. Also, the monthly income is positive skewed, which means the most of employees have lower than $10,000 per month income level.
The distributions of hourly rate and daily rate don't provide much valuable insights of employees. For employees who choose to leave, their hourly rate and daily rate are not significantly different from people who stay in the company.
### Work Life Balance
```{r}
# Filter by department
worklife_sales <-data.frame(table(filter(HR,Department=="Sales")$WorkLifeBalance))
names(worklife_sales)[1] <- "Status"
names(worklife_sales)[2] <- "Counts"
worklife_RD <-data.frame(table(filter(HR,Department=="Research & Development")$WorkLifeBalance))
names(worklife_RD)[1] <- "Status"
names(worklife_RD)[2] <- "Counts"
worklife_HR <-data.frame(table(filter(HR,Department=="Human Resources")$WorkLifeBalance))
names(worklife_HR)[1] <- "Status"
names(worklife_HR)[2] <- "Counts"
a <- ggpie(worklife_HR,"Counts",fill="Status",color="white", label="Counts",lab.pos = "out",lab.font = "white") +
ggtitle("Human Resources") + theme(legend.position = "right")
b <- ggpie(worklife_RD,"Counts",fill="Status",color="white", label="Counts",lab.pos = "out",lab.font = "white") +
ggtitle("Research & Development") + theme(legend.position = "right")
c <- ggpie(worklife_sales,"Counts",fill="Status",color="white", label="Counts",lab.pos = "out",lab.font = "white") +
ggtitle("Sales") + theme(legend.position = "right")
grid.arrange(a,b,c,ncol=2,nrow=2,newpage = FALSE)
```
***
Q3. Working Years in the company?
In thhe company, majority employees have 0-12 years of work experience. And when taking look at the work experience they gained at the company, 0-10 years is the most common time length for all three departments. Most left employees choose to leave after they work for the company 0-5 years, or after they work 0-10 years in total. There are few employees working for the company over 20 years no matter they choose to leave or stay.
### Job Satisfaction
```{r}
HR$JobSatisfaction <- factor(HR$JobSatisfaction,
levels = c("Low", "Medium","High","Very High"))
ggplot(HR,aes(Age,MonthlyIncome)) + geom_point(aes(color=Department)) +
geom_smooth(col="black", se=FALSE,method="loess")+ facet_grid(.~Department) +
labs(title="Age and Monthly Income", x="Age", y="Monthly Income")
```
***
Q4. What is the job satisfaction levels?
The job satisfaction increases when the age and experience of the employee increases. We see that there is a downfall in satisfaction after age of 50 and monthly income of $14,000. There is also a mixed level of satisfaction for employees in the sales department.
### Possible factors affect attrition
```{r}
ggplot(data=HR)+
geom_bar(position="dodge",mapping=aes(JobSatisfaction,fill=Attrition)) + labs(title="Job Satisfaction and Attrition", x="Job Satisfaction", y="Number of employees", caption="IBM HR Analytics")
```
***
Q5. What are the possible factors that affect attrition?
It is interesting to notice that the two major groups who left companies are employees who have "low" and "high" job satisfaction. For employees who are highly satisfied with thier job, it is more important for HR to know what cause them left.