My experience of interning at a HR consulting company at Willis Towers Watson draws my attention onto the topic of Human Resources. You might think that this is my personal interest. In fact, however, Human Resources is a topic which is closely related to everyone. People, either aiming to have a better life or to furfill personal life goals, can’t help but being involved in the labor market. Human resource is a sector where can gradually enhance the efficiency of human working behavior and thus create more value to the society as well as push the global development forward at its full speed.
What do you think is the reason behind why employee quit? Will it be salary? How much could salary effect people’s motivation as well as career path in general?
Read this article from the BALANCE before we take a deeper look into the topic.
LinkedIn: Kathy Sun
Handshake: Kathy Sun
Instagram: Supermoooe
twitter: BizAnalyticsKat
And in this final project I would like to discuss more about the core of the sector–rewards and benefits and to simplify it more, I will mainly talk about salary management.
I collect most of my data from Kaggle. Following are some details for the datasets:
I will use a lot of data visualization to illustrate and compare the different sets of data. By finding the differences in the employers wage and their correlation with promotion, satisfaction and, most importantly, the employees’ termination decisions. And technical wise I will choose the packages including: ggplot2, tidyverse, dplyr and more to explore
I want to use all the visualization as stated above to compare all the data sets that I can find about employee information. I want to find the correlation between salary and termination decisions. And more importantly, I want to examine carefully about confounders among the variables in the datasets. Will that be the case that the salary effect their satisfaction level or their promotion possibility, or even their production outcomes? After researching, I hope I could finalize a guessing and return back to the article we read in the beginning. And I want to use other data sets to underscore the extra cause of hiring new people to take place the old ones and address the importance of a reasonable salary managment and only by that could firms be at their highest efficiency level and help to develop our human races and our home land–earth in the best way.
LinkedIn: Kathy Sun
Handshake: Kathy Sun
Instagram: Supermoooe
twitter: BizAnalyticsKat
Here is a list of packages used with description of their function:
tidyverse: for cleaning data and creating a tidy format
tidyr: coming with tidyverse and also for tidy data
dplyr: transforming data
knitr: displaying a chart
ggplot2: for data visualization
magrittr: allowing pipe operator to avoid repeated code
DT: for nice HTML charts output
library(tidyverse)
library(tidyr)
library(dplyr)
library(knitr)
library(ggplot2)
library(magrittr)
library(DT)
LinkedIn: Kathy Sun
Handshake: Kathy Sun
Instagram: Supermoooe
twitter: BizAnalyticsKat
The original dataset is called “Human Resources Data Set” and is collected by Dr. Rich who is a principal data architect at New England Quality Care Alliance. This dataset can be found on Kaggle.
Click here to download the original dataset.
1. HR Core Dataset This dataset contains 21 variables on 300+ employees. The variable in the core HR dataset include:
library(DT)
v_core <- read_csv("C:/Users/WFU/Desktop/temporary/R/Final Project Prep/v_core.csv")
cols(
Variable = col_character(),
Description = col_character()
)
## cols(
## Variable = col_character(),
## Description = col_character()
## )
datatable(v_core)
2. Production Staff Dataset This dataset only includes the people who are working in the production sector.It includes 15 variables on 200+ employees who are working at the production department. And the following is a list of name and description of each variables used.
library(DT)
v_prod <- read_csv("C:/Users/WFU/Desktop/temporary/R/Final Project Prep/v_prod.csv")
datatable(v_prod)
These two data sets include some of the same employees but provide different variables. So I decided to clean them and merge them together to give us a more detailed and explicit over view of the employee’s behavior and status.
Here are the few steps that I did to deal with the data:
Step One: Cleaning the datasets core and prod and keeping the necessary variables
core <- read.csv("core_dataset.csv", na.strings = c("", "NA"))
prod <- read.csv("production_staff.csv", na.strings = c("", "NA"))
Salary_Scale <- read.csv("salary_grid.csv") %>%
select ("X", "X.3") %>%
rename ( Position = "X") %>%
rename ( Median_Salary = "X.3")
Salary_Scale <- Salary_Scale[-c(1),]
Step Two: Merging the two different data sets together based on Emplyees Name and rename the redundant variables
core_clean <- core %>%
arrange (Employee.Name)
core_clean <- core_clean[ c("Employee.Name", "Reason.For.Term", "Employment.Status", "Pay.Rate", "Performance.Score", "Position")]
prod_clean <- prod %>%
arrange(Employee.Name)
prod_clean <- prod_clean[ c("Employee.Name", "Daily.Error.Rate", "X90.day.Complaints")]
Step Three: Filtering out the currently active employees and terminated employees and put them into two subsets to compare
join <- core_clean %>%
left_join(prod_clean, na.strings =c("", "NA")) %>%
select (- Employee.Name) %>%
rename(Status = "Employment.Status") %>%
rename(Salary = "Pay.Rate") %>%
rename(Score = "Performance.Score") %>%
rename(complaints = "X90.day.Complaints")
join_Active <- join %>%
filter(Status == "Active") %>%
mutate(category = "Active")
join_Terminated <- join %>%
filter(Status == "Voluntarily Terminated" | Status == "Terminated for Cause") %>%
mutate(category = "Terminated")
Clean <- rbind (join_Active, join_Terminated)
####3.4 Data Preview
datatable(Clean)
LinkedIn: Kathy Sun
Handshake: Kathy Sun
Instagram: Supermoooe
twitter: BizAnalyticsKat
Through Obeservation, We could see that there are few reasons lead to employees’ terminations.
Those reasons include:
1. “Another position” as in the person might get promoted or transitted to a different sector.
2. “attendance” as if their attendance meet with the company’s requirement.
3. “career change” as the employee changed the filed that he works in.
4. “gross misconduct” there is soome misconduction that leads to the employee’s termination.
5. “hours” the long working hour push the employees away
6. “madical issue”
7. “military”
8. “more money”
9. “no-call no-show”
10. “performance”
11. “relocation out of area”
12. “retiring”
13. “returning to school”
14. “unhappy”
And first I want to take a glance of how those reasons behind retiring distribute.
Clean%>%
filter (category == "Terminated") %>%
ggplot( aes(x= Reason.For.Term, fill = Reason.For.Term)) +
geom_bar() +
coord_polar()
From the polar boxplot we could see that the salary plays a huge factor in employee’s descition of termination.
Thus, I decide to run a boxplot of hourly salary between the people who are still employed and already terminated to capture the differences.
Clean %>%
ggplot ( aes(x = category, y = Salary, color = category)) +
geom_boxplot(fill = "white")
LinkedIn: Kathy Sun
Handshake: Kathy Sun
Instagram: Supermoooe
twitter: BizAnalyticsKat
LinkedIn: Kathy Sun
Handshake: Kathy Sun
Instagram: Supermoooe
twitter: BizAnalyticsKat