My experience of interning at a HR consulting company at Willis Towers Watson draws my attention onto the topic of Human Resources. You might think that this is my personal interest. In fact, however, Human Resources is a topic which is closely related to everyone. People, either aiming to have a better life or to furfill personal life goals, can’t help but being involved in the labor market. Human resource is a sector where can gradually enhance the efficiency of human working behavior and thus create more value to the society as well as push the global development forward at its full speed.

*What do you think is the reason behind why employee quit? Will it be salary? How much could salary effect people’s motivation as well as career path in general?
Read this article from the balance before we take a deeper look into the topic.

Object of the Project

And in this final project I would like to discuss more about the core of the sector–rewards and benefits and to simplify it more, I will mainly talk about salary management.


Details of the Datasets

I collect most of my data from Kaggle. Following are some details for the datasets:

  • Kaggle
    • Human Reources Core Data Set
      (This is the core HR data set.)
      • Year: 2017
      • Varibles: Employee Name, Employee Number, State, Zip, DOB, Age, Sex, MaritalDesc, CitizenDesc, Hispanic/Latino, RaceDesc, Date of Hire, Date of Termination, Reason For Term, Employment Status, Department, Position, Pay Rate, Manager Name, Employee Source, Performance Score.
    • Human Reources Production Staff Data Set
      (This is the production staff data set, complete with information about productivity, performance score, etc. It would be interesting to see the relationships between productivity and their performance score. You would think higher productivity would indicate a better overall performance. Is this necessarily the case?)
      • Year: 2017
      • Variable: Employee Name, Race Desc, Date of Hire, TermDate, Reason for Term, Employment Status, Department, Position, Pay, Manager Name, Performance Score, Abutments/Hour Wk 1, Abutments/Hour Wk 2, Daily Error Rate, 90-day Complaints.
    • Human Resources Analytics
      (Why are our best and most experienced employees leaving prematurely?)
      • Year: 2015
      • Variable: Satisfaction Level, Last evaluation, Number of projects, Average monthly hours, spent at the company, Whether they have had a work accident, Whether they have had a promotion in the last 5 years, Departments,Salary,Whether the employee has left.

Methodology of Data Analyzing

I will use a lot of data visualization to illustrate and compare the different sets of data. By finding the differences in the employers wage and their correlation with promotion, satisfaction and, most importantly, the employees’ termination decisions. And technical wise I will choose the packages including: ggplot2, tidyverse, dplyr and more to explore


Projections of the project

I want to use all the visualization as stated above to compare all the data sets that I can find about employee information. I want to find the correlation between salary and termination decisions. And more importantly, I want to examine carefully about confounders among the variables in the datasets. Will that be the case that the salary effect their satisfaction level or their promotion possibility, or even their production outcomes? After researching, I hope I could finalize a guessing and return back to the article we read in the beginning. And I want to use other data sets to underscore the extra cause of hiring new people to take place the old ones and address the importance of a reasonable salary managment and only by that could firms be at their highest efficiency level and help to develop our human races and our home land–earth in the best way.







Contact me via:

LinkedIn: Kathy Sun

Handshake: Kathy Sun

Instagram: Supermoooe

twitter: BizAnalyticsKat