Student Information

Name: Jing-Ting, Huang G Number: G01226692

Introduction

This semester we will be working with a data set from the field of Human Resources Analytics.

Broadly speaking, this field is concerned with using employee data within a company to optimize objectives such as employee satisfaction, productivity, project management, and most commonly, avoiding employee attrition.

Ideally, companies would like to keep attrition rates (the proportion of employees leaving a company for other opportunities) as low as possible due to the variable costs and business disruptions that come with having to replace productive employees on short notice.

The objective of this project is to perform an exploratory data analysis on the employee_data data set to uncover potential solutions for minimizing employee attrition rates.

Employee Attrition Data

The employee_data data frame is loaded below and consists of 1,470 employee records for a U.S. based product company. The rows in this data frame represent the attributes of an employee at this company across the variables listed in the table below.

Note: If you have not installed the tidyverse package, please do so by going to the Packages tab in the lower right section of RStudio, select the Install button and type tidyverse into the prompt.

library(tidyverse)

employee_data <- read_rds(url('https://gmubusinessanalytics.netlify.app/data/employee_data.rds'))

Raw Data

employee_data
# A tibble: 1,470 x 13
   left_company department       job_level   salary weekly_hours business_travel
   <fct>        <fct>            <fct>        <dbl>        <dbl> <fct>          
 1 Yes          Sales            Director    1.19e5           56 Rarely         
 2 No           Sales            Senior Man~ 8.56e4           42 Frequently     
 3 Yes          Product Develop~ Associate   4.62e4           56 Rarely         
 4 No           IT and Analytics Director    1.17e5           50 Frequently     
 5 No           Sales            Associate   3.66e4           46 Rarely         
 6 No           Marketing        Senior Man~ 8.35e4           48 Frequently     
 7 No           Marketing        Senior Man~ 8.86e4           44 Rarely         
 8 No           Sales            Director    1.22e5           47 Rarely         
 9 No           Finance and Ope~ Senior Man~ 9.46e4           50 Frequently     
10 No           Product Develop~ Director    1.25e5           51 Rarely         
# ... with 1,460 more rows, and 7 more variables: yrs_at_company <int>,
#   yrs_since_promotion <int>, previous_companies <dbl>,
#   job_satisfaction <fct>, performance_rating <fct>, marital_status <fct>,
#   miles_from_home <int>



Exploratory Data Analysis

Executives at this company have hired you as a data science consultant to identify the factors that lead to employees leaving their company.

They would like for you to explore why employees are leaving their company and make recommendations on how to minimize this behavior.

You must think of at least 8 relevant questions that explore the relationship between left_company and the other variables in the employee_data data frame.

The goal of your analysis should be discovering which variables drive the differences between employees who do and do not leave the company.

You must answer each question and provide supporting data summaries with either a summary data frame (using dplyr/tidyr) or a plot (using ggplot) or both.

In total, you must have a minimum of 5 plots and 4 summary data frames for the exploratory data analysis section. Among the plots you produce, you must have at least 4 different types (ex. box plot, bar chart, histogram, heat map, etc…)

Each question must be answered with supporting evidence from your tables and plots. See the example question below.

Sample Question

Is there a relationship between employees leaving the company and their current salary?

Answer: Yes, the data indicates that employees who leave the company tend to have lower salaries when compared to employees who do not. Among the 237 employees that left the company, the average salary was $76,625. This is over $20,000 less than the average salary of employees who did not leave the company.

Among the employees who did not leave the company, only 10% have a salary that is less than or equal to $60,000. When looking at employees who did leave the company, this increase to 34%.

Summary Table

employee_data %>% group_by(left_company) %>% 
                  summarise(n_employees = n(),
                            min_salary = min(salary),
                            avg_salary = mean(salary),
                            max_salary = max(salary),
                            sd_salary = sd(salary),
                            pct_less_60k = mean(salary <= 60000))
# A tibble: 2 x 7
  left_company n_employees min_salary avg_salary max_salary sd_salary
* <fct>              <int>      <dbl>      <dbl>      <dbl>     <dbl>
1 No                  1233     29849.     97431.    212135.    36470.
2 Yes                  237     30488.     76626.    211621.    38567.
# ... with 1 more variable: pct_less_60k <dbl>

Data Visulatization

ggplot(data = employee_data, aes(x = salary, fill = left_company)) + 
   geom_histogram(aes(y = ..density..), color = "white", bins = 20) +
   facet_wrap(~ left_company, nrow = 2) +
   labs(title = "Employee Salary Distribution by Status (Left the Comapny - Yes/No)",
           x = "Salary (US Dollars", y = "Proportion of Employees")

Question 1

Question: Is there a relationship between employees leaving the company and their job level?

Answer: Yes, there a relationship between employees leaving the company and their position. We can tell from the Leaving-position chart that employees in basic positions like associate and manager have a higher chance left the company. On the other side, employees in higher positions have a higher chance to stay in the company especially there is a significant gap between manager and senior manager.

Summary Table

To add additional R code chunks for your work, select Insert then R from the top of this notebook file.

Q1 <- employee_data %>% group_by(left_company,job_level)%>% select(left_company,job_level) %>% summarise(total = n())
Q1                      
# A tibble: 10 x 3
# Groups:   left_company [2]
   left_company job_level      total
   <fct>        <fct>          <int>
 1 No           Associate        113
 2 No           Manager          251
 3 No           Senior Manager   447
 4 No           Director         303
 5 No           Vice President   119
 6 Yes          Associate         72
 7 Yes          Manager           93
 8 Yes          Senior Manager    29
 9 Yes          Director          28
10 Yes          Vice President    15
### Data Visulatization

ggplot(data = Q1, aes (x = job_level, fill = left_company)) +  
    geom_histogram(stat= 'identity', aes (y =  total), color = "blue" ) +
    facet_wrap(~ left_company, nrow = 2) +
    labs(title = "Employee Salary Distribution by Position (Left the Comapny - Yes/No)",
           x = "Position", y = "Proportion of Employees")

Question 2

Question: Is there a relationship between employees leaving the company and their weekly hours?

Answer: Yes, there a relationship between employees leaving the company and weekly hours. We can tell from the table that there is a gap in average weekly hours between the employees who left the company or not. Compared with employees who left the company, the average working hours of employees who stayed in the company was 10.24 hours less.

Summary Table

Q2 <- employee_data %>% group_by(left_company) %>% 
                  summarise(n_employees = n(),
                            min_weekly_hours = min(weekly_hours),
                            avg_weekly_hours = mean(weekly_hours),
                            max_weekly_hours = max(weekly_hours),
                            sd_weekly_hours = sd(weekly_hours)) 


Q2
# A tibble: 2 x 6
  left_company n_employees min_weekly_hours avg_weekly_hours max_weekly_hours
* <fct>              <int>            <dbl>            <dbl>            <dbl>
1 No                  1233               40             48.4               58
2 Yes                  237               51             58.6               66
# ... with 1 more variable: sd_weekly_hours <dbl>
### Data Visulatization
   
ggplot(data = employee_data, aes(x=weekly_hours, fill = left_company))+
  geom_histogram(color = "blue", bins = 30)+
  facet_wrap(~ left_company, nrow = 2) +
   labs(title = "Employee Weekly Hours Distribution by Weekly Hours (Left the Comapny - Yes/No)",
           x = "Weekly Hours", y = "Proportion of Employees")

Question 3

Question: Is there a relationship between employees leaving the company and their weekly hours?

                        sd_weekly_hours = sd(weekly_hours))

Answer: Working distance is related to the left rate in most occupations but not significantly. Among these occupations, employees who are in sales and product development have the most impact from the distance between company and home. We can find that sales and product development in the third quartile have a clear willingness to leave their company. Also, We can find that the left rate of researchers is more concentrated because of distance. Compared with researchers who are unwilling to leave, those who leave have a significantly longer working distance so that they tend to leave.

Summary Table

Q3 <- employee_data %>% group_by(left_company,department)%>% select(left_company,department, miles_from_home) %>% summarise( quantile1_miles_from_home = quantile(miles_from_home, 1 / 4), mean_miles_from_home = mean(miles_from_home), quantile3_miles_from_home = quantile(miles_from_home, 3 / 4)) %>% arrange(department,desc(left_company)) Q3

Data Visulatization

ggplot(data = employee_data, aes(x=miles_from_home, fill = left_company))+
  geom_boxplot(aes (y =  department),color = "blue", bins = 30) +
   labs(title = "Employee's Distance To Company Distribution Divide By Department(Left the Comapny - Yes/No)",
           x = "Distence", y = "Department")+
            theme(plot.title = element_text(size=9))

Question 4

Question: Whether marital status and salary will be factors in leaving?

Answer: From the graph shown below, we can observe that employees do not have obvious resignation due to marital status, but have obvious requirements for salary. Whether single and married employees resign due to wages are equally distributed.However, even if divorced employees have a higher salary, they still might leave the company. It can be inferred that for them, salary is not the main reason for their decision to leave. For them, salary is not what they are pursuing at this stage of marriage.

Q4 <- employee_data %>% group_by(left_company,marital_status)%>% select(left_company,marital_status,salary) %>% 
      summarise( quantile1_salarye = quantile(salary, 1 / 4),
                 mean_salary = mean(salary),
                 quantile3_salary = quantile(salary, 3 / 4)) %>%
                      arrange(marital_status,desc(left_company))

ggplot(data = employee_data, aes(x=salary, fill = left_company))+
  geom_boxplot(aes (y =  marital_status),color = "blue", bins = 30) +
   labs(title = "Employee's Salary To Company Distribution Divide By Marital_Status(Left the Comapny - Yes/No)",
           x = "Distence", y = "Department")+
            theme(plot.title = element_text(size=9))

Question 5

Question: Is there a relationship between department and salary? Does this relationship affect the employee’s willingness to leave their company?

Answer: We can find that the salary is related to the willingness to leave. The higher the salary, the lower the willingness to leave. But different occupations have different sensitivity to salary. We can find that most of the researchers will not leave because of the salary level, especially after the salary increases. In contrast, Sales attaches great importance to salary. Compared with other occupations, the lower the salary, the higher the turnover rate of sales.

Q5 <- employee_data %>% group_by(left_company, department)%>% select(department,salary) %>% 
      summarise( quantile1_salarye = quantile(salary, 1 / 4),
                 mean_salary = mean(salary),
                 quantile3_salary = quantile(salary, 3 / 4)) %>%
                      arrange(department,desc(left_company))

ggplot(data = employee_data, aes(x= department, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Employee's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Department", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

Question 6

Question: Is there a relationship between job_satisfaction and salary? Does the relationship affect the employee’s willingness to leave their company?

Answer: From this chart, we can find that employees who are satisfied with their work are less willing to leave. And the higher the salary and the more satisfied employees are, the less willing they are to leave. But the interesting thing is that we can find that not all employees in low-paying and unsatisfactory jobs want to change jobs. Most of them are still willing to stay in their original jobs. Finding these employees and understanding why they are unwilling to leave can provide valuable information to the HR department.

ggplot(data = employee_data, aes(x= job_satisfaction, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Employee's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Department", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

Question 7

Question: Is there a relationship between yrs_at_company and salary? Does the relationship affect the employee’s willingness to leave their company?

Answer: From this chart, we can find that there is no direct relationship between salary and working hours, perhaps because of different occupations. But we can see that in the lower-left corner of this chart, employees with low salaries and low working years have a higher chance of leaving, but if the salary increases, you can find that even newcomers will increase their willingness to retain value because of their salary. Therefore, I think the HR department should screen for talents and raise the salary as soon as possible when there are a lot of newcomers.

ggplot(data = employee_data, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Employee's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

Question 8

Question: The previous question stated that working years is not directly related to the turnover rate. What occupations’ salary and yrs_at_company has a relationship and affects employee’s willingness to leave.

Answer: We can find that sales and marketing employees will change their willingness to quit due to salary and working year. The higher the salary and working years, the lower their willingness to quit. But the opposite. Researchers and IT employees seldom want to leave due to time and salary. I think researchers and IT employees tend to be more stable, so they are less willing to take risks to find new jobs. The Product Development employees’ quit willingness have no relationship with salary and working years. We can find that regardless of the year and salary, product development employees tend to quit. The turnover rate of Finance and Operations is relatively higher than other occupations, but their leave willingness is greatly reduced after ten years of work. Therefore, if financial talents can be retained for company in the first 10 years, there is a high chance that they will continue to work for you.

employee_data_Sales <- employee_data %>% filter(employee_data$department == 'Sales')

ggplot(data = employee_data_Sales, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Sales's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

employee_data_Researchs <- employee_data %>% filter(employee_data$department == 'Research')

ggplot(data = employee_data_Researchs, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Research's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

employee_data_Marketing <- employee_data %>% filter(employee_data$department == 'Marketing')

ggplot(data = employee_data_Marketing, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Marketing's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

employee_data_Product_Development <- employee_data %>% filter(employee_data$department == 'Product Development')

ggplot(data = employee_data_Product_Development, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Product Development's Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

employee_data_IT_and_Analytics <- employee_data %>% filter(employee_data$department == 'IT and Analytics')

ggplot(data = employee_data_IT_and_Analytics, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "IT and Analytics' Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

employee_data_Finance_and_Operations <- employee_data %>% filter(employee_data$department == 'Finance and Operations')

ggplot(data = employee_data_Finance_and_Operations, aes(x= yrs_at_company, color = left_company))+
  geom_point(position=position_jitterdodge(),alpha=.6,aes (y =  salary), bins = 30) +
   labs(title = "Finance_and_Operations' Salary To Company Distribution Divide By Years in Company(Left the Comapny - Yes/No)",
           x = "Years at company", y = "Salary")+
            theme(plot.title = element_text(size=9),
                   axis.text.x = element_text(size=6) )

Summary of Results

Write an executive summary of your overall findings and recommendations to the executives at this company. Think of this section as your closing remarks of a presentation, where you summarize your key findings and make recommendations to improve HR processes at the company.

Your executive summary must be written in a professional tone, with minimal grammatical errors, and should include the following sections:

  1. An introduction where you explain the business problem and goals of your data analysis

    • What problem(s) is this company trying to solve? Why are they important to their future success?

    • What was the goal of your analysis? What questions were you trying to answer and why do they matter?

  2. Highlights and key findings from your Exploratory Data Analysis section

    • What were the interesting findings from your analysis and why are they important for the business?

    • This section is meant to establish the need for your recommendations in the following section

  3. Your recommendations to the company on how to reduce employee attrition rates

    • Each recommendation must be supported by your data analysis results

    • You must clearly explain why you are making each recommendation and which results from your data analysis support this recommendation

    • You must also describe the potential business impact of your recommendation:

      • Why is this a good recommendation?

      • What benefits will the business achieve?

Executive Summary

Please write your executive summary below. If you prefer, you can type your summary in a text editor, such as Microsoft Word, and paste your final text here. 1. An introduction where you explain the business problem and goals of your data analysis

Many companies are faced to talent issues and it spend a lot of cost, such as money and time, to recruit talents frequently. It is possible to reduce training and recruitment costs by solving the employee turnover problem. Therefore, my goal is to use different aspects to accurately solve the problem of employees leaving.

  1. Highlights and key findings from your Exploratory Data Analysis section

I found in Exploratory Data that I cannot use only one factor to investigate data at a time. We need to observe the data in multiple ways a the same time. For instance, It cost a lot if we raise the salary after we find that there is a relationship between salary and intention to leave. And this method does not make all employees want to stay in the company if they want to leave. Besides, during the investigation, I also discovered that many unexpected situations occurred. for instance, divorced employees will not be effect by salary if they want the quit. These outcomes can not be found if we just use 1 factor to predict whether an employee is leaving or not.

  1. Your recommendations to the company on how to reduce employee attrition rates

(1).Different job levels have different willingness to quit. We can also prove from the year at the company that the longer the employee’s year at the company, the lower the willingness to quit. Especially regardless of occupation. After 10 years of work, almost no employee wants to leave.

(2).Working hours per week is an important factor affecting employee turnover. According to the dataset, The average weekly working hours of employees willing to stay is 48.366 hours, which is 10 hours less than the employee who wants to quit. Thus, I suggest the company can set weekly working hours less than 51 hours, which is the second quartile of employee and mean weekly hours of an employee who wants to leave.

(3).When hiring employees, the scope of recruiting employees should be selected according to employees of different occupations.

(4).When employees want to leave, we should adjust our strategy to face them according to different employees. (a). All employees reduce their willingness to leave after 10 years of work, so the first 10 years are the key time for employees to leave. (b). When marketing staff and sales staff want to leave, raising their salary is the most direct and effective for them. (c). Researchers and IT employees have low quit willingness. But they can not be affected by salary but will reduce the willingness to leave over time. The sooner they settle down, the better for the company (d). Finance, Operations, product development employees will not reduce their willingness to leave before working for ten years. They will not be affected by time and money. Thus, I suggest creating a working environment that can build employee satisfaction.