Advanced Data Analysis 1

Author

Shane Penrose

library(tidyverse)
library(knitr)
employee <- read_csv("employee.csv")

Introduction

In recent times we have seen employee attrition consistently giving us challenges as a company. As discussed in our last meeting, I was given the task of analyzing our in-house data set regarding our employee’s attrition. Over the past three weeks, I’ve spent a considerable amount of time searching for the reason why our past employees potentially left our company. I believe the following insights will give us the tools to put a plan in place and lower the attrition rates of our company;

  • Relationship Between Satisfaction Levels and Attrition.
  • Impact of Career Growth and Tenure on Attrition.
  • Salary and Promotion’s Role in Retention.
  • Job Role and Departmental Differences in Attrition Rate.
  • Effect of Commute Distance and Business Travel on Attrition.
  • Impact of Job Involvement and Performance Ratings on Attrition.
  • Effects of Training Opportunities on Employee Retention.
  • Attrition by Education and Field of Study.

Relationship Between Satisfaction Levels and Attrition.

employee_long <- employee %>%
  pivot_longer(
    cols = c(JobSatisfaction, EnvironmentSatisfaction, WorkLifeBalance),
    names_to = "SatisfactionType",
    values_to = "SatisfactionLevel"
  )

combined_plot <- employee_long %>%
  ggplot(aes(x = Attrition, y = as.numeric(SatisfactionLevel), fill = Attrition)) +
  geom_boxplot() +
  facet_wrap(~ SatisfactionType, scales = "free_y") +
  labs(
    title = "Relationship Between Satisfaction Levels and Attrition",
    x = "Attrition",
    y = "Satisfaction Level"
  ) +
  theme_light()


print(combined_plot)

Explanation

When analyzing the relationship or correlation between attrition and satisfaction levels, we chose to refer to three elements of satisfaction. These included “environmental satisfaction”, “Job satisfaction” and “work life balance”.

As we can see from our combined box plots, the main culprits for attrition fall under our employees work environment and their job satisfaction. This potentially means our employees don’t feel comfortable in the environment we have created either stemming from a managerial position,from some of the staff we have currently have working for us or from the physical environment of the offices.

Attrition related to job satisfaction could mean that the workload is far too stressful or boring. It could also indicate a lack of interest in the line of work. From the box plot, we can see that the satisfaction rate wasn’t given a 4.This indicates that even employees still working for the company believe the workload is only mediocre.

If we put both of these metrics beside each other, we can potentially pull the insight that attrition is caused by a non desirable workload being given to employees that don’t feel comfortable in the office.

Impact of Career Growth and Tenure on Attrition

employee_long <- employee %>%
  pivot_longer(
    cols = c(YearsAtCompany, YearsSinceLastPromotion, YearsInCurrentRole),
    names_to = "CareerMetric",
    values_to = "Years"
  )

combined_plot <- employee_long %>%
  ggplot(aes(x = Attrition, y = Years, fill = Attrition)) +
  geom_boxplot() +
  facet_wrap(~ CareerMetric, scales = "free_y") +
  labs(
    title = "Impact of Career Growth and Tenure on Attrition",
    x = "Attrition",
    y = "Years"
  ) +
  theme_light()

print(combined_plot)

Explanation

Career growth is something that employees need to feel valued within a company. Again, we have used a combined box plot in order to outline these pieces of output next to each other. The metrics we decided to use are “years at the company”, “years in their current role”, and “years since their last promotion”.

The first metric we can see is that employees typically leave after 2-3 years with PRL.

The second metric is that employees typically leave PRL after 2 years when they are continuously working in the same role.

Lastly, we see that employees typically leave us if they don’t get a promotion before the two year mark.

This means we have to start looking at methods of offering incentives to our employees before the 2 year mark arrives in order to decrease our staff turnover. These could include better educations for our staff, and giving more responsibility to our employees to lift their morale in PRL. Doing this should allow us to give our newer employees a path in which they can grow, and have some element of control or responsibility within our company.

Salary and Promotion’s Role in Retention.

promotion_salary_summary <- employee %>%
  group_by(Attrition) %>%
  summarize(
    avg_salary_hike = mean(PercentSalaryHike, na.rm = TRUE),
    avg_years_since_promotion = mean(YearsSinceLastPromotion, na.rm = TRUE)
  ) %>%
  pivot_longer(
    cols = c(avg_salary_hike, avg_years_since_promotion),
    names_to = "Metric",
    values_to = "AverageValue"
  )

promotion_salary_plot <- promotion_salary_summary %>%
  ggplot(aes(x = Attrition, y = AverageValue, fill = Attrition)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ Metric, scales = "free_y") +
  labs(
    title = "Impact of Salary and Promotion on Employee Retention",
    x = "Attrition",
    y = "Average Value"
  ) +
  theme_light()

print(promotion_salary_plot)

Explanation

This bar chart outlines how salary increases and the timing of promotions within PRL impact attrition and staff retention.

Employees that left us appeared to have lower salary, while simultaneously having a longer time frame in between promotions.

Yet again, this could potentially indicate a lack of career growth for our staff inside and outside the workplace. With the cost of living continuously increasing, our staff members have the necessity for greater wages, but this only comes with increasing their value to the company.

This indicates yet again that we need to work harder as a company to create a pathway for our staff members to grow and take on more responsibility within the workplace.

Job Role and Departmental Differences in Attrition Rate.

attrition_table <- employee %>%
  group_by(Department, JobRole) %>%
  summarize(
    TotalEmployees = n(),
    AttritionCount = sum(Attrition == "Yes"),
    AttritionRate = (AttritionCount / TotalEmployees) * 100
  ) %>%
  arrange(desc(AttritionRate)) %>%
  ungroup()
`summarise()` has grouped output by 'Department'. You can override using the
`.groups` argument.
attrition_table %>%
  kable(
    col.names = c("Department", "Job Role", "Total Employees", "Attrition Count", "Attrition Rate (%)"),
    caption = "Attrition Rate by Department and Job Role",
    digits = 2
  )
Attrition Rate by Department and Job Role
Department Job Role Total Employees Attrition Count Attrition Rate (%)
Sales Sales Representative 83 33 39.76
Research & Development Laboratory Technician 259 62 23.94
Human Resources Human Resources 52 12 23.08
Sales Sales Executive 326 57 17.48
Research & Development Research Scientist 292 47 16.10
Research & Development Manufacturing Director 145 10 6.90
Research & Development Healthcare Representative 131 9 6.87
Research & Development Manager 54 3 5.56
Sales Manager 37 2 5.41
Research & Development Research Director 80 2 2.50
Human Resources Manager 11 0 0.00

Explanation

While its hard to estimate what a good level of attrition is, it’s said that a company should be aiming for around 10%. (Insight Global,2024).

From the table we can see that our attrition rate is exceeding this in nearly half the roles in PRL. An interesting factor when looking at this attrition rate is that the department doesn’t really have an effect on the persons attrition, more so the job role of the employee within this department.

For example, Within the sales department, under half of the sales representatives left the company but a much smaller portion from both the sales managers, and executives left the company.

This pattern follows across all departments and roles, which fundamentally makes sense as their are less employees in these higher positions, but the levels of attrition for the entry level job roles are disproportionate and presenting a big issue for us.

Yet again, we’re seeing the trend that newer employees joining our team aren’t satisfied with either the lack of promotions, and career advancement in PRL, or the conditions of being in these entry level roles.

Effect of Commute Distance and Business Travel on Attrition.

commute_travel_summary <- employee %>%
  group_by(Attrition, BusinessTravel) %>%
  summarize(
    avg_commute_distance = mean(DistanceFromHome, na.rm = TRUE),
    count = n()
  ) %>%
  mutate(
    attrition_rate = (count / sum(count)) * 100
  ) %>%
  ungroup()
`summarise()` has grouped output by 'Attrition'. You can override using the
`.groups` argument.
commute_travel_summary %>%
  ggplot(aes(x = BusinessTravel, y = avg_commute_distance, fill = Attrition)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "Effect of Commute Distance and Business Travel on Attrition",
    x = "Business Travel Frequency",
    y = "Average Commute Distance (km)"
  ) +
  theme_minimal()

Explanation

This chart displays commute distance and business travel frequency simultaneously. The findings from this comparison are the following.

Employees that are traveling up to 15km or more are more likely to leave.

Employees that don’t travel for business are more likely to leave us.

Employees that travel for business in some degree have lower levels of attrition.

The level of attrition between employees that travel often, and rarely are minimal.

Knowing this allows gives us capacity to make an informed decision about how we can make commuting easier, and allowing business travel to be available to more employees. Moving from a traditional work office to hybrid office is something that will limit the amount of time our employees spend commuting to work.

Delegating business travel to a greater mix of employees is potentially a good incentive we could offer our employees in order to reduce attrition levels.

Impact of Job Involvement and Performance Ratings on Attrition.

heatmap_data <- employee %>%
  group_by(JobInvolvement, PerformanceRating, Attrition) %>%
  summarize(
    count = n()
  ) %>%
  ungroup() %>%
  group_by(JobInvolvement, PerformanceRating) %>%
  mutate(
  total = sum(count),
  attrition_rate = (count / total) * 100
  ) %>%
  filter(Attrition == "Yes")
`summarise()` has grouped output by 'JobInvolvement', 'PerformanceRating'. You
can override using the `.groups` argument.
heatmap_data %>%
  ggplot(aes(x = JobInvolvement, y = PerformanceRating, fill = attrition_rate)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "#F76", high = "#00BFC4", name = "Attrition Rate (%)") +
  labs(
    title = "Attrition Rate by Job Involvement and Performance Rating",
    x = "Job Involvement",
    y = "Performance Rating"
  ) +
  theme_light()

Explanation

Job involvement and performance rating is being displayed using a heat map.

Employees with a lower job involvement increases the likelihood of them leaving the company. Again, this might be correlated with a lack of responsibility or control within certain projects is decreasing their personal morale as they feel their abilities aren’t being seen by managers or team members.

Employees with higher performance ratings aren’t as likely to leave the company. This could be a result of feeling valued by the managers and peers as they have consistently performed well over the last year.

We see the greatest level of attrition where employees have poorly related scores paired with a lack of job involvement. This isn’t surprising as managers cant be expected to give greater responsibility to employees that rent performing well.

Something we can take from this finding is that employees that potentially have lower performance scores need to be assessed whether or not they’re capable of doing the job we need. If they have potential to improve, we need to assist them in figuring out why they’re performing poorly and help them where we can. This might come in the shape of better educations or more tailored work shops for our employees.

Effects of Training Opportunities on Employee Retention.

training_summary <- employee %>%
  group_by(Attrition) %>%
  summarize(
    avg_training_sessions = mean(TrainingTimesLastYear, na.rm = TRUE),
    median_training_sessions = median(TrainingTimesLastYear, na.rm = TRUE),
    .groups = 'drop'
  )
training_summary
# A tibble: 2 × 3
  Attrition avg_training_sessions median_training_sessions
  <chr>                     <dbl>                    <dbl>
1 No                         2.83                        3
2 Yes                        2.62                        2
employee %>%
  ggplot(aes(x = Attrition, y = TrainingTimesLastYear, fill = Attrition)) +
  geom_boxplot() +
  labs(
    title = "Training Sessions by Attrition Status",
    x = "Attrition",
    y = "Training Sessions in the Last Year"
  ) +
  theme_minimal() +
  scale_fill_manual(values = c("No" = "#F76", "Yes" = "#00BFC4"), labels = c("Stay", "Leave"))

Explanation

Attrition by Education and Field of Study.

employee %>%
  ggplot(mapping = aes(x = Education, y = EducationField, fill = Attrition)) +
  geom_tile(color = "white") +
  labs(
    title = "Attrition Rate by Education Level and Field of Study",
    x = "Education Level",
    y = "Field of Study"
  ) +
  theme_light()

Explanation

From this output, we are shown the level of attrition by their level of education and their chosen field of study.

The graph displays three fields of study that were chosen by all our employees that left, as well as the four levels of education they came from. What’s interesting is that the education level with the most attrition is actually employees with doctorates. This could possibly indicate a lack of a challenge for these higher educated employees, or even a lack of feeling like their knowledge is being utilized correctly by us in PRL.

Another interesting insight is that employees that had a marketing bachelors degree had higher levels of attrition.This substantiates our findings that the most attrition is found in sales as marketing and sales are closely related.

Recomendations

  • We need to understand why our employees struggle with the work environment. We can do this by having a meeting with past and current employees.
  • We need to develop a clear pathway for our employees within the first two years, or at least manage expectations better.
  • We should consider adopting a hybrid working environment for our staff.
  • Try delegate Business travel better for our staff and offer more opportunities to do so.
  • Try uncover the reasoning for employees performing poorly, and give them the support they need so that they can eventually take on more responsibility and increase their job involvement.

Bibliography

Long, B. (2024) What is attrition rate, and why does it matter?, Insight Global. Available at: https://insightglobal.com/blog/employee-attrition-rate-how-to-calculate-improve/ (Accessed: 14 November 2024).