library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Histogram: Distribution of Employee Satisfaction Create a histogram
of the satisfaction_level variable. The title should reflect a key
takeaway from the distribution.
plot_ly(hr, x = ~satisfaction_level, type = "histogram") %>%
layout(title = "Significant Portion of Employees Show Extremely Low Satisfaction",
xaxis = list(title = "Satisfaction Level"),
yaxis = list(title = "Count"))
The distribution appears to be roughly bimodal, with two distinct
peaks: one at very low satisfaction (near 0) and another in the
mid-to-high satisfaction range (around 0.5 to 0.9).
There is a large concentration of highly dissatisfied employees,
with a spike in the count near 0 satisfaction.
Box Plot: Last Evaluation Scores Create a box plot of the
last_evaluation variable. The title should highlight an important
insight about the evaluation scores.
plot_ly(hr, y = ~last_evaluation, type = "box") %>%
layout(title = "Most Employees Receive Mid-to-High Evaluation Scores",
yaxis = list(title = "Last Evaluation Scores"))
The median evaluation score is around 0.72, indicating that half of
the employees have scores higher than this, and half are lower.
The IQR spans from 0.56 to 0.87, showing that most employees receive
evaluations in this mid-to-high range.
A small number of employees have lower evaluation scores.
Comparative Box Plot: Monthly Hours by Department Create a
comparative box plot of average_montly_hours grouped by department. The
title should emphasize a significant difference or pattern among
departments.
plot_ly(hr, x = ~as.factor(Department), y = ~average_montly_hours, type = "box") %>%
layout(title = "Monthly Hours Worked are Consistent Across Departments",
xaxis = list(title = "Department"),
yaxis = list(title = "Average Monthly Hours"))
Median monthly hours for all departments are fairly similar, with
most departments showing a median around 200 hours per month.
The IQR for each department are quite broad, spanning from about 150
to 250 hours.
Pie Chart of Frequencies: Attrition by Salary Level Create a pie
chart showing the frequency of employee attrition (left) for each salary
category. The title should point out the relationship between salary and
attrition.
left_by_salary <- hr %>%
filter(left == 1) %>%
count(salary)
plot_ly(left_by_salary, labels = ~salary, values = ~n, type = 'pie') %>%
layout(title = 'Higher Attrition Rates in Low Salary Levels')
60.8% of employee attrition comes from those in the low salary
category, this means a significant portion of employees leaving are
earning lower wages.
36.9% of attrition occurs in the medium salary category, this means
that mid-range salary earners also contribute substantially to the
overall attrition rate.
Bar Plot of Averages: Average Satisfaction by Department Create a
bar plot displaying the average satisfaction_level for each department.
The title should highlight a key observation about departmental
satisfaction.
department_satisfaction <- hr %>%
group_by(Department) %>%
summarise(avg_satisfaction = mean(satisfaction_level))
plot_ly(department_satisfaction, x = ~factor(Department), y = ~avg_satisfaction, type = 'bar') %>%
layout(title = 'Consistent Satisfaction Across Departments',
xaxis = list(title = 'Department'),
yaxis = list(title = 'Average Satisfaction'))
Average satisfaction levels across all departments are consistently
close to 0.6, showing little variation between different
departments.
The accounting department has the lowest satisfaction, though still
close to 0.6, suggesting no department is significantly below others in
employee satisfaction.