library(readr)
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Histogram: Distribution of Employee Satisfaction Create a histogram
of the satisfaction_level variable. The title should reflect a key
takeaway from the distribution.
plot_ly(hr, x = ~satisfaction_level, type = "histogram") %>%
layout(title = "Most employees are satisifed",
xaxis = list(title = "Satisfaction Level"),
yaxis = list(title = "Frequency"))
about 6 percent of employees are extremely dissatisifed (level <=
.1)
most employees are satisifed (satisifaction level > .5)
Box Plot: Last Evaluation Scores Create a box plot of the
last_evaluation variable. The title should highlight an important
insight about the evaluation scores.
plot_ly(hr, y = ~satisfaction_level, type = "box") %>%
layout(title = "Box Plot of Last Evaluation Scores: Most Employees Cluster Around High Scores",
yaxis = list(title = "Last Evaluation Score"))
large portion of employees received evaluation scores between
approximately 0.55 and 0.85. This indicates that most employees were
evaluated with moderate to high scores.
Any points outside the whiskers are considered outliers. These
represent employees who received either exceptionally low or high
scores
Comparative Box Plot: Monthly Hours by Department Create a
comparative box plot of average_montly_hours grouped by department. The
title should emphasize a significant difference or pattern among
departments.
plot_ly(hr, x = ~Department, y = ~average_montly_hours, type = "box") %>%
layout(title = "Comparative Box Plot of Monthly Hours: Noticeable Variation Across Departments",
xaxis = list(title = "Department"),
yaxis = list(title = "Average_montly_hours"))
The plot reveals whether certain departments, such as IT or Sales,
have consistently higher or lower monthly hours, indicating potential
workload imbalances.
Some departments may have a higher median (central line in the box),
indicating that employees in these departments tend to work more hours
on average compared to others.
Pie Chart of Frequencies: Attrition by Salary Level Create a pie
chart showing the frequency of employee attrition (left) for each salary
category. The title should point out the relationship between salary and
attrition.
attrition_data <- hr[hr$left == 1, ]
attrition_by_salary <- as.data.frame(table(attrition_data$salary))
colnames(attrition_by_salary) <- c("salary", "count")
plot_ly(attrition_by_salary, labels = ~salary, values = ~count, type = 'pie') %>%
layout(title = 'Employee Attrition by Salary: Higher Attrition in Lower Salary Categories')
There is a clear relationship between salary level and
attrition—employees in lower salary categories experience higher rates
of turnover, while those in higher salary categories tend to stay with
the company longer.
If the pie chart shows that a significant portion of the chart is
taken up by the “low” salary category, this indicates that employees
with lower salaries are more likely to leave the company
Bar Plot of Averages: Average Satisfaction by Department Create a
bar plot displaying the average satisfaction_level for each department.
The title should highlight a key observation about departmental
satisfaction.
avg_satisfaction <- hr %>% group_by(Department) %>% summarize(avg_satisfaction = mean(satisfaction_level))
plot_ly(avg_satisfaction, x = ~Department, y = ~avg_satisfaction, type = 'bar') %>%
layout(title = 'Average Satisfaction by Department: Key Differences in Satisfaction Levels Across Teams',
xaxis = list(title = 'sales'),
yaxis = list(title = 'average satisfaction level'))
Management and It have the highest satisfaction level meaning they
like their job the most
hr and accounting have a lower satifaction level