library(readr)
library(ggplot2) #i'm most familiar with this package!
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(hr) +
aes(x = satisfaction_level) +
geom_histogram(bins = 30L, fill = "#748FC1") +
labs(
x = "Satisfaction Level",
y = "Count",
title = "Most employees are satisfied ",
subtitle = "satisfied= (.5 or above)"
) +
theme_minimal()
When looking at satisfaction level, you can see that a vast majority of employees are above the .5 mark of satisfaction, so they’re satisfied. There is a hike of employees that aren’t satisfied, though, as from the .25 mark going left it starts to rise and hits above 500 emplyees that are next to not satisfied at all.
ggplot(hr) +
aes(x = "", y = last_evaluation) +
geom_boxplot(fill = "#C1B2F0") +
labs(
x = "Count of Employees",
y = "Score of Last Evaluation",
title = "Vast Majority of Employees Have Rated Positively",
subtitle = "positive= (.5 or above)"
) +
theme_minimal()
When looking at the last evaluation data, the IQR is between .55 and .85, which is positive and means the middle 50% is between that range, and the rest is above.
ggplot(hr) +
aes(x = average_montly_hours, y = Department) +
geom_boxplot(fill = "#CB4B99") +
labs(
x = "Average Monthly Hours",
y = "Department",
title = "Average Monthly Hours Dont Vary Much By Department"
) +
theme_minimal()
The overall shapes of the boxpolots look the same, which shows that the average monthly hours are relatively the same, no departmemt works a lot longer than the others.
q4 <- hr %>%
group_by(salary, left) %>%
summarize(Count = n())
## `summarise()` has grouped output by 'salary'. You can override using the
## `.groups` argument.
ggplot(q4, aes(x = "", y = Count, fill = interaction(salary, left))) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
labs(fill = "Salary & Left Status", y = NULL, x = NULL) +
ggtitle("More than Half of Employees Stay Regardless of Salary") +
theme_void() +
theme(legend.position = "right") +
scale_fill_discrete(name = "Salary and Employement Status")
This pie chart shows the breakdown of each employment status and their salary. 0 means they stayed, 1 means the left. Only a sliver of employees with a high salary left, a small amount of medium salary left and about an eighth of employees with low salary left. The vast majority of employees just stayed.
# Load necessary libraries
library(ggplot2)
library(dplyr)
department_satisfaction <- hr %>%
group_by(Department) %>%
summarize(Average_Satisfaction = mean(satisfaction_level, na.rm = TRUE), .groups = 'drop')
# Step 2: Create the barplot
ggplot(department_satisfaction, aes(x = reorder(Department, Average_Satisfaction), y = Average_Satisfaction)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Accountants May be Sadder than Coworkers",
x = "Department",
y = "Average Satisfaction Level") +
theme_minimal()
This box plot shows fairly similar satisfaction level ratings by department, but the accountants have a lower satisfaction rating which may cause some concerns for employers.