Assignment 5: Data Visualization

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Histogram: Distribution of Employee Satisfaction

ggplot(hr, aes(x = satisfaction_level)) +
  geom_histogram(fill = "#2C7BB6", color = "white", bins = 20) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(
    title = "Most Employees Report High Satisfaction, Though a Small Group Shows 
    Very Low Satisfaction",
    x = "Satisfaction Level",
    y = "Number of Employees"
  )

The distribution is heavily concentrated at higher satisfaction levels, indicating that most employees report being satisfied. However, there is a noticeable cluster at very low satisfaction levels, which may represent employees at risk of leaving.

Box Plot: Last Evaluation Scores

ggplot(hr, aes(y = last_evaluation)) +
  geom_boxplot(fill="blue", color = "black") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(
    title = "Employee Evaluation Scores Are Evenly Distributed Around a 
    Moderately High Median",
    y = "Last Evaluation Score"
  )

Employee evaluation scores are fairly evenly distributed around the median, with most employees receiving moderately high evaluations. The distribution appears relatively symmetric, suggesting no strong skew in performance ratings.

Comparative Box Plot: Monthly Hours by Department

ggplot(hr, aes(x = Department, y = average_montly_hours, fill = Department)) +
  geom_boxplot() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none") +
  labs(
    title = "Average Monthly Work Hours Are Consistent Across Departments",
    x = "Department",
    y = "Average Monthly Hours"
  )

The median monthly hours are nearly the same across all departments, suggesting that workload is relatively evenly distributed. While there is some variation within departments, no single department stands out as working substantially more hours than others.

Pie Chart: Attrition by Salary Level

attrition_salary <- hr %>%
  group_by(salary, left) %>%
  summarise(count = n(), .groups = "drop")

ggplot(attrition_salary, aes(x = "", y = count, fill = factor(left))) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  facet_wrap(~salary) +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5)
  ) +
  labs(
    title = "Attrition Decreases as Salary Increases",
    fill = "Attrition (1 = Left)"
  )

Employees in the low salary category have a visibly higher proportion of attrition. Attrition decreases as salary level increases. This suggests compensation plays a major role in retention.

Bar Plot: Average Satisfaction by Department

avg_satisfaction <- hr %>%
  group_by(Department) %>%
  summarise(avg_sat = mean(satisfaction_level))

ggplot(avg_satisfaction, aes(x = reorder(Department, avg_sat), y = avg_sat, fill = Department)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  theme_minimal() +
  theme(legend.position = "none", plot.title = element_text(hjust = 0.5)) +
  labs(
    title = "All Departments Share Similar Average Satisfaction Levels",
    x = "Department",
    y = "Average Satisfaction Level"
  )

The departments all have very similar average satisfaction levels. The average satisfaction level for these departments is around 0.6, with the accounting department slightly lower than the rest.