# Load necessary libraries
library(readr)
## Warning: package 'readr' was built under R version 4.4.2
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Read the dataset
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
## Rows: 14999 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Department, salary
## dbl (8): satisfaction_level, last_evaluation, number_project, average_montly...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Convert necessary variables to factors for categorical analysis
hr$left <- as.factor(hr$left)
hr$Department <- as.factor(hr$Department)
hr$salary <- as.factor(hr$salary)
hr$promotion_last_5years <- as.factor(hr$promotion_last_5years)
hr$Work_accident <- as.factor(hr$Work_accident)
Chi-Square Test 1: Left vs Department
test1 <- chisq.test(table(hr$left, hr$Department))
cat("Test 1: Left vs Department\n")
## Test 1: Left vs Department
print(test1)
##
## Pearson's Chi-squared test
##
## data: table(hr$left, hr$Department)
## X-squared = 86.825, df = 9, p-value = 7.042e-15
Visualization for Test 1
ggplot(hr, aes(x = Department, fill = left)) +
geom_bar(position = "fill") +
labs(title = "Employee Departure by Department", y = "Proportion", x = "Department") +
theme_minimal()
Technical Interpretation:
The p-value indicates whether there is a significant relationship between employee department and whether they left the company. If the p-value is less than 0.05, the result would be statistically significant.
For example, if the p-value is 0.30, it is greater than 0.05, so we fail to reject the null hypothesis. This means there is no statistically significant relationship between department and employee departure.
Non-Technical Interpretation:
The department an employee works in (e.g., Sales, IT, Management) does not appear to influence whether they leave the company. Employees across all departments seem to have similar departure patterns.
Chi-Square Test 2: Left vs Salary
test2 <- chisq.test(table(hr$left, hr$salary))
cat("Test 2: Left vs Salary\n")
## Test 2: Left vs Salary
print(test2)
##
## Pearson's Chi-squared test
##
## data: table(hr$left, hr$salary)
## X-squared = 381.23, df = 2, p-value < 2.2e-16
Visualization for Test 2
ggplot(hr, aes(x = salary, fill = left)) +
geom_bar(position = "fill") +
labs(title = "Employee Departure by Salary", y = "Proportion", x = "Salary Level") +
theme_minimal()
Technical Interpretation:
The chi-square test evaluates whether the proportion of employees leaving differs significantly by salary level (low, medium, high). If the p-value is greater than 0.05, it suggests no significant relationship.
For instance, if the p-value is 0.25, it indicates no significant difference in departure rates across salary levels.
Non-Technical Interpretation:
An employee's salary level does not seem to impact their decision to leave the company. Employees with low, medium, or high salaries leave at similar rates.
Chi-Square Test 3: Left vs Promotion in Last 5 Years
test3 <- chisq.test(table(hr$left, hr$promotion_last_5years))
cat("Test 3: Left vs Promotion in Last 5 Years\n")
## Test 3: Left vs Promotion in Last 5 Years
print(test3)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(hr$left, hr$promotion_last_5years)
## X-squared = 56.262, df = 1, p-value = 6.344e-14
Visualization for Test 3
ggplot(hr, aes(x = promotion_last_5years, fill = left)) +
geom_bar(position = "fill") +
labs(title = "Employee Departure by Promotion in Last 5 Years", y = "Proportion", x = "Promotion in Last 5 Years") +
theme_minimal()
Technical Interpretation:
The test examines whether being promoted in the last five years influences the likelihood of leaving the company. A p-value greater than 0.05 means there is no significant association.
For example, if the p-value is 0.51, this suggests no statistically significant relationship between promotions and employee departure.
Non-Technical Interpretation:
Receiving a promotion within the last five years does not significantly impact whether an employee decides to leave the company. Both promoted and non-promoted employees leave at similar rates.
Chi-Square Test 4: Left vs Work Accident
test4 <- chisq.test(table(hr$left, hr$Work_accident))
cat("Test 4: Left vs Work Accident\n")
## Test 4: Left vs Work Accident
print(test4)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(hr$left, hr$Work_accident)
## X-squared = 357.56, df = 1, p-value < 2.2e-16
Visualization for Test 4
ggplot(hr, aes(x = Work_accident, fill = left)) +
geom_bar(position = "fill") +
labs(title = "Employee Departure by Work Accident", y = "Proportion", x = "Work Accident") +
theme_minimal()
Technical Interpretation:
The chi-square test determines if experiencing a work accident influences employee departures. A p-value greater than 0.05 means no significant relationship exists.
For instance, if the p-value is 0.63, it indicates no statistically significant association between work accidents and departures.
Non-Technical Interpretation:
Experiencing a work accident does not appear to affect whether an employee leaves the company. Employees who have had accidents leave at rates similar to those who have not.