In this assignment, you will analyze employee attrition data using chi-square tests and visualization techniques in R. For each chi-square test, interpret the results, both in technical and in non-technical terms, and create the appropriate graph.
The dataset contains information about employees, including their satisfaction levels, last evaluation scores, number of projects, average monthly hours, time spent at the company, work accidents, promotion history, department, and salary.
Perform four (4) chi-square tests using any appropriate
variables (categorical) by the variable left. Note that the
variable left describes whether the employee left the
company (left = 1), or not
(left = 0).
For each of the four chi square tests:
Perform the chi-square test (.5 point) Choose any two appropriate variables from the data and perform the chi-square test, displaying the results.
Interpret the results in technical terms (.5 point) For each chi-square test, explain what the test’s p-value means (significance).
Interpret the results in non-technical terms (1 point) For each chi-square test, what do the results mean in non-techical terms.
Create a plot that helps visualize the chi-square test (.5 point) For each chi-square test, create a graph to help visualize the difference between means, if any. The title must be the non-technical interpretation.
Submit your assignment by providing a link to your published RPubs document containing all the required visualizations and explanations.
Total: 10 points
Use this code to read the data. Note that you will need additional libraries
library(readr)
hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')
chisq.test(mtcars$cyl , mtcars$am)
##
## Pearson's Chi-squared test
##
## data: mtcars$cyl and mtcars$am
## X-squared = 8.7407, df = 2, p-value = 0.01265
p-value interpretation: The p-value is very small, therefore the probability of these results being random is very small.
chi-square test interpretation: There is a dependence between the number of cylinders and the transmission type.
non-technical interpretation: Four cylinder cars are most likely to have a manual transmission.
library(plotly)
library(dplyr)
# Calculate proportions
prop_data <- mtcars %>%
group_by(cyl) %>%
summarise(
automatic = sum(am == 0) / n(),
manual = sum(am == 1) / n()
)
# Create stacked bar chart
plot_ly(prop_data) %>%
add_bars(x = ~cyl, y = ~automatic, name = "Automatic",
marker = list(color = "#1f77b4")) %>%
add_bars(x = ~cyl, y = ~manual, name = "Manual",
marker = list(color = "#ff7f0e")) %>%
layout(
barmode = "stack",
xaxis = list(title = "Number of Cylinders"),
yaxis = list(title = "Proportion", tickformat = ",.0%"),
title = "Four cylinder cars are most likely to have a manual transmission"
)
Good luck!