Objective

In this assignment, you will analyze employee attrition data using chi-square tests and visualization techniques in R. For each chi-square test, interpret the results, both in technical and in non-technical terms, and create the appropriate graph.

Data

The dataset contains information about employees, including their satisfaction levels, last evaluation scores, number of projects, average monthly hours, time spent at the company, work accidents, promotion history, department, and salary.

Tasks

Perform four (4) chi-square tests using any appropriate variables (categorical) by the variable left. Note that the variable left describes whether the employee left the company (left = 1), or not (left = 0).

For each of the four chi square tests:

  1. Perform the chi-square test (.5 point) Choose any two appropriate variables from the data and perform the chi-square test, displaying the results.

  2. Interpret the results in technical terms (.5 point) For each chi-square test, explain what the test’s p-value means (significance).

  3. Interpret the results in non-technical terms (1 point) For each chi-square test, what do the results mean in non-techical terms.

  4. Create a plot that helps visualize the chi-square test (.5 point) For each chi-square test, create a graph to help visualize the difference between means, if any. The title must be the non-technical interpretation.

Submission

Submit your assignment by providing a link to your published RPubs document containing all the required visualizations and explanations.

Total: 10 points

Starter code

Use this code to read the data. Note that you will need additional libraries

library(readr)

hr <- read_csv('https://raw.githubusercontent.com/aiplanethub/Datasets/refs/heads/master/HR_comma_sep.csv')

Example of a succesful chi-square test

chisq.test(mtcars$cyl , mtcars$am)
## 
##  Pearson's Chi-squared test
## 
## data:  mtcars$cyl and mtcars$am
## X-squared = 8.7407, df = 2, p-value = 0.01265

p-value interpretation: The p-value is very small, therefore the probability of these results being random is very small.

chi-square test interpretation: There is a dependence between the number of cylinders and the transmission type.

non-technical interpretation: Four cylinder cars are most likely to have a manual transmission.

library(plotly)
library(dplyr)

# Calculate proportions


prop_data <- mtcars %>%
  group_by(cyl) %>%
  summarise(
    automatic = sum(am == 0) / n(),
    manual = sum(am == 1) / n()
  )

# Create stacked bar chart
plot_ly(prop_data) %>%
  add_bars(x = ~cyl, y = ~automatic, name = "Automatic", 
           marker = list(color = "#1f77b4")) %>%
  add_bars(x = ~cyl, y = ~manual, name = "Manual", 
           marker = list(color = "#ff7f0e")) %>%
  layout(
    barmode = "stack",
    xaxis = list(title = "Number of Cylinders"),
    yaxis = list(title = "Proportion", tickformat = ",.0%"),
    title = "Four cylinder cars are most likely to have a manual transmission"
  )

Good luck!