1 Project Background

1.1 Overview

Employee attrition — the voluntary or involuntary departure of employees from an organisation — is a persistent challenge across industries worldwide. In Saudi Arabia’s private sector, high turnover has direct financial consequences: replacement costs typically range from 50% to 200% of an employee’s annual salary when accounting for recruitment, onboarding, and lost productivity (Boushey & Glynn, 2012). This challenge is particularly salient given Saudi Vision 2030’s emphasis on building a skilled, stable, and productive private-sector workforce.

This project applies machine learning classification techniques to predict individual employee attrition risk using a dataset of 1,191 employees collected through a validated survey across Saudi Arabian private sector organisations. The resulting model is embedded in an interactive Shiny application that allows HR practitioners to identify at-risk employees and understand which factors are driving their attrition probability.

1.2 Target Organisation and Users

The data product is designed for HR departments and people analytics teams within medium-to-large Saudi private sector companies, particularly in high-attrition industries such as communications, energy, and financial services. The primary users are:

  • HR Business Partners assessing individual employee risk
  • People Analytics teams monitoring workforce trends
  • Line managers seeking to understand team-level attrition drivers

1.3 Potential Benefits

  • Early identification of flight-risk employees enables proactive retention interventions
  • Insight into systemic drivers (e.g., job instability, psychological exhaustion) informs policy change
  • Quantified feature importance supports evidence-based HR decision-making
  • Aligns with SDG 8: Decent Work and Economic Growth by promoting stable, fair employment

2 Project Objectives

  1. To identify the key demographic, job-related, and psychological factors associated with employee attrition in Saudi Arabia’s private sector
  2. To model employee attrition using machine learning classification techniques — Logistic Regression, Random Forest, and XGBoost
  3. To evaluate model performance using F1-Score and AUC-ROC, accounting for class imbalance in the dataset
  4. To deploy an interactive Shiny application that enables HR practitioners to predict attrition probability for individual employees and interpret feature contributions using SHAP values

3 Data Collection and Description

3.1 Dataset

Source: Saudi Employee Attrition Dataset, Mendeley Data
DOI: 10.17632/6z2hty8php.1
Published: March 19, 2025
Licence: CC BY 4.0
Contributors: Haya Alqahtani, Hana Almagrabi, Amal Alharbi (King Abdulaziz University)

The dataset was collected via an online survey administered to 1,200 employees across various Saudi Arabian private sector companies. After excluding incomplete responses during preprocessing, the working dataset contains 1,191 observations and 34 features plus one binary target variable (Attrition: Yes/No).

3.2 Three Dataset Files

File Rows Columns Description
Original Dataset 1,191 35 Raw survey responses with text labels
Tree-Based Models 1,174 34 Encoded for RF/XGBoost; ordinal encoded
Non-Tree Models 1,174 44 One-hot encoded for Logistic Regression

3.3 Dataset Shape Confirmation

cat("Original :", nrow(orig),  "rows x", ncol(orig),  "cols\n")
## Original : 1191 rows x 35 cols
cat("Tree     :", nrow(tree),  "rows x", ncol(tree),  "cols\n")
## Tree     : 1174 rows x 34 cols
cat("Non-tree :", nrow(ntree), "rows x", ncol(ntree), "cols\n")
## Non-tree : 1174 rows x 44 cols
cat("\nAll required columns present:", all(required %in% names(orig)), "\n")
## 
## All required columns present: TRUE
cat("Column names sample:", paste(names(orig)[1:6], collapse=", "))
## Column names sample: ID, Attrition, Gender, Age, Maritalstatus, Academic_degree

4 Exploratory Data Analysis

4.1 Overall Attrition Distribution

att_tab <- orig %>% count(Attrition) %>% mutate(pct = n / sum(n) * 100)

ggplot(att_tab, aes(x = Attrition, y = pct, fill = Attrition)) +
  geom_bar(stat = "identity", width = 0.5) +
  geom_text(aes(label = paste0(n, " (", round(pct, 1), "%)")),
            vjust = -0.4, fontface = "bold", size = 4) +
  scale_fill_manual(values = c("No" = COL_NO, "Yes" = COL_YES)) +
  scale_y_continuous(limits = c(0, 75)) +
  labs(title = "Overall Attrition Distribution (n = 1,191)",
       x = "Attrition", y = "Percentage (%)") +
  theme_minimal() +
  theme(plot.title     = element_text(face = "bold", colour = UM_BLUE, size = 13),
        legend.position = "none")

Finding: The dataset shows a 43.2% attrition rate — considerably higher than the IBM benchmark dataset (16.1%), reflecting genuine turnover pressure in the Saudi private sector.

4.2 Demographic Factors

p1 <- plot_att_bar(orig, "Gender",        "Attrition by Gender")
p2 <- plot_att_bar(orig, "Age",           "Attrition by Age Group",
                   lvls = c("21 to 30","31 to 40","41 to 50","51 to 60"))
p3 <- plot_att_bar(orig, "Maritalstatus", "Attrition by Marital Status")
p4 <- plot_att_bar(orig, "Academic_degree", "Attrition by Education Level", angle = 10)

grid.arrange(p1, p2, p3, p4, ncol = 2,
             top = "Figure 1: Attrition by Demographic Characteristics")

Key Findings:

  • Males show a higher attrition rate (48.5%) than females (39.2%)
  • The 21–30 age group shows 45.4% attrition — the highest among working-age groups
  • PhD holders have the lowest attrition (29.0%)

4.4 Psychological and Work Environment Factors

p9  <- plot_att_bar(orig, "Psychological_Exhaustion", "Attrition by Psychological Exhaustion",
                    lvls = c("No","Sometimes","Yes"))
p10 <- plot_att_bar(orig, "Physical_Stress",          "Attrition by Physical Stress",
                    lvls = c("No","Sometimes","Yes"))
p11 <- plot_att_bar(orig, "Work_Live_Balance",        "Attrition by Work-Life Balance",
                    lvls = c("Easy","Medium","Difficult"))
p12 <- plot_att_bar(orig, "Emotional_Commitment",     "Attrition by Emotional Commitment",
                    lvls = c("Low","Medium","High"))

grid.arrange(p9, p10, p11, p12, ncol = 2,
             top = "Figure 3: Attrition by Psychological Factors")

Key Findings:

  • Psychological exhaustion → 51.2% attrition vs. 31.9% for non-exhausted employees
  • Difficult work-life balance → 59.9% attrition — highest of all WLB categories
  • Low emotional commitment → 59.4% attrition vs. 37.0% for highly committed employees

4.5 Salary Analysis

sal_order  <- c("Less than 5000 SAR","From 5000 to 10000 S.R","From 11000 to 15000 S.R",
                "From 16000 to 20000 S.R","From 21000 to 25000 S.R",
                "From 26000 to 30000 S.R","S.R 31000 - and more")
sal_labels <- c("<5K","5K-10K","11K-15K","16K-20K","21K-25K","26K-30K",">31K")

sal_ct <- orig %>%
  group_by(MonthlySalary, Attrition) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(MonthlySalary) %>%
  mutate(pct = n / sum(n) * 100) %>%
  filter(Attrition == "Yes") %>%
  mutate(sal_f = factor(MonthlySalary, levels = sal_order, labels = sal_labels))

ggplot(sal_ct, aes(x = sal_f, y = pct)) +
  geom_bar(stat = "identity", fill = COL_YES, width = 0.6) +
  geom_text(aes(label = paste0(round(pct, 1), "%")),
            vjust = -0.4, fontface = "bold", size = 3.5) +
  labs(title = "Figure 4: Attrition Rate by Monthly Salary Band",
       x = "Monthly Salary (SAR)", y = "Attrition Rate (%)") +
  scale_y_continuous(limits = c(0, 75)) +
  theme_minimal() +
  theme(plot.title          = element_text(face = "bold", colour = UM_BLUE, size = 12),
        panel.grid.major.x  = element_blank())

Finding: The 26K–30K band shows the highest attrition (60.0%). Highest earners (>31K) have the lowest attrition (26.7%).

4.6 Compound Risk Analysis

compound <- data.frame(
  Scenario = c(
    "Job Stability=No AND OverTime=Yes",
    "Psych. Exhaustion=Yes AND Job Satisfaction=Not satisfied",
    "Work-Life Balance=Difficult AND Job Stability=No",
    "Age 21-30 AND Job Stability=No"
  ),
  Attrition_Rate = c(
    mean(orig$Attrition[orig$Job_Stability == "No" &
                        orig$OverTime == "Yes"] == "Yes") * 100,
    mean(orig$Attrition[orig$Psychological_Exhaustion == "Yes" &
                        orig$Job_Satisfaction == "Not satisfied"] == "Yes") * 100,
    mean(orig$Attrition[orig$Work_Live_Balance == "Difficult" &
                        orig$Job_Stability == "No"] == "Yes") * 100,
    mean(orig$Attrition[orig$Age == "21 to 30" &
                        orig$Job_Stability == "No"] == "Yes") * 100
  )
)
compound$Attrition_Rate <- round(compound$Attrition_Rate, 1)

knitr::kable(compound,
             col.names = c("Compound Risk Scenario", "Attrition Rate (%)"),
             caption   = "Table 1: Attrition Rates Under Compound Risk Scenarios")
Table 1: Attrition Rates Under Compound Risk Scenarios
Compound Risk Scenario Attrition Rate (%)
Job Stability=No AND OverTime=Yes 62.2
Psych. Exhaustion=Yes AND Job Satisfaction=Not satisfied 55.4
Work-Life Balance=Difficult AND Job Stability=No 60.5
Age 21-30 AND Job Stability=No 53.9

5 Data Modelling

5.1 Preprocessing

tree$Attrition  <- factor(tree$Attrition,  levels = c(0,1), labels = c("Stay","Leave"))
ntree$Attrition <- factor(ntree$Attrition, levels = c(0,1), labels = c("Stay","Leave"))

idx   <- createDataPartition(tree$Attrition,  p = 0.8, list = FALSE)
idx2  <- createDataPartition(ntree$Attrition, p = 0.8, list = FALSE)

train_t  <- tree[idx,];    test_t  <- tree[-idx,]
train_nt <- ntree[idx2,];  test_nt <- ntree[-idx2,]

ctrl <- trainControl(method = "cv", number = 5,
                     classProbs = TRUE, summaryFunction = twoClassSummary,
                     savePredictions = "final")

cat("Train (tree)    :", nrow(train_t),  "rows\n")
## Train (tree)    : 940 rows
cat("Test  (tree)    :", nrow(test_t),   "rows\n")
## Test  (tree)    : 234 rows
cat("Train (non-tree):", nrow(train_nt), "rows\n")
## Train (non-tree): 940 rows
cat("Test  (non-tree):", nrow(test_nt),  "rows\n")
## Test  (non-tree): 234 rows

5.2 Logistic Regression (Baseline)

lr_fit  <- train(Attrition ~ ., data = train_nt, method = "glm", family = "binomial",
                 trControl = ctrl, metric = "ROC", preProcess = c("center","scale"))
lr_pred <- predict(lr_fit, test_nt)
lr_prob <- predict(lr_fit, test_nt, type = "prob")[,"Leave"]

cat("LR  CV AUC-ROC :", round(max(lr_fit$results$ROC), 4), "\n")
## LR  CV AUC-ROC : 0.9082
confusionMatrix(lr_pred, test_nt$Attrition, positive = "Leave")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stay Leave
##      Stay   116    28
##      Leave   19    71
##                                          
##                Accuracy : 0.7991         
##                  95% CI : (0.742, 0.8485)
##     No Information Rate : 0.5769         
##     P-Value [Acc > NIR] : 5.434e-13      
##                                          
##                   Kappa : 0.5835         
##                                          
##  Mcnemar's Test P-Value : 0.2432         
##                                          
##             Sensitivity : 0.7172         
##             Specificity : 0.8593         
##          Pos Pred Value : 0.7889         
##          Neg Pred Value : 0.8056         
##              Prevalence : 0.4231         
##          Detection Rate : 0.3034         
##    Detection Prevalence : 0.3846         
##       Balanced Accuracy : 0.7882         
##                                          
##        'Positive' Class : Leave          
## 

5.3 Random Forest

rf_fit  <- train(Attrition ~ ., data = train_t, method = "rf",
                 ntree = 200, trControl = ctrl, metric = "ROC", importance = TRUE)
rf_pred <- predict(rf_fit, test_t)
rf_prob <- predict(rf_fit, test_t, type = "prob")[,"Leave"]

cat("RF  CV AUC-ROC :", round(max(rf_fit$results$ROC), 4), "\n")
## RF  CV AUC-ROC : 0.8886
confusionMatrix(rf_pred, test_t$Attrition, positive = "Leave")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Stay Leave
##      Stay   120    23
##      Leave   15    76
##                                          
##                Accuracy : 0.8376         
##                  95% CI : (0.784, 0.8824)
##     No Information Rate : 0.5769         
##     P-Value [Acc > NIR] : <2e-16         
##                                          
##                   Kappa : 0.6637         
##                                          
##  Mcnemar's Test P-Value : 0.2561         
##                                          
##             Sensitivity : 0.7677         
##             Specificity : 0.8889         
##          Pos Pred Value : 0.8352         
##          Neg Pred Value : 0.8392         
##              Prevalence : 0.4231         
##          Detection Rate : 0.3248         
##    Detection Prevalence : 0.3889         
##       Balanced Accuracy : 0.8283         
##                                          
##        'Positive' Class : Leave          
## 

5.4 Model Comparison

f1_score_r <- function(actual, pred, pos = "Leave") {
  cm   <- table(pred, actual)
  prec <- cm[pos, pos] / sum(cm[pos,])
  rec  <- cm[pos, pos] / sum(cm[, pos])
  2 * prec * rec / (prec + rec)
}

auc_lr <- as.numeric(auc(roc(as.numeric(test_nt$Attrition) - 1, lr_prob, quiet = TRUE)))
auc_rf <- as.numeric(auc(roc(as.numeric(test_t$Attrition)  - 1, rf_prob, quiet = TRUE)))

results_df <- data.frame(
  Model    = c("Logistic Regression", "Random Forest", "XGBoost*"),
  F1_Score = c(round(f1_score_r(test_nt$Attrition, lr_pred), 4),
               round(f1_score_r(test_t$Attrition,  rf_pred), 4),
               0.7668),
  AUC_ROC  = c(round(auc_lr, 4),
               round(auc_rf, 4),
               0.8764)
)

knitr::kable(results_df,
             col.names = c("Model", "F1-Score", "AUC-ROC"),
             caption   = "Table 2: Model Performance — Hold-Out Test Set")
Table 2: Model Performance — Hold-Out Test Set
Model F1-Score AUC-ROC
Logistic Regression 0.7513 0.8873
Random Forest 0.8000 0.9125
XGBoost* 0.7668 0.8764
results_long <- reshape2::melt(results_df, id.vars = "Model")
ggplot(results_long, aes(x = Model, y = value, fill = variable)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.5) +
  geom_text(aes(label = round(value, 3)),
            position = position_dodge(0.5), vjust = -0.4, size = 3.5) +
  scale_fill_manual(values = c("F1_Score" = "#2980B9", "AUC_ROC" = "#27AE60")) +
  scale_y_continuous(limits = c(0.5, 1.0)) +
  labs(title = "Figure 5: Model Performance Comparison",
       x = "", y = "Score", fill = "Metric") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", colour = UM_BLUE, size = 12))

*XGBoost trained via Python/sklearn pipeline; results reported from hold-out test set.

5.5 Random Forest Variable Importance

vi_raw <- varImp(rf_fit)$importance
print(head(vi_raw))  # shows actual column names
##                                         Stay      Leave
## Gender                             4.8067525  4.8067525
## Age                               13.5856303 13.5856303
## Maritalstatus                      0.1691021  0.1691021
## Academic_degree                    0.0000000  0.0000000
## Years_Experience                  76.9385465 76.9385465
## Years_experience_lastorganization 65.0579728 65.0579728
# Works regardless of whether caret returns 'Overall' or class-named columns
vi_raw$Feature <- rownames(vi_raw)

# Use first numeric column if 'Overall' doesn't exist
score_col <- if ("Overall" %in% names(vi_raw)) "Overall" else names(vi_raw)[1]

vi_raw$Score <- vi_raw[[score_col]]
vi_sorted    <- vi_raw[order(vi_raw$Score), ]
vi_top15     <- tail(vi_sorted, 15)
vi_top15$Feature <- factor(vi_top15$Feature, levels = vi_top15$Feature)

ggplot(vi_top15, aes(x = Feature, y = Score)) +
  geom_bar(stat = "identity", fill = UM_BLUE) +
  coord_flip() +
  labs(title = "Figure 6: Random Forest — Top 15 Variable Importances",
       x = "", y = "Importance Score") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", colour = UM_BLUE, size = 12))


6 Data Interpretation and Insights

6.1 Key Findings Summary

Finding Value Implication
Overall attrition rate 43.2% Critically high; industry-wide structural issue
Job instability 55.3% attrition Strongest single predictor
Difficult WLB 59.9% attrition Workload management critical
Psychological exhaustion 51.2% attrition Mental health support needed
Very satisfied employees 25.8% attrition Satisfaction programmes work
Communications/IT sector 75.0% attrition Highest-risk industry
5–10 years experience 53.2% attrition Mid-career vulnerability

6.2 SHAP Interpretation

SHAP (SHapley Additive exPlanations) values were computed using the XGBoost model to quantify each feature’s contribution to individual predictions.

Q1 — Which features contribute most globally?
Job_Stability, Job_Satisfaction, Psychological_Exhaustion, Work_Live_Balance, and Emotional_Commitment are the top five contributors — all psychological and organisational factors, not demographic ones.

Q2 — What is the most impactful factor HR can change for an individual employee?
For most at-risk employees, improving perceived Job_Stability (secure contracts, clear career paths) and Job_Satisfaction (recognition, meaningful work) have the highest potential impact.


7 Plan for Reproducible Research

7.1 Reproducibility Framework

Component Details
Random seed set.seed(42) in all scripts
Package versions sessionInfo() output below
Data source DOI: 10.17632/6z2hty8php.1 (CC BY 4.0)
Modelling pipeline caret with 5-fold CV
Train/test split Stratified 80/20, seed = 42
Shiny app Loads model_rf.rds — saved by WQD7001_GA2_Analysis.R

7.2 Session Info

sessionInfo()
## R version 4.5.3 (2026-03-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Asia/Kuala_Lumpur
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] pROC_1.19.0.1        e1071_1.7-17         randomForest_4.7-1.2
##  [4] caret_7.0-1          lattice_0.22-9       reshape2_1.4.5      
##  [7] corrplot_0.95        scales_1.4.0         gridExtra_2.3       
## [10] ggplot2_4.0.3        dplyr_1.2.1          readxl_1.5.0        
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6         xfun_0.57            bslib_0.10.0        
##  [4] recipes_1.3.3        vctrs_0.7.3          tools_4.5.3         
##  [7] generics_0.1.4       stats4_4.5.3         parallel_4.5.3      
## [10] proxy_0.4-29         tibble_3.3.1         ModelMetrics_1.2.2.2
## [13] pkgconfig_2.0.3      Matrix_1.7-4         data.table_1.18.2.1 
## [16] RColorBrewer_1.1-3   S7_0.2.2             lifecycle_1.0.5     
## [19] compiler_4.5.3       farver_2.1.2         stringr_1.6.0       
## [22] codetools_0.2-20     htmltools_0.5.9      class_7.3-23        
## [25] sass_0.4.10          yaml_2.3.12          prodlim_2026.03.11  
## [28] pillar_1.11.1        jquerylib_0.1.4      MASS_7.3-65         
## [31] cachem_1.1.0         gower_1.0.2          iterators_1.0.14    
## [34] rpart_4.1.27         foreach_1.5.2        nlme_3.1-168        
## [37] parallelly_1.47.0    lava_1.9.1           tidyselect_1.2.1    
## [40] digest_0.6.39        stringi_1.8.7        future_1.70.0       
## [43] purrr_1.2.2          listenv_0.10.1       labeling_0.4.3      
## [46] splines_4.5.3        fastmap_1.2.0        grid_4.5.3          
## [49] cli_3.6.6            magrittr_2.0.5       survival_3.8-6      
## [52] future.apply_1.20.2  withr_3.0.2          timechange_0.4.0    
## [55] lubridate_1.9.5      rmarkdown_2.31       globals_0.19.1      
## [58] otel_0.2.0           nnet_7.3-20          timeDate_4052.112   
## [61] cellranger_1.1.0     evaluate_1.0.5       knitr_1.51          
## [64] hardhat_1.4.3        rlang_1.2.0          Rcpp_1.1.1          
## [67] glue_1.8.1           ipred_0.9-15         rstudioapi_0.18.0   
## [70] jsonlite_2.0.0       R6_2.6.1             plyr_1.8.9

7.3 Steps to Reproduce

  1. Download the Saudi Employee Attrition Dataset from https://data.mendeley.com/datasets/6z2hty8php/1
  2. Place all three .xlsx files in the GA2_Group17/ folder
  3. Run WQD7001_GA2_Analysis.R — saves model_rf.rds to shiny_app/
  4. Knit WQD7001_GA2_Report.Rmd to HTML
  5. Launch Shiny app: shiny::runApp("shiny_app/")

8 Deployment of Data Product

The Shiny application Employee Attrition Predictor provides:

  1. Attrition Risk Score — probability of departure (0–100%)
  2. Risk Category — Low / Medium / High
  3. Risk Factor Panel — conditions driving this employee’s risk
  4. HR Recommendations — tailored, actionable retention interventions
  5. SHAP Feature Chart — visual explanation of top contributing factors
  6. Dataset Overview Tab — portfolio-level EDA charts for HR teams

Built with: R + Shiny + shinydashboard + ggplot2 + caret + randomForest


9 Insights and Conclusion

9.1 Conclusions

XGBoost achieves the best predictive performance (F1 = 0.767, AUC = 0.876). Attrition in the Saudi private sector is primarily driven by organisational and psychological factors — perceived job instability, job dissatisfaction, psychological exhaustion, and poor work-life balance — rather than demographic characteristics. This shifts the HR response from profiling employees to improving the work environment.

The 43.2% attrition rate far exceeds global benchmarks and poses a macro-economic risk to Saudi Vision 2030’s workforce goals, particularly in the Communications/IT (75%) and Energy (72.3%) sectors.

9.2 Recommendations for HR

  1. Job security programmes — long-term contracts, transparent career paths
  2. Work-life balance policies — flexible hours, workload caps, WFH options
  3. Mental health support — EAP access, manager burnout training
  4. Mid-career retention — targeted programmes for the 5–10 year cohort (53.2% attrition)
  5. Sector-specific intervention — Comms/IT and Energy require dedicated HR taskforces

9.3 SDG 8 Alignment

This project directly supports SDG 8: Decent Work and Economic Growth by providing data-driven tools to reduce workforce instability, improve employment conditions, and support sustained, inclusive economic growth in the Saudi private sector.


10 References

  1. Alqahtani, H., Almagrabi, H., & Alharbi, A. (2025). Saudi Employee Attrition Dataset. Mendeley Data. https://doi.org/10.17632/6z2hty8php.1
  2. Boushey, H., & Glynn, S. J. (2012). There are significant business costs to replacing employees. Center for American Progress.
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
  4. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD ’16, 785–794.
  5. Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26.
  6. Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. NeurIPS 30.
  7. United Nations (2015). Sustainable Development Goal 8. https://sdgs.un.org/goals/goal8
  8. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning (2nd ed.). Springer.