Pendahuluan

Dataset: https://www.kaggle.com/datasets/zkyfauzi/work-wellbeing-dataset

Variabel Dependen (Y): Burnout_Risk — Ordinal dengan tiga kategori: Low < Medium < High

Variabel Independen (X):

Variabel Tipe Keterangan
Work_Location Kategorik (Home / Office / Coworking) Lokasi kerja utama: Home, Office, Coworking
Seniority_Level Kategorik (Junior / Mid / Senior) Level jabatan: Junior, Mid, Senior
Avg_Working_Hours Numerik Rata-rata jam kerja per hari (rentang dibatasi 6.0-13.0)
Meeting_Intensity Numerik Rata-rata jam meeting/call per hari (0-10)
Sentiment_Score Numerik Skor sentimen pada rentang -1.0 sampai 1.0

Tujuan: Memodelkan faktor-faktor yang memengaruhi tingkat risiko burnout karyawan menggunakan Ordinal Logistic Regression.


Install & Load Package

# install.packages(c("MASS", "brant", "car", "dplyr", "tidyr", "ggplot2"), quiet = TRUE)

library(MASS)
library(brant)
library(car)
library(dplyr)
library(tidyr)
library(ggplot2)

Load Data

df <- read.csv("work_wellbeing_dataset.csv")
head(df, 10)
Employee_ID Work_Location Avg_Working_Hours Meeting_Intensity Internet_Reliability Seniority_Level Work_Life_Balance Daily_Mood_Note Sentiment_Score Burnout_Risk
1 Home 8.14 2 Fair Junior 4 Fokus bagus hari ini, meeting berjalan efisien dan tugas utama tuntas. 0.685 Low
2 Office 9.66 4 Good Mid 2 Hari cukup padat, beberapa meeting memecah fokus tapi masih terkendali. Fokus tambahan: Organized leadingedge info-mediaries. -0.463 Medium
3 Home 8.21 1 Excellent Senior 2 Progress ada, walau ritme kerja naik turun sepanjang hari. -0.226 Low
4 Home 9.71 4 Good Mid 5 Hari produktif, pekerjaan selesai tepat waktu dan energi masih stabil. 0.897 Low
5 Home 11.26 5 Excellent Mid 4 Progress ada, walau ritme kerja naik turun sepanjang hari. Fokus tambahan: Persistent high-level Graphical User Interface. 0.365 Medium
6 Office 9.44 3 Excellent Mid 4 Hari produktif, pekerjaan selesai tepat waktu dan energi masih stabil. 0.797 Low
7 Coworking 10.68 4 Good Senior 1 Koneksi dan tekanan deadline membuat hari ini cukup berat. -1.000 High
8 Home 8.12 1 Fair Junior 4 Ritme kerja nyaman, bisa selesai tanpa lembur berlebihan. 0.675 Low
9 Office 11.72 3 Fair Mid 4 Tugas selesai sebagian, perlu atur ulang prioritas untuk besok. 0.304 Low
10 Home 10.46 5 Good Senior 5 Hari produktif, pekerjaan selesai tepat waktu dan energi masih stabil. 0.612 Medium

Info data

str(df)
## 'data.frame':    30000 obs. of  10 variables:
##  $ Employee_ID         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Work_Location       : chr  "Home" "Office" "Home" "Home" ...
##  $ Avg_Working_Hours   : num  8.14 9.66 8.21 9.71 11.26 ...
##  $ Meeting_Intensity   : int  2 4 1 4 5 3 4 1 3 5 ...
##  $ Internet_Reliability: chr  "Fair" "Good" "Excellent" "Good" ...
##  $ Seniority_Level     : chr  "Junior" "Mid" "Senior" "Mid" ...
##  $ Work_Life_Balance   : int  4 2 2 5 4 4 1 4 4 5 ...
##  $ Daily_Mood_Note     : chr  "Fokus bagus hari ini, meeting berjalan efisien dan tugas utama tuntas." "Hari cukup padat, beberapa meeting memecah fokus tapi masih terkendali. Fokus tambahan: Organized leadingedge info-mediaries." "Progress ada, walau ritme kerja naik turun sepanjang hari." "Hari produktif, pekerjaan selesai tepat waktu dan energi masih stabil." ...
##  $ Sentiment_Score     : num  0.685 -0.463 -0.226 0.897 0.365 0.797 -1 0.675 0.304 0.612 ...
##  $ Burnout_Risk        : chr  "Low" "Medium" "Low" "Low" ...

Statistika Deskriptif

summary(df)
##   Employee_ID    Work_Location      Avg_Working_Hours Meeting_Intensity
##  Min.   :    1   Length:30000       Min.   : 6.000    Min.   : 1.000   
##  1st Qu.: 7501   Class :character   1st Qu.: 8.680    1st Qu.: 2.000   
##  Median :15000   Mode  :character   Median : 9.570    Median : 3.000   
##  Mean   :15000                      Mean   : 9.586    Mean   : 3.631   
##  3rd Qu.:22500                      3rd Qu.:10.470    3rd Qu.: 5.000   
##  Max.   :30000                      Max.   :13.000    Max.   :10.000   
##  Internet_Reliability Seniority_Level    Work_Life_Balance Daily_Mood_Note   
##  Length:30000         Length:30000       Min.   :1.00      Length:30000      
##  Class :character     Class :character   1st Qu.:3.00      Class :character  
##  Mode  :character     Mode  :character   Median :4.00      Mode  :character  
##                                          Mean   :3.61                        
##                                          3rd Qu.:4.00                        
##                                          Max.   :5.00                        
##  Sentiment_Score   Burnout_Risk      
##  Min.   :-1.0000   Length:30000      
##  1st Qu.:-0.2040   Class :character  
##  Median : 0.5125   Mode  :character  
##  Mean   : 0.3042                     
##  3rd Qu.: 0.8410                     
##  Max.   : 1.0000

Cek Missing Values

colSums(is.na(df))
##          Employee_ID        Work_Location    Avg_Working_Hours 
##                    0                    0                    0 
##    Meeting_Intensity Internet_Reliability      Seniority_Level 
##                    0                    0                    0 
##    Work_Life_Balance      Daily_Mood_Note      Sentiment_Score 
##                    0                    0                    0 
##         Burnout_Risk 
##                    0

Eksplorasi Data

Distribusi Variabel Numerik

df %>%
  select(Avg_Working_Hours, Meeting_Intensity, Sentiment_Score) %>%
  pivot_longer(cols = everything()) %>%
  ggplot(aes(x = value)) +
  geom_histogram(bins = 20, fill = "steelblue", color = "white") +
  facet_wrap(~name, scales = "free") +
  theme_minimal(base_size = 12) +
  labs(title = "Distribusi Variabel Numerik", x = NULL, y = "Frekuensi")

Distribusi Variabel Kategorik

df %>%
  select(Work_Location, Seniority_Level, Burnout_Risk) %>%
  pivot_longer(cols = everything()) %>%
  ggplot(aes(x = value)) +
  geom_bar(fill = "orange", color = "white") +
  facet_wrap(~name, scales = "free") +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Distribusi Variabel Kategorik", x = NULL, y = "Frekuensi")


Persiapan Data

Pilih Variabel yang digunakan

ordinaldf <- df[, c("Burnout_Risk", "Work_Location", "Seniority_Level",
                    "Avg_Working_Hours", "Meeting_Intensity", "Sentiment_Score")]

Konversi Y ke ordered factor

ordinaldf$Burnout_Risk <- factor(
  ordinaldf$Burnout_Risk,
  levels  = c("Low", "Medium", "High"),
  ordered = TRUE
)

Konversi variabel kategorik ke faktor

ordinaldf$Work_Location <- factor(
  ordinaldf$Work_Location,
  levels = c("Home", "Office", "Coworking")
)

ordinaldf$Seniority_Level <- factor(
  ordinaldf$Seniority_Level,
  levels = c("Junior", "Mid", "Senior")
)

cat("Dimensi data:", dim(ordinaldf), "\n")
## Dimensi data: 30000 6
head(ordinaldf)
Burnout_Risk Work_Location Seniority_Level Avg_Working_Hours Meeting_Intensity Sentiment_Score
Low Home Junior 8.14 2 0.685
Medium Office Mid 9.66 4 -0.463
Low Home Senior 8.21 1 -0.226
Low Home Mid 9.71 4 0.897
Medium Home Mid 11.26 5 0.365
Low Office Mid 9.44 3 0.797

Uji Asumsi

Asumsi 1: Variabel Dependen Bersifat Ordinal

cat("Class:", class(ordinaldf$Burnout_Risk), "\n")
## Class: ordered factor
cat("Is ordered:", is.ordered(ordinaldf$Burnout_Risk), "\n")
## Is ordered: TRUE
cat("Levels:", paste(levels(ordinaldf$Burnout_Risk), collapse = " < "), "\n\n")
## Levels: Low < Medium < High
tbl <- table(ordinaldf$Burnout_Risk)
data.frame(
  Kategori  = names(tbl),
  Frekuensi = as.integer(tbl),
  Proporsi  = paste0(round(prop.table(tbl) * 100, 2), "%")
)
Kategori Frekuensi Proporsi
Low 15584 51.95%
Medium 7581 25.27%
High 6835 22.78%

Terpenuhi Burnout_Risk adalah ordered factor dengan urutan Low < Medium < High.

Asumsi 2: Independensi Observasi

cat("Jumlah baris     :", nrow(ordinaldf), "\n")
## Jumlah baris     : 30000
cat("Jumlah ID unik   :", length(unique(df$Employee_ID)), "\n")
## Jumlah ID unik   : 30000
cat("Independen?      :", nrow(ordinaldf) == length(unique(df$Employee_ID)), "\n")
## Independen?      : TRUE

Terpenuhi — Setiap baris merepresentasikan satu karyawan unik.

Asumsi 3: Tidak Ada Multikolinearitas

Matriks Korelasi variabel numerik

round(cor(ordinaldf[, c("Avg_Working_Hours", "Meeting_Intensity", "Sentiment_Score")]), 4)
##                   Avg_Working_Hours Meeting_Intensity Sentiment_Score
## Avg_Working_Hours            1.0000             0.520         -0.3115
## Meeting_Intensity            0.5200             1.000         -0.1740
## Sentiment_Score             -0.3115            -0.174          1.0000

VIF

ordinaldf$Y_num <- as.numeric(ordinaldf$Burnout_Risk)

model_ols <- lm(
  Y_num ~ Work_Location + Seniority_Level +
          Avg_Working_Hours + Meeting_Intensity +
          Sentiment_Score,
  data = ordinaldf
)

round(vif(model_ols), 3)
##                    GVIF Df GVIF^(1/(2*Df))
## Work_Location     1.031  2           1.008
## Seniority_Level   1.324  2           1.073
## Avg_Working_Hours 1.596  1           1.263
## Meeting_Intensity 1.494  1           1.222
## Sentiment_Score   1.115  1           1.056
ordinaldf$Y_num <- NULL

Terpenuhi — Semua |r| < 0.8 dan VIF < 10.

Asumsi 4: Tidak Ada Outlier Ekstrem

detect_outlier_iqr <- function(x, varname) {
  Q1    <- quantile(x, 0.25)
  Q3    <- quantile(x, 0.75)
  IQR_v <- Q3 - Q1
  lo    <- Q1 - 1.5 * IQR_v
  hi    <- Q3 + 1.5 * IQR_v
  idx   <- x < lo | x > hi
  data.frame(
    Variabel      = varname,
    Q1            = round(Q1, 3),
    Q3            = round(Q3, 3),
    Batas_Bawah   = round(lo, 3),
    Batas_Atas    = round(hi, 3),
    N_Outlier     = sum(idx),
    Pct_Outlier   = paste0(round(mean(idx) * 100, 2), "%")
  )
}

do.call(rbind, list(
  detect_outlier_iqr(ordinaldf$Avg_Working_Hours, "Avg_Working_Hours"),
  detect_outlier_iqr(ordinaldf$Meeting_Intensity, "Meeting_Intensity"),
  detect_outlier_iqr(ordinaldf$Sentiment_Score,   "Sentiment_Score")
))
Variabel Q1 Q3 Batas_Bawah Batas_Atas N_Outlier Pct_Outlier
25% Avg_Working_Hours 8.680 10.470 5.995 13.155 0 0%
25%1 Meeting_Intensity 2.000 5.000 -2.500 9.500 75 0.25%
25%2 Sentiment_Score -0.204 0.841 -1.771 2.409 0 0%
par(mfrow = c(1, 3))
boxplot(ordinaldf$Avg_Working_Hours, main = "Avg Working Hours",
        col = "lightblue",   ylab = "Jam")
boxplot(ordinaldf$Meeting_Intensity, main = "Meeting Intensity",
        col = "lightyellow", ylab = "Skor")
boxplot(ordinaldf$Sentiment_Score,   main = "Sentiment Score",
        col = "lightgreen",  ylab = "Skor")

par(mfrow = c(1, 1))
ordinaldf$Y_num <- as.numeric(ordinaldf$Burnout_Risk)

model_ols2 <- lm(
  Y_num ~ Work_Location + Seniority_Level +
          Avg_Working_Hours + Meeting_Intensity +
          Sentiment_Score,
  data = ordinaldf
)
n         <- nrow(ordinaldf)
cooksd    <- cooks.distance(model_ols2)
threshold <- 4 / n

data.frame(
  Threshold       = round(threshold, 6),
  N_Berpengaruh   = sum(cooksd > threshold, na.rm = TRUE),
  Pct_Berpengaruh = paste0(round(mean(cooksd > threshold, na.rm = TRUE) * 100, 2), "%")
)
Threshold N_Berpengaruh Pct_Berpengaruh
0.000133 1567 5.22%
ordinaldf$Y_num <- NULL

Terpenuhi — Outlier IQR wajar.


Estimasi Model

Fit Model Ordinal Logistic Regression

model_polr <- polr(
  Burnout_Risk ~ Work_Location + Seniority_Level +
                 Avg_Working_Hours + Meeting_Intensity +
                 Sentiment_Score,
  data   = ordinaldf,
  method = "logistic",
  Hess   = TRUE
)

Summary Model

summary(model_polr)
## Call:
## polr(formula = Burnout_Risk ~ Work_Location + Seniority_Level + 
##     Avg_Working_Hours + Meeting_Intensity + Sentiment_Score, 
##     data = ordinaldf, Hess = TRUE, method = "logistic")
## 
## Coefficients:
##                            Value Std. Error  t value
## Work_LocationOffice    -0.006156   0.030520  -0.2017
## Work_LocationCoworking -0.006986   0.034416  -0.2030
## Seniority_LevelMid      0.032234   0.030816   1.0460
## Seniority_LevelSenior   0.071556   0.038569   1.8553
## Avg_Working_Hours       0.630927   0.013819  45.6564
## Meeting_Intensity       0.160553   0.007213  22.2595
## Sentiment_Score        -2.298132   0.024485 -93.8588
## 
## Intercepts:
##             Value    Std. Error t value 
## Low|Medium    5.8553   0.1258    46.5411
## Medium|High   7.9673   0.1309    60.8603
## 
## Residual Deviance: 41910.35 
## AIC: 41928.35
model_null <- polr(
  Burnout_Risk ~ 1,
  data   = ordinaldf,
  method = "logistic",
  Hess   = TRUE
)

Uji Serentak — Likelihood Ratio Test

H₀: Semua koefisien prediktor = 0
H₁: Minimal satu koefisien ≠ 0
Tolak H₀ jika p-value < 0.05

LL_full <- as.numeric(logLik(model_polr))
LL_null <- as.numeric(logLik(model_null))
G2      <- -2 * (LL_null - LL_full)
df_lrt  <- length(coef(model_polr))
p_lrt   <- pchisq(G2, df = df_lrt, lower.tail = FALSE)

data.frame(
  Statistik      = round(G2, 4),
  df             = df_lrt,
  p_value        = format(p_lrt, scientific = TRUE, digits = 4),
  Keputusan      = ifelse(p_lrt < 0.05, "TOLAK H0", "GAGAL TOLAK H0"),
  Kesimpulan     = ifelse(p_lrt < 0.05,
    "Model signifikan secara serentak",
    "Model tidak signifikan secara serentak")
)
Statistik df p_value Keputusan Kesimpulan
19579.19 7 0e+00 TOLAK H0 Model signifikan secara serentak

Uji Parsial — Wald Test

H₀: Koefisien variabel ke-j = 0
H₁: Koefisien variabel ke-j ≠ 0
Tolak H₀ jika p-value < 0.05

coef_tbl <- coef(summary(model_polr))
z_vals   <- coef_tbl[, "t value"]
p_vals   <- 2 * pnorm(abs(z_vals), lower.tail = FALSE)

wald_result <- data.frame(
  Variabel  = rownames(coef_tbl),
  Koefisien = round(coef_tbl[, "Value"],      4),
  Std_Error = round(coef_tbl[, "Std. Error"], 4),
  z_value   = round(z_vals, 4),
  p_value   = round(p_vals, 6),
  Sig       = case_when(
    p_vals < 0.001 ~ "***",
    p_vals < 0.01  ~ "**",
    p_vals < 0.05  ~ "*",
    p_vals < 0.1   ~ ".",
    TRUE           ~ "ns"
  )
)
wald_result
Variabel Koefisien Std_Error z_value p_value Sig
Work_LocationOffice Work_LocationOffice -0.0062 0.0305 -0.2017 0.840135 ns
Work_LocationCoworking Work_LocationCoworking -0.0070 0.0344 -0.2030 0.839158 ns
Seniority_LevelMid Seniority_LevelMid 0.0322 0.0308 1.0460 0.295561 ns
Seniority_LevelSenior Seniority_LevelSenior 0.0716 0.0386 1.8553 0.063555 .
Avg_Working_Hours Avg_Working_Hours 0.6309 0.0138 45.6564 0.000000 ***
Meeting_Intensity Meeting_Intensity 0.1606 0.0072 22.2595 0.000000 ***
Sentiment_Score Sentiment_Score -2.2981 0.0245 -93.8588 0.000000 ***
Low|Medium Low|Medium 5.8553 0.1258 46.5411 0.000000 ***
Medium|High Medium|High 7.9673 0.1309 60.8603 0.000000 ***

Goodness of Fit

McFadden R² Interpretasi
0.00 – 0.10 Lemah
0.10 – 0.20 Cukup
0.20 – 0.40 Baik
> 0.40 Sangat Baik
n           <- nrow(ordinaldf)
r2_mcfadden <- 1 - (LL_full / LL_null)
r2_cox      <- 1 - exp((2 / n) * (LL_null - LL_full))
r2_nag      <- r2_cox / (1 - exp((2 / n) * LL_null))

data.frame(
  Metrik = c("McFadden Pseudo R²", "Cox & Snell R²", "Nagelkerke R²",
             "AIC Full Model", "AIC Null Model", "Residual Deviance"),
  Nilai  = round(c(r2_mcfadden, r2_cox, r2_nag,
                   AIC(model_polr), AIC(model_null),
                   model_polr$deviance), 4)
)
Metrik Nilai
McFadden Pseudo R² 0.3184
Cox & Snell R² 0.4793
Nagelkerke R² 0.5502
AIC Full Model 41928.3535
AIC Null Model 61493.5434
Residual Deviance 41910.3535

Prediksi

Prediksi Kelas & Probabilitas

ordinaldf$pred_class <- predict(model_polr, newdata = ordinaldf, type = "class")

head(data.frame(
  Aktual   = ordinaldf$Burnout_Risk,
  Prediksi = ordinaldf$pred_class
), 10)
Aktual Prediksi
Low Low
Medium High
Low Low
Low Low
Medium Medium
Low Low
High High
Low Low
Low Medium
Medium Low
prob_pred <- predict(model_polr, newdata = ordinaldf, type = "probs")
head(round(prob_pred, 4), 10)
##       Low Medium   High
## 1  0.8779 0.1056 0.0166
## 2  0.1222 0.4128 0.4650
## 3  0.4810 0.4035 0.1155
## 4  0.7532 0.2086 0.0381
## 5  0.2235 0.4806 0.2959
## 6  0.7726 0.1930 0.0344
## 7  0.0201 0.1247 0.8552
## 8  0.8931 0.0926 0.0143
## 9  0.2061 0.4760 0.3178
## 10 0.4472 0.4227 0.1301

Evaluasi Model

Confusion Matrix

cm <- table(Aktual = ordinaldf$Burnout_Risk, Prediksi = ordinaldf$pred_class)
print(cm)
##         Prediksi
## Aktual     Low Medium  High
##   Low    13532   1807   245
##   Medium  4078   2134  1369
##   High     662   1123  5050

Akurasi & Metrik Per-Kelas

akurasi <- sum(diag(cm)) / sum(cm)
cat(sprintf("Akurasi Model: %.2f%%\n\n", akurasi * 100))
## Akurasi Model: 69.05%
metrics <- do.call(rbind, lapply(rownames(cm), function(kelas) {
  tp   <- cm[kelas, kelas]
  fp   <- sum(cm[, kelas]) - tp
  fn   <- sum(cm[kelas, ]) - tp
  prec <- ifelse((tp + fp) > 0, tp / (tp + fp), NA)
  rec  <- ifelse((tp + fn) > 0, tp / (tp + fn), NA)
  data.frame(Kelas = kelas, Precision = round(prec, 3), Recall = round(rec, 3))
}))
print(metrics)
##    Kelas Precision Recall
## 1    Low     0.741  0.868
## 2 Medium     0.421  0.281
## 3   High     0.758  0.739

Visualisasi Confusion Matrix

cm_df <- as.data.frame(cm)
ggplot(cm_df, aes(x = Prediksi, y = Aktual, fill = Freq)) +
  geom_tile(color = "white") +
  geom_text(aes(label = Freq), size = 7, fontface = "bold") +
  scale_fill_gradient(low = "#f0f7ff", high = "#2171b5") +
  theme_minimal(base_size = 13) +
  labs(title = "Confusion Matrix — Ordinal Logistic Regression",
       x = "Prediksi", y = "Aktual", fill = "Jumlah")


Interpretasi

Odds Ratio

or_vals <- exp(coef(model_polr))
se_vals <- sqrt(diag(vcov(model_polr))[names(coef(model_polr))])

or_tbl <- data.frame(
  Variabel   = names(or_vals),
  OR         = round(or_vals, 4),
  CI_Lower   = round(exp(coef(model_polr) - 1.96 * se_vals), 4),
  CI_Upper   = round(exp(coef(model_polr) + 1.96 * se_vals), 4)
)
or_tbl
Variabel OR CI_Lower CI_Upper
Work_LocationOffice Work_LocationOffice 0.9939 0.9362 1.0551
Work_LocationCoworking Work_LocationCoworking 0.9930 0.9283 1.0623
Seniority_LevelMid Seniority_LevelMid 1.0328 0.9722 1.0971
Seniority_LevelSenior Seniority_LevelSenior 1.0742 0.9960 1.1585
Avg_Working_Hours Avg_Working_Hours 1.8794 1.8291 1.9309
Meeting_Intensity Meeting_Intensity 1.1742 1.1577 1.1909
Sentiment_Score Sentiment_Score 0.1004 0.0957 0.1054

Forest Plot Odds Ratio

ggplot(or_tbl, aes(x = OR, y = reorder(Variabel, OR))) +
  geom_point(size = 3.5, color = "steelblue") +
  geom_errorbarh(aes(xmin = CI_Lower, xmax = CI_Upper), height = 0.25, color = "steelblue") +
  geom_vline(xintercept = 1, linetype = "dashed", color = "red", linewidth = 0.8) +
  theme_minimal(base_size = 12) +
  labs(title = "Odds Ratio dengan 95% Confidence Interval",
       subtitle = "Garis merah = OR 1 (tidak ada efek)",
       x = "Odds Ratio", y = "Variabel")

Pengaruh Avg Working Hours terhadap Probabilitas Burnout Risk

# Buat grid nilai Avg_Working_Hours
grid <- data.frame(
  Avg_Working_Hours = seq(
    min(ordinaldf$Avg_Working_Hours),
    max(ordinaldf$Avg_Working_Hours),
    length.out = 100
  ),
  Meeting_Intensity = mean(ordinaldf$Meeting_Intensity),
  Sentiment_Score   = mean(ordinaldf$Sentiment_Score),
  Work_Location     = factor("Office", levels = c("Home", "Office", "Coworking")),
  Seniority_Level   = factor("Mid", levels = c("Junior", "Mid", "Senior"))
)

# Prediksi probabilitas
prob <- predict(model_polr, newdata = grid, type = "probs")

# Gabungkan ke data
prob_df <- cbind(grid, prob)

# Ubah ke long format
prob_long <- prob_df %>%
  pivot_longer(cols = c("Low", "Medium", "High"),
               names_to = "Kategori",
               values_to = "Probabilitas")

# Plot
ggplot(prob_long, aes(x = Avg_Working_Hours, y = Probabilitas, color = Kategori)) +
  geom_line(size = 1.2) +
  theme_minimal(base_size = 13) +
  labs(
    title = "Pengaruh Jam Kerja terhadap Probabilitas Burnout",
    x = "Avg Working Hours",
    y = "Probabilitas",
    color = "Kategori"
  )