Pendahuluan

Latar Belakang

Keberhasilan studi mahasiswa merupakan indikator kualitas pendidikan tinggi yang krusial. Memahami faktor-faktor yang memengaruhi apakah seorang mahasiswa akan lulus (Graduate), masih aktif (Enrolled), atau keluar sebelum selesai (Dropout) sangat penting bagi institusi pendidikan untuk merancang intervensi yang tepat sasaran.

Analisis Multinomial Logistic Regression (MLR) digunakan karena variabel dependen (Status Akademik) memiliki lebih dari dua kategori yang bersifat nominal. Berbeda dengan regresi logistik biner, MLR memodelkan probabilitas setiap kategori secara simultan dengan menetapkan satu kategori sebagai referensi (base category), dalam hal ini Graduate.

Deskripsi Dataset

Dataset yang digunakan adalah Student Academic Performance dari Polytechnic Institute of Portalegre, Portugal. Dataset ini mencakup informasi demografis, akademik, dan ekonomi mahasiswa pada saat pendaftaran maupun selama studi.

Variabel dependen: Target (Graduate / Enrolled / Dropout)

Variabel prediktor mencakup:

  • Demografis: Status pernikahan, usia saat mendaftar, jenis kelamin, kebangsaan, status internasional
  • Akademik: Nilai kualifikasi sebelumnya, nilai penerimaan, unit kurikuler semester 1 & 2 (kredit, terdaftar, evaluasi, lulus, nilai)
  • Sosial-ekonomi: Status debitur, beasiswa, biaya kuliah terkini, tingkat pengangguran, inflasi, GDP
  • Latar belakang keluarga: Kualifikasi dan pekerjaan ayah/ibu

Tujuan Analisis

  1. Melakukan eksplorasi data untuk memahami distribusi variabel dan hubungan antar variabel.
  2. Menguji asumsi-asumsi yang diperlukan sebelum pemodelan MLR.
  3. Membangun dan mengevaluasi model Multinomial Logistic Regression.
  4. Menginterpretasikan koefisien, odds ratio, dan pengaruh marjinal (AME) secara komprehensif.
  5. Mengevaluasi performa model pada data testing menggunakan confusion matrix dan metrik evaluasi.

Persiapan

Instalasi dan Memuat Package

# Perpanjang batas waktu unduh
options(timeout = 300)

# Fungsi instalasi otomatis jika package belum tersedia
install_if_missing <- function(pkg) {
  if (!require(pkg, character.only = TRUE, quietly = TRUE)) {
    install.packages(pkg, repos = "https://cloud.r-project.org", dependencies = TRUE)
    library(pkg, character.only = TRUE)
  }
}

packages_needed <- c(
  "nnet", "car", "caret", "ggplot2", "dplyr", "tidyr",
  "knitr", "kableExtra", "gridExtra", "RColorBrewer",
  "marginaleffects", "broom", "scales", "ggcorrplot", "reshape2"
)

invisible(sapply(packages_needed, install_if_missing))

suppressPackageStartupMessages({
  library(nnet)             # multinom() - Multinomial Logistic Regression
  library(car)              # vif() - Variance Inflation Factor
  library(caret)            # confusionMatrix()
  library(ggplot2)          # visualisasi
  library(dplyr)            # manipulasi data
  library(tidyr)            # reshape data
  library(knitr)            # kable()
  library(kableExtra)       # kable styling
  library(gridExtra)        # grid.arrange()
  library(RColorBrewer)     # palet warna
  library(marginaleffects)  # avg_slopes() - Average Marginal Effects
  library(broom)            # tidy() - tidy output model
  library(scales)           # percent_format()
  library(ggcorrplot)       # korelasi heatmap
  library(reshape2)         # melt()
})

Memuat dan Mempersiapkan Data

# Membaca data (pastikan file data.csv berada di working directory)
df_raw <- read.csv("data.csv", sep = ";", header = TRUE,
                   stringsAsFactors = FALSE, check.names = FALSE)

# Bersihkan nama kolom dari spasi berlebih
colnames(df_raw) <- trimws(colnames(df_raw))

# Tampilkan dimensi dan distribusi awal
cat("Dimensi Data:", nrow(df_raw), "baris x", ncol(df_raw), "kolom\n")
## Dimensi Data: 4424 baris x 37 kolom
cat("\nDistribusi Target (sebelum preprocessing):\n")
## 
## Distribusi Target (sebelum preprocessing):
print(table(df_raw$Target))
## 
##  Dropout Enrolled Graduate 
##     1421      794     2209
# Ubah variabel Target menjadi faktor dengan referensi "Graduate"
df_raw$Target <- factor(df_raw$Target,
                        levels = c("Graduate", "Enrolled", "Dropout"))

# Konversi semua kolom prediktor ke numerik
predictor_cols <- setdiff(colnames(df_raw), "Target")
df <- df_raw
for (col in predictor_cols) {
  df[[col]] <- as.numeric(df[[col]])
}

# Cek missing value
cat("\nJumlah Missing Value per kolom:\n")
## 
## Jumlah Missing Value per kolom:
mv <- colSums(is.na(df))
print(mv[mv > 0])
## named numeric(0)
cat("Total missing value:", sum(is.na(df)), "\n")
## Total missing value: 0
# Hapus baris dengan missing value
df <- na.omit(df)
cat("Jumlah data setelah hapus NA:", nrow(df), "\n")
## Jumlah data setelah hapus NA: 4424

Interpretasi: Dataset memuat 4424 mahasiswa dengan 37 variabel. Distribusi awal menunjukkan komposisi tiga kelas target: Graduate, Enrolled, dan Dropout. Setelah pemeriksaan dan penghapusan missing value (jika ada), data siap untuk analisis. Kategori referensi ditetapkan sebagai “Graduate”, artinya seluruh koefisien model akan menginterpretasikan peluang relatif Enrolled maupun Dropout dibandingkan dengan Graduate.


Eksplorasi Data (EDA)

Statistika Deskriptif

# Daftar variabel numerik kontinu
numeric_vars <- c("Previous qualification (grade)", "Admission grade",
                  "Age at enrollment",
                  "Curricular units 1st sem (credited)",
                  "Curricular units 1st sem (enrolled)",
                  "Curricular units 1st sem (evaluations)",
                  "Curricular units 1st sem (approved)",
                  "Curricular units 1st sem (grade)",
                  "Curricular units 1st sem (without evaluations)",
                  "Curricular units 2nd sem (credited)",
                  "Curricular units 2nd sem (enrolled)",
                  "Curricular units 2nd sem (evaluations)",
                  "Curricular units 2nd sem (approved)",
                  "Curricular units 2nd sem (grade)",
                  "Curricular units 2nd sem (without evaluations)",
                  "Unemployment rate", "Inflation rate", "GDP")

desc_stats <- data.frame(
  Variabel = numeric_vars,
  Min      = round(sapply(df[, numeric_vars], min,    na.rm = TRUE), 2),
  Q1       = round(sapply(df[, numeric_vars], quantile, 0.25, na.rm = TRUE), 2),
  Median   = round(sapply(df[, numeric_vars], median,  na.rm = TRUE), 2),
  Mean     = round(sapply(df[, numeric_vars], mean,    na.rm = TRUE), 2),
  Q3       = round(sapply(df[, numeric_vars], quantile, 0.75, na.rm = TRUE), 2),
  Max      = round(sapply(df[, numeric_vars], max,     na.rm = TRUE), 2),
  SD       = round(sapply(df[, numeric_vars], sd,      na.rm = TRUE), 3)
)
rownames(desc_stats) <- NULL

kable(desc_stats,
      caption = "Tabel 1. Statistika Deskriptif Variabel Numerik",
      align   = c("l", rep("r", 7))) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 12) %>%
  column_spec(1, bold = TRUE)
Tabel 1. Statistika Deskriptif Variabel Numerik
Variabel Min Q1 Median Mean Q3 Max SD
Previous qualification (grade) 95.00 125.00 133.10 132.61 140.00 190.00 13.188
Admission grade 95.00 117.90 126.10 126.98 134.80 190.00 14.482
Age at enrollment 17.00 19.00 20.00 23.27 25.00 70.00 7.588
Curricular units 1st sem (credited) 0.00 0.00 0.00 0.71 0.00 20.00 2.361
Curricular units 1st sem (enrolled) 0.00 5.00 6.00 6.27 7.00 26.00 2.480
Curricular units 1st sem (evaluations) 0.00 6.00 8.00 8.30 10.00 45.00 4.179
Curricular units 1st sem (approved) 0.00 3.00 5.00 4.71 6.00 26.00 3.094
Curricular units 1st sem (grade) 0.00 11.00 12.29 10.64 13.40 18.88 4.844
Curricular units 1st sem (without evaluations) 0.00 0.00 0.00 0.14 0.00 12.00 0.691
Curricular units 2nd sem (credited) 0.00 0.00 0.00 0.54 0.00 19.00 1.919
Curricular units 2nd sem (enrolled) 0.00 5.00 6.00 6.23 7.00 23.00 2.196
Curricular units 2nd sem (evaluations) 0.00 6.00 8.00 8.06 10.00 33.00 3.948
Curricular units 2nd sem (approved) 0.00 2.00 5.00 4.44 6.00 20.00 3.015
Curricular units 2nd sem (grade) 0.00 10.75 12.20 10.23 13.33 18.57 5.211
Curricular units 2nd sem (without evaluations) 0.00 0.00 0.00 0.15 0.00 12.00 0.754
Unemployment rate 7.60 9.40 11.10 11.57 13.90 16.20 2.664
Inflation rate -0.80 0.30 1.40 1.23 2.60 3.70 1.383
GDP -4.06 -1.70 0.32 0.00 1.79 3.51 2.270

Interpretasi: Tabel 1 menampilkan ringkasan distribusi delapan belas variabel numerik. Beberapa temuan penting:

  • Previous qualification (grade) dan Admission grade memiliki rentang nilai yang cukup lebar (dari sekitar 95 hingga 190), menandakan variasi akademik awal yang besar antar mahasiswa.
  • Age at enrollment menunjukkan median yang relatif muda (sekitar 20 tahun) namun nilai maksimum yang jauh lebih tinggi, mengindikasikan distribusi right-skewed dengan adanya mahasiswa berusia lebih tua.
  • Unit kurikuler yang disetujui (approved) pada semester 1 dan 2 kemungkinan besar merupakan prediktor paling kuat mengingat perbedaan nilainya erat kaitannya dengan keberhasilan studi.
  • Variabel makroekonomi seperti Unemployment rate, Inflation rate, dan GDP mencerminkan kondisi eksternal yang dapat memengaruhi keputusan mahasiswa untuk melanjutkan studi.
  • Variabel SD yang besar relatif terhadap Mean pada beberapa kolom menandakan heterogenitas data yang tinggi.

Distribusi Status Akademik (Target)

target_dist <- as.data.frame(table(Status = df$Target)) %>%
  mutate(Persentase = round(Freq / sum(Freq) * 100, 2),
         Label = paste0(Status, "\n(", Freq, " | ", Persentase, "%)"))

p_target <- ggplot(target_dist, aes(x = "", y = Freq, fill = Status)) +
  geom_bar(stat = "identity", width = 1, color = "white", linewidth = 0.5) +
  coord_polar("y", start = 0) +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  geom_text(aes(label = Label),
            position = position_stack(vjust = 0.5),
            size = 3.5, fontface = "bold", color = "white") +
  labs(title    = "Gambar 1. Distribusi Status Akademik Mahasiswa",
       subtitle = paste("Total:", nrow(df), "mahasiswa"),
       fill     = "Status") +
  theme_void(base_size = 12) +
  theme(legend.position  = "bottom",
        plot.title       = element_text(face = "bold", hjust = 0.5),
        plot.subtitle    = element_text(hjust = 0.5, color = "gray50"))
print(p_target)
Gambar 1. Distribusi Status Akademik Mahasiswa

Gambar 1. Distribusi Status Akademik Mahasiswa

Interpretasi: Gambar 1 menampilkan diagram lingkaran distribusi ketiga kategori status akademik. Dapat dilihat bahwa kelompok Graduate mendominasi dataset, diikuti oleh Dropout dan Enrolled. Ketidakseimbangan kelas (class imbalance) ini merupakan kondisi yang umum dalam data pendidikan dan perlu diperhatikan saat mengevaluasi performa model, khususnya untuk kategori minoritas. Kategori Graduate sebagai referensi dipilih karena merepresentasikan kondisi “ideal” yang menjadi acuan perbandingan.

Distribusi Usia per Status Akademik

p_age <- ggplot(df, aes(x = Target, y = `Age at enrollment`, fill = Target)) +
  geom_boxplot(alpha = 0.75, outlier.shape = 21, outlier.size = 1.5) +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  labs(title = "Gambar 2. Distribusi Usia Saat Mendaftar per Status Akademik",
       x = "Status Akademik", y = "Usia Saat Mendaftar") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))
print(p_age)
Gambar 2. Distribusi Usia Saat Mendaftar per Status Akademik

Gambar 2. Distribusi Usia Saat Mendaftar per Status Akademik

Interpretasi: Gambar 2 menunjukkan distribusi usia saat pendaftaran berdasarkan status akademik akhir. Mahasiswa yang Dropout cenderung memiliki median usia yang lebih tinggi dibandingkan Graduate, yang mengindikasikan bahwa mahasiswa yang mendaftar lebih tua mungkin menghadapi lebih banyak tantangan (misalnya, tanggung jawab keluarga atau pekerjaan) yang memengaruhi kelangsungan studi mereka. Sementara itu, mahasiswa Graduate dan Enrolled memiliki distribusi usia yang lebih muda dan relatif serupa. Adanya outlier pada semua kategori (titik di luar whisker) menunjukkan keberadaan mahasiswa dengan usia jauh di atas rata-rata.

Distribusi Unit Kurikuler yang Disetujui

df_long_sem <- df %>%
  select(Target,
         `Sem1 Approved` = `Curricular units 1st sem (approved)`,
         `Sem2 Approved` = `Curricular units 2nd sem (approved)`) %>%
  pivot_longer(cols = c(`Sem1 Approved`, `Sem2 Approved`),
               names_to = "Semester", values_to = "Approved")

p_approved <- ggplot(df_long_sem, aes(x = Target, y = Approved, fill = Target)) +
  geom_boxplot(alpha = 0.75, outlier.shape = 21, outlier.size = 1) +
  facet_wrap(~ Semester) +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  labs(title = "Gambar 3. Unit Kurikuler Lulus per Status Akademik",
       x = "Status Akademik", y = "Jumlah Unit Kurikuler Disetujui") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))
print(p_approved)
Gambar 3. Distribusi Unit Kurikuler Lulus per Status Akademik

Gambar 3. Distribusi Unit Kurikuler Lulus per Status Akademik

Interpretasi: Gambar 3 mengungkap perbedaan yang sangat jelas pada jumlah unit kurikuler yang berhasil dilulus (approved) antara ketiga kelompok. Mahasiswa Graduate secara konsisten memiliki jumlah unit kurikuler yang disetujui lebih tinggi di kedua semester, sedangkan mahasiswa Dropout memiliki nilai yang jauh lebih rendah, bahkan mendekati nol pada banyak kasus. Hal ini mengindikasikan bahwa performa akademik semester awal merupakan prediktor yang sangat kuat untuk status akhir mahasiswa. Mahasiswa Enrolled berada di antara keduanya, mencerminkan bahwa mereka masih dalam proses studi dengan tingkat kelulusan matakuliah yang moderat.

Heatmap Korelasi Variabel Numerik

cor_matrix <- cor(df[, numeric_vars], use = "complete.obs")

p_cor <- ggcorrplot(cor_matrix,
                    method     = "square",
                    type       = "lower",
                    lab        = FALSE,
                    colors     = c("#F44336", "white", "#2196F3"),
                    title      = "Gambar 4. Heatmap Korelasi Antar Variabel Numerik",
                    ggtheme    = theme_minimal(base_size = 9)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
        axis.text.y = element_text(size = 7),
        plot.title  = element_text(face = "bold", hjust = 0.5))
print(p_cor)
Gambar 4. Heatmap Korelasi Antar Variabel Numerik

Gambar 4. Heatmap Korelasi Antar Variabel Numerik

Interpretasi: Gambar 4 menampilkan matriks korelasi antar variabel numerik. Warna biru tua menunjukkan korelasi positif kuat, sedangkan merah tua menunjukkan korelasi negatif kuat; putih menandakan korelasi mendekati nol. Beberapa pola yang menonjol:

  • Korelasi tinggi terdeteksi antara variabel unit kurikuler semester 1 dan semester 2 (misalnya, enrolled, approved, evaluations), yang wajar karena performa akademik antar semester cenderung konsisten.
  • Variabel makroekonomi (Unemployment rate, Inflation rate, GDP) juga berkorelasi satu sama lain, yang mencerminkan kondisi ekonomi yang bergerak bersama.
  • Adanya korelasi tinggi antar prediktor ini menjadi dasar perlunya uji multikolinearitas (VIF) pada tahap berikutnya untuk memastikan kestabilan estimasi model.

Pengujian Asumsi

Uji Multikolinearitas (VIF)

# Hitung VIF menggunakan model linear sebagai proxy
formula_all <- as.formula(
  paste("as.numeric(Target) ~",
        paste0("`", predictor_cols, "`", collapse = " + "))
)

lm_temp  <- lm(formula_all, data = df)
vif_vals <- vif(lm_temp)

vif_df <- data.frame(
  Variabel   = names(vif_vals),
  VIF        = round(vif_vals, 3),
  Keterangan = ifelse(vif_vals >= 10, "⚠ Multikolinear Tinggi",
                      ifelse(vif_vals >= 5, "⚡ Perlu Perhatian", "✓ OK"))
)
rownames(vif_df) <- NULL

kable(vif_df,
      caption = "Tabel 2. Nilai VIF Seluruh Variabel Prediktor",
      align   = c("l", "r", "l")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 12) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(which(vif_df$VIF >= 10), background = "#ffe0e0") %>%
  row_spec(which(vif_df$VIF >= 5 & vif_df$VIF < 10), background = "#fff3cd")
Tabel 2. Nilai VIF Seluruh Variabel Prediktor
Variabel VIF Keterangan
Marital status 1.426 ✓ OK
Application mode 1.816 ✓ OK
Application order 1.252 ✓ OK
Course 2.240 ✓ OK
Daytime/evening attendance 1.376 ✓ OK
Previous qualification 1.350 ✓ OK
Previous qualification (grade) 1.571 ✓ OK
Nacionality 2.695 ✓ OK
Mother's qualification 1.539 ✓ OK
Father's qualification 1.454 ✓ OK
Mother's occupation 5.979 ⚡ Perlu Perhatian |
Father's occupation 5.971 ⚡ Perlu Perhatian |
Admission grade 1.628 ✓ OK
Displaced 1.315 ✓ OK
Educational special needs 1.009 ✓ OK
Debtor 1.257 ✓ OK
Tuition fees up to date 1.351 ✓ OK
Gender 1.155 ✓ OK
Scholarship holder 1.175 ✓ OK
Age at enrollment 2.302 ✓ OK
International 2.714 ✓ OK
Curricular units 1st sem (credited) 16.224 ⚠ Multikolinear Tinggi
Curricular units 1st sem (enrolled) 24.483 ⚠ Multikolinear Tinggi
Curricular units 1st sem (evaluations) 4.014 ✓ OK
Curricular units 1st sem (approved) 13.104 ⚠ Multikolinear Tinggi
Curricular units 1st sem (grade) 5.197 ⚡ Perlu Perhatian |
Curricular units 1st sem (without evaluations) 1.713 ✓ OK
Curricular units 2nd sem (credited) 12.592 ⚠ Multikolinear Tinggi
Curricular units 2nd sem (enrolled) 17.207 ⚠ Multikolinear Tinggi
Curricular units 2nd sem (evaluations) 3.386 ✓ OK
Curricular units 2nd sem (approved) 10.737 ⚠ Multikolinear Tinggi
Curricular units 2nd sem (grade) 5.778 ⚡ Perlu Perhatian |
Curricular units 2nd sem (without evaluations) 1.587 ✓ OK
Unemployment rate 1.292 ✓ OK
Inflation rate 1.044 ✓ OK
GDP 1.292 ✓ OK
# Identifikasi variabel bermasalah
high_vif <- vif_df$Variabel[vif_df$VIF >= 10]
cat("\nVariabel dengan VIF >= 10:",
    if (length(high_vif) == 0) "Tidak ada" else paste(high_vif, collapse = ", "), "\n")
## 
## Variabel dengan VIF >= 10: `Curricular units 1st sem (credited)`, `Curricular units 1st sem (enrolled)`, `Curricular units 1st sem (approved)`, `Curricular units 2nd sem (credited)`, `Curricular units 2nd sem (enrolled)`, `Curricular units 2nd sem (approved)`
# Plot VIF
vif_plot_df <- vif_df %>%
  arrange(VIF) %>%
  mutate(Variabel = factor(Variabel, levels = Variabel),
         Warna    = case_when(VIF >= 10 ~ "Tinggi",
                              VIF >=  5 ~ "Sedang",
                              TRUE      ~ "Rendah"))

p_vif <- ggplot(vif_plot_df, aes(x = Variabel, y = VIF, fill = Warna)) +
  geom_bar(stat = "identity", alpha = 0.85) +
  geom_hline(yintercept = 5,  linetype = "dashed", color = "orange", linewidth = 0.8) +
  geom_hline(yintercept = 10, linetype = "dashed", color = "red",    linewidth = 0.8) +
  coord_flip() +
  scale_fill_manual(values = c("Rendah" = "#4CAF50",
                               "Sedang" = "#FF9800",
                               "Tinggi" = "#F44336")) +
  annotate("text", x = 1, y = 5.5,  label = "VIF = 5",  color = "orange", size = 3) +
  annotate("text", x = 1, y = 10.5, label = "VIF = 10", color = "red",    size = 3) +
  labs(title    = "Gambar 5. Nilai VIF Seluruh Variabel Prediktor",
       subtitle = "Garis oranye = ambang 5; Garis merah = ambang 10",
       x = "Variabel", y = "VIF", fill = "Kategori") +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"))
print(p_vif)
Gambar 5. Nilai VIF Seluruh Variabel Prediktor

Gambar 5. Nilai VIF Seluruh Variabel Prediktor

Interpretasi: Tabel 2 dan Gambar 5 menampilkan nilai Variance Inflation Factor (VIF) untuk setiap variabel prediktor.

  • VIF < 5 (hijau): tidak ada masalah multikolinearitas — variabel tersebut cukup independen satu sama lain.
  • VIF 5–10 (oranye): perlu perhatian, korelasi antar variabel cukup tinggi namun masih dapat ditoleransi.
  • VIF ≥ 10 (merah): indikasi multikolinearitas serius yang dapat membuat estimasi koefisien tidak stabil.

Variabel unit kurikuler semester 1 dan 2 yang saling berkorelasi tinggi (terdeteksi di heatmap sebelumnya) mungkin menunjukkan VIF yang lebih tinggi. Dalam praktik, jika VIF sangat tinggi pada beberapa variabel, pilihan yang dapat diambil adalah: (1) menghapus salah satu variabel yang redundan, (2) menggabungkan variabel menjadi indeks komposit, atau (3) menggunakan regularisasi. Dalam modul ini, seluruh variabel tetap dipertahankan untuk tujuan demonstrasi analitik yang komprehensif.

Identifikasi Outlier

df_num_long <- df %>%
  select(all_of(c("Target", numeric_vars[1:9]))) %>%
  pivot_longer(cols = -Target, names_to = "Variabel", values_to = "Nilai")

p_outlier <- ggplot(df_num_long, aes(x = Target, y = Nilai, fill = Target)) +
  geom_boxplot(alpha = 0.6, outlier.shape = 21, outlier.size = 0.8,
               outlier.color = "red") +
  facet_wrap(~ Variabel, scales = "free_y", ncol = 3) +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  labs(title = "Gambar 6. Deteksi Outlier per Status Akademik (9 Variabel Pertama)",
       x = "Status", y = "Nilai") +
  theme_minimal(base_size = 9) +
  theme(legend.position = "none",
        axis.text.x     = element_text(angle = 30, hjust = 1),
        plot.title      = element_text(face = "bold"))
print(p_outlier)
Gambar 6. Deteksi Outlier per Status Akademik

Gambar 6. Deteksi Outlier per Status Akademik

Interpretasi: Gambar 6 menampilkan boxplot per status akademik untuk sembilan variabel numerik pertama, dengan titik merah menandai outlier (nilai di luar 1.5 × IQR). Kehadiran outlier relatif banyak terutama pada variabel yang bernilai diskret seperti unit kurikuler yang di-kredit. Perlu dicatat bahwa dalam konteks data pendidikan, outlier tidak selalu merupakan kesalahan — misalnya mahasiswa dengan jumlah unit kurikuler yang sangat tinggi bisa merupakan mahasiswa transfer. Regresi logistik multinomial cukup robust terhadap outlier pada prediktor (berbeda dengan regresi linear), sehingga penghapusan outlier tidak wajib dilakukan, namun perlu diwaspadai jika jumlahnya sangat ekstrem.


Pemodelan Multinomial Logistic Regression

Pembagian Data: Training dan Testing

set.seed(2024)
train_idx <- createDataPartition(df$Target, p = 0.8, list = FALSE)
df_train  <- df[ train_idx, ]
df_test   <- df[-train_idx, ]

cat("Jumlah data training:", nrow(df_train))
## Jumlah data training: 3541
cat("\nJumlah data testing :", nrow(df_test))
## 
## Jumlah data testing : 883
cat("\nDistribusi Target pada data Training (%):\n")
## 
## Distribusi Target pada data Training (%):
print(round(prop.table(table(df_train$Target)) * 100, 2))
## 
## Graduate Enrolled  Dropout 
##    49.93    17.96    32.11
cat("\nDistribusi Target pada data Testing (%):\n")
## 
## Distribusi Target pada data Testing (%):
print(round(prop.table(table(df_test$Target)) * 100, 2))
## 
## Graduate Enrolled  Dropout 
##    49.94    17.89    32.16

Interpretasi: Data dibagi dengan rasio 80:20 menggunakan createDataPartition dari package caret, yang memastikan proporsi setiap kelas target tetap terjaga secara stratifikasi (stratified splitting). Pendekatan ini penting agar distribusi Graduate, Enrolled, dan Dropout pada data training dan testing representatif, mencegah bias evaluasi yang bisa terjadi jika pembagian dilakukan secara acak murni (random split). Data training digunakan untuk membangun model, sedangkan data testing digunakan secara independen untuk mengevaluasi kemampuan generalisasi model.

Fitting Model MLR

# Formula dengan semua prediktor
formula_mlr <- as.formula(
  paste("Target ~",
        paste0("`", predictor_cols, "`", collapse = " + "))
)

cat("Fitting model Multinomial Logistic Regression...\n")
## Fitting model Multinomial Logistic Regression...
model_mlr <- multinom(formula_mlr,
                      data  = df_train,
                      maxit = 500,
                      trace = FALSE)

cat("Model berhasil difit!\n")
## Model berhasil difit!
cat("\nRingkasan Model (koefisien dan standard error):\n")
## 
## Ringkasan Model (koefisien dan standard error):
print(summary(model_mlr))
## Call:
## multinom(formula = formula_mlr, data = df_train, maxit = 500, 
##     trace = FALSE)
## 
## Coefficients:
##          (Intercept) `Marital status` `Application mode` `Application order`
## Enrolled    2.591354       0.02149497        0.007974044         -0.02631623
## Dropout     1.394192      -0.09995486        0.002695014          0.11416646
##                Course `Daytime/evening attendance` `Previous qualification`
## Enrolled 9.137398e-05                  -0.02128973              -0.01209524
## Dropout  1.991490e-04                   0.02935562              -0.01499905
##          `Previous qualification (grade)` Nacionality `Mother's qualification`
## Enrolled                     -0.002529813  0.01740900             -0.008740620
## Dropout                       0.006261044  0.03687635              0.006109717
##          `Father's qualification` `Mother's occupation` `Father's occupation`
## Enrolled             -0.006140603           0.001391896         -0.0003408843
## Dropout              -0.011360295          -0.008112361         -0.0004440111
##          `Admission grade`  Displaced `Educational special needs`    Debtor
## Enrolled      -0.008492794 -0.1511190                 -0.01882188 0.9074932
## Dropout       -0.014162321  0.3277647                  0.14295713 0.8708463
##          `Tuition fees up to date`    Gender `Scholarship holder`
## Enrolled                -0.9769278 0.3021360           -0.5580360
## Dropout                 -3.0268614 0.4960701           -0.7988546
##          `Age at enrollment` International
## Enrolled        -0.009921602    -0.8433316
## Dropout          0.054733204    -1.7040670
##          `Curricular units 1st sem (credited)`
## Enrolled                             0.1842744
## Dropout                              0.2610108
##          `Curricular units 1st sem (enrolled)`
## Enrolled                             0.1422616
## Dropout                              0.2140077
##          `Curricular units 1st sem (evaluations)`
## Enrolled                              0.053857407
## Dropout                               0.009602908
##          `Curricular units 1st sem (approved)`
## Enrolled                            -0.5715946
## Dropout                             -0.7523165
##          `Curricular units 1st sem (grade)`
## Enrolled                         0.04440662
## Dropout                          0.10058945
##          `Curricular units 1st sem (without evaluations)`
## Enrolled                                       0.02206242
## Dropout                                       -0.11345276
##          `Curricular units 2nd sem (credited)`
## Enrolled                            -0.1602848
## Dropout                              0.1108328
##          `Curricular units 2nd sem (enrolled)`
## Enrolled                             0.6884902
## Dropout                              0.9520021
##          `Curricular units 2nd sem (evaluations)`
## Enrolled                               0.12392850
## Dropout                                0.02628074
##          `Curricular units 2nd sem (approved)`
## Enrolled                            -0.7656494
## Dropout                             -1.0977382
##          `Curricular units 2nd sem (grade)`
## Enrolled                         -0.1056671
## Dropout                          -0.1940451
##          `Curricular units 2nd sem (without evaluations)` `Unemployment rate`
## Enrolled                                       -0.1022767         -0.02926524
## Dropout                                        -0.1830535          0.08464214
##          `Inflation rate`         GDP
## Enrolled      -0.03878659 0.011641935
## Dropout       -0.01339119 0.006324895
## 
## Std. Errors:
##           (Intercept) `Marital status` `Application mode` `Application order`
## Enrolled 0.0005718884     0.0013015059        0.004278238          0.02764564
## Dropout  0.0004943607     0.0009963952        0.004766024          0.02096289
##                Course `Daytime/evening attendance` `Previous qualification`
## Enrolled 4.158236e-05                  0.001089257              0.006677090
## Dropout  4.298738e-05                  0.001047016              0.007299817
##          `Previous qualification (grade)` Nacionality `Mother's qualification`
## Enrolled                      0.004923807 0.007797112              0.004479135
## Dropout                       0.005281657 0.007783585              0.005134783
##          `Father's qualification` `Mother's occupation` `Father's occupation`
## Enrolled              0.004397935           0.005779038           0.005966553
## Dropout               0.005045480           0.007123604           0.007416296
##          `Admission grade`   Displaced `Educational special needs`       Debtor
## Enrolled       0.004958115 0.003431994                0.0001118759 0.0006353506
## Dropout        0.005347784 0.002391512                0.0001342721 0.0006138388
##          `Tuition fees up to date`      Gender `Scholarship holder`
## Enrolled              0.0009279103 0.001822231         0.0011518782
## Dropout               0.0010040968 0.001544084         0.0008833714
##          `Age at enrollment` International
## Enrolled          0.01019912  0.0001610243
## Dropout           0.01017572  0.0001189408
##          `Curricular units 1st sem (credited)`
## Enrolled                            0.01879097
## Dropout                             0.01629546
##          `Curricular units 1st sem (enrolled)`
## Enrolled                           0.008855274
## Dropout                            0.006208763
##          `Curricular units 1st sem (evaluations)`
## Enrolled                               0.01666750
## Dropout                                0.01659722
##          `Curricular units 1st sem (approved)`
## Enrolled                           0.009881157
## Dropout                            0.010867053
##          `Curricular units 1st sem (grade)`
## Enrolled                         0.01575668
## Dropout                          0.01537292
##          `Curricular units 1st sem (without evaluations)`
## Enrolled                                      0.002486609
## Dropout                                       0.002064936
##          `Curricular units 2nd sem (credited)`
## Enrolled                            0.01241906
## Dropout                             0.01156136
##          `Curricular units 2nd sem (enrolled)`
## Enrolled                           0.007778634
## Dropout                            0.005299014
##          `Curricular units 2nd sem (evaluations)`
## Enrolled                               0.01912784
## Dropout                                0.01916321
##          `Curricular units 2nd sem (approved)`
## Enrolled                             0.0194288
## Dropout                              0.0183802
##          `Curricular units 2nd sem (grade)`
## Enrolled                         0.01546695
## Dropout                          0.01491198
##          `Curricular units 2nd sem (without evaluations)` `Unemployment rate`
## Enrolled                                      0.001814669          0.02052893
## Dropout                                       0.001654826          0.02245053
##          `Inflation rate`        GDP
## Enrolled       0.02372426 0.02061236
## Dropout        0.01935775 0.02126891
## 
## Residual Deviance: 3904.075 
## AIC: 4052.075

Interpretasi: Model MLR berhasil dibangun menggunakan fungsi multinom() dari package nnet dengan algoritma optimisasi Neural Network (metode BFGS). Output summary(model_mlr) menampilkan dua baris koefisien: satu baris untuk Enrolled vs Graduate dan satu baris untuk Dropout vs Graduate. Setiap koefisien merepresentasikan log-odds relatif dari satu kategori dibanding kategori referensi (Graduate), dengan asumsi variabel lain konstan. Parameter maxit = 500 digunakan untuk memastikan model konvergen, yakni mencapai solusi optimal yang stabil. Nilai Residual Deviance yang ditampilkan mengukur seberapa baik model fit terhadap data training.


Uji Signifikansi Parameter

Uji Serentak (Likelihood Ratio Test / G²)

cat("==========================================\n")
## ==========================================
cat("     Uji Serentak (G² / LRT)\n")
##      Uji Serentak (G² / LRT)
cat("==========================================\n")
## ==========================================
# Model null (intercept only)
model_null <- multinom(Target ~ 1, data = df_train, maxit = 500, trace = FALSE)

# Hitung G²
ll_null  <- as.numeric(logLik(model_null))
ll_model <- as.numeric(logLik(model_mlr))
G2       <- -2 * (ll_null - ll_model)
df_g2    <- (length(levels(df_train$Target)) - 1) * length(predictor_cols)
p_g2     <- 1 - pchisq(G2, df = df_g2)

uji_serentak <- data.frame(
  Statistik = c("Log-Likelihood (Model Null)", "Log-Likelihood (Model Full)",
                "G² (Chi-Square)", "Derajat Bebas (df)",
                "P-value", "Keputusan (α = 0.05)"),
  Nilai     = c(round(ll_null,  4),
                round(ll_model, 4),
                round(G2, 4),
                df_g2,
                format(p_g2, scientific = TRUE, digits = 4),
                ifelse(p_g2 < 0.05,
                       "Tolak H0 — model signifikan",
                       "Gagal Tolak H0 — model tidak signifikan"))
)

kable(uji_serentak,
      col.names = c("Statistik", "Nilai"),
      caption   = "Tabel 3. Hasil Uji Serentak (Likelihood Ratio Test)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE, position = "left", font_size = 13) %>%
  column_spec(1, bold = TRUE)
Tabel 3. Hasil Uji Serentak (Likelihood Ratio Test)
Statistik Nilai
Log-Likelihood (Model Null) -3611.6229
Log-Likelihood (Model Full) -1952.0376
G² (Chi-Square) 3319.1707
Derajat Bebas (df) 72
P-value 0e+00
Keputusan (α = 0.05) Tolak H0 — model signifikan

Interpretasi: Tabel 3 menampilkan hasil Uji Serentak (G²) atau Likelihood Ratio Test (LRT) yang menguji hipotesis:

  • H₀: Semua koefisien prediktor = 0 (model tidak lebih baik dari intercept-only)
  • H₁: Minimal satu koefisien ≠ 0 (model lebih baik dari intercept-only)

Statistik G² dihitung sebagai: G² = −2 × (ln L₀ − ln L₁), di mana L₀ adalah likelihood model null dan L₁ adalah likelihood model penuh. Jika p-value < 0.05, maka H₀ ditolak, artinya secara serentak minimal satu variabel prediktor signifikan memengaruhi status akademik mahasiswa. Derajat bebas dihitung sebagai (jumlah kategori − 1) × jumlah prediktor, mencerminkan jumlah parameter tambahan yang diestimasi oleh model penuh dibandingkan model null.

Uji Parsial (Wald Test)

cat("==========================================\n")
## ==========================================
cat("        Uji Parsial (Wald Test)\n")
##         Uji Parsial (Wald Test)
cat("==========================================\n")
## ==========================================
smry    <- summary(model_mlr)
coefs   <- smry$coefficients
std_err <- smry$standard.errors

# Hitung statistik Wald dan p-value
W_stat <- (coefs / std_err)^2
p_wald <- 2 * (1 - pnorm(abs(coefs / std_err)))

# --- Tabel Enrolled vs Graduate ---
wald_enrolled <- data.frame(
  Variabel = colnames(coefs),
  Beta     = round(coefs["Enrolled", ], 4),
  SE       = round(std_err["Enrolled", ], 4),
  W2       = round(W_stat["Enrolled", ], 4),
  P_value  = round(p_wald["Enrolled", ], 4),
  Sig      = ifelse(p_wald["Enrolled", ] < 0.001, "***",
                    ifelse(p_wald["Enrolled", ] < 0.01,  "**",
                           ifelse(p_wald["Enrolled", ] < 0.05, "*",
                                  ifelse(p_wald["Enrolled", ] < 0.1, ".", " "))))
)
rownames(wald_enrolled) <- NULL

# --- Tabel Dropout vs Graduate ---
wald_dropout <- data.frame(
  Variabel = colnames(coefs),
  Beta     = round(coefs["Dropout", ], 4),
  SE       = round(std_err["Dropout", ], 4),
  W2       = round(W_stat["Dropout", ], 4),
  P_value  = round(p_wald["Dropout", ], 4),
  Sig      = ifelse(p_wald["Dropout", ] < 0.001, "***",
                    ifelse(p_wald["Dropout", ] < 0.01,  "**",
                           ifelse(p_wald["Dropout", ] < 0.05, "*",
                                  ifelse(p_wald["Dropout", ] < 0.1, ".", " "))))
)
rownames(wald_dropout) <- NULL

kable(wald_enrolled,
      col.names = c("Variabel", "β", "SE", "W²", "P-value", "Sig"),
      caption   = "Tabel 4a. Uji Parsial Wald: Enrolled vs Graduate") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 11) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(which(wald_enrolled$P_value < 0.05), background = "#e8f5e9")
Tabel 4a. Uji Parsial Wald: Enrolled vs Graduate
Variabel β SE P-value Sig
(Intercept) 2.5914 0.0006 2.053198e+07 0.0000 ***
Marital status 0.0215 0.0013 2.727603e+02 0.0000 ***
Application mode 0.0080 0.0043 3.474000e+00 0.0623 .
Application order -0.0263 0.0276 9.061000e-01 0.3411
Course 0.0001 0.0000 4.828700e+00 0.0280
Daytime/evening attendance -0.0213 0.0011 3.820144e+02 0.0000 ***
Previous qualification -0.0121 0.0067 3.281400e+00 0.0701 .
Previous qualification (grade) -0.0025 0.0049 2.640000e-01 0.6074
Nacionality 0.0174 0.0078 4.985200e+00 0.0256
Mother's qualification -0.0087 0.0045 3.808000e+00 0.0510 .
Father's qualification -0.0061 0.0044 1.949500e+00 0.1626
Mother's occupation 0.0014 0.0058 5.800000e-02 0.8097
Father's occupation -0.0003 0.0060 3.300000e-03 0.9544
Admission grade -0.0085 0.0050 2.934100e+00 0.0867 .
Displaced -0.1511 0.0034 1.938854e+03 0.0000 ***
Educational special needs -0.0188 0.0001 2.830437e+04 0.0000 ***
Debtor 0.9075 0.0006 2.040139e+06 0.0000 ***
Tuition fees up to date -0.9769 0.0009 1.108442e+06 0.0000 ***
Gender 0.3021 0.0018 2.749148e+04 0.0000 ***
Scholarship holder -0.5580 0.0012 2.346991e+05 0.0000 ***
Age at enrollment -0.0099 0.0102 9.463000e-01 0.3307
International -0.8433 0.0002 2.742924e+07 0.0000 ***
Curricular units 1st sem (credited) 0.1843 0.0188 9.616830e+01 0.0000 ***
Curricular units 1st sem (enrolled) 0.1423 0.0089 2.580902e+02 0.0000 ***
Curricular units 1st sem (evaluations) 0.0539 0.0167 1.044120e+01 0.0012 **
Curricular units 1st sem (approved) -0.5716 0.0099 3.346267e+03 0.0000 ***
Curricular units 1st sem (grade) 0.0444 0.0158 7.942700e+00 0.0048 **
Curricular units 1st sem (without evaluations) 0.0221 0.0025 7.872110e+01 0.0000 ***
Curricular units 2nd sem (credited) -0.1603 0.0124 1.665739e+02 0.0000 ***
Curricular units 2nd sem (enrolled) 0.6885 0.0078 7.834096e+03 0.0000 ***
Curricular units 2nd sem (evaluations) 0.1239 0.0191 4.197690e+01 0.0000 ***
Curricular units 2nd sem (approved) -0.7656 0.0194 1.552987e+03 0.0000 ***
Curricular units 2nd sem (grade) -0.1057 0.0155 4.667350e+01 0.0000 ***
Curricular units 2nd sem (without evaluations) -0.1023 0.0018 3.176572e+03 0.0000 ***
Unemployment rate -0.0293 0.0205 2.032200e+00 0.1540
Inflation rate -0.0388 0.0237 2.672900e+00 0.1021
GDP 0.0116 0.0206 3.190000e-01 0.5722
kable(wald_dropout,
      col.names = c("Variabel", "β", "SE", "W²", "P-value", "Sig"),
      caption   = "Tabel 4b. Uji Parsial Wald: Dropout vs Graduate") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 11) %>%
  column_spec(1, bold = TRUE) %>%
  row_spec(which(wald_dropout$P_value < 0.05), background = "#fce4ec")
Tabel 4b. Uji Parsial Wald: Dropout vs Graduate
Variabel β SE P-value Sig
(Intercept) 1.3942 0.0005 7.953481e+06 0.0000 ***
Marital status -0.1000 0.0010 1.006340e+04 0.0000 ***
Application mode 0.0027 0.0048 3.197000e-01 0.5718
Application order 0.1142 0.0210 2.966030e+01 0.0000 ***
Course 0.0002 0.0000 2.146220e+01 0.0000 ***
Daytime/evening attendance 0.0294 0.0010 7.860968e+02 0.0000 ***
Previous qualification -0.0150 0.0073 4.221900e+00 0.0399
Previous qualification (grade) 0.0063 0.0053 1.405200e+00 0.2358
Nacionality 0.0369 0.0078 2.244590e+01 0.0000 ***
Mother's qualification 0.0061 0.0051 1.415800e+00 0.2341
Father's qualification -0.0114 0.0050 5.069600e+00 0.0243
Mother's occupation -0.0081 0.0071 1.296900e+00 0.2548
Father's occupation -0.0004 0.0074 3.600000e-03 0.9523
Admission grade -0.0142 0.0053 7.013300e+00 0.0081 **
Displaced 0.3278 0.0024 1.878362e+04 0.0000 ***
Educational special needs 0.1430 0.0001 1.133548e+06 0.0000 ***
Debtor 0.8708 0.0006 2.012678e+06 0.0000 ***
Tuition fees up to date -3.0269 0.0010 9.087280e+06 0.0000 ***
Gender 0.4961 0.0015 1.032154e+05 0.0000 ***
Scholarship holder -0.7989 0.0009 8.178032e+05 0.0000 ***
Age at enrollment 0.0547 0.0102 2.893150e+01 0.0000 ***
International -1.7041 0.0001 2.052636e+08 0.0000 ***
Curricular units 1st sem (credited) 0.2610 0.0163 2.565569e+02 0.0000 ***
Curricular units 1st sem (enrolled) 0.2140 0.0062 1.188088e+03 0.0000 ***
Curricular units 1st sem (evaluations) 0.0096 0.0166 3.348000e-01 0.5629
Curricular units 1st sem (approved) -0.7523 0.0109 4.792671e+03 0.0000 ***
Curricular units 1st sem (grade) 0.1006 0.0154 4.281460e+01 0.0000 ***
Curricular units 1st sem (without evaluations) -0.1135 0.0021 3.018678e+03 0.0000 ***
Curricular units 2nd sem (credited) 0.1108 0.0116 9.190060e+01 0.0000 ***
Curricular units 2nd sem (enrolled) 0.9520 0.0053 3.227645e+04 0.0000 ***
Curricular units 2nd sem (evaluations) 0.0263 0.0192 1.880800e+00 0.1702
Curricular units 2nd sem (approved) -1.0977 0.0184 3.566952e+03 0.0000 ***
Curricular units 2nd sem (grade) -0.1940 0.0149 1.693303e+02 0.0000 ***
Curricular units 2nd sem (without evaluations) -0.1831 0.0017 1.223634e+04 0.0000 ***
Unemployment rate 0.0846 0.0225 1.421410e+01 0.0002 ***
Inflation rate -0.0134 0.0194 4.786000e-01 0.4891
GDP 0.0063 0.0213 8.840000e-02 0.7662
cat("\nKeterangan: *** p<0.001 | ** p<0.01 | * p<0.05 | . p<0.1\n")
## 
## Keterangan: *** p<0.001 | ** p<0.01 | * p<0.05 | . p<0.1

Interpretasi: Tabel 4a dan 4b menampilkan hasil Uji Parsial Wald untuk setiap prediktor secara individual, dengan hipotesis:

  • H₀: βⱼ = 0 (prediktor ke-j tidak signifikan memengaruhi log-odds)
  • H₁: βⱼ ≠ 0 (prediktor ke-j signifikan)

Statistik Wald dihitung sebagai W² = (β/SE)² yang mengikuti distribusi Chi-square dengan df = 1. Baris yang disorot hijau (Tabel 4a) atau merah muda (Tabel 4b) menunjukkan variabel yang signifikan secara statistik (p < 0.05). Tanda bintang (//) menunjukkan tingkat signifikansi. Perhatikan bahwa variabel yang signifikan untuk Enrolled vs Graduate belum tentu sama dengan yang signifikan untuk Dropout vs Graduate, karena MLR memodelkan dua persamaan logit secara terpisah namun simultan.


Interpretasi Model

Relative Log-Odds (Koefisien β)

log_odds_df <- rbind(
  wald_enrolled %>% mutate(Kategori = "Enrolled vs Graduate"),
  wald_dropout  %>% mutate(Kategori = "Dropout vs Graduate")
)

sig_logodds <- log_odds_df %>%
  filter(P_value < 0.05, Variabel != "(Intercept)") %>%
  mutate(Arah     = ifelse(Beta > 0, "Positif", "Negatif"),
         Variabel = reorder(Variabel, Beta))

p_logodds <- ggplot(sig_logodds, aes(x = Variabel, y = Beta, fill = Arah)) +
  geom_bar(stat = "identity", alpha = 0.85, color = "white") +
  geom_errorbar(aes(ymin = Beta - 1.96 * SE, ymax = Beta + 1.96 * SE),
                width = 0.25, color = "black", linewidth = 0.5) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black", linewidth = 0.5) +
  coord_flip() +
  facet_wrap(~ Kategori, scales = "free_x") +
  scale_fill_manual(values = c("Positif" = "#2196F3", "Negatif" = "#F44336")) +
  labs(title    = "Gambar 7. Koefisien Log-Odds Signifikan (p < 0.05)",
       subtitle = "Error bar = 95% Confidence Interval",
       x = "Variabel", y = "Log-Odds (β)", fill = "Arah Pengaruh") +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))
print(p_logodds)
Gambar 7. Koefisien Log-Odds Signifikan

Gambar 7. Koefisien Log-Odds Signifikan

Interpretasi: Gambar 7 menampilkan koefisien log-odds (β) dari setiap variabel yang signifikan. Koefisien β merepresentasikan perubahan log-odds rasio antara kategori target dan Graduate untuk setiap penambahan 1 unit prediktor.

  • Bar biru (positif): Meningkatnya nilai variabel ini meningkatkan log-odds mahasiswa berada di kategori tersebut (Enrolled/Dropout) dibandingkan Graduate.
  • Bar merah (negatif): Meningkatnya nilai variabel ini menurunkan log-odds mahasiswa berada di kategori tersebut, artinya lebih cenderung Graduate.
  • Error bar menunjukkan interval kepercayaan 95%; error bar yang tidak melewati garis nol menandakan signifikansi statistik yang konsisten.
  • Variabel dengan bar panjang (nilai β besar) memiliki pengaruh yang lebih besar terhadap status akademik dibandingkan variabel dengan bar pendek.

Relative Risk Ratios / Odds Ratio (exp(β))

rrr_enrolled <- data.frame(
  Variabel = colnames(coefs),
  Beta     = round(coefs["Enrolled", ], 4),
  RRR      = round(exp(coefs["Enrolled", ]), 4),
  CI_Lower = round(exp(coefs["Enrolled", ] - 1.96 * std_err["Enrolled", ]), 4),
  CI_Upper = round(exp(coefs["Enrolled", ] + 1.96 * std_err["Enrolled", ]), 4),
  P_value  = round(p_wald["Enrolled", ], 4),
  Sig      = ifelse(p_wald["Enrolled", ] < 0.001, "***",
                    ifelse(p_wald["Enrolled", ] < 0.01,  "**",
                           ifelse(p_wald["Enrolled", ] < 0.05, "*", " ")))
)
rownames(rrr_enrolled) <- NULL

rrr_dropout <- data.frame(
  Variabel = colnames(coefs),
  Beta     = round(coefs["Dropout", ], 4),
  RRR      = round(exp(coefs["Dropout", ]), 4),
  CI_Lower = round(exp(coefs["Dropout", ] - 1.96 * std_err["Dropout", ]), 4),
  CI_Upper = round(exp(coefs["Dropout", ] + 1.96 * std_err["Dropout", ]), 4),
  P_value  = round(p_wald["Dropout", ], 4),
  Sig      = ifelse(p_wald["Dropout", ] < 0.001, "***",
                    ifelse(p_wald["Dropout", ] < 0.01,  "**",
                           ifelse(p_wald["Dropout", ] < 0.05, "*", " ")))
)
rownames(rrr_dropout) <- NULL

kable(rrr_enrolled,
      col.names = c("Variabel", "β", "RRR", "CI Lower (95%)", "CI Upper (95%)", "P-value", "Sig"),
      caption   = "Tabel 5a. Relative Risk Ratio: Enrolled vs Graduate") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 11) %>%
  column_spec(1, bold = TRUE)
Tabel 5a. Relative Risk Ratio: Enrolled vs Graduate
Variabel β RRR CI Lower (95%) CI Upper (95%) P-value Sig
(Intercept) 2.5914 13.3478 13.3329 13.3628 0.0000 ***
Marital status 0.0215 1.0217 1.0191 1.0243 0.0000 ***
Application mode 0.0080 1.0080 0.9996 1.0165 0.0623
Application order -0.0263 0.9740 0.9227 1.0283 0.3411
Course 0.0001 1.0001 1.0000 1.0002 0.0280
Daytime/evening attendance -0.0213 0.9789 0.9768 0.9810 0.0000 ***
Previous qualification -0.0121 0.9880 0.9751 1.0010 0.0701
Previous qualification (grade) -0.0025 0.9975 0.9879 1.0071 0.6074
Nacionality 0.0174 1.0176 1.0021 1.0332 0.0256
Mother's qualification -0.0087 0.9913 0.9826 1.0000 0.0510
Father's qualification -0.0061 0.9939 0.9853 1.0025 0.1626
Mother's occupation 0.0014 1.0014 0.9901 1.0128 0.8097
Father's occupation -0.0003 0.9997 0.9880 1.0114 0.9544
Admission grade -0.0085 0.9915 0.9820 1.0012 0.0867
Displaced -0.1511 0.8597 0.8540 0.8655 0.0000 ***
Educational special needs -0.0188 0.9814 0.9811 0.9816 0.0000 ***
Debtor 0.9075 2.4781 2.4750 2.4812 0.0000 ***
Tuition fees up to date -0.9769 0.3765 0.3758 0.3772 0.0000 ***
Gender 0.3021 1.3527 1.3479 1.3576 0.0000 ***
Scholarship holder -0.5580 0.5723 0.5710 0.5736 0.0000 ***
Age at enrollment -0.0099 0.9901 0.9705 1.0101 0.3307
International -0.8433 0.4303 0.4301 0.4304 0.0000 ***
Curricular units 1st sem (credited) 0.1843 1.2023 1.1589 1.2475 0.0000 ***
Curricular units 1st sem (enrolled) 0.1423 1.1529 1.1330 1.1731 0.0000 ***
Curricular units 1st sem (evaluations) 0.0539 1.0553 1.0214 1.0904 0.0012 **
Curricular units 1st sem (approved) -0.5716 0.5646 0.5538 0.5757 0.0000 ***
Curricular units 1st sem (grade) 0.0444 1.0454 1.0136 1.0782 0.0048 **
Curricular units 1st sem (without evaluations) 0.0221 1.0223 1.0173 1.0273 0.0000 ***
Curricular units 2nd sem (credited) -0.1603 0.8519 0.8314 0.8729 0.0000 ***
Curricular units 2nd sem (enrolled) 0.6885 1.9907 1.9606 2.0213 0.0000 ***
Curricular units 2nd sem (evaluations) 0.1239 1.1319 1.0903 1.1752 0.0000 ***
Curricular units 2nd sem (approved) -0.7656 0.4650 0.4477 0.4831 0.0000 ***
Curricular units 2nd sem (grade) -0.1057 0.8997 0.8729 0.9274 0.0000 ***
Curricular units 2nd sem (without evaluations) -0.1023 0.9028 0.8996 0.9060 0.0000 ***
Unemployment rate -0.0293 0.9712 0.9329 1.0110 0.1540
Inflation rate -0.0388 0.9620 0.9182 1.0077 0.1021
GDP 0.0116 1.0117 0.9717 1.0534 0.5722
kable(rrr_dropout,
      col.names = c("Variabel", "β", "RRR", "CI Lower (95%)", "CI Upper (95%)", "P-value", "Sig"),
      caption   = "Tabel 5b. Relative Risk Ratio: Dropout vs Graduate") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 11) %>%
  column_spec(1, bold = TRUE)
Tabel 5b. Relative Risk Ratio: Dropout vs Graduate
Variabel β RRR CI Lower (95%) CI Upper (95%) P-value Sig
(Intercept) 1.3942 4.0317 4.0278 4.0356 0.0000 ***
Marital status -0.1000 0.9049 0.9031 0.9066 0.0000 ***
Application mode 0.0027 1.0027 0.9934 1.0121 0.5718
Application order 0.1142 1.1209 1.0758 1.1680 0.0000 ***
Course 0.0002 1.0002 1.0001 1.0003 0.0000 ***
Daytime/evening attendance 0.0294 1.0298 1.0277 1.0319 0.0000 ***
Previous qualification -0.0150 0.9851 0.9711 0.9993 0.0399
Previous qualification (grade) 0.0063 1.0063 0.9959 1.0168 0.2358
Nacionality 0.0369 1.0376 1.0219 1.0535 0.0000 ***
Mother's qualification 0.0061 1.0061 0.9961 1.0163 0.2341
Father's qualification -0.0114 0.9887 0.9790 0.9985 0.0243
Mother's occupation -0.0081 0.9919 0.9782 1.0059 0.2548
Father's occupation -0.0004 0.9996 0.9851 1.0142 0.9523
Admission grade -0.0142 0.9859 0.9757 0.9963 0.0081 **
Displaced 0.3278 1.3879 1.3814 1.3944 0.0000 ***
Educational special needs 0.1430 1.1537 1.1534 1.1540 0.0000 ***
Debtor 0.8708 2.3889 2.3861 2.3918 0.0000 ***
Tuition fees up to date -3.0269 0.0485 0.0484 0.0486 0.0000 ***
Gender 0.4961 1.6423 1.6373 1.6472 0.0000 ***
Scholarship holder -0.7989 0.4498 0.4491 0.4506 0.0000 ***
Age at enrollment 0.0547 1.0563 1.0354 1.0775 0.0000 ***
International -1.7041 0.1819 0.1819 0.1820 0.0000 ***
Curricular units 1st sem (credited) 0.2610 1.2982 1.2574 1.3404 0.0000 ***
Curricular units 1st sem (enrolled) 0.2140 1.2386 1.2237 1.2538 0.0000 ***
Curricular units 1st sem (evaluations) 0.0096 1.0096 0.9773 1.0430 0.5629
Curricular units 1st sem (approved) -0.7523 0.4713 0.4613 0.4814 0.0000 ***
Curricular units 1st sem (grade) 0.1006 1.1058 1.0730 1.1396 0.0000 ***
Curricular units 1st sem (without evaluations) -0.1135 0.8927 0.8891 0.8964 0.0000 ***
Curricular units 2nd sem (credited) 0.1108 1.1172 1.0922 1.1428 0.0000 ***
Curricular units 2nd sem (enrolled) 0.9520 2.5909 2.5641 2.6179 0.0000 ***
Curricular units 2nd sem (evaluations) 0.0263 1.0266 0.9888 1.0659 0.1702
Curricular units 2nd sem (approved) -1.0977 0.3336 0.3218 0.3459 0.0000 ***
Curricular units 2nd sem (grade) -0.1940 0.8236 0.7999 0.8480 0.0000 ***
Curricular units 2nd sem (without evaluations) -0.1831 0.8327 0.8300 0.8354 0.0000 ***
Unemployment rate 0.0846 1.0883 1.0415 1.1373 0.0002 ***
Inflation rate -0.0134 0.9867 0.9500 1.0249 0.4891
GDP 0.0063 1.0063 0.9653 1.0492 0.7662
# Plot RRR
rrr_combined <- rbind(
  rrr_enrolled %>% filter(P_value < 0.05, Variabel != "(Intercept)") %>%
    mutate(Kategori = "Enrolled vs Graduate"),
  rrr_dropout  %>% filter(P_value < 0.05, Variabel != "(Intercept)") %>%
    mutate(Kategori = "Dropout vs Graduate")
) %>%
  mutate(Variabel = reorder(Variabel, RRR),
         Arah     = ifelse(RRR > 1, "RRR > 1 (risiko naik)", "RRR < 1 (risiko turun)"))

p_rrr <- ggplot(rrr_combined, aes(x = Variabel, y = RRR, color = Arah)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper), width = 0.3, linewidth = 0.7) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "gray40", linewidth = 0.8) +
  coord_flip() +
  facet_wrap(~ Kategori, scales = "free") +
  scale_color_manual(values = c("RRR > 1 (risiko naik)"  = "#F44336",
                                "RRR < 1 (risiko turun)" = "#2196F3")) +
  labs(title    = "Gambar 8. Relative Risk Ratio (RRR) dengan 95% CI",
       subtitle = "Garis putus-putus = RRR 1 (tidak ada perubahan risiko)",
       x = "Variabel", y = "RRR = exp(β)", color = "Arah RRR") +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"),
        strip.text = element_text(face = "bold"))
print(p_rrr)
Gambar 8. Relative Risk Ratio dengan 95% CI

Gambar 8. Relative Risk Ratio dengan 95% CI

Interpretasi: Tabel 5a & 5b serta Gambar 8 menampilkan Relative Risk Ratio (RRR) = exp(β), yang merupakan bentuk eksponensial dari koefisien log-odds. RRR lebih mudah diinterpretasikan secara substantif:

  • RRR > 1 (merah): Peningkatan 1 unit prediktor meningkatkan odds berada di kategori tersebut (Enrolled/Dropout) dibandingkan Graduate. Contoh: RRR = 1.5 berarti odds naik sebesar 50%.
  • RRR < 1 (biru): Peningkatan 1 unit prediktor menurunkan odds berada di kategori tersebut, artinya lebih cenderung Graduate. Contoh: RRR = 0.8 berarti odds turun sebesar 20%.
  • RRR = 1 (garis putus-putus): tidak ada efek.
  • Interval kepercayaan 95% yang tidak mencakup nilai 1 mengonfirmasi signifikansi statistik.

Contoh interpretasi konkret: Jika variabel “Curricular units 1st sem (approved)” memiliki RRR = 0.5 untuk Dropout vs Graduate, maka setiap tambahan 1 unit kurikuler yang lulus di semester 1 menurunkan odds mahasiswa untuk Dropout (dibandingkan Graduate) sebesar 50%, dengan asumsi variabel lain konstan.

Average Marginal Effects (AME)

cat("Menghitung Average Marginal Effects (AME)...\n")
## Menghitung Average Marginal Effects (AME)...
ame     <- avg_slopes(model_mlr, newdata = df_train)
ame_df  <- as.data.frame(ame) %>%
  select(term, group, estimate, std.error, statistic, p.value, conf.low, conf.high) %>%
  rename(
    Variabel = term,
    Kategori = group,
    AME      = estimate,
    SE       = std.error,
    Z_stat   = statistic,
    P_value  = p.value,
    CI_Low   = conf.low,
    CI_High  = conf.high
  ) %>%
  mutate(across(where(is.numeric), ~ round(.x, 4)),
         Sig = ifelse(P_value < 0.001, "***",
                      ifelse(P_value < 0.01,  "**",
                             ifelse(P_value < 0.05, "*",
                                    ifelse(P_value < 0.1, ".", " ")))))

kable(ame_df,
      col.names = c("Variabel", "Kategori", "AME", "SE", "Z",
                    "P-value", "CI Low", "CI High", "Sig"),
      caption   = "Tabel 6. Average Marginal Effects (AME)") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = TRUE, font_size = 11) %>%
  column_spec(1, bold = TRUE)
Tabel 6. Average Marginal Effects (AME)
Variabel Kategori AME SE Z P-value CI Low CI High Sig
Admission grade Graduate 0.0011 0.0005 2.3544 0.0186 0.0002 0.0020
Admission grade Enrolled -0.0003 0.0005 -0.5139 0.6073 -0.0012 0.0007
Admission grade Dropout -0.0008 0.0004 -2.0471 0.0406 -0.0017 0.0000
Age at enrollment Graduate -0.0015 0.0009 -1.5688 0.1167 -0.0033 0.0004
Age at enrollment Enrolled -0.0041 0.0010 -3.9519 0.0001 -0.0061 -0.0021 ***
Age at enrollment Dropout 0.0055 0.0008 6.9336 0.0000 0.0040 0.0071 ***
Application mode Graduate -0.0006 0.0004 -1.5447 0.1224 -0.0014 0.0002
Application mode Enrolled 0.0008 0.0004 1.8490 0.0645 0.0000 0.0017 .
Application mode Dropout -0.0002 0.0004 -0.4745 0.6351 -0.0009 0.0006
Application order Graduate -0.0027 0.0010 -2.5347 0.0113 -0.0047 -0.0006
Application order Enrolled -0.0092 0.0044 -2.1029 0.0355 -0.0178 -0.0006
Application order Dropout 0.0119 0.0034 3.5046 0.0005 0.0052 0.0185 ***
Course Graduate 0.0000 0.0000 -3.3703 0.0008 0.0000 0.0000 ***
Course Enrolled 0.0000 0.0000 0.0735 0.9414 0.0000 0.0000
Course Dropout 0.0000 0.0000 4.8102 0.0000 0.0000 0.0000 ***
Curricular units 1st sem (approved) Graduate 0.0664 0.0009 70.7025 0.0000 0.0646 0.0682 ***
Curricular units 1st sem (approved) Enrolled -0.0279 0.0013 -21.1249 0.0000 -0.0305 -0.0253 ***
Curricular units 1st sem (approved) Dropout -0.0385 0.0012 -31.8867 0.0000 -0.0409 -0.0361 ***
Curricular units 1st sem (credited) Graduate -0.0221 0.0013 -16.9490 0.0000 -0.0247 -0.0196 ***
Curricular units 1st sem (credited) Enrolled 0.0080 0.0026 3.0727 0.0021 0.0029 0.0131 **
Curricular units 1st sem (credited) Dropout 0.0141 0.0020 6.9799 0.0000 0.0101 0.0181 ***
Curricular units 1st sem (enrolled) Graduate -0.0176 0.0009 -18.9923 0.0000 -0.0194 -0.0157 ***
Curricular units 1st sem (enrolled) Enrolled 0.0055 0.0012 4.7987 0.0000 0.0033 0.0078 ***
Curricular units 1st sem (enrolled) Dropout 0.0120 0.0007 16.9373 0.0000 0.0106 0.0134 ***
Curricular units 1st sem (evaluations) Graduate -0.0039 0.0012 -3.2392 0.0012 -0.0063 -0.0015 **
Curricular units 1st sem (evaluations) Enrolled 0.0059 0.0022 2.7127 0.0067 0.0016 0.0101 **
Curricular units 1st sem (evaluations) Dropout -0.0020 0.0018 -1.1216 0.2620 -0.0055 0.0015
Curricular units 1st sem (grade) Graduate -0.0068 0.0012 -5.4728 0.0000 -0.0092 -0.0044 ***
Curricular units 1st sem (grade) Enrolled -0.0001 0.0021 -0.0351 0.9720 -0.0042 0.0040
Curricular units 1st sem (grade) Dropout 0.0069 0.0017 4.0697 0.0000 0.0036 0.0101 ***
Curricular units 1st sem (without evaluations) Graduate 0.0029 0.0003 10.2347 0.0000 0.0024 0.0035 ***
Curricular units 1st sem (without evaluations) Enrolled 0.0087 0.0004 20.6677 0.0000 0.0078 0.0095 ***
Curricular units 1st sem (without evaluations) Dropout -0.0116 0.0004 -28.4168 0.0000 -0.0124 -0.0108 ***
Curricular units 2nd sem (approved) Graduate 0.0924 0.0015 61.6803 0.0000 0.0895 0.0953 ***
Curricular units 2nd sem (approved) Enrolled -0.0326 0.0027 -11.9965 0.0000 -0.0379 -0.0273 ***
Curricular units 2nd sem (approved) Dropout -0.0598 0.0023 -25.8189 0.0000 -0.0644 -0.0553 ***
Curricular units 2nd sem (credited) Graduate 0.0063 0.0010 6.2804 0.0000 0.0043 0.0082 ***
Curricular units 2nd sem (credited) Enrolled -0.0249 0.0017 -14.2944 0.0000 -0.0284 -0.0215 ***
Curricular units 2nd sem (credited) Dropout 0.0187 0.0013 14.8365 0.0000 0.0162 0.0211 ***
Curricular units 2nd sem (enrolled) Graduate -0.0817 0.0021 -39.2503 0.0000 -0.0858 -0.0777 ***
Curricular units 2nd sem (enrolled) Enrolled 0.0312 0.0023 13.4867 0.0000 0.0266 0.0357 ***
Curricular units 2nd sem (enrolled) Dropout 0.0506 0.0019 27.1274 0.0000 0.0469 0.0542 ***
Curricular units 2nd sem (evaluations) Graduate -0.0091 0.0015 -5.9564 0.0000 -0.0122 -0.0061 ***
Curricular units 2nd sem (evaluations) Enrolled 0.0133 0.0022 6.1201 0.0000 0.0091 0.0176 ***
Curricular units 2nd sem (evaluations) Dropout -0.0042 0.0018 -2.3771 0.0174 -0.0076 -0.0007
Curricular units 2nd sem (grade) Graduate 0.0144 0.0010 14.4995 0.0000 0.0124 0.0163 ***
Curricular units 2nd sem (grade) Enrolled -0.0022 0.0021 -1.0820 0.2793 -0.0063 0.0018
Curricular units 2nd sem (grade) Dropout -0.0122 0.0017 -7.2538 0.0000 -0.0154 -0.0089 ***
Curricular units 2nd sem (without evaluations) Graduate 0.0137 0.0003 39.7422 0.0000 0.0131 0.0144 ***
Curricular units 2nd sem (without evaluations) Enrolled -0.0024 0.0004 -5.4489 0.0000 -0.0033 -0.0015 ***
Curricular units 2nd sem (without evaluations) Dropout -0.0113 0.0005 -24.9076 0.0000 -0.0122 -0.0104 ***
Daytime/evening attendance Graduate 0.0003 0.0001 2.6813 0.0073 0.0001 0.0005 **
Daytime/evening attendance Enrolled -0.0041 0.0002 -21.8512 0.0000 -0.0045 -0.0037 ***
Daytime/evening attendance Dropout 0.0038 0.0002 24.0109 0.0000 0.0035 0.0041 ***
Debtor Graduate -0.1008 0.0021 -47.9323 0.0000 -0.1049 -0.0967 ***
Debtor Enrolled 0.0685 0.0025 27.8863 0.0000 0.0637 0.0734 ***
Debtor Dropout 0.0323 0.0016 19.9532 0.0000 0.0291 0.0354 ***
Displaced Graduate -0.0022 0.0008 -2.7753 0.0055 -0.0038 -0.0007 **
Displaced Enrolled -0.0352 0.0010 -34.9624 0.0000 -0.0372 -0.0332 ***
Displaced Dropout 0.0374 0.0010 35.8958 0.0000 0.0354 0.0395 ***
Educational special needs Graduate -0.0045 0.0003 -16.5738 0.0000 -0.0050 -0.0040 ***
Educational special needs Enrolled -0.0098 0.0003 -35.5261 0.0000 -0.0103 -0.0092 ***
Educational special needs Dropout 0.0143 0.0004 38.2623 0.0000 0.0135 0.0150 ***
Father’s occupation Graduate 0.0000 0.0006 0.0644 0.9487 -0.0012 0.0012
Father’s occupation Enrolled 0.0000 0.0006 -0.0306 0.9756 -0.0011 0.0011
Father’s occupation Dropout 0.0000 0.0005 -0.0419 0.9666 -0.0011 0.0010
Father’s qualification Graduate 0.0008 0.0004 1.9866 0.0470 0.0000 0.0017
Father’s qualification Enrolled -0.0001 0.0004 -0.2817 0.7782 -0.0010 0.0007
Father’s qualification Dropout -0.0007 0.0004 -1.8123 0.0699 -0.0015 0.0001 .
GDP Graduate -0.0010 0.0016 -0.6252 0.5318 -0.0042 0.0022
GDP Enrolled 0.0010 0.0027 0.3942 0.6934 -0.0042 0.0063
GDP Dropout 0.0000 0.0022 -0.0182 0.9855 -0.0043 0.0043
Gender Graduate -0.0400 0.0008 -48.4002 0.0000 -0.0416 -0.0383 ***
Gender Enrolled 0.0098 0.0007 13.0706 0.0000 0.0083 0.0113 ***
Gender Dropout 0.0302 0.0009 33.9473 0.0000 0.0284 0.0319 ***
Inflation rate Graduate 0.0031 0.0009 3.5378 0.0004 0.0014 0.0048 ***
Inflation rate Enrolled -0.0039 0.0038 -1.0191 0.3082 -0.0114 0.0036
Inflation rate Dropout 0.0008 0.0030 0.2780 0.7810 -0.0051 0.0067
International Graduate 0.1045 0.0021 49.9945 0.0000 0.1004 0.1086 ***
International Enrolled -0.0129 0.0023 -5.7146 0.0000 -0.0173 -0.0085 ***
International Dropout -0.0916 0.0025 -36.1662 0.0000 -0.0966 -0.0866 ***
Marital status Graduate 0.0024 0.0002 10.7222 0.0000 0.0020 0.0029 ***
Marital status Enrolled 0.0079 0.0003 29.5026 0.0000 0.0073 0.0084 ***
Marital status Dropout -0.0103 0.0003 -39.6433 0.0000 -0.0108 -0.0098 ***
Mother’s occupation Graduate 0.0002 0.0006 0.3705 0.7110 -0.0009 0.0014
Mother’s occupation Enrolled 0.0006 0.0005 1.1326 0.2574 -0.0004 0.0016
Mother’s occupation Dropout -0.0008 0.0005 -1.6025 0.1090 -0.0018 0.0002
Mother’s qualification Graduate 0.0003 0.0004 0.7909 0.4290 -0.0005 0.0012
Mother’s qualification Enrolled -0.0014 0.0005 -2.9833 0.0029 -0.0023 -0.0005 **
Mother’s qualification Dropout 0.0010 0.0004 2.5356 0.0112 0.0002 0.0018
Nacionality Graduate -0.0026 0.0007 -3.5762 0.0003 -0.0040 -0.0012 ***
Nacionality Enrolled 0.0001 0.0008 0.1354 0.8923 -0.0014 0.0017
Nacionality Dropout 0.0025 0.0006 4.0236 0.0001 0.0013 0.0036 ***
Previous qualification Graduate 0.0014 0.0006 2.1527 0.0313 0.0001 0.0026
Previous qualification Enrolled -0.0006 0.0007 -0.9638 0.3351 -0.0019 0.0007
Previous qualification Dropout -0.0007 0.0006 -1.3064 0.1914 -0.0018 0.0004
Previous qualification (grade) Graduate -0.0001 0.0005 -0.1609 0.8722 -0.0010 0.0008
Previous qualification (grade) Enrolled -0.0006 0.0005 -1.2681 0.2048 -0.0016 0.0003
Previous qualification (grade) Dropout 0.0007 0.0004 1.7119 0.0869 -0.0001 0.0015 .
Scholarship holder Graduate 0.0671 0.0014 49.5394 0.0000 0.0644 0.0697 ***
Scholarship holder Enrolled -0.0247 0.0013 -18.5165 0.0000 -0.0274 -0.0221 ***
Scholarship holder Dropout -0.0423 0.0013 -31.7345 0.0000 -0.0450 -0.0397 ***
Tuition fees up to date Graduate 0.2435 0.0062 39.2992 0.0000 0.2313 0.2556 ***
Tuition fees up to date Enrolled 0.0788 0.0043 18.4887 0.0000 0.0704 0.0871 ***
Tuition fees up to date Dropout -0.3222 0.0086 -37.3936 0.0000 -0.3391 -0.3053 ***
Unemployment rate Graduate -0.0013 0.0019 -0.6991 0.4845 -0.0051 0.0024
Unemployment rate Enrolled -0.0080 0.0022 -3.6434 0.0003 -0.0123 -0.0037 ***
Unemployment rate Dropout 0.0093 0.0019 4.9907 0.0000 0.0057 0.0130 ***
# Plot AME signifikan
ame_sig <- ame_df %>%
  filter(P_value < 0.05, Variabel != "(Intercept)") %>%
  mutate(Variabel = reorder(Variabel, AME))

p_ame <- ggplot(ame_sig, aes(x = Variabel, y = AME, fill = Kategori)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.85) +
  geom_errorbar(aes(ymin = CI_Low, ymax = CI_High),
                position = position_dodge(0.9),
                width = 0.3, linewidth = 0.5) +
  geom_hline(yintercept = 0, linetype = "solid", linewidth = 0.5) +
  coord_flip() +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  labs(title    = "Gambar 9. Average Marginal Effects (AME) — Variabel Signifikan",
       subtitle = "AME = perubahan probabilitas rata-rata per 1 unit perubahan X",
       x = "Variabel", y = "AME (Perubahan Probabilitas)",
       fill = "Status") +
  theme_minimal(base_size = 10) +
  theme(plot.title = element_text(face = "bold"))
print(p_ame)
Gambar 9. Average Marginal Effects Signifikan

Gambar 9. Average Marginal Effects Signifikan

Interpretasi: Tabel 6 dan Gambar 9 menampilkan Average Marginal Effects (AME), yang merupakan ukuran pengaruh yang paling mudah diinterpretasikan secara praktis. AME menjawab pertanyaan: “Rata-rata, seberapa besar perubahan probabilitas suatu status akademik jika variabel X meningkat 1 unit?”

  • AME positif untuk Dropout: kenaikan 1 unit variabel X meningkatkan probabilitas rata-rata mahasiswa masuk kategori Dropout.
  • AME negatif untuk Dropout: kenaikan 1 unit variabel X menurunkan probabilitas rata-rata Dropout (berarti meningkatkan peluang Graduate atau Enrolled).
  • AME dihitung sebagai rata-rata marginal effect di seluruh observasi pada data training, sehingga mencerminkan pengaruh “di populasi” secara lebih realistis dibandingkan hanya pada nilai rata-rata (marginal effect at the mean).
  • Contoh: Jika AME untuk “Curricular units 2nd sem (approved)” terhadap Graduate = +0.05, artinya setiap tambahan 1 unit kurikuler yang lulus di semester 2 meningkatkan probabilitas lulus (Graduate) rata-rata sebesar 5 poin persentase.

AME memberikan perspektif yang melengkapi RRR, karena AME langsung berbicara dalam satuan probabilitas (0–1) yang lebih intuitif untuk komunikasi kebijakan.


Evaluasi Model

Prediksi dan Confusion Matrix

pred_test <- predict(model_mlr, newdata = df_test, type = "class")
prob_test <- predict(model_mlr, newdata = df_test, type = "probs")

cm <- confusionMatrix(pred_test, df_test$Target)

cat("Confusion Matrix:\n")
## Confusion Matrix:
print(cm$table)
##           Reference
## Prediction Graduate Enrolled Dropout
##   Graduate      390       63      40
##   Enrolled       35       47      25
##   Dropout        16       48     219
cat("\nStatistik Evaluasi Keseluruhan:\n")
## 
## Statistik Evaluasi Keseluruhan:
print(cm$overall)
##       Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull 
##   7.429219e-01   5.689422e-01   7.127498e-01   7.714630e-01   4.994337e-01 
## AccuracyPValue  McnemarPValue 
##   1.682322e-49   1.194884e-05
# Visualisasi Confusion Matrix
cm_df <- as.data.frame(cm$table)
colnames(cm_df) <- c("Prediksi", "Aktual", "Frekuensi")

p_cm <- ggplot(cm_df, aes(x = Aktual, y = Prediksi, fill = Frekuensi)) +
  geom_tile(color = "white", linewidth = 0.5) +
  geom_text(aes(label = Frekuensi), size = 5, fontface = "bold") +
  scale_fill_gradient(low = "#E3F2FD", high = "#1565C0") +
  labs(title    = "Gambar 10. Confusion Matrix — Data Testing",
       subtitle = sprintf("Akurasi Keseluruhan: %.2f%%",
                          cm$overall["Accuracy"] * 100),
       x = "Status Aktual", y = "Status Prediksi", fill = "Frekuensi") +
  theme_minimal(base_size = 12) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5),
        axis.text  = element_text(size = 11))
print(p_cm)
Gambar 10. Confusion Matrix — Data Testing

Gambar 10. Confusion Matrix — Data Testing

Interpretasi: Gambar 10 menampilkan Confusion Matrix yang membandingkan prediksi model dengan status aktual pada data testing. Setiap sel menunjukkan jumlah observasi:

  • Diagonal utama (kiri atas ke kanan bawah): prediksi benar (Dropout diprediksi Dropout, Enrolled → Enrolled, Graduate → Graduate). Warna biru lebih gelap menandakan jumlah yang lebih besar.
  • Sel di luar diagonal: kesalahan prediksi (misclassification). Perhatikan ke arah mana kesalahan lebih sering terjadi (misalnya, apakah Dropout sering diprediksi sebagai Graduate atau Enrolled?).

Akurasi keseluruhan = total prediksi benar / total observasi. Namun, karena distribusi kelas tidak seimbang, akurasi saja bisa menyesatkan; metrik per-kelas (Sensitivity, Specificity, F1-Score) perlu dicermati lebih lanjut pada tabel berikutnya.

Evaluasi Per Kelas

eval_per_class <- data.frame(
  Kelas       = rownames(cm$byClass),
  Sensitivity = round(cm$byClass[, "Sensitivity"],   4),
  Specificity = round(cm$byClass[, "Specificity"],   4),
  Precision   = round(cm$byClass[, "Pos Pred Value"], 4),
  F1_Score    = round(cm$byClass[, "F1"],             4)
)
rownames(eval_per_class) <- NULL
eval_per_class$Kelas <- gsub("Class: ", "", eval_per_class$Kelas)

kable(eval_per_class,
      col.names = c("Kelas", "Sensitivity", "Specificity", "Precision", "F1-Score"),
      caption   = "Tabel 7. Performa Model MLR per Kelas") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE, position = "left", font_size = 13) %>%
  column_spec(1, bold = TRUE)
Tabel 7. Performa Model MLR per Kelas
Kelas Sensitivity Specificity Precision F1-Score
Graduate 0.8844 0.7670 0.7911 0.8351
Enrolled 0.2975 0.9172 0.4393 0.3547
Dropout 0.7711 0.8932 0.7739 0.7725

Interpretasi: Tabel 7 menyajikan metrik evaluasi untuk masing-masing kelas target:

  • Sensitivity (Recall): proporsi data aktual kelas X yang berhasil diprediksi dengan benar. Sensitivity tinggi berarti model jarang “melewatkan” kasus dari kelas tersebut.
  • Specificity: proporsi data aktual yang bukan kelas X yang berhasil diidentifikasi sebagai bukan kelas X. Specificity tinggi berarti model tidak sering salah mengklasifikasikan kelas lain sebagai kelas X.
  • Precision: proporsi prediksi kelas X yang memang benar-benar kelas X. Precision tinggi berarti ketika model memprediksi “Dropout”, prediksinya banyak yang tepat.
  • F1-Score: rata-rata harmonik Precision dan Recall; merupakan metrik keseimbangan yang berguna saat kelas tidak seimbang.

Perhatikan perbedaan performa antar kelas: kelas dengan lebih banyak data (misalnya Graduate) cenderung memiliki metrik yang lebih tinggi. Kelas Dropout atau Enrolled yang lebih sedikit mungkin memiliki F1-Score lebih rendah, yang menjadi area fokus untuk perbaikan model jika diperlukan.

Ringkasan Performa Model

model_summary_tbl <- data.frame(
  Metrik = c("Akurasi", "Kappa",
             "Log-Likelihood (Full)", "Log-Likelihood (Null)",
             "G² (Chi-Square)", "P-value G²",
             "Jumlah Observasi Training", "Jumlah Observasi Testing",
             "Jumlah Prediktor", "Kategori Referensi"),
  Nilai  = c(
    sprintf("%.2f%%", cm$overall["Accuracy"] * 100),
    sprintf("%.4f",   cm$overall["Kappa"]),
    sprintf("%.4f",   ll_model),
    sprintf("%.4f",   ll_null),
    sprintf("%.4f",   G2),
    format(p_g2, scientific = TRUE, digits = 4),
    nrow(df_train),
    nrow(df_test),
    length(predictor_cols),
    "Graduate"
  )
)

kable(model_summary_tbl,
      col.names = c("Metrik", "Nilai"),
      caption   = "Tabel 8. Ringkasan Performa Model Multinomial Logistic Regression") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE, position = "left", font_size = 13) %>%
  column_spec(1, bold = TRUE)
Tabel 8. Ringkasan Performa Model Multinomial Logistic Regression
Metrik Nilai
Akurasi 74.29%
Kappa 0.5689
Log-Likelihood (Full) -1952.0376
Log-Likelihood (Null) -3611.6229
G² (Chi-Square) 3319.1707
P-value G² 0e+00
Jumlah Observasi Training 3541
Jumlah Observasi Testing 883
Jumlah Prediktor 36
Kategori Referensi Graduate

Interpretasi: Tabel 8 merangkum seluruh metrik performa model dalam satu tabel komprehensif:

  • Akurasi: persentase prediksi benar secara keseluruhan pada data testing. Nilai yang lebih tinggi (mendekati 100%) menunjukkan model yang lebih baik.
  • Kappa (Cohen’s Kappa): mengukur akurasi yang disesuaikan dengan kemungkinan prediksi benar secara kebetulan. Kappa > 0.6 umumnya dianggap baik; Kappa > 0.8 sangat baik. Kappa lebih fair daripada akurasi saat data tidak seimbang.
  • Log-Likelihood: mengukur kesesuaian model; semakin mendekati nol (kurang negatif), semakin baik fit model.
  • dan p-value: mengonfirmasi bahwa model secara keseluruhan signifikan melebihi model null.

Visualisasi Tambahan

Distribusi Probabilitas Prediksi

prob_df <- as.data.frame(prob_test)
prob_df$Aktual   <- df_test$Target
prob_df$Prediksi <- pred_test
prob_df$Benar    <- prob_df$Aktual == prob_df$Prediksi

prob_long <- prob_df %>%
  pivot_longer(cols = c(Graduate, Enrolled, Dropout),
               names_to = "Kategori", values_to = "Probabilitas") %>%
  mutate(Kategori = factor(Kategori, levels = c("Graduate", "Enrolled", "Dropout")))

p_prob <- ggplot(prob_long, aes(x = Aktual, y = Probabilitas, fill = Kategori)) +
  geom_boxplot(alpha = 0.75, outlier.size = 0.5) +
  facet_wrap(~ Kategori) +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  labs(title    = "Gambar 11. Distribusi Probabilitas Prediksi per Status Aktual",
       subtitle = "Model yang baik: probabilitas tertinggi pada kategori yang benar",
       x = "Status Aktual", y = "Probabilitas Prediksi",
       fill = "Kategori Prediksi") +
  theme_minimal(base_size = 11) +
  theme(legend.position = "none",
        plot.title      = element_text(face = "bold"))
print(p_prob)
Gambar 11. Distribusi Probabilitas Prediksi per Status Aktual

Gambar 11. Distribusi Probabilitas Prediksi per Status Aktual

Interpretasi: Gambar 11 menampilkan distribusi probabilitas prediksi model, dibagi per kategori yang diprediksi dan per status aktual mahasiswa. Sebuah model yang baik akan menunjukkan pola berikut:

  • Panel Graduate: probabilitas prediksi Graduate tertinggi ketika status aktual adalah Graduate (boxplot dengan nilai tinggi pada kolom “Graduate”).
  • Panel Enrolled: probabilitas prediksi Enrolled tertinggi ketika status aktual adalah Enrolled.
  • Panel Dropout: probabilitas prediksi Dropout tertinggi ketika status aktual adalah Dropout.

Jika boxplot probabilitas pada kategori yang “sesuai” lebih tinggi dibandingkan kategori lain, model memiliki kemampuan diskriminasi yang baik. Tumpang tindih distribusi antar kolom status aktual menunjukkan area di mana model masih kesulitan membedakan kategori.

Akurasi Prediksi per Status Akademik

acc_by_class <- prob_df %>%
  group_by(Aktual) %>%
  summarise(
    Total   = n(),
    Benar   = sum(Benar),
    Akurasi = round(Benar / Total * 100, 2)
  )

p_acc <- ggplot(acc_by_class, aes(x = Aktual, y = Akurasi, fill = Aktual)) +
  geom_bar(stat = "identity", alpha = 0.85, width = 0.6) +
  geom_text(aes(label = paste0(Akurasi, "%\n(", Benar, "/", Total, ")")),
            vjust = -0.3, size = 4, fontface = "bold") +
  geom_hline(yintercept = cm$overall["Accuracy"] * 100,
             linetype = "dashed", color = "gray40", linewidth = 0.8) +
  scale_fill_manual(values = c("Graduate" = "#2196F3",
                               "Enrolled" = "#4CAF50",
                               "Dropout"  = "#F44336")) +
  scale_y_continuous(limits = c(0, 115)) +
  annotate("text", x = 0.6,
           y = cm$overall["Accuracy"] * 100 + 3,
           label = sprintf("Akurasi Rata-rata: %.2f%%", cm$overall["Accuracy"] * 100),
           color = "gray40", size = 3.5) +
  labs(title = "Gambar 12. Akurasi Prediksi per Status Akademik",
       x = "Status Aktual", y = "Akurasi (%)") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        plot.title      = element_text(face = "bold"))
print(p_acc)
Gambar 12. Akurasi Prediksi per Status Akademik

Gambar 12. Akurasi Prediksi per Status Akademik

Interpretasi: Gambar 12 menampilkan akurasi prediksi model untuk masing-masing kelas, dilengkapi jumlah prediksi benar dibanding total observasi per kelas. Garis putus-putus menandai akurasi rata-rata keseluruhan.

  • Akurasi kelas Graduate biasanya paling tinggi karena memiliki sampel paling banyak (model lebih “terlatih” untuk mengenali pola Graduate).
  • Akurasi kelas Enrolled cenderung paling rendah karena karakteristiknya berada di tengah antara Dropout dan Graduate, sehingga lebih sulit dibedakan.
  • Akurasi Dropout biasanya moderat — pola akademik yang buruk (nilai rendah, sedikit unit lulus) memberikan sinyal yang relatif jelas.

Perbedaan akurasi antar kelas ini mengindikasikan bahwa model bekerja berbeda untuk setiap segmen mahasiswa. Untuk aplikasi praktis (misalnya sistem peringatan dini dropout), sensitivitas terhadap kelas Dropout menjadi prioritas utama meskipun akurasi keseluruhannya mungkin lebih rendah.


Kesimpulan

model_summary_final <- data.frame(
  Metrik  = c("Akurasi", "Kappa", "G² Statistik", "P-value G²",
              "Data Training", "Data Testing", "Jumlah Prediktor",
              "Kategori Referensi"),
  Nilai   = c(
    sprintf("%.2f%%", cm$overall["Accuracy"] * 100),
    sprintf("%.4f",   cm$overall["Kappa"]),
    sprintf("%.4f",   G2),
    format(p_g2, scientific = TRUE, digits = 4),
    nrow(df_train),
    nrow(df_test),
    length(predictor_cols),
    "Graduate"
  )
)

kable(model_summary_final,
      col.names = c("Metrik", "Nilai"),
      caption   = "Tabel 9. Ringkasan Akhir Model Multinomial Logistic Regression") %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE, position = "left", font_size = 13) %>%
  column_spec(1, bold = TRUE)
Tabel 9. Ringkasan Akhir Model Multinomial Logistic Regression
Metrik Nilai
Akurasi 74.29%
Kappa 0.5689
G² Statistik 3319.1707
P-value G² 0e+00
Data Training 3541
Data Testing 883
Jumlah Prediktor 36
Kategori Referensi Graduate

Berdasarkan analisis Multinomial Logistic Regression terhadap data Status Akademik Mahasiswa (Polytechnic Institute of Portalegre), diperoleh kesimpulan sebagai berikut:

1. Keseluruhan Model Signifikan Uji serentak (G²/LRT) menghasilkan p-value yang sangat kecil (< 0.05), sehingga H₀ ditolak. Artinya, secara bersama-sama variabel prediktor yang digunakan signifikan memengaruhi status akademik mahasiswa dibandingkan model tanpa prediktor.

2. Variabel Akademik Semester Paling Dominan Variabel yang paling konsisten signifikan pada uji parsial Wald adalah variabel unit kurikuler semester 1 dan 2 yang berhasil lulus (approved). Mahasiswa yang menyelesaikan lebih banyak matakuliah di semester awal memiliki odds yang jauh lebih rendah untuk Dropout dan lebih tinggi untuk Graduate, mengonfirmasi bahwa performa semester awal adalah prediktor terkuat keberhasilan studi.

3. Performa Model Model mencapai akurasi 74.29% pada data testing dengan Kappa 0.5689, yang menunjukkan kemampuan diskriminasi yang baik. Performa terbaik diperoleh pada kelas Graduate; kelas Enrolled relatif lebih sulit diprediksi karena posisinya yang berada di antara Dropout dan Graduate.

4. Implikasi Praktis - Institusi pendidikan dapat menggunakan model ini sebagai sistem peringatan dini untuk mengidentifikasi mahasiswa berisiko Dropout sejak semester pertama. - Intervensi yang tepat sasaran (misalnya tutoring akademik, konseling finansial) dapat difokuskan pada mahasiswa dengan profil prediktor yang menunjukkan risiko tinggi. - Variabel sosial-ekonomi seperti status debitur dan keterkinian pembayaran biaya kuliah juga signifikan, menunjukkan pentingnya dukungan finansial dalam keberhasilan studi.

5. Keterbatasan - Model menggunakan seluruh prediktor tanpa seleksi fitur, sehingga terdapat potensi multikolinearitas yang dapat memengaruhi stabilitas estimasi koefisien individual. - Ketidakseimbangan kelas (class imbalance) dapat menyebabkan bias terhadap kelas mayoritas; teknik seperti SMOTE atau penimbangan kelas dapat diterapkan untuk meningkatkan performa pada kelas minoritas.


Analisis selesai. Dokumen ini dibuat menggunakan R Markdown dengan referensi metodologi dari Modul 3 (Analisis Clustering) dan Modul 4 Part 2 (Multinomial Logistic Regression).