Keberhasilan studi mahasiswa merupakan indikator kualitas pendidikan tinggi yang krusial. Memahami faktor-faktor yang memengaruhi apakah seorang mahasiswa akan lulus (Graduate), masih aktif (Enrolled), atau keluar sebelum selesai (Dropout) sangat penting bagi institusi pendidikan untuk merancang intervensi yang tepat sasaran.
Analisis Multinomial Logistic Regression (MLR) digunakan karena variabel dependen (Status Akademik) memiliki lebih dari dua kategori yang bersifat nominal. Berbeda dengan regresi logistik biner, MLR memodelkan probabilitas setiap kategori secara simultan dengan menetapkan satu kategori sebagai referensi (base category), dalam hal ini Graduate.
Dataset yang digunakan adalah Student Academic Performance dari Polytechnic Institute of Portalegre, Portugal. Dataset ini mencakup informasi demografis, akademik, dan ekonomi mahasiswa pada saat pendaftaran maupun selama studi.
Variabel dependen: Target (Graduate / Enrolled / Dropout)
Variabel prediktor mencakup:
# Perpanjang batas waktu unduh
options(timeout = 300)
# Fungsi instalasi otomatis jika package belum tersedia
install_if_missing <- function(pkg) {
if (!require(pkg, character.only = TRUE, quietly = TRUE)) {
install.packages(pkg, repos = "https://cloud.r-project.org", dependencies = TRUE)
library(pkg, character.only = TRUE)
}
}
packages_needed <- c(
"nnet", "car", "caret", "ggplot2", "dplyr", "tidyr",
"knitr", "kableExtra", "gridExtra", "RColorBrewer",
"marginaleffects", "broom", "scales", "ggcorrplot", "reshape2"
)
invisible(sapply(packages_needed, install_if_missing))
suppressPackageStartupMessages({
library(nnet) # multinom() - Multinomial Logistic Regression
library(car) # vif() - Variance Inflation Factor
library(caret) # confusionMatrix()
library(ggplot2) # visualisasi
library(dplyr) # manipulasi data
library(tidyr) # reshape data
library(knitr) # kable()
library(kableExtra) # kable styling
library(gridExtra) # grid.arrange()
library(RColorBrewer) # palet warna
library(marginaleffects) # avg_slopes() - Average Marginal Effects
library(broom) # tidy() - tidy output model
library(scales) # percent_format()
library(ggcorrplot) # korelasi heatmap
library(reshape2) # melt()
})# Membaca data (pastikan file data.csv berada di working directory)
df_raw <- read.csv("data.csv", sep = ";", header = TRUE,
stringsAsFactors = FALSE, check.names = FALSE)
# Bersihkan nama kolom dari spasi berlebih
colnames(df_raw) <- trimws(colnames(df_raw))
# Tampilkan dimensi dan distribusi awal
cat("Dimensi Data:", nrow(df_raw), "baris x", ncol(df_raw), "kolom\n")## Dimensi Data: 4424 baris x 37 kolom
##
## Distribusi Target (sebelum preprocessing):
##
## Dropout Enrolled Graduate
## 1421 794 2209
# Ubah variabel Target menjadi faktor dengan referensi "Graduate"
df_raw$Target <- factor(df_raw$Target,
levels = c("Graduate", "Enrolled", "Dropout"))
# Konversi semua kolom prediktor ke numerik
predictor_cols <- setdiff(colnames(df_raw), "Target")
df <- df_raw
for (col in predictor_cols) {
df[[col]] <- as.numeric(df[[col]])
}
# Cek missing value
cat("\nJumlah Missing Value per kolom:\n")##
## Jumlah Missing Value per kolom:
## named numeric(0)
## Total missing value: 0
# Hapus baris dengan missing value
df <- na.omit(df)
cat("Jumlah data setelah hapus NA:", nrow(df), "\n")## Jumlah data setelah hapus NA: 4424
Interpretasi: Dataset memuat 4424 mahasiswa dengan 37 variabel. Distribusi awal menunjukkan komposisi tiga kelas target: Graduate, Enrolled, dan Dropout. Setelah pemeriksaan dan penghapusan missing value (jika ada), data siap untuk analisis. Kategori referensi ditetapkan sebagai “Graduate”, artinya seluruh koefisien model akan menginterpretasikan peluang relatif Enrolled maupun Dropout dibandingkan dengan Graduate.
# Daftar variabel numerik kontinu
numeric_vars <- c("Previous qualification (grade)", "Admission grade",
"Age at enrollment",
"Curricular units 1st sem (credited)",
"Curricular units 1st sem (enrolled)",
"Curricular units 1st sem (evaluations)",
"Curricular units 1st sem (approved)",
"Curricular units 1st sem (grade)",
"Curricular units 1st sem (without evaluations)",
"Curricular units 2nd sem (credited)",
"Curricular units 2nd sem (enrolled)",
"Curricular units 2nd sem (evaluations)",
"Curricular units 2nd sem (approved)",
"Curricular units 2nd sem (grade)",
"Curricular units 2nd sem (without evaluations)",
"Unemployment rate", "Inflation rate", "GDP")
desc_stats <- data.frame(
Variabel = numeric_vars,
Min = round(sapply(df[, numeric_vars], min, na.rm = TRUE), 2),
Q1 = round(sapply(df[, numeric_vars], quantile, 0.25, na.rm = TRUE), 2),
Median = round(sapply(df[, numeric_vars], median, na.rm = TRUE), 2),
Mean = round(sapply(df[, numeric_vars], mean, na.rm = TRUE), 2),
Q3 = round(sapply(df[, numeric_vars], quantile, 0.75, na.rm = TRUE), 2),
Max = round(sapply(df[, numeric_vars], max, na.rm = TRUE), 2),
SD = round(sapply(df[, numeric_vars], sd, na.rm = TRUE), 3)
)
rownames(desc_stats) <- NULL
kable(desc_stats,
caption = "Tabel 1. Statistika Deskriptif Variabel Numerik",
align = c("l", rep("r", 7))) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 12) %>%
column_spec(1, bold = TRUE)| Variabel | Min | Q1 | Median | Mean | Q3 | Max | SD |
|---|---|---|---|---|---|---|---|
| Previous qualification (grade) | 95.00 | 125.00 | 133.10 | 132.61 | 140.00 | 190.00 | 13.188 |
| Admission grade | 95.00 | 117.90 | 126.10 | 126.98 | 134.80 | 190.00 | 14.482 |
| Age at enrollment | 17.00 | 19.00 | 20.00 | 23.27 | 25.00 | 70.00 | 7.588 |
| Curricular units 1st sem (credited) | 0.00 | 0.00 | 0.00 | 0.71 | 0.00 | 20.00 | 2.361 |
| Curricular units 1st sem (enrolled) | 0.00 | 5.00 | 6.00 | 6.27 | 7.00 | 26.00 | 2.480 |
| Curricular units 1st sem (evaluations) | 0.00 | 6.00 | 8.00 | 8.30 | 10.00 | 45.00 | 4.179 |
| Curricular units 1st sem (approved) | 0.00 | 3.00 | 5.00 | 4.71 | 6.00 | 26.00 | 3.094 |
| Curricular units 1st sem (grade) | 0.00 | 11.00 | 12.29 | 10.64 | 13.40 | 18.88 | 4.844 |
| Curricular units 1st sem (without evaluations) | 0.00 | 0.00 | 0.00 | 0.14 | 0.00 | 12.00 | 0.691 |
| Curricular units 2nd sem (credited) | 0.00 | 0.00 | 0.00 | 0.54 | 0.00 | 19.00 | 1.919 |
| Curricular units 2nd sem (enrolled) | 0.00 | 5.00 | 6.00 | 6.23 | 7.00 | 23.00 | 2.196 |
| Curricular units 2nd sem (evaluations) | 0.00 | 6.00 | 8.00 | 8.06 | 10.00 | 33.00 | 3.948 |
| Curricular units 2nd sem (approved) | 0.00 | 2.00 | 5.00 | 4.44 | 6.00 | 20.00 | 3.015 |
| Curricular units 2nd sem (grade) | 0.00 | 10.75 | 12.20 | 10.23 | 13.33 | 18.57 | 5.211 |
| Curricular units 2nd sem (without evaluations) | 0.00 | 0.00 | 0.00 | 0.15 | 0.00 | 12.00 | 0.754 |
| Unemployment rate | 7.60 | 9.40 | 11.10 | 11.57 | 13.90 | 16.20 | 2.664 |
| Inflation rate | -0.80 | 0.30 | 1.40 | 1.23 | 2.60 | 3.70 | 1.383 |
| GDP | -4.06 | -1.70 | 0.32 | 0.00 | 1.79 | 3.51 | 2.270 |
Interpretasi: Tabel 1 menampilkan ringkasan distribusi delapan belas variabel numerik. Beberapa temuan penting:
- Previous qualification (grade) dan Admission grade memiliki rentang nilai yang cukup lebar (dari sekitar 95 hingga 190), menandakan variasi akademik awal yang besar antar mahasiswa.
- Age at enrollment menunjukkan median yang relatif muda (sekitar 20 tahun) namun nilai maksimum yang jauh lebih tinggi, mengindikasikan distribusi right-skewed dengan adanya mahasiswa berusia lebih tua.
- Unit kurikuler yang disetujui (approved) pada semester 1 dan 2 kemungkinan besar merupakan prediktor paling kuat mengingat perbedaan nilainya erat kaitannya dengan keberhasilan studi.
- Variabel makroekonomi seperti Unemployment rate, Inflation rate, dan GDP mencerminkan kondisi eksternal yang dapat memengaruhi keputusan mahasiswa untuk melanjutkan studi.
- Variabel SD yang besar relatif terhadap Mean pada beberapa kolom menandakan heterogenitas data yang tinggi.
target_dist <- as.data.frame(table(Status = df$Target)) %>%
mutate(Persentase = round(Freq / sum(Freq) * 100, 2),
Label = paste0(Status, "\n(", Freq, " | ", Persentase, "%)"))
p_target <- ggplot(target_dist, aes(x = "", y = Freq, fill = Status)) +
geom_bar(stat = "identity", width = 1, color = "white", linewidth = 0.5) +
coord_polar("y", start = 0) +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
geom_text(aes(label = Label),
position = position_stack(vjust = 0.5),
size = 3.5, fontface = "bold", color = "white") +
labs(title = "Gambar 1. Distribusi Status Akademik Mahasiswa",
subtitle = paste("Total:", nrow(df), "mahasiswa"),
fill = "Status") +
theme_void(base_size = 12) +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, color = "gray50"))
print(p_target)Gambar 1. Distribusi Status Akademik Mahasiswa
Interpretasi: Gambar 1 menampilkan diagram lingkaran distribusi ketiga kategori status akademik. Dapat dilihat bahwa kelompok Graduate mendominasi dataset, diikuti oleh Dropout dan Enrolled. Ketidakseimbangan kelas (class imbalance) ini merupakan kondisi yang umum dalam data pendidikan dan perlu diperhatikan saat mengevaluasi performa model, khususnya untuk kategori minoritas. Kategori Graduate sebagai referensi dipilih karena merepresentasikan kondisi “ideal” yang menjadi acuan perbandingan.
p_age <- ggplot(df, aes(x = Target, y = `Age at enrollment`, fill = Target)) +
geom_boxplot(alpha = 0.75, outlier.shape = 21, outlier.size = 1.5) +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
labs(title = "Gambar 2. Distribusi Usia Saat Mendaftar per Status Akademik",
x = "Status Akademik", y = "Usia Saat Mendaftar") +
theme_minimal(base_size = 12) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
print(p_age)Gambar 2. Distribusi Usia Saat Mendaftar per Status Akademik
Interpretasi: Gambar 2 menunjukkan distribusi usia saat pendaftaran berdasarkan status akademik akhir. Mahasiswa yang Dropout cenderung memiliki median usia yang lebih tinggi dibandingkan Graduate, yang mengindikasikan bahwa mahasiswa yang mendaftar lebih tua mungkin menghadapi lebih banyak tantangan (misalnya, tanggung jawab keluarga atau pekerjaan) yang memengaruhi kelangsungan studi mereka. Sementara itu, mahasiswa Graduate dan Enrolled memiliki distribusi usia yang lebih muda dan relatif serupa. Adanya outlier pada semua kategori (titik di luar whisker) menunjukkan keberadaan mahasiswa dengan usia jauh di atas rata-rata.
df_long_sem <- df %>%
select(Target,
`Sem1 Approved` = `Curricular units 1st sem (approved)`,
`Sem2 Approved` = `Curricular units 2nd sem (approved)`) %>%
pivot_longer(cols = c(`Sem1 Approved`, `Sem2 Approved`),
names_to = "Semester", values_to = "Approved")
p_approved <- ggplot(df_long_sem, aes(x = Target, y = Approved, fill = Target)) +
geom_boxplot(alpha = 0.75, outlier.shape = 21, outlier.size = 1) +
facet_wrap(~ Semester) +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
labs(title = "Gambar 3. Unit Kurikuler Lulus per Status Akademik",
x = "Status Akademik", y = "Jumlah Unit Kurikuler Disetujui") +
theme_minimal(base_size = 12) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
print(p_approved)Gambar 3. Distribusi Unit Kurikuler Lulus per Status Akademik
Interpretasi: Gambar 3 mengungkap perbedaan yang sangat jelas pada jumlah unit kurikuler yang berhasil dilulus (approved) antara ketiga kelompok. Mahasiswa Graduate secara konsisten memiliki jumlah unit kurikuler yang disetujui lebih tinggi di kedua semester, sedangkan mahasiswa Dropout memiliki nilai yang jauh lebih rendah, bahkan mendekati nol pada banyak kasus. Hal ini mengindikasikan bahwa performa akademik semester awal merupakan prediktor yang sangat kuat untuk status akhir mahasiswa. Mahasiswa Enrolled berada di antara keduanya, mencerminkan bahwa mereka masih dalam proses studi dengan tingkat kelulusan matakuliah yang moderat.
cor_matrix <- cor(df[, numeric_vars], use = "complete.obs")
p_cor <- ggcorrplot(cor_matrix,
method = "square",
type = "lower",
lab = FALSE,
colors = c("#F44336", "white", "#2196F3"),
title = "Gambar 4. Heatmap Korelasi Antar Variabel Numerik",
ggtheme = theme_minimal(base_size = 9)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
axis.text.y = element_text(size = 7),
plot.title = element_text(face = "bold", hjust = 0.5))
print(p_cor)Gambar 4. Heatmap Korelasi Antar Variabel Numerik
Interpretasi: Gambar 4 menampilkan matriks korelasi antar variabel numerik. Warna biru tua menunjukkan korelasi positif kuat, sedangkan merah tua menunjukkan korelasi negatif kuat; putih menandakan korelasi mendekati nol. Beberapa pola yang menonjol:
- Korelasi tinggi terdeteksi antara variabel unit kurikuler semester 1 dan semester 2 (misalnya,
enrolled,approved,evaluations), yang wajar karena performa akademik antar semester cenderung konsisten.- Variabel makroekonomi (Unemployment rate, Inflation rate, GDP) juga berkorelasi satu sama lain, yang mencerminkan kondisi ekonomi yang bergerak bersama.
- Adanya korelasi tinggi antar prediktor ini menjadi dasar perlunya uji multikolinearitas (VIF) pada tahap berikutnya untuk memastikan kestabilan estimasi model.
# Hitung VIF menggunakan model linear sebagai proxy
formula_all <- as.formula(
paste("as.numeric(Target) ~",
paste0("`", predictor_cols, "`", collapse = " + "))
)
lm_temp <- lm(formula_all, data = df)
vif_vals <- vif(lm_temp)
vif_df <- data.frame(
Variabel = names(vif_vals),
VIF = round(vif_vals, 3),
Keterangan = ifelse(vif_vals >= 10, "⚠ Multikolinear Tinggi",
ifelse(vif_vals >= 5, "⚡ Perlu Perhatian", "✓ OK"))
)
rownames(vif_df) <- NULL
kable(vif_df,
caption = "Tabel 2. Nilai VIF Seluruh Variabel Prediktor",
align = c("l", "r", "l")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 12) %>%
column_spec(1, bold = TRUE) %>%
row_spec(which(vif_df$VIF >= 10), background = "#ffe0e0") %>%
row_spec(which(vif_df$VIF >= 5 & vif_df$VIF < 10), background = "#fff3cd")| Variabel | VIF | Keterangan |
|---|---|---|
Marital status
|
1.426 | ✓ OK |
Application mode
|
1.816 | ✓ OK |
Application order
|
1.252 | ✓ OK |
| Course | 2.240 | ✓ OK |
Daytime/evening attendance
|
1.376 | ✓ OK |
Previous qualification
|
1.350 | ✓ OK |
Previous qualification (grade)
|
1.571 | ✓ OK |
| Nacionality | 2.695 | ✓ OK |
Mother's qualification
|
1.539 | ✓ OK |
Father's qualification
|
1.454 | ✓ OK |
Mother's occupation
|
5.979 | ⚡ Perlu Perhatian | |
Father's occupation
|
5.971 | ⚡ Perlu Perhatian | |
Admission grade
|
1.628 | ✓ OK |
| Displaced | 1.315 | ✓ OK |
Educational special needs
|
1.009 | ✓ OK |
| Debtor | 1.257 | ✓ OK |
Tuition fees up to date
|
1.351 | ✓ OK |
| Gender | 1.155 | ✓ OK |
Scholarship holder
|
1.175 | ✓ OK |
Age at enrollment
|
2.302 | ✓ OK |
| International | 2.714 | ✓ OK |
Curricular units 1st sem (credited)
|
16.224 | ⚠ Multikolinear Tinggi |
Curricular units 1st sem (enrolled)
|
24.483 | ⚠ Multikolinear Tinggi |
Curricular units 1st sem (evaluations)
|
4.014 | ✓ OK |
Curricular units 1st sem (approved)
|
13.104 | ⚠ Multikolinear Tinggi |
Curricular units 1st sem (grade)
|
5.197 | ⚡ Perlu Perhatian | |
Curricular units 1st sem (without evaluations)
|
1.713 | ✓ OK |
Curricular units 2nd sem (credited)
|
12.592 | ⚠ Multikolinear Tinggi |
Curricular units 2nd sem (enrolled)
|
17.207 | ⚠ Multikolinear Tinggi |
Curricular units 2nd sem (evaluations)
|
3.386 | ✓ OK |
Curricular units 2nd sem (approved)
|
10.737 | ⚠ Multikolinear Tinggi |
Curricular units 2nd sem (grade)
|
5.778 | ⚡ Perlu Perhatian | |
Curricular units 2nd sem (without evaluations)
|
1.587 | ✓ OK |
Unemployment rate
|
1.292 | ✓ OK |
Inflation rate
|
1.044 | ✓ OK |
| GDP | 1.292 | ✓ OK |
# Identifikasi variabel bermasalah
high_vif <- vif_df$Variabel[vif_df$VIF >= 10]
cat("\nVariabel dengan VIF >= 10:",
if (length(high_vif) == 0) "Tidak ada" else paste(high_vif, collapse = ", "), "\n")##
## Variabel dengan VIF >= 10: `Curricular units 1st sem (credited)`, `Curricular units 1st sem (enrolled)`, `Curricular units 1st sem (approved)`, `Curricular units 2nd sem (credited)`, `Curricular units 2nd sem (enrolled)`, `Curricular units 2nd sem (approved)`
# Plot VIF
vif_plot_df <- vif_df %>%
arrange(VIF) %>%
mutate(Variabel = factor(Variabel, levels = Variabel),
Warna = case_when(VIF >= 10 ~ "Tinggi",
VIF >= 5 ~ "Sedang",
TRUE ~ "Rendah"))
p_vif <- ggplot(vif_plot_df, aes(x = Variabel, y = VIF, fill = Warna)) +
geom_bar(stat = "identity", alpha = 0.85) +
geom_hline(yintercept = 5, linetype = "dashed", color = "orange", linewidth = 0.8) +
geom_hline(yintercept = 10, linetype = "dashed", color = "red", linewidth = 0.8) +
coord_flip() +
scale_fill_manual(values = c("Rendah" = "#4CAF50",
"Sedang" = "#FF9800",
"Tinggi" = "#F44336")) +
annotate("text", x = 1, y = 5.5, label = "VIF = 5", color = "orange", size = 3) +
annotate("text", x = 1, y = 10.5, label = "VIF = 10", color = "red", size = 3) +
labs(title = "Gambar 5. Nilai VIF Seluruh Variabel Prediktor",
subtitle = "Garis oranye = ambang 5; Garis merah = ambang 10",
x = "Variabel", y = "VIF", fill = "Kategori") +
theme_minimal(base_size = 10) +
theme(plot.title = element_text(face = "bold"))
print(p_vif)Gambar 5. Nilai VIF Seluruh Variabel Prediktor
Interpretasi: Tabel 2 dan Gambar 5 menampilkan nilai Variance Inflation Factor (VIF) untuk setiap variabel prediktor.
- VIF < 5 (hijau): tidak ada masalah multikolinearitas — variabel tersebut cukup independen satu sama lain.
- VIF 5–10 (oranye): perlu perhatian, korelasi antar variabel cukup tinggi namun masih dapat ditoleransi.
- VIF ≥ 10 (merah): indikasi multikolinearitas serius yang dapat membuat estimasi koefisien tidak stabil.
Variabel unit kurikuler semester 1 dan 2 yang saling berkorelasi tinggi (terdeteksi di heatmap sebelumnya) mungkin menunjukkan VIF yang lebih tinggi. Dalam praktik, jika VIF sangat tinggi pada beberapa variabel, pilihan yang dapat diambil adalah: (1) menghapus salah satu variabel yang redundan, (2) menggabungkan variabel menjadi indeks komposit, atau (3) menggunakan regularisasi. Dalam modul ini, seluruh variabel tetap dipertahankan untuk tujuan demonstrasi analitik yang komprehensif.
df_num_long <- df %>%
select(all_of(c("Target", numeric_vars[1:9]))) %>%
pivot_longer(cols = -Target, names_to = "Variabel", values_to = "Nilai")
p_outlier <- ggplot(df_num_long, aes(x = Target, y = Nilai, fill = Target)) +
geom_boxplot(alpha = 0.6, outlier.shape = 21, outlier.size = 0.8,
outlier.color = "red") +
facet_wrap(~ Variabel, scales = "free_y", ncol = 3) +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
labs(title = "Gambar 6. Deteksi Outlier per Status Akademik (9 Variabel Pertama)",
x = "Status", y = "Nilai") +
theme_minimal(base_size = 9) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 30, hjust = 1),
plot.title = element_text(face = "bold"))
print(p_outlier)Gambar 6. Deteksi Outlier per Status Akademik
Interpretasi: Gambar 6 menampilkan boxplot per status akademik untuk sembilan variabel numerik pertama, dengan titik merah menandai outlier (nilai di luar 1.5 × IQR). Kehadiran outlier relatif banyak terutama pada variabel yang bernilai diskret seperti unit kurikuler yang di-kredit. Perlu dicatat bahwa dalam konteks data pendidikan, outlier tidak selalu merupakan kesalahan — misalnya mahasiswa dengan jumlah unit kurikuler yang sangat tinggi bisa merupakan mahasiswa transfer. Regresi logistik multinomial cukup robust terhadap outlier pada prediktor (berbeda dengan regresi linear), sehingga penghapusan outlier tidak wajib dilakukan, namun perlu diwaspadai jika jumlahnya sangat ekstrem.
set.seed(2024)
train_idx <- createDataPartition(df$Target, p = 0.8, list = FALSE)
df_train <- df[ train_idx, ]
df_test <- df[-train_idx, ]
cat("Jumlah data training:", nrow(df_train))## Jumlah data training: 3541
##
## Jumlah data testing : 883
##
## Distribusi Target pada data Training (%):
##
## Graduate Enrolled Dropout
## 49.93 17.96 32.11
##
## Distribusi Target pada data Testing (%):
##
## Graduate Enrolled Dropout
## 49.94 17.89 32.16
Interpretasi: Data dibagi dengan rasio 80:20 menggunakan
createDataPartitiondari packagecaret, yang memastikan proporsi setiap kelas target tetap terjaga secara stratifikasi (stratified splitting). Pendekatan ini penting agar distribusi Graduate, Enrolled, dan Dropout pada data training dan testing representatif, mencegah bias evaluasi yang bisa terjadi jika pembagian dilakukan secara acak murni (random split). Data training digunakan untuk membangun model, sedangkan data testing digunakan secara independen untuk mengevaluasi kemampuan generalisasi model.
# Formula dengan semua prediktor
formula_mlr <- as.formula(
paste("Target ~",
paste0("`", predictor_cols, "`", collapse = " + "))
)
cat("Fitting model Multinomial Logistic Regression...\n")## Fitting model Multinomial Logistic Regression...
model_mlr <- multinom(formula_mlr,
data = df_train,
maxit = 500,
trace = FALSE)
cat("Model berhasil difit!\n")## Model berhasil difit!
##
## Ringkasan Model (koefisien dan standard error):
## Call:
## multinom(formula = formula_mlr, data = df_train, maxit = 500,
## trace = FALSE)
##
## Coefficients:
## (Intercept) `Marital status` `Application mode` `Application order`
## Enrolled 2.591354 0.02149497 0.007974044 -0.02631623
## Dropout 1.394192 -0.09995486 0.002695014 0.11416646
## Course `Daytime/evening attendance` `Previous qualification`
## Enrolled 9.137398e-05 -0.02128973 -0.01209524
## Dropout 1.991490e-04 0.02935562 -0.01499905
## `Previous qualification (grade)` Nacionality `Mother's qualification`
## Enrolled -0.002529813 0.01740900 -0.008740620
## Dropout 0.006261044 0.03687635 0.006109717
## `Father's qualification` `Mother's occupation` `Father's occupation`
## Enrolled -0.006140603 0.001391896 -0.0003408843
## Dropout -0.011360295 -0.008112361 -0.0004440111
## `Admission grade` Displaced `Educational special needs` Debtor
## Enrolled -0.008492794 -0.1511190 -0.01882188 0.9074932
## Dropout -0.014162321 0.3277647 0.14295713 0.8708463
## `Tuition fees up to date` Gender `Scholarship holder`
## Enrolled -0.9769278 0.3021360 -0.5580360
## Dropout -3.0268614 0.4960701 -0.7988546
## `Age at enrollment` International
## Enrolled -0.009921602 -0.8433316
## Dropout 0.054733204 -1.7040670
## `Curricular units 1st sem (credited)`
## Enrolled 0.1842744
## Dropout 0.2610108
## `Curricular units 1st sem (enrolled)`
## Enrolled 0.1422616
## Dropout 0.2140077
## `Curricular units 1st sem (evaluations)`
## Enrolled 0.053857407
## Dropout 0.009602908
## `Curricular units 1st sem (approved)`
## Enrolled -0.5715946
## Dropout -0.7523165
## `Curricular units 1st sem (grade)`
## Enrolled 0.04440662
## Dropout 0.10058945
## `Curricular units 1st sem (without evaluations)`
## Enrolled 0.02206242
## Dropout -0.11345276
## `Curricular units 2nd sem (credited)`
## Enrolled -0.1602848
## Dropout 0.1108328
## `Curricular units 2nd sem (enrolled)`
## Enrolled 0.6884902
## Dropout 0.9520021
## `Curricular units 2nd sem (evaluations)`
## Enrolled 0.12392850
## Dropout 0.02628074
## `Curricular units 2nd sem (approved)`
## Enrolled -0.7656494
## Dropout -1.0977382
## `Curricular units 2nd sem (grade)`
## Enrolled -0.1056671
## Dropout -0.1940451
## `Curricular units 2nd sem (without evaluations)` `Unemployment rate`
## Enrolled -0.1022767 -0.02926524
## Dropout -0.1830535 0.08464214
## `Inflation rate` GDP
## Enrolled -0.03878659 0.011641935
## Dropout -0.01339119 0.006324895
##
## Std. Errors:
## (Intercept) `Marital status` `Application mode` `Application order`
## Enrolled 0.0005718884 0.0013015059 0.004278238 0.02764564
## Dropout 0.0004943607 0.0009963952 0.004766024 0.02096289
## Course `Daytime/evening attendance` `Previous qualification`
## Enrolled 4.158236e-05 0.001089257 0.006677090
## Dropout 4.298738e-05 0.001047016 0.007299817
## `Previous qualification (grade)` Nacionality `Mother's qualification`
## Enrolled 0.004923807 0.007797112 0.004479135
## Dropout 0.005281657 0.007783585 0.005134783
## `Father's qualification` `Mother's occupation` `Father's occupation`
## Enrolled 0.004397935 0.005779038 0.005966553
## Dropout 0.005045480 0.007123604 0.007416296
## `Admission grade` Displaced `Educational special needs` Debtor
## Enrolled 0.004958115 0.003431994 0.0001118759 0.0006353506
## Dropout 0.005347784 0.002391512 0.0001342721 0.0006138388
## `Tuition fees up to date` Gender `Scholarship holder`
## Enrolled 0.0009279103 0.001822231 0.0011518782
## Dropout 0.0010040968 0.001544084 0.0008833714
## `Age at enrollment` International
## Enrolled 0.01019912 0.0001610243
## Dropout 0.01017572 0.0001189408
## `Curricular units 1st sem (credited)`
## Enrolled 0.01879097
## Dropout 0.01629546
## `Curricular units 1st sem (enrolled)`
## Enrolled 0.008855274
## Dropout 0.006208763
## `Curricular units 1st sem (evaluations)`
## Enrolled 0.01666750
## Dropout 0.01659722
## `Curricular units 1st sem (approved)`
## Enrolled 0.009881157
## Dropout 0.010867053
## `Curricular units 1st sem (grade)`
## Enrolled 0.01575668
## Dropout 0.01537292
## `Curricular units 1st sem (without evaluations)`
## Enrolled 0.002486609
## Dropout 0.002064936
## `Curricular units 2nd sem (credited)`
## Enrolled 0.01241906
## Dropout 0.01156136
## `Curricular units 2nd sem (enrolled)`
## Enrolled 0.007778634
## Dropout 0.005299014
## `Curricular units 2nd sem (evaluations)`
## Enrolled 0.01912784
## Dropout 0.01916321
## `Curricular units 2nd sem (approved)`
## Enrolled 0.0194288
## Dropout 0.0183802
## `Curricular units 2nd sem (grade)`
## Enrolled 0.01546695
## Dropout 0.01491198
## `Curricular units 2nd sem (without evaluations)` `Unemployment rate`
## Enrolled 0.001814669 0.02052893
## Dropout 0.001654826 0.02245053
## `Inflation rate` GDP
## Enrolled 0.02372426 0.02061236
## Dropout 0.01935775 0.02126891
##
## Residual Deviance: 3904.075
## AIC: 4052.075
Interpretasi: Model MLR berhasil dibangun menggunakan fungsi
multinom()dari packagennetdengan algoritma optimisasi Neural Network (metode BFGS). Outputsummary(model_mlr)menampilkan dua baris koefisien: satu baris untuk Enrolled vs Graduate dan satu baris untuk Dropout vs Graduate. Setiap koefisien merepresentasikan log-odds relatif dari satu kategori dibanding kategori referensi (Graduate), dengan asumsi variabel lain konstan. Parametermaxit = 500digunakan untuk memastikan model konvergen, yakni mencapai solusi optimal yang stabil. Nilai Residual Deviance yang ditampilkan mengukur seberapa baik model fit terhadap data training.
## ==========================================
## Uji Serentak (G² / LRT)
## ==========================================
# Model null (intercept only)
model_null <- multinom(Target ~ 1, data = df_train, maxit = 500, trace = FALSE)
# Hitung G²
ll_null <- as.numeric(logLik(model_null))
ll_model <- as.numeric(logLik(model_mlr))
G2 <- -2 * (ll_null - ll_model)
df_g2 <- (length(levels(df_train$Target)) - 1) * length(predictor_cols)
p_g2 <- 1 - pchisq(G2, df = df_g2)
uji_serentak <- data.frame(
Statistik = c("Log-Likelihood (Model Null)", "Log-Likelihood (Model Full)",
"G² (Chi-Square)", "Derajat Bebas (df)",
"P-value", "Keputusan (α = 0.05)"),
Nilai = c(round(ll_null, 4),
round(ll_model, 4),
round(G2, 4),
df_g2,
format(p_g2, scientific = TRUE, digits = 4),
ifelse(p_g2 < 0.05,
"Tolak H0 — model signifikan",
"Gagal Tolak H0 — model tidak signifikan"))
)
kable(uji_serentak,
col.names = c("Statistik", "Nilai"),
caption = "Tabel 3. Hasil Uji Serentak (Likelihood Ratio Test)") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE, position = "left", font_size = 13) %>%
column_spec(1, bold = TRUE)| Statistik | Nilai |
|---|---|
| Log-Likelihood (Model Null) | -3611.6229 |
| Log-Likelihood (Model Full) | -1952.0376 |
| G² (Chi-Square) | 3319.1707 |
| Derajat Bebas (df) | 72 |
| P-value | 0e+00 |
| Keputusan (α = 0.05) | Tolak H0 — model signifikan |
Interpretasi: Tabel 3 menampilkan hasil Uji Serentak (G²) atau Likelihood Ratio Test (LRT) yang menguji hipotesis:
- H₀: Semua koefisien prediktor = 0 (model tidak lebih baik dari intercept-only)
- H₁: Minimal satu koefisien ≠ 0 (model lebih baik dari intercept-only)
Statistik G² dihitung sebagai: G² = −2 × (ln L₀ − ln L₁), di mana L₀ adalah likelihood model null dan L₁ adalah likelihood model penuh. Jika p-value < 0.05, maka H₀ ditolak, artinya secara serentak minimal satu variabel prediktor signifikan memengaruhi status akademik mahasiswa. Derajat bebas dihitung sebagai (jumlah kategori − 1) × jumlah prediktor, mencerminkan jumlah parameter tambahan yang diestimasi oleh model penuh dibandingkan model null.
## ==========================================
## Uji Parsial (Wald Test)
## ==========================================
smry <- summary(model_mlr)
coefs <- smry$coefficients
std_err <- smry$standard.errors
# Hitung statistik Wald dan p-value
W_stat <- (coefs / std_err)^2
p_wald <- 2 * (1 - pnorm(abs(coefs / std_err)))
# --- Tabel Enrolled vs Graduate ---
wald_enrolled <- data.frame(
Variabel = colnames(coefs),
Beta = round(coefs["Enrolled", ], 4),
SE = round(std_err["Enrolled", ], 4),
W2 = round(W_stat["Enrolled", ], 4),
P_value = round(p_wald["Enrolled", ], 4),
Sig = ifelse(p_wald["Enrolled", ] < 0.001, "***",
ifelse(p_wald["Enrolled", ] < 0.01, "**",
ifelse(p_wald["Enrolled", ] < 0.05, "*",
ifelse(p_wald["Enrolled", ] < 0.1, ".", " "))))
)
rownames(wald_enrolled) <- NULL
# --- Tabel Dropout vs Graduate ---
wald_dropout <- data.frame(
Variabel = colnames(coefs),
Beta = round(coefs["Dropout", ], 4),
SE = round(std_err["Dropout", ], 4),
W2 = round(W_stat["Dropout", ], 4),
P_value = round(p_wald["Dropout", ], 4),
Sig = ifelse(p_wald["Dropout", ] < 0.001, "***",
ifelse(p_wald["Dropout", ] < 0.01, "**",
ifelse(p_wald["Dropout", ] < 0.05, "*",
ifelse(p_wald["Dropout", ] < 0.1, ".", " "))))
)
rownames(wald_dropout) <- NULL
kable(wald_enrolled,
col.names = c("Variabel", "β", "SE", "W²", "P-value", "Sig"),
caption = "Tabel 4a. Uji Parsial Wald: Enrolled vs Graduate") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 11) %>%
column_spec(1, bold = TRUE) %>%
row_spec(which(wald_enrolled$P_value < 0.05), background = "#e8f5e9")| Variabel | β | SE | W² | P-value | Sig |
|---|---|---|---|---|---|
| (Intercept) | 2.5914 | 0.0006 | 2.053198e+07 | 0.0000 | *** |
Marital status
|
0.0215 | 0.0013 | 2.727603e+02 | 0.0000 | *** |
Application mode
|
0.0080 | 0.0043 | 3.474000e+00 | 0.0623 | . |
Application order
|
-0.0263 | 0.0276 | 9.061000e-01 | 0.3411 | |
| Course | 0.0001 | 0.0000 | 4.828700e+00 | 0.0280 |
|
Daytime/evening attendance
|
-0.0213 | 0.0011 | 3.820144e+02 | 0.0000 | *** |
Previous qualification
|
-0.0121 | 0.0067 | 3.281400e+00 | 0.0701 | . |
Previous qualification (grade)
|
-0.0025 | 0.0049 | 2.640000e-01 | 0.6074 | |
| Nacionality | 0.0174 | 0.0078 | 4.985200e+00 | 0.0256 |
|
Mother's qualification
|
-0.0087 | 0.0045 | 3.808000e+00 | 0.0510 | . |
Father's qualification
|
-0.0061 | 0.0044 | 1.949500e+00 | 0.1626 | |
Mother's occupation
|
0.0014 | 0.0058 | 5.800000e-02 | 0.8097 | |
Father's occupation
|
-0.0003 | 0.0060 | 3.300000e-03 | 0.9544 | |
Admission grade
|
-0.0085 | 0.0050 | 2.934100e+00 | 0.0867 | . |
| Displaced | -0.1511 | 0.0034 | 1.938854e+03 | 0.0000 | *** |
Educational special needs
|
-0.0188 | 0.0001 | 2.830437e+04 | 0.0000 | *** |
| Debtor | 0.9075 | 0.0006 | 2.040139e+06 | 0.0000 | *** |
Tuition fees up to date
|
-0.9769 | 0.0009 | 1.108442e+06 | 0.0000 | *** |
| Gender | 0.3021 | 0.0018 | 2.749148e+04 | 0.0000 | *** |
Scholarship holder
|
-0.5580 | 0.0012 | 2.346991e+05 | 0.0000 | *** |
Age at enrollment
|
-0.0099 | 0.0102 | 9.463000e-01 | 0.3307 | |
| International | -0.8433 | 0.0002 | 2.742924e+07 | 0.0000 | *** |
Curricular units 1st sem (credited)
|
0.1843 | 0.0188 | 9.616830e+01 | 0.0000 | *** |
Curricular units 1st sem (enrolled)
|
0.1423 | 0.0089 | 2.580902e+02 | 0.0000 | *** |
Curricular units 1st sem (evaluations)
|
0.0539 | 0.0167 | 1.044120e+01 | 0.0012 | ** |
Curricular units 1st sem (approved)
|
-0.5716 | 0.0099 | 3.346267e+03 | 0.0000 | *** |
Curricular units 1st sem (grade)
|
0.0444 | 0.0158 | 7.942700e+00 | 0.0048 | ** |
Curricular units 1st sem (without evaluations)
|
0.0221 | 0.0025 | 7.872110e+01 | 0.0000 | *** |
Curricular units 2nd sem (credited)
|
-0.1603 | 0.0124 | 1.665739e+02 | 0.0000 | *** |
Curricular units 2nd sem (enrolled)
|
0.6885 | 0.0078 | 7.834096e+03 | 0.0000 | *** |
Curricular units 2nd sem (evaluations)
|
0.1239 | 0.0191 | 4.197690e+01 | 0.0000 | *** |
Curricular units 2nd sem (approved)
|
-0.7656 | 0.0194 | 1.552987e+03 | 0.0000 | *** |
Curricular units 2nd sem (grade)
|
-0.1057 | 0.0155 | 4.667350e+01 | 0.0000 | *** |
Curricular units 2nd sem (without evaluations)
|
-0.1023 | 0.0018 | 3.176572e+03 | 0.0000 | *** |
Unemployment rate
|
-0.0293 | 0.0205 | 2.032200e+00 | 0.1540 | |
Inflation rate
|
-0.0388 | 0.0237 | 2.672900e+00 | 0.1021 | |
| GDP | 0.0116 | 0.0206 | 3.190000e-01 | 0.5722 |
kable(wald_dropout,
col.names = c("Variabel", "β", "SE", "W²", "P-value", "Sig"),
caption = "Tabel 4b. Uji Parsial Wald: Dropout vs Graduate") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 11) %>%
column_spec(1, bold = TRUE) %>%
row_spec(which(wald_dropout$P_value < 0.05), background = "#fce4ec")| Variabel | β | SE | W² | P-value | Sig |
|---|---|---|---|---|---|
| (Intercept) | 1.3942 | 0.0005 | 7.953481e+06 | 0.0000 | *** |
Marital status
|
-0.1000 | 0.0010 | 1.006340e+04 | 0.0000 | *** |
Application mode
|
0.0027 | 0.0048 | 3.197000e-01 | 0.5718 | |
Application order
|
0.1142 | 0.0210 | 2.966030e+01 | 0.0000 | *** |
| Course | 0.0002 | 0.0000 | 2.146220e+01 | 0.0000 | *** |
Daytime/evening attendance
|
0.0294 | 0.0010 | 7.860968e+02 | 0.0000 | *** |
Previous qualification
|
-0.0150 | 0.0073 | 4.221900e+00 | 0.0399 |
|
Previous qualification (grade)
|
0.0063 | 0.0053 | 1.405200e+00 | 0.2358 | |
| Nacionality | 0.0369 | 0.0078 | 2.244590e+01 | 0.0000 | *** |
Mother's qualification
|
0.0061 | 0.0051 | 1.415800e+00 | 0.2341 | |
Father's qualification
|
-0.0114 | 0.0050 | 5.069600e+00 | 0.0243 |
|
Mother's occupation
|
-0.0081 | 0.0071 | 1.296900e+00 | 0.2548 | |
Father's occupation
|
-0.0004 | 0.0074 | 3.600000e-03 | 0.9523 | |
Admission grade
|
-0.0142 | 0.0053 | 7.013300e+00 | 0.0081 | ** |
| Displaced | 0.3278 | 0.0024 | 1.878362e+04 | 0.0000 | *** |
Educational special needs
|
0.1430 | 0.0001 | 1.133548e+06 | 0.0000 | *** |
| Debtor | 0.8708 | 0.0006 | 2.012678e+06 | 0.0000 | *** |
Tuition fees up to date
|
-3.0269 | 0.0010 | 9.087280e+06 | 0.0000 | *** |
| Gender | 0.4961 | 0.0015 | 1.032154e+05 | 0.0000 | *** |
Scholarship holder
|
-0.7989 | 0.0009 | 8.178032e+05 | 0.0000 | *** |
Age at enrollment
|
0.0547 | 0.0102 | 2.893150e+01 | 0.0000 | *** |
| International | -1.7041 | 0.0001 | 2.052636e+08 | 0.0000 | *** |
Curricular units 1st sem (credited)
|
0.2610 | 0.0163 | 2.565569e+02 | 0.0000 | *** |
Curricular units 1st sem (enrolled)
|
0.2140 | 0.0062 | 1.188088e+03 | 0.0000 | *** |
Curricular units 1st sem (evaluations)
|
0.0096 | 0.0166 | 3.348000e-01 | 0.5629 | |
Curricular units 1st sem (approved)
|
-0.7523 | 0.0109 | 4.792671e+03 | 0.0000 | *** |
Curricular units 1st sem (grade)
|
0.1006 | 0.0154 | 4.281460e+01 | 0.0000 | *** |
Curricular units 1st sem (without evaluations)
|
-0.1135 | 0.0021 | 3.018678e+03 | 0.0000 | *** |
Curricular units 2nd sem (credited)
|
0.1108 | 0.0116 | 9.190060e+01 | 0.0000 | *** |
Curricular units 2nd sem (enrolled)
|
0.9520 | 0.0053 | 3.227645e+04 | 0.0000 | *** |
Curricular units 2nd sem (evaluations)
|
0.0263 | 0.0192 | 1.880800e+00 | 0.1702 | |
Curricular units 2nd sem (approved)
|
-1.0977 | 0.0184 | 3.566952e+03 | 0.0000 | *** |
Curricular units 2nd sem (grade)
|
-0.1940 | 0.0149 | 1.693303e+02 | 0.0000 | *** |
Curricular units 2nd sem (without evaluations)
|
-0.1831 | 0.0017 | 1.223634e+04 | 0.0000 | *** |
Unemployment rate
|
0.0846 | 0.0225 | 1.421410e+01 | 0.0002 | *** |
Inflation rate
|
-0.0134 | 0.0194 | 4.786000e-01 | 0.4891 | |
| GDP | 0.0063 | 0.0213 | 8.840000e-02 | 0.7662 |
##
## Keterangan: *** p<0.001 | ** p<0.01 | * p<0.05 | . p<0.1
Interpretasi: Tabel 4a dan 4b menampilkan hasil Uji Parsial Wald untuk setiap prediktor secara individual, dengan hipotesis:
- H₀: βⱼ = 0 (prediktor ke-j tidak signifikan memengaruhi log-odds)
- H₁: βⱼ ≠ 0 (prediktor ke-j signifikan)
Statistik Wald dihitung sebagai W² = (β/SE)² yang mengikuti distribusi Chi-square dengan df = 1. Baris yang disorot hijau (Tabel 4a) atau merah muda (Tabel 4b) menunjukkan variabel yang signifikan secara statistik (p < 0.05). Tanda bintang (//) menunjukkan tingkat signifikansi. Perhatikan bahwa variabel yang signifikan untuk Enrolled vs Graduate belum tentu sama dengan yang signifikan untuk Dropout vs Graduate, karena MLR memodelkan dua persamaan logit secara terpisah namun simultan.
log_odds_df <- rbind(
wald_enrolled %>% mutate(Kategori = "Enrolled vs Graduate"),
wald_dropout %>% mutate(Kategori = "Dropout vs Graduate")
)
sig_logodds <- log_odds_df %>%
filter(P_value < 0.05, Variabel != "(Intercept)") %>%
mutate(Arah = ifelse(Beta > 0, "Positif", "Negatif"),
Variabel = reorder(Variabel, Beta))
p_logodds <- ggplot(sig_logodds, aes(x = Variabel, y = Beta, fill = Arah)) +
geom_bar(stat = "identity", alpha = 0.85, color = "white") +
geom_errorbar(aes(ymin = Beta - 1.96 * SE, ymax = Beta + 1.96 * SE),
width = 0.25, color = "black", linewidth = 0.5) +
geom_hline(yintercept = 0, linetype = "solid", color = "black", linewidth = 0.5) +
coord_flip() +
facet_wrap(~ Kategori, scales = "free_x") +
scale_fill_manual(values = c("Positif" = "#2196F3", "Negatif" = "#F44336")) +
labs(title = "Gambar 7. Koefisien Log-Odds Signifikan (p < 0.05)",
subtitle = "Error bar = 95% Confidence Interval",
x = "Variabel", y = "Log-Odds (β)", fill = "Arah Pengaruh") +
theme_minimal(base_size = 10) +
theme(plot.title = element_text(face = "bold"),
strip.text = element_text(face = "bold"))
print(p_logodds)Gambar 7. Koefisien Log-Odds Signifikan
Interpretasi: Gambar 7 menampilkan koefisien log-odds (β) dari setiap variabel yang signifikan. Koefisien β merepresentasikan perubahan log-odds rasio antara kategori target dan Graduate untuk setiap penambahan 1 unit prediktor.
- Bar biru (positif): Meningkatnya nilai variabel ini meningkatkan log-odds mahasiswa berada di kategori tersebut (Enrolled/Dropout) dibandingkan Graduate.
- Bar merah (negatif): Meningkatnya nilai variabel ini menurunkan log-odds mahasiswa berada di kategori tersebut, artinya lebih cenderung Graduate.
- Error bar menunjukkan interval kepercayaan 95%; error bar yang tidak melewati garis nol menandakan signifikansi statistik yang konsisten.
- Variabel dengan bar panjang (nilai β besar) memiliki pengaruh yang lebih besar terhadap status akademik dibandingkan variabel dengan bar pendek.
rrr_enrolled <- data.frame(
Variabel = colnames(coefs),
Beta = round(coefs["Enrolled", ], 4),
RRR = round(exp(coefs["Enrolled", ]), 4),
CI_Lower = round(exp(coefs["Enrolled", ] - 1.96 * std_err["Enrolled", ]), 4),
CI_Upper = round(exp(coefs["Enrolled", ] + 1.96 * std_err["Enrolled", ]), 4),
P_value = round(p_wald["Enrolled", ], 4),
Sig = ifelse(p_wald["Enrolled", ] < 0.001, "***",
ifelse(p_wald["Enrolled", ] < 0.01, "**",
ifelse(p_wald["Enrolled", ] < 0.05, "*", " ")))
)
rownames(rrr_enrolled) <- NULL
rrr_dropout <- data.frame(
Variabel = colnames(coefs),
Beta = round(coefs["Dropout", ], 4),
RRR = round(exp(coefs["Dropout", ]), 4),
CI_Lower = round(exp(coefs["Dropout", ] - 1.96 * std_err["Dropout", ]), 4),
CI_Upper = round(exp(coefs["Dropout", ] + 1.96 * std_err["Dropout", ]), 4),
P_value = round(p_wald["Dropout", ], 4),
Sig = ifelse(p_wald["Dropout", ] < 0.001, "***",
ifelse(p_wald["Dropout", ] < 0.01, "**",
ifelse(p_wald["Dropout", ] < 0.05, "*", " ")))
)
rownames(rrr_dropout) <- NULL
kable(rrr_enrolled,
col.names = c("Variabel", "β", "RRR", "CI Lower (95%)", "CI Upper (95%)", "P-value", "Sig"),
caption = "Tabel 5a. Relative Risk Ratio: Enrolled vs Graduate") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 11) %>%
column_spec(1, bold = TRUE)| Variabel | β | RRR | CI Lower (95%) | CI Upper (95%) | P-value | Sig |
|---|---|---|---|---|---|---|
| (Intercept) | 2.5914 | 13.3478 | 13.3329 | 13.3628 | 0.0000 | *** |
Marital status
|
0.0215 | 1.0217 | 1.0191 | 1.0243 | 0.0000 | *** |
Application mode
|
0.0080 | 1.0080 | 0.9996 | 1.0165 | 0.0623 | |
Application order
|
-0.0263 | 0.9740 | 0.9227 | 1.0283 | 0.3411 | |
| Course | 0.0001 | 1.0001 | 1.0000 | 1.0002 | 0.0280 |
|
Daytime/evening attendance
|
-0.0213 | 0.9789 | 0.9768 | 0.9810 | 0.0000 | *** |
Previous qualification
|
-0.0121 | 0.9880 | 0.9751 | 1.0010 | 0.0701 | |
Previous qualification (grade)
|
-0.0025 | 0.9975 | 0.9879 | 1.0071 | 0.6074 | |
| Nacionality | 0.0174 | 1.0176 | 1.0021 | 1.0332 | 0.0256 |
|
Mother's qualification
|
-0.0087 | 0.9913 | 0.9826 | 1.0000 | 0.0510 | |
Father's qualification
|
-0.0061 | 0.9939 | 0.9853 | 1.0025 | 0.1626 | |
Mother's occupation
|
0.0014 | 1.0014 | 0.9901 | 1.0128 | 0.8097 | |
Father's occupation
|
-0.0003 | 0.9997 | 0.9880 | 1.0114 | 0.9544 | |
Admission grade
|
-0.0085 | 0.9915 | 0.9820 | 1.0012 | 0.0867 | |
| Displaced | -0.1511 | 0.8597 | 0.8540 | 0.8655 | 0.0000 | *** |
Educational special needs
|
-0.0188 | 0.9814 | 0.9811 | 0.9816 | 0.0000 | *** |
| Debtor | 0.9075 | 2.4781 | 2.4750 | 2.4812 | 0.0000 | *** |
Tuition fees up to date
|
-0.9769 | 0.3765 | 0.3758 | 0.3772 | 0.0000 | *** |
| Gender | 0.3021 | 1.3527 | 1.3479 | 1.3576 | 0.0000 | *** |
Scholarship holder
|
-0.5580 | 0.5723 | 0.5710 | 0.5736 | 0.0000 | *** |
Age at enrollment
|
-0.0099 | 0.9901 | 0.9705 | 1.0101 | 0.3307 | |
| International | -0.8433 | 0.4303 | 0.4301 | 0.4304 | 0.0000 | *** |
Curricular units 1st sem (credited)
|
0.1843 | 1.2023 | 1.1589 | 1.2475 | 0.0000 | *** |
Curricular units 1st sem (enrolled)
|
0.1423 | 1.1529 | 1.1330 | 1.1731 | 0.0000 | *** |
Curricular units 1st sem (evaluations)
|
0.0539 | 1.0553 | 1.0214 | 1.0904 | 0.0012 | ** |
Curricular units 1st sem (approved)
|
-0.5716 | 0.5646 | 0.5538 | 0.5757 | 0.0000 | *** |
Curricular units 1st sem (grade)
|
0.0444 | 1.0454 | 1.0136 | 1.0782 | 0.0048 | ** |
Curricular units 1st sem (without evaluations)
|
0.0221 | 1.0223 | 1.0173 | 1.0273 | 0.0000 | *** |
Curricular units 2nd sem (credited)
|
-0.1603 | 0.8519 | 0.8314 | 0.8729 | 0.0000 | *** |
Curricular units 2nd sem (enrolled)
|
0.6885 | 1.9907 | 1.9606 | 2.0213 | 0.0000 | *** |
Curricular units 2nd sem (evaluations)
|
0.1239 | 1.1319 | 1.0903 | 1.1752 | 0.0000 | *** |
Curricular units 2nd sem (approved)
|
-0.7656 | 0.4650 | 0.4477 | 0.4831 | 0.0000 | *** |
Curricular units 2nd sem (grade)
|
-0.1057 | 0.8997 | 0.8729 | 0.9274 | 0.0000 | *** |
Curricular units 2nd sem (without evaluations)
|
-0.1023 | 0.9028 | 0.8996 | 0.9060 | 0.0000 | *** |
Unemployment rate
|
-0.0293 | 0.9712 | 0.9329 | 1.0110 | 0.1540 | |
Inflation rate
|
-0.0388 | 0.9620 | 0.9182 | 1.0077 | 0.1021 | |
| GDP | 0.0116 | 1.0117 | 0.9717 | 1.0534 | 0.5722 |
kable(rrr_dropout,
col.names = c("Variabel", "β", "RRR", "CI Lower (95%)", "CI Upper (95%)", "P-value", "Sig"),
caption = "Tabel 5b. Relative Risk Ratio: Dropout vs Graduate") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 11) %>%
column_spec(1, bold = TRUE)| Variabel | β | RRR | CI Lower (95%) | CI Upper (95%) | P-value | Sig |
|---|---|---|---|---|---|---|
| (Intercept) | 1.3942 | 4.0317 | 4.0278 | 4.0356 | 0.0000 | *** |
Marital status
|
-0.1000 | 0.9049 | 0.9031 | 0.9066 | 0.0000 | *** |
Application mode
|
0.0027 | 1.0027 | 0.9934 | 1.0121 | 0.5718 | |
Application order
|
0.1142 | 1.1209 | 1.0758 | 1.1680 | 0.0000 | *** |
| Course | 0.0002 | 1.0002 | 1.0001 | 1.0003 | 0.0000 | *** |
Daytime/evening attendance
|
0.0294 | 1.0298 | 1.0277 | 1.0319 | 0.0000 | *** |
Previous qualification
|
-0.0150 | 0.9851 | 0.9711 | 0.9993 | 0.0399 |
|
Previous qualification (grade)
|
0.0063 | 1.0063 | 0.9959 | 1.0168 | 0.2358 | |
| Nacionality | 0.0369 | 1.0376 | 1.0219 | 1.0535 | 0.0000 | *** |
Mother's qualification
|
0.0061 | 1.0061 | 0.9961 | 1.0163 | 0.2341 | |
Father's qualification
|
-0.0114 | 0.9887 | 0.9790 | 0.9985 | 0.0243 |
|
Mother's occupation
|
-0.0081 | 0.9919 | 0.9782 | 1.0059 | 0.2548 | |
Father's occupation
|
-0.0004 | 0.9996 | 0.9851 | 1.0142 | 0.9523 | |
Admission grade
|
-0.0142 | 0.9859 | 0.9757 | 0.9963 | 0.0081 | ** |
| Displaced | 0.3278 | 1.3879 | 1.3814 | 1.3944 | 0.0000 | *** |
Educational special needs
|
0.1430 | 1.1537 | 1.1534 | 1.1540 | 0.0000 | *** |
| Debtor | 0.8708 | 2.3889 | 2.3861 | 2.3918 | 0.0000 | *** |
Tuition fees up to date
|
-3.0269 | 0.0485 | 0.0484 | 0.0486 | 0.0000 | *** |
| Gender | 0.4961 | 1.6423 | 1.6373 | 1.6472 | 0.0000 | *** |
Scholarship holder
|
-0.7989 | 0.4498 | 0.4491 | 0.4506 | 0.0000 | *** |
Age at enrollment
|
0.0547 | 1.0563 | 1.0354 | 1.0775 | 0.0000 | *** |
| International | -1.7041 | 0.1819 | 0.1819 | 0.1820 | 0.0000 | *** |
Curricular units 1st sem (credited)
|
0.2610 | 1.2982 | 1.2574 | 1.3404 | 0.0000 | *** |
Curricular units 1st sem (enrolled)
|
0.2140 | 1.2386 | 1.2237 | 1.2538 | 0.0000 | *** |
Curricular units 1st sem (evaluations)
|
0.0096 | 1.0096 | 0.9773 | 1.0430 | 0.5629 | |
Curricular units 1st sem (approved)
|
-0.7523 | 0.4713 | 0.4613 | 0.4814 | 0.0000 | *** |
Curricular units 1st sem (grade)
|
0.1006 | 1.1058 | 1.0730 | 1.1396 | 0.0000 | *** |
Curricular units 1st sem (without evaluations)
|
-0.1135 | 0.8927 | 0.8891 | 0.8964 | 0.0000 | *** |
Curricular units 2nd sem (credited)
|
0.1108 | 1.1172 | 1.0922 | 1.1428 | 0.0000 | *** |
Curricular units 2nd sem (enrolled)
|
0.9520 | 2.5909 | 2.5641 | 2.6179 | 0.0000 | *** |
Curricular units 2nd sem (evaluations)
|
0.0263 | 1.0266 | 0.9888 | 1.0659 | 0.1702 | |
Curricular units 2nd sem (approved)
|
-1.0977 | 0.3336 | 0.3218 | 0.3459 | 0.0000 | *** |
Curricular units 2nd sem (grade)
|
-0.1940 | 0.8236 | 0.7999 | 0.8480 | 0.0000 | *** |
Curricular units 2nd sem (without evaluations)
|
-0.1831 | 0.8327 | 0.8300 | 0.8354 | 0.0000 | *** |
Unemployment rate
|
0.0846 | 1.0883 | 1.0415 | 1.1373 | 0.0002 | *** |
Inflation rate
|
-0.0134 | 0.9867 | 0.9500 | 1.0249 | 0.4891 | |
| GDP | 0.0063 | 1.0063 | 0.9653 | 1.0492 | 0.7662 |
# Plot RRR
rrr_combined <- rbind(
rrr_enrolled %>% filter(P_value < 0.05, Variabel != "(Intercept)") %>%
mutate(Kategori = "Enrolled vs Graduate"),
rrr_dropout %>% filter(P_value < 0.05, Variabel != "(Intercept)") %>%
mutate(Kategori = "Dropout vs Graduate")
) %>%
mutate(Variabel = reorder(Variabel, RRR),
Arah = ifelse(RRR > 1, "RRR > 1 (risiko naik)", "RRR < 1 (risiko turun)"))
p_rrr <- ggplot(rrr_combined, aes(x = Variabel, y = RRR, color = Arah)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = CI_Lower, ymax = CI_Upper), width = 0.3, linewidth = 0.7) +
geom_hline(yintercept = 1, linetype = "dashed", color = "gray40", linewidth = 0.8) +
coord_flip() +
facet_wrap(~ Kategori, scales = "free") +
scale_color_manual(values = c("RRR > 1 (risiko naik)" = "#F44336",
"RRR < 1 (risiko turun)" = "#2196F3")) +
labs(title = "Gambar 8. Relative Risk Ratio (RRR) dengan 95% CI",
subtitle = "Garis putus-putus = RRR 1 (tidak ada perubahan risiko)",
x = "Variabel", y = "RRR = exp(β)", color = "Arah RRR") +
theme_minimal(base_size = 10) +
theme(plot.title = element_text(face = "bold"),
strip.text = element_text(face = "bold"))
print(p_rrr)Gambar 8. Relative Risk Ratio dengan 95% CI
Interpretasi: Tabel 5a & 5b serta Gambar 8 menampilkan Relative Risk Ratio (RRR) = exp(β), yang merupakan bentuk eksponensial dari koefisien log-odds. RRR lebih mudah diinterpretasikan secara substantif:
- RRR > 1 (merah): Peningkatan 1 unit prediktor meningkatkan odds berada di kategori tersebut (Enrolled/Dropout) dibandingkan Graduate. Contoh: RRR = 1.5 berarti odds naik sebesar 50%.
- RRR < 1 (biru): Peningkatan 1 unit prediktor menurunkan odds berada di kategori tersebut, artinya lebih cenderung Graduate. Contoh: RRR = 0.8 berarti odds turun sebesar 20%.
- RRR = 1 (garis putus-putus): tidak ada efek.
- Interval kepercayaan 95% yang tidak mencakup nilai 1 mengonfirmasi signifikansi statistik.
Contoh interpretasi konkret: Jika variabel “Curricular units 1st sem (approved)” memiliki RRR = 0.5 untuk Dropout vs Graduate, maka setiap tambahan 1 unit kurikuler yang lulus di semester 1 menurunkan odds mahasiswa untuk Dropout (dibandingkan Graduate) sebesar 50%, dengan asumsi variabel lain konstan.
## Menghitung Average Marginal Effects (AME)...
ame <- avg_slopes(model_mlr, newdata = df_train)
ame_df <- as.data.frame(ame) %>%
select(term, group, estimate, std.error, statistic, p.value, conf.low, conf.high) %>%
rename(
Variabel = term,
Kategori = group,
AME = estimate,
SE = std.error,
Z_stat = statistic,
P_value = p.value,
CI_Low = conf.low,
CI_High = conf.high
) %>%
mutate(across(where(is.numeric), ~ round(.x, 4)),
Sig = ifelse(P_value < 0.001, "***",
ifelse(P_value < 0.01, "**",
ifelse(P_value < 0.05, "*",
ifelse(P_value < 0.1, ".", " ")))))
kable(ame_df,
col.names = c("Variabel", "Kategori", "AME", "SE", "Z",
"P-value", "CI Low", "CI High", "Sig"),
caption = "Tabel 6. Average Marginal Effects (AME)") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = TRUE, font_size = 11) %>%
column_spec(1, bold = TRUE)| Variabel | Kategori | AME | SE | Z | P-value | CI Low | CI High | Sig |
|---|---|---|---|---|---|---|---|---|
| Admission grade | Graduate | 0.0011 | 0.0005 | 2.3544 | 0.0186 | 0.0002 | 0.0020 |
|
| Admission grade | Enrolled | -0.0003 | 0.0005 | -0.5139 | 0.6073 | -0.0012 | 0.0007 | |
| Admission grade | Dropout | -0.0008 | 0.0004 | -2.0471 | 0.0406 | -0.0017 | 0.0000 |
|
| Age at enrollment | Graduate | -0.0015 | 0.0009 | -1.5688 | 0.1167 | -0.0033 | 0.0004 | |
| Age at enrollment | Enrolled | -0.0041 | 0.0010 | -3.9519 | 0.0001 | -0.0061 | -0.0021 | *** |
| Age at enrollment | Dropout | 0.0055 | 0.0008 | 6.9336 | 0.0000 | 0.0040 | 0.0071 | *** |
| Application mode | Graduate | -0.0006 | 0.0004 | -1.5447 | 0.1224 | -0.0014 | 0.0002 | |
| Application mode | Enrolled | 0.0008 | 0.0004 | 1.8490 | 0.0645 | 0.0000 | 0.0017 | . |
| Application mode | Dropout | -0.0002 | 0.0004 | -0.4745 | 0.6351 | -0.0009 | 0.0006 | |
| Application order | Graduate | -0.0027 | 0.0010 | -2.5347 | 0.0113 | -0.0047 | -0.0006 |
|
| Application order | Enrolled | -0.0092 | 0.0044 | -2.1029 | 0.0355 | -0.0178 | -0.0006 |
|
| Application order | Dropout | 0.0119 | 0.0034 | 3.5046 | 0.0005 | 0.0052 | 0.0185 | *** |
| Course | Graduate | 0.0000 | 0.0000 | -3.3703 | 0.0008 | 0.0000 | 0.0000 | *** |
| Course | Enrolled | 0.0000 | 0.0000 | 0.0735 | 0.9414 | 0.0000 | 0.0000 | |
| Course | Dropout | 0.0000 | 0.0000 | 4.8102 | 0.0000 | 0.0000 | 0.0000 | *** |
| Curricular units 1st sem (approved) | Graduate | 0.0664 | 0.0009 | 70.7025 | 0.0000 | 0.0646 | 0.0682 | *** |
| Curricular units 1st sem (approved) | Enrolled | -0.0279 | 0.0013 | -21.1249 | 0.0000 | -0.0305 | -0.0253 | *** |
| Curricular units 1st sem (approved) | Dropout | -0.0385 | 0.0012 | -31.8867 | 0.0000 | -0.0409 | -0.0361 | *** |
| Curricular units 1st sem (credited) | Graduate | -0.0221 | 0.0013 | -16.9490 | 0.0000 | -0.0247 | -0.0196 | *** |
| Curricular units 1st sem (credited) | Enrolled | 0.0080 | 0.0026 | 3.0727 | 0.0021 | 0.0029 | 0.0131 | ** |
| Curricular units 1st sem (credited) | Dropout | 0.0141 | 0.0020 | 6.9799 | 0.0000 | 0.0101 | 0.0181 | *** |
| Curricular units 1st sem (enrolled) | Graduate | -0.0176 | 0.0009 | -18.9923 | 0.0000 | -0.0194 | -0.0157 | *** |
| Curricular units 1st sem (enrolled) | Enrolled | 0.0055 | 0.0012 | 4.7987 | 0.0000 | 0.0033 | 0.0078 | *** |
| Curricular units 1st sem (enrolled) | Dropout | 0.0120 | 0.0007 | 16.9373 | 0.0000 | 0.0106 | 0.0134 | *** |
| Curricular units 1st sem (evaluations) | Graduate | -0.0039 | 0.0012 | -3.2392 | 0.0012 | -0.0063 | -0.0015 | ** |
| Curricular units 1st sem (evaluations) | Enrolled | 0.0059 | 0.0022 | 2.7127 | 0.0067 | 0.0016 | 0.0101 | ** |
| Curricular units 1st sem (evaluations) | Dropout | -0.0020 | 0.0018 | -1.1216 | 0.2620 | -0.0055 | 0.0015 | |
| Curricular units 1st sem (grade) | Graduate | -0.0068 | 0.0012 | -5.4728 | 0.0000 | -0.0092 | -0.0044 | *** |
| Curricular units 1st sem (grade) | Enrolled | -0.0001 | 0.0021 | -0.0351 | 0.9720 | -0.0042 | 0.0040 | |
| Curricular units 1st sem (grade) | Dropout | 0.0069 | 0.0017 | 4.0697 | 0.0000 | 0.0036 | 0.0101 | *** |
| Curricular units 1st sem (without evaluations) | Graduate | 0.0029 | 0.0003 | 10.2347 | 0.0000 | 0.0024 | 0.0035 | *** |
| Curricular units 1st sem (without evaluations) | Enrolled | 0.0087 | 0.0004 | 20.6677 | 0.0000 | 0.0078 | 0.0095 | *** |
| Curricular units 1st sem (without evaluations) | Dropout | -0.0116 | 0.0004 | -28.4168 | 0.0000 | -0.0124 | -0.0108 | *** |
| Curricular units 2nd sem (approved) | Graduate | 0.0924 | 0.0015 | 61.6803 | 0.0000 | 0.0895 | 0.0953 | *** |
| Curricular units 2nd sem (approved) | Enrolled | -0.0326 | 0.0027 | -11.9965 | 0.0000 | -0.0379 | -0.0273 | *** |
| Curricular units 2nd sem (approved) | Dropout | -0.0598 | 0.0023 | -25.8189 | 0.0000 | -0.0644 | -0.0553 | *** |
| Curricular units 2nd sem (credited) | Graduate | 0.0063 | 0.0010 | 6.2804 | 0.0000 | 0.0043 | 0.0082 | *** |
| Curricular units 2nd sem (credited) | Enrolled | -0.0249 | 0.0017 | -14.2944 | 0.0000 | -0.0284 | -0.0215 | *** |
| Curricular units 2nd sem (credited) | Dropout | 0.0187 | 0.0013 | 14.8365 | 0.0000 | 0.0162 | 0.0211 | *** |
| Curricular units 2nd sem (enrolled) | Graduate | -0.0817 | 0.0021 | -39.2503 | 0.0000 | -0.0858 | -0.0777 | *** |
| Curricular units 2nd sem (enrolled) | Enrolled | 0.0312 | 0.0023 | 13.4867 | 0.0000 | 0.0266 | 0.0357 | *** |
| Curricular units 2nd sem (enrolled) | Dropout | 0.0506 | 0.0019 | 27.1274 | 0.0000 | 0.0469 | 0.0542 | *** |
| Curricular units 2nd sem (evaluations) | Graduate | -0.0091 | 0.0015 | -5.9564 | 0.0000 | -0.0122 | -0.0061 | *** |
| Curricular units 2nd sem (evaluations) | Enrolled | 0.0133 | 0.0022 | 6.1201 | 0.0000 | 0.0091 | 0.0176 | *** |
| Curricular units 2nd sem (evaluations) | Dropout | -0.0042 | 0.0018 | -2.3771 | 0.0174 | -0.0076 | -0.0007 |
|
| Curricular units 2nd sem (grade) | Graduate | 0.0144 | 0.0010 | 14.4995 | 0.0000 | 0.0124 | 0.0163 | *** |
| Curricular units 2nd sem (grade) | Enrolled | -0.0022 | 0.0021 | -1.0820 | 0.2793 | -0.0063 | 0.0018 | |
| Curricular units 2nd sem (grade) | Dropout | -0.0122 | 0.0017 | -7.2538 | 0.0000 | -0.0154 | -0.0089 | *** |
| Curricular units 2nd sem (without evaluations) | Graduate | 0.0137 | 0.0003 | 39.7422 | 0.0000 | 0.0131 | 0.0144 | *** |
| Curricular units 2nd sem (without evaluations) | Enrolled | -0.0024 | 0.0004 | -5.4489 | 0.0000 | -0.0033 | -0.0015 | *** |
| Curricular units 2nd sem (without evaluations) | Dropout | -0.0113 | 0.0005 | -24.9076 | 0.0000 | -0.0122 | -0.0104 | *** |
| Daytime/evening attendance | Graduate | 0.0003 | 0.0001 | 2.6813 | 0.0073 | 0.0001 | 0.0005 | ** |
| Daytime/evening attendance | Enrolled | -0.0041 | 0.0002 | -21.8512 | 0.0000 | -0.0045 | -0.0037 | *** |
| Daytime/evening attendance | Dropout | 0.0038 | 0.0002 | 24.0109 | 0.0000 | 0.0035 | 0.0041 | *** |
| Debtor | Graduate | -0.1008 | 0.0021 | -47.9323 | 0.0000 | -0.1049 | -0.0967 | *** |
| Debtor | Enrolled | 0.0685 | 0.0025 | 27.8863 | 0.0000 | 0.0637 | 0.0734 | *** |
| Debtor | Dropout | 0.0323 | 0.0016 | 19.9532 | 0.0000 | 0.0291 | 0.0354 | *** |
| Displaced | Graduate | -0.0022 | 0.0008 | -2.7753 | 0.0055 | -0.0038 | -0.0007 | ** |
| Displaced | Enrolled | -0.0352 | 0.0010 | -34.9624 | 0.0000 | -0.0372 | -0.0332 | *** |
| Displaced | Dropout | 0.0374 | 0.0010 | 35.8958 | 0.0000 | 0.0354 | 0.0395 | *** |
| Educational special needs | Graduate | -0.0045 | 0.0003 | -16.5738 | 0.0000 | -0.0050 | -0.0040 | *** |
| Educational special needs | Enrolled | -0.0098 | 0.0003 | -35.5261 | 0.0000 | -0.0103 | -0.0092 | *** |
| Educational special needs | Dropout | 0.0143 | 0.0004 | 38.2623 | 0.0000 | 0.0135 | 0.0150 | *** |
| Father’s occupation | Graduate | 0.0000 | 0.0006 | 0.0644 | 0.9487 | -0.0012 | 0.0012 | |
| Father’s occupation | Enrolled | 0.0000 | 0.0006 | -0.0306 | 0.9756 | -0.0011 | 0.0011 | |
| Father’s occupation | Dropout | 0.0000 | 0.0005 | -0.0419 | 0.9666 | -0.0011 | 0.0010 | |
| Father’s qualification | Graduate | 0.0008 | 0.0004 | 1.9866 | 0.0470 | 0.0000 | 0.0017 |
|
| Father’s qualification | Enrolled | -0.0001 | 0.0004 | -0.2817 | 0.7782 | -0.0010 | 0.0007 | |
| Father’s qualification | Dropout | -0.0007 | 0.0004 | -1.8123 | 0.0699 | -0.0015 | 0.0001 | . |
| GDP | Graduate | -0.0010 | 0.0016 | -0.6252 | 0.5318 | -0.0042 | 0.0022 | |
| GDP | Enrolled | 0.0010 | 0.0027 | 0.3942 | 0.6934 | -0.0042 | 0.0063 | |
| GDP | Dropout | 0.0000 | 0.0022 | -0.0182 | 0.9855 | -0.0043 | 0.0043 | |
| Gender | Graduate | -0.0400 | 0.0008 | -48.4002 | 0.0000 | -0.0416 | -0.0383 | *** |
| Gender | Enrolled | 0.0098 | 0.0007 | 13.0706 | 0.0000 | 0.0083 | 0.0113 | *** |
| Gender | Dropout | 0.0302 | 0.0009 | 33.9473 | 0.0000 | 0.0284 | 0.0319 | *** |
| Inflation rate | Graduate | 0.0031 | 0.0009 | 3.5378 | 0.0004 | 0.0014 | 0.0048 | *** |
| Inflation rate | Enrolled | -0.0039 | 0.0038 | -1.0191 | 0.3082 | -0.0114 | 0.0036 | |
| Inflation rate | Dropout | 0.0008 | 0.0030 | 0.2780 | 0.7810 | -0.0051 | 0.0067 | |
| International | Graduate | 0.1045 | 0.0021 | 49.9945 | 0.0000 | 0.1004 | 0.1086 | *** |
| International | Enrolled | -0.0129 | 0.0023 | -5.7146 | 0.0000 | -0.0173 | -0.0085 | *** |
| International | Dropout | -0.0916 | 0.0025 | -36.1662 | 0.0000 | -0.0966 | -0.0866 | *** |
| Marital status | Graduate | 0.0024 | 0.0002 | 10.7222 | 0.0000 | 0.0020 | 0.0029 | *** |
| Marital status | Enrolled | 0.0079 | 0.0003 | 29.5026 | 0.0000 | 0.0073 | 0.0084 | *** |
| Marital status | Dropout | -0.0103 | 0.0003 | -39.6433 | 0.0000 | -0.0108 | -0.0098 | *** |
| Mother’s occupation | Graduate | 0.0002 | 0.0006 | 0.3705 | 0.7110 | -0.0009 | 0.0014 | |
| Mother’s occupation | Enrolled | 0.0006 | 0.0005 | 1.1326 | 0.2574 | -0.0004 | 0.0016 | |
| Mother’s occupation | Dropout | -0.0008 | 0.0005 | -1.6025 | 0.1090 | -0.0018 | 0.0002 | |
| Mother’s qualification | Graduate | 0.0003 | 0.0004 | 0.7909 | 0.4290 | -0.0005 | 0.0012 | |
| Mother’s qualification | Enrolled | -0.0014 | 0.0005 | -2.9833 | 0.0029 | -0.0023 | -0.0005 | ** |
| Mother’s qualification | Dropout | 0.0010 | 0.0004 | 2.5356 | 0.0112 | 0.0002 | 0.0018 |
|
| Nacionality | Graduate | -0.0026 | 0.0007 | -3.5762 | 0.0003 | -0.0040 | -0.0012 | *** |
| Nacionality | Enrolled | 0.0001 | 0.0008 | 0.1354 | 0.8923 | -0.0014 | 0.0017 | |
| Nacionality | Dropout | 0.0025 | 0.0006 | 4.0236 | 0.0001 | 0.0013 | 0.0036 | *** |
| Previous qualification | Graduate | 0.0014 | 0.0006 | 2.1527 | 0.0313 | 0.0001 | 0.0026 |
|
| Previous qualification | Enrolled | -0.0006 | 0.0007 | -0.9638 | 0.3351 | -0.0019 | 0.0007 | |
| Previous qualification | Dropout | -0.0007 | 0.0006 | -1.3064 | 0.1914 | -0.0018 | 0.0004 | |
| Previous qualification (grade) | Graduate | -0.0001 | 0.0005 | -0.1609 | 0.8722 | -0.0010 | 0.0008 | |
| Previous qualification (grade) | Enrolled | -0.0006 | 0.0005 | -1.2681 | 0.2048 | -0.0016 | 0.0003 | |
| Previous qualification (grade) | Dropout | 0.0007 | 0.0004 | 1.7119 | 0.0869 | -0.0001 | 0.0015 | . |
| Scholarship holder | Graduate | 0.0671 | 0.0014 | 49.5394 | 0.0000 | 0.0644 | 0.0697 | *** |
| Scholarship holder | Enrolled | -0.0247 | 0.0013 | -18.5165 | 0.0000 | -0.0274 | -0.0221 | *** |
| Scholarship holder | Dropout | -0.0423 | 0.0013 | -31.7345 | 0.0000 | -0.0450 | -0.0397 | *** |
| Tuition fees up to date | Graduate | 0.2435 | 0.0062 | 39.2992 | 0.0000 | 0.2313 | 0.2556 | *** |
| Tuition fees up to date | Enrolled | 0.0788 | 0.0043 | 18.4887 | 0.0000 | 0.0704 | 0.0871 | *** |
| Tuition fees up to date | Dropout | -0.3222 | 0.0086 | -37.3936 | 0.0000 | -0.3391 | -0.3053 | *** |
| Unemployment rate | Graduate | -0.0013 | 0.0019 | -0.6991 | 0.4845 | -0.0051 | 0.0024 | |
| Unemployment rate | Enrolled | -0.0080 | 0.0022 | -3.6434 | 0.0003 | -0.0123 | -0.0037 | *** |
| Unemployment rate | Dropout | 0.0093 | 0.0019 | 4.9907 | 0.0000 | 0.0057 | 0.0130 | *** |
# Plot AME signifikan
ame_sig <- ame_df %>%
filter(P_value < 0.05, Variabel != "(Intercept)") %>%
mutate(Variabel = reorder(Variabel, AME))
p_ame <- ggplot(ame_sig, aes(x = Variabel, y = AME, fill = Kategori)) +
geom_bar(stat = "identity", position = "dodge", alpha = 0.85) +
geom_errorbar(aes(ymin = CI_Low, ymax = CI_High),
position = position_dodge(0.9),
width = 0.3, linewidth = 0.5) +
geom_hline(yintercept = 0, linetype = "solid", linewidth = 0.5) +
coord_flip() +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
labs(title = "Gambar 9. Average Marginal Effects (AME) — Variabel Signifikan",
subtitle = "AME = perubahan probabilitas rata-rata per 1 unit perubahan X",
x = "Variabel", y = "AME (Perubahan Probabilitas)",
fill = "Status") +
theme_minimal(base_size = 10) +
theme(plot.title = element_text(face = "bold"))
print(p_ame)Gambar 9. Average Marginal Effects Signifikan
Interpretasi: Tabel 6 dan Gambar 9 menampilkan Average Marginal Effects (AME), yang merupakan ukuran pengaruh yang paling mudah diinterpretasikan secara praktis. AME menjawab pertanyaan: “Rata-rata, seberapa besar perubahan probabilitas suatu status akademik jika variabel X meningkat 1 unit?”
- AME positif untuk Dropout: kenaikan 1 unit variabel X meningkatkan probabilitas rata-rata mahasiswa masuk kategori Dropout.
- AME negatif untuk Dropout: kenaikan 1 unit variabel X menurunkan probabilitas rata-rata Dropout (berarti meningkatkan peluang Graduate atau Enrolled).
- AME dihitung sebagai rata-rata marginal effect di seluruh observasi pada data training, sehingga mencerminkan pengaruh “di populasi” secara lebih realistis dibandingkan hanya pada nilai rata-rata (marginal effect at the mean).
- Contoh: Jika AME untuk “Curricular units 2nd sem (approved)” terhadap Graduate = +0.05, artinya setiap tambahan 1 unit kurikuler yang lulus di semester 2 meningkatkan probabilitas lulus (Graduate) rata-rata sebesar 5 poin persentase.
AME memberikan perspektif yang melengkapi RRR, karena AME langsung berbicara dalam satuan probabilitas (0–1) yang lebih intuitif untuk komunikasi kebijakan.
pred_test <- predict(model_mlr, newdata = df_test, type = "class")
prob_test <- predict(model_mlr, newdata = df_test, type = "probs")
cm <- confusionMatrix(pred_test, df_test$Target)
cat("Confusion Matrix:\n")## Confusion Matrix:
## Reference
## Prediction Graduate Enrolled Dropout
## Graduate 390 63 40
## Enrolled 35 47 25
## Dropout 16 48 219
##
## Statistik Evaluasi Keseluruhan:
## Accuracy Kappa AccuracyLower AccuracyUpper AccuracyNull
## 7.429219e-01 5.689422e-01 7.127498e-01 7.714630e-01 4.994337e-01
## AccuracyPValue McnemarPValue
## 1.682322e-49 1.194884e-05
# Visualisasi Confusion Matrix
cm_df <- as.data.frame(cm$table)
colnames(cm_df) <- c("Prediksi", "Aktual", "Frekuensi")
p_cm <- ggplot(cm_df, aes(x = Aktual, y = Prediksi, fill = Frekuensi)) +
geom_tile(color = "white", linewidth = 0.5) +
geom_text(aes(label = Frekuensi), size = 5, fontface = "bold") +
scale_fill_gradient(low = "#E3F2FD", high = "#1565C0") +
labs(title = "Gambar 10. Confusion Matrix — Data Testing",
subtitle = sprintf("Akurasi Keseluruhan: %.2f%%",
cm$overall["Accuracy"] * 100),
x = "Status Aktual", y = "Status Prediksi", fill = "Frekuensi") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold", hjust = 0.5),
axis.text = element_text(size = 11))
print(p_cm)Gambar 10. Confusion Matrix — Data Testing
Interpretasi: Gambar 10 menampilkan Confusion Matrix yang membandingkan prediksi model dengan status aktual pada data testing. Setiap sel menunjukkan jumlah observasi:
- Diagonal utama (kiri atas ke kanan bawah): prediksi benar (Dropout diprediksi Dropout, Enrolled → Enrolled, Graduate → Graduate). Warna biru lebih gelap menandakan jumlah yang lebih besar.
- Sel di luar diagonal: kesalahan prediksi (misclassification). Perhatikan ke arah mana kesalahan lebih sering terjadi (misalnya, apakah Dropout sering diprediksi sebagai Graduate atau Enrolled?).
Akurasi keseluruhan = total prediksi benar / total observasi. Namun, karena distribusi kelas tidak seimbang, akurasi saja bisa menyesatkan; metrik per-kelas (Sensitivity, Specificity, F1-Score) perlu dicermati lebih lanjut pada tabel berikutnya.
eval_per_class <- data.frame(
Kelas = rownames(cm$byClass),
Sensitivity = round(cm$byClass[, "Sensitivity"], 4),
Specificity = round(cm$byClass[, "Specificity"], 4),
Precision = round(cm$byClass[, "Pos Pred Value"], 4),
F1_Score = round(cm$byClass[, "F1"], 4)
)
rownames(eval_per_class) <- NULL
eval_per_class$Kelas <- gsub("Class: ", "", eval_per_class$Kelas)
kable(eval_per_class,
col.names = c("Kelas", "Sensitivity", "Specificity", "Precision", "F1-Score"),
caption = "Tabel 7. Performa Model MLR per Kelas") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE, position = "left", font_size = 13) %>%
column_spec(1, bold = TRUE)| Kelas | Sensitivity | Specificity | Precision | F1-Score |
|---|---|---|---|---|
| Graduate | 0.8844 | 0.7670 | 0.7911 | 0.8351 |
| Enrolled | 0.2975 | 0.9172 | 0.4393 | 0.3547 |
| Dropout | 0.7711 | 0.8932 | 0.7739 | 0.7725 |
Interpretasi: Tabel 7 menyajikan metrik evaluasi untuk masing-masing kelas target:
- Sensitivity (Recall): proporsi data aktual kelas X yang berhasil diprediksi dengan benar. Sensitivity tinggi berarti model jarang “melewatkan” kasus dari kelas tersebut.
- Specificity: proporsi data aktual yang bukan kelas X yang berhasil diidentifikasi sebagai bukan kelas X. Specificity tinggi berarti model tidak sering salah mengklasifikasikan kelas lain sebagai kelas X.
- Precision: proporsi prediksi kelas X yang memang benar-benar kelas X. Precision tinggi berarti ketika model memprediksi “Dropout”, prediksinya banyak yang tepat.
- F1-Score: rata-rata harmonik Precision dan Recall; merupakan metrik keseimbangan yang berguna saat kelas tidak seimbang.
Perhatikan perbedaan performa antar kelas: kelas dengan lebih banyak data (misalnya Graduate) cenderung memiliki metrik yang lebih tinggi. Kelas Dropout atau Enrolled yang lebih sedikit mungkin memiliki F1-Score lebih rendah, yang menjadi area fokus untuk perbaikan model jika diperlukan.
model_summary_tbl <- data.frame(
Metrik = c("Akurasi", "Kappa",
"Log-Likelihood (Full)", "Log-Likelihood (Null)",
"G² (Chi-Square)", "P-value G²",
"Jumlah Observasi Training", "Jumlah Observasi Testing",
"Jumlah Prediktor", "Kategori Referensi"),
Nilai = c(
sprintf("%.2f%%", cm$overall["Accuracy"] * 100),
sprintf("%.4f", cm$overall["Kappa"]),
sprintf("%.4f", ll_model),
sprintf("%.4f", ll_null),
sprintf("%.4f", G2),
format(p_g2, scientific = TRUE, digits = 4),
nrow(df_train),
nrow(df_test),
length(predictor_cols),
"Graduate"
)
)
kable(model_summary_tbl,
col.names = c("Metrik", "Nilai"),
caption = "Tabel 8. Ringkasan Performa Model Multinomial Logistic Regression") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE, position = "left", font_size = 13) %>%
column_spec(1, bold = TRUE)| Metrik | Nilai |
|---|---|
| Akurasi | 74.29% |
| Kappa | 0.5689 |
| Log-Likelihood (Full) | -1952.0376 |
| Log-Likelihood (Null) | -3611.6229 |
| G² (Chi-Square) | 3319.1707 |
| P-value G² | 0e+00 |
| Jumlah Observasi Training | 3541 |
| Jumlah Observasi Testing | 883 |
| Jumlah Prediktor | 36 |
| Kategori Referensi | Graduate |
Interpretasi: Tabel 8 merangkum seluruh metrik performa model dalam satu tabel komprehensif:
- Akurasi: persentase prediksi benar secara keseluruhan pada data testing. Nilai yang lebih tinggi (mendekati 100%) menunjukkan model yang lebih baik.
- Kappa (Cohen’s Kappa): mengukur akurasi yang disesuaikan dengan kemungkinan prediksi benar secara kebetulan. Kappa > 0.6 umumnya dianggap baik; Kappa > 0.8 sangat baik. Kappa lebih fair daripada akurasi saat data tidak seimbang.
- Log-Likelihood: mengukur kesesuaian model; semakin mendekati nol (kurang negatif), semakin baik fit model.
- G² dan p-value: mengonfirmasi bahwa model secara keseluruhan signifikan melebihi model null.
prob_df <- as.data.frame(prob_test)
prob_df$Aktual <- df_test$Target
prob_df$Prediksi <- pred_test
prob_df$Benar <- prob_df$Aktual == prob_df$Prediksi
prob_long <- prob_df %>%
pivot_longer(cols = c(Graduate, Enrolled, Dropout),
names_to = "Kategori", values_to = "Probabilitas") %>%
mutate(Kategori = factor(Kategori, levels = c("Graduate", "Enrolled", "Dropout")))
p_prob <- ggplot(prob_long, aes(x = Aktual, y = Probabilitas, fill = Kategori)) +
geom_boxplot(alpha = 0.75, outlier.size = 0.5) +
facet_wrap(~ Kategori) +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
labs(title = "Gambar 11. Distribusi Probabilitas Prediksi per Status Aktual",
subtitle = "Model yang baik: probabilitas tertinggi pada kategori yang benar",
x = "Status Aktual", y = "Probabilitas Prediksi",
fill = "Kategori Prediksi") +
theme_minimal(base_size = 11) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
print(p_prob)Gambar 11. Distribusi Probabilitas Prediksi per Status Aktual
Interpretasi: Gambar 11 menampilkan distribusi probabilitas prediksi model, dibagi per kategori yang diprediksi dan per status aktual mahasiswa. Sebuah model yang baik akan menunjukkan pola berikut:
- Panel Graduate: probabilitas prediksi Graduate tertinggi ketika status aktual adalah Graduate (boxplot dengan nilai tinggi pada kolom “Graduate”).
- Panel Enrolled: probabilitas prediksi Enrolled tertinggi ketika status aktual adalah Enrolled.
- Panel Dropout: probabilitas prediksi Dropout tertinggi ketika status aktual adalah Dropout.
Jika boxplot probabilitas pada kategori yang “sesuai” lebih tinggi dibandingkan kategori lain, model memiliki kemampuan diskriminasi yang baik. Tumpang tindih distribusi antar kolom status aktual menunjukkan area di mana model masih kesulitan membedakan kategori.
acc_by_class <- prob_df %>%
group_by(Aktual) %>%
summarise(
Total = n(),
Benar = sum(Benar),
Akurasi = round(Benar / Total * 100, 2)
)
p_acc <- ggplot(acc_by_class, aes(x = Aktual, y = Akurasi, fill = Aktual)) +
geom_bar(stat = "identity", alpha = 0.85, width = 0.6) +
geom_text(aes(label = paste0(Akurasi, "%\n(", Benar, "/", Total, ")")),
vjust = -0.3, size = 4, fontface = "bold") +
geom_hline(yintercept = cm$overall["Accuracy"] * 100,
linetype = "dashed", color = "gray40", linewidth = 0.8) +
scale_fill_manual(values = c("Graduate" = "#2196F3",
"Enrolled" = "#4CAF50",
"Dropout" = "#F44336")) +
scale_y_continuous(limits = c(0, 115)) +
annotate("text", x = 0.6,
y = cm$overall["Accuracy"] * 100 + 3,
label = sprintf("Akurasi Rata-rata: %.2f%%", cm$overall["Accuracy"] * 100),
color = "gray40", size = 3.5) +
labs(title = "Gambar 12. Akurasi Prediksi per Status Akademik",
x = "Status Aktual", y = "Akurasi (%)") +
theme_minimal(base_size = 12) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
print(p_acc)Gambar 12. Akurasi Prediksi per Status Akademik
Interpretasi: Gambar 12 menampilkan akurasi prediksi model untuk masing-masing kelas, dilengkapi jumlah prediksi benar dibanding total observasi per kelas. Garis putus-putus menandai akurasi rata-rata keseluruhan.
- Akurasi kelas Graduate biasanya paling tinggi karena memiliki sampel paling banyak (model lebih “terlatih” untuk mengenali pola Graduate).
- Akurasi kelas Enrolled cenderung paling rendah karena karakteristiknya berada di tengah antara Dropout dan Graduate, sehingga lebih sulit dibedakan.
- Akurasi Dropout biasanya moderat — pola akademik yang buruk (nilai rendah, sedikit unit lulus) memberikan sinyal yang relatif jelas.
Perbedaan akurasi antar kelas ini mengindikasikan bahwa model bekerja berbeda untuk setiap segmen mahasiswa. Untuk aplikasi praktis (misalnya sistem peringatan dini dropout), sensitivitas terhadap kelas Dropout menjadi prioritas utama meskipun akurasi keseluruhannya mungkin lebih rendah.
model_summary_final <- data.frame(
Metrik = c("Akurasi", "Kappa", "G² Statistik", "P-value G²",
"Data Training", "Data Testing", "Jumlah Prediktor",
"Kategori Referensi"),
Nilai = c(
sprintf("%.2f%%", cm$overall["Accuracy"] * 100),
sprintf("%.4f", cm$overall["Kappa"]),
sprintf("%.4f", G2),
format(p_g2, scientific = TRUE, digits = 4),
nrow(df_train),
nrow(df_test),
length(predictor_cols),
"Graduate"
)
)
kable(model_summary_final,
col.names = c("Metrik", "Nilai"),
caption = "Tabel 9. Ringkasan Akhir Model Multinomial Logistic Regression") %>%
kable_styling(bootstrap_options = c("striped", "hover"),
full_width = FALSE, position = "left", font_size = 13) %>%
column_spec(1, bold = TRUE)| Metrik | Nilai |
|---|---|
| Akurasi | 74.29% |
| Kappa | 0.5689 |
| G² Statistik | 3319.1707 |
| P-value G² | 0e+00 |
| Data Training | 3541 |
| Data Testing | 883 |
| Jumlah Prediktor | 36 |
| Kategori Referensi | Graduate |
Berdasarkan analisis Multinomial Logistic Regression terhadap data Status Akademik Mahasiswa (Polytechnic Institute of Portalegre), diperoleh kesimpulan sebagai berikut:
1. Keseluruhan Model Signifikan Uji serentak (G²/LRT) menghasilkan p-value yang sangat kecil (< 0.05), sehingga H₀ ditolak. Artinya, secara bersama-sama variabel prediktor yang digunakan signifikan memengaruhi status akademik mahasiswa dibandingkan model tanpa prediktor.
2. Variabel Akademik Semester Paling Dominan Variabel yang paling konsisten signifikan pada uji parsial Wald adalah variabel unit kurikuler semester 1 dan 2 yang berhasil lulus (approved). Mahasiswa yang menyelesaikan lebih banyak matakuliah di semester awal memiliki odds yang jauh lebih rendah untuk Dropout dan lebih tinggi untuk Graduate, mengonfirmasi bahwa performa semester awal adalah prediktor terkuat keberhasilan studi.
3. Performa Model Model mencapai akurasi 74.29% pada data testing dengan Kappa 0.5689, yang menunjukkan kemampuan diskriminasi yang baik. Performa terbaik diperoleh pada kelas Graduate; kelas Enrolled relatif lebih sulit diprediksi karena posisinya yang berada di antara Dropout dan Graduate.
4. Implikasi Praktis - Institusi pendidikan dapat menggunakan model ini sebagai sistem peringatan dini untuk mengidentifikasi mahasiswa berisiko Dropout sejak semester pertama. - Intervensi yang tepat sasaran (misalnya tutoring akademik, konseling finansial) dapat difokuskan pada mahasiswa dengan profil prediktor yang menunjukkan risiko tinggi. - Variabel sosial-ekonomi seperti status debitur dan keterkinian pembayaran biaya kuliah juga signifikan, menunjukkan pentingnya dukungan finansial dalam keberhasilan studi.
5. Keterbatasan - Model menggunakan seluruh prediktor tanpa seleksi fitur, sehingga terdapat potensi multikolinearitas yang dapat memengaruhi stabilitas estimasi koefisien individual. - Ketidakseimbangan kelas (class imbalance) dapat menyebabkan bias terhadap kelas mayoritas; teknik seperti SMOTE atau penimbangan kelas dapat diterapkan untuk meningkatkan performa pada kelas minoritas.
Analisis selesai. Dokumen ini dibuat menggunakan R Markdown dengan referensi metodologi dari Modul 3 (Analisis Clustering) dan Modul 4 Part 2 (Multinomial Logistic Regression).