Klasifikasi Peringkat Akreditasi Sekolah Menggunakan Regresi Ordinal

Author

Windi Pangesti

Published

October 2, 2025

Analisis ini bertujuan untuk memprediksi peringkat akreditasi sekolah di Sumatera Selatan dan Sumatera Barat menggunakan Regresi Logistik Ordinal , serta menilai akurasi skor literasi dan numerasi 2023–2024 sebagai prediktor akreditasi.

1 Prediksi : Regresi Logistik

1.1 Library dan Load data

library(openxlsx)
Warning: package 'openxlsx' was built under R version 4.5.1
library(writexl)
Warning: package 'writexl' was built under R version 4.5.1
library(readxl)
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.1

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# panggil data keseluruhan
tugas2_a <- read_excel("Data Tugas-22.xlsx", sheet="Data-1")
tugas2_b <- read_excel("Data Tugas-22.xlsx", sheet="Data-2")

# Pilih data
# SKEMA 1: Data orisinil
data1 <- tugas2_a[tugas2_a$Kelompok == 8, c("Lit_2023", "Lit_2024", "Num_2023", "Num_2024", "Akreditasi")]

# SKEMA 2: 4 Prediktor (setelah hapus anomali)
data2 <- tugas2_b[tugas2_b$Kelompok == 8, c("Lit_2023", "Lit_2024", "Num_2023", "Num_2024", "Akreditasi")]

# SKEMA 3: Semua prediktor + FE
data3 <- tugas2_b[tugas2_b$Kelompok == 8, c("Lit_2023", "Lit_2024", "Num_2023", "Num_2024", "del_lit", "del_num", "del_litnum", "avg_litnum", "avg_2023", "avg_2024", "Akreditasi")]

# SKEMA 4: Hanya FE delta dan rataan
data4 <- tugas2_b[tugas2_b$Kelompok == 8, c("del_lit", "del_num", "del_litnum", "avg_litnum", "avg_2023", "avg_2024", "Akreditasi")]

# SKEMA 5: Hanya FE bagian delta
data5 <- tugas2_b[tugas2_b$Kelompok == 8, c("del_lit", "del_num", "del_litnum", "Akreditasi")]

# SKEMA 6: Hanya FE bagian rataan
data6 <- tugas2_b[tugas2_b$Kelompok == 8, c("avg_litnum", "avg_2023", "avg_2024", "Akreditasi")]

# SKEMA 7: Prediktor asli + FE rataan
data7 <- tugas2_b[tugas2_b$Kelompok == 8, c("Lit_2023", "Lit_2024", "Num_2023", "Num_2024", "avg_litnum", "avg_2023", "avg_2024", "Akreditasi")]

# ubah jadi kategorik
data1 <- data1 %>% 
      mutate(across(where(is.character),as.factor))

data2 <- data2 %>% 
      mutate(across(where(is.character),as.factor))

data3 <- data3 %>% 
      mutate(across(where(is.character),as.factor))

data4 <- data4 %>% 
      mutate(across(where(is.character),as.factor))

data5 <- data5 %>% 
      mutate(across(where(is.character),as.factor))

data6 <- data6 %>% 
      mutate(across(where(is.character),as.factor))

data7 <- data7 %>% 
      mutate(across(where(is.character),as.factor))

1.2 Cek multikolinearitas

library(car)
Warning: package 'car' was built under R version 4.5.1
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
# gabungkan semua dataset ke dalam list
data_list <- list(
  data1 = data1,
  data2 = data2,
  data3 = data3,
  data4 = data4,
  data5 = data5,
  data6 = data6,
  data7 = data7
)

# loop cek multikol
for (nm in names(data_list)) {
  cat("\n====================\n")
  cat("Dataset:", nm, "\n")
  
  df <- data_list[[nm]]
  
  # ambil semua variabel numerik
  pred_vars <- df |> dplyr::select(where(is.numeric)) |> names()
  
  # pastikan minimal ada >1 prediktor
  if (length(pred_vars) < 2) {
    cat("⚠️ Variabel numerik kurang untuk hitung VIF\n")
    next
  }
  
  # buat formula (pakai variabel pertama sebagai respon dummy)
  form <- as.formula(
    paste(pred_vars[1], "~", paste(pred_vars[-1], collapse = " + "))
  )
  
  # cek VIF dengan penanganan error
  out <- tryCatch({
    model <- lm(form, data = df)
    vif(model)
  }, error = function(e) {
    message("⚠️ Aliased coefficients terdeteksi (multikol kuat).")
    return(NULL)
  })
  
  print(out)
}

====================
Dataset: data1 
Lit_2024 Num_2023 Num_2024 
4.146151 2.356746 3.487893 

====================
Dataset: data2 
Lit_2024 Num_2023 Num_2024 
4.081232 2.244891 3.560247 

====================
Dataset: data3 
⚠️ Aliased coefficients terdeteksi (multikol kuat).
NULL

====================
Dataset: data4 
⚠️ Aliased coefficients terdeteksi (multikol kuat).
NULL

====================
Dataset: data5 
Warning in summary.lm(object, ...): essentially perfect fit: summary may be
unreliable
   del_num del_litnum 
  2.807796   2.807796 

====================
Dataset: data6 
Warning in summary.lm(object, ...): essentially perfect fit: summary may be
unreliable
avg_2023 avg_2024 
2.604528 2.604528 

====================
Dataset: data7 
⚠️ Aliased coefficients terdeteksi (multikol kuat).
NULL

Karena adanya pelanggaran asumsi multikolinearitas, maka skema yang diterapkan dalam membandingkan KNN dan regresi logistik adalah skema 1 dan skema 2 pada dua kondisi yaitu tanpa penanganan imbalance dan dengan penanganan imbalance

1.3 Fungsi Cross Validation

run_ordinal_final <- function(data, skema_name, seed = 123) {
  library(caret)
  library(MASS)
  library(dplyr)
  
  set.seed(seed)
  
  # 1. Split data
  data$Akreditasi <- ordered(data$Akreditasi, levels = c("C", "B", "A"))
  train_index <- createDataPartition(data$Akreditasi, p = 0.8, list = FALSE)
  train_data <- data[train_index, ]
  test_data  <- data[-train_index, ]
  
  cat("=== ORDINAL REGRESSION -", skema_name, "===\n")
  
  # 2. Model
  final_model <- polr(Akreditasi ~ ., data = train_data, Hess = TRUE)
  
  # 3. Prediksi + Probabilities
  pred_test <- predict(final_model, newdata = test_data, type = "class")
  pred_probs <- predict(final_model, newdata = test_data, type = "prob")  # ← TAMBAH INI
  
  # 4. Metrics dengan CARET
  cm <- confusionMatrix(pred_test, test_data$Akreditasi)
  
  # Ambil metrics dari caret
  accuracy <- cm$overall["Accuracy"]
  kappa <- cm$overall["Kappa"]
  by_class <- cm$byClass
  
  # Hitung averages
  precision <- mean(by_class[,"Precision"], na.rm = TRUE)
  recall <- mean(by_class[,"Recall"], na.rm = TRUE)
  f_meas <- mean(by_class[,"F1"], na.rm = TRUE)
  specificity <- mean(by_class[,"Specificity"], na.rm = TRUE)
  bal_accuracy <- mean(by_class[,"Balanced Accuracy"], na.rm = TRUE)
  
  # Buat metrics dataframe
  metrics_result <- data.frame(
    .metric = c("accuracy", "bal_accuracy", "kap", "precision", "recall", "f_meas", "specificity"),
    .estimator = "macro",
    .estimate = c(accuracy, bal_accuracy, kappa, precision, recall, f_meas, specificity)
  )
  
  # 5. Return results (DENGAN PROBABILITIES)
  return(list(
    train_data = train_data,
    test_data = test_data,
    final_model = final_model,
    predictions = data.frame(
      .pred_class = pred_test,
      .pred_C = pred_probs[,1],  # Probabilitas kelas C
      .pred_B = pred_probs[,2],  # Probabilitas kelas B
      .pred_A = pred_probs[,3],  # Probabilitas kelas A
      Akreditasi = test_data$Akreditasi
    ),
    metrics_overall = metrics_result,
    confusion_matrix = cm$table,
    confusion_result = cm
  ))
}

1.4 Fungsi untuk melihat hasil pemodelan

lihat_hasil_lengkap <- function(result, nama_skema) {
  cat("HASIL LENGKAP REGRESI ORDINAL -", nama_skema, "\n")
  
  # 1. Informasi Model
  cat("🎯 INFORMASI MODEL:\n")
  cat("Model: Regresi Logistik Ordinal\n")
  cat("Family: Cumulative (Proportional Odds)\n") 
  cat("Training data:", nrow(result$train_data), "observasi\n")
  cat("Test data:", nrow(result$test_data), "observasi\n")
  cat("\n")
  
  # 2. Metrics Evaluasi
  cat("📊 METRICS EVALUASI (Test Set):\n")
  print(result$metrics_overall)
  cat("\n")
  
  # 3. CONTOH PREDIKSI dengan PROBABILITIES
  cat("🔮 CONTOH PREDIKSI + PROBABILITAS (5 observasi pertama):\n")
  contoh_prediksi <- result$predictions %>%
    dplyr::select(.pred_class, .pred_C, .pred_B, .pred_A, Akreditasi) %>%
    head(5)
  print(contoh_prediksi)
  cat("\n")
  
  # 4. Coefficients
  cat("🔍 KOEFISIEN MODEL:\n")
  print(coef(result$final_model))
  cat("\n")
  
  # 5. Odds Ratio
  cat("📈 ODDS RATIO:\n")
  print(exp(coef(result$final_model)))
  cat("\n")
  
  # 6. Confusion Matrix
  cat("📋 CONFUSION MATRIX:\n")
  print(result$confusion_matrix)
  cat("\n")
}

1.5 Hasil Pemodelan

results_final <- list()
results_final[[1]] <- run_ordinal_final(data1, "SKEMA 1 - DATA ORISINIL")
Warning: package 'caret' was built under R version 4.5.1
Loading required package: ggplot2
Warning: package 'ggplot2' was built under R version 4.5.1
Loading required package: lattice
Warning: package 'MASS' was built under R version 4.5.1

Attaching package: 'MASS'
The following object is masked from 'package:dplyr':

    select
=== ORDINAL REGRESSION - SKEMA 1 - DATA ORISINIL ===
results_final[[2]] <- run_ordinal_final(data2, "SKEMA 2 - DATA HAPUS ANOMALI")
=== ORDINAL REGRESSION - SKEMA 2 - DATA HAPUS ANOMALI ===
# Lihat hasil dengan probabilities
lihat_hasil_lengkap(results_final[[1]], "SKEMA 1 - DATA ORISINIL")
HASIL LENGKAP REGRESI ORDINAL - SKEMA 1 - DATA ORISINIL 
🎯 INFORMASI MODEL:
Model: Regresi Logistik Ordinal
Family: Cumulative (Proportional Odds)
Training data: 300 observasi
Test data: 72 observasi

📊 METRICS EVALUASI (Test Set):
       .metric .estimator .estimate
1     accuracy      macro 0.5555556
2 bal_accuracy      macro 0.6343344
3          kap      macro 0.2941176
4    precision      macro 0.5060458
5       recall      macro 0.5029762
6       f_meas      macro 0.4915647
7  specificity      macro 0.7656926

🔮 CONTOH PREDIKSI + PROBABILITAS (5 observasi pertama):
  .pred_class    .pred_C   .pred_B   .pred_A Akreditasi
1           B 0.10543749 0.5116873 0.3828752          A
2           A 0.03144673 0.2760327 0.6925205          A
3           A 0.07557490 0.4522790 0.4721461          A
4           A 0.01172271 0.1278484 0.8604289          A
5           A 0.01928076 0.1926044 0.7881149          A

🔍 KOEFISIEN MODEL:
  Lit_2023   Lit_2024   Num_2023   Num_2024 
0.02958486 0.01476253 0.01346368 0.01258815 

📈 ODDS RATIO:
Lit_2023 Lit_2024 Num_2023 Num_2024 
1.030027 1.014872 1.013555 1.012668 

📋 CONFUSION MATRIX:
          Reference
Prediction  C  B  A
         C  3  5  0
         B 11 14  5
         A  2  9 23
lihat_hasil_lengkap(results_final[[2]], "SKEMA 1 - DATA HAPUS ANOMALI")
HASIL LENGKAP REGRESI ORDINAL - SKEMA 1 - DATA HAPUS ANOMALI 
🎯 INFORMASI MODEL:
Model: Regresi Logistik Ordinal
Family: Cumulative (Proportional Odds)
Training data: 293 observasi
Test data: 72 observasi

📊 METRICS EVALUASI (Test Set):
       .metric .estimator .estimate
1     accuracy      macro 0.5833333
2 bal_accuracy      macro 0.6728896
3          kap      macro 0.3406593
4    precision      macro 0.6538721
5       recall      macro 0.5714286
6       f_meas      macro 0.5928321
7  specificity      macro 0.7743506

🔮 CONTOH PREDIKSI + PROBABILITAS (5 observasi pertama):
  .pred_class    .pred_C   .pred_B   .pred_A Akreditasi
1           B 0.10496310 0.5203496 0.3746873          A
2           A 0.02839396 0.2653316 0.7062744          A
3           B 0.07581287 0.4628024 0.4613847          A
4           B 0.15701954 0.5690641 0.2739164          A
5           A 0.01411942 0.1551845 0.8306960          A

🔍 KOEFISIEN MODEL:
  Lit_2023   Lit_2024   Num_2023   Num_2024 
0.02625930 0.01019023 0.02451442 0.01673332 

📈 ODDS RATIO:
Lit_2023 Lit_2024 Num_2023 Num_2024 
1.026607 1.010242 1.024817 1.016874 

📋 CONFUSION MATRIX:
          Reference
Prediction  C  B  A
         C  8  1  0
         B  8 14  8
         A  0 13 20

1.6 Fungsi Heatmap

buat_heatmap <- function(result, nama_skema) {
  library(ggplot2)
  
  # Convert confusion matrix to dataframe
  cm_df <- as.data.frame(result$confusion_matrix)
  
  # Buat heatmap
  heatmap_plot <- ggplot(cm_df, aes(x = Reference, y = Prediction, fill = Freq)) +
    geom_tile(color = "white", linewidth = 1) +
    geom_text(aes(label = Freq), color = "black", size = 10, fontface = "bold") +
    scale_fill_gradient(low = "skyblue", high = "blue") +
    labs(
      title = paste("Confusion Matrix -", nama_skema),
      x = "Actual Akreditasi",
      y = "Predicted Akreditasi"
    ) +
    theme_minimal()
  
  return(heatmap_plot)
}

1.7 Visualisasi heatmap

# Buat heatmap untuk skema 1
heatmap_skema1 <- buat_heatmap(results_final[[1]], "SKEMA 1 - DATA ORISINIL")
print(heatmap_skema1)

# Buat heatmap untuk skema 2  
heatmap_skema2 <- buat_heatmap(results_final[[2]], "SKEMA 2 - DATA HAPUS ANOMALI")
print(heatmap_skema2)

1.8 Melakukan Prediksi pada Data Testing

lihat_prediksi <- function(result, nama_skema, n = 5) {
  cat("PREDIKSI -", nama_skema, "\n")
  cat("Contoh", n, "observasi dari test set:\n\n")
  
  pred_df <- result$predictions %>%
    dplyr::select(.pred_class, Akreditasi) %>%
    dplyr::rename(Predicted = .pred_class,
                  Actual = Akreditasi) %>%
    head(n)
  
  print(pred_df)
}

# Lihat prediksi skema 1
lihat_prediksi(results_final[[1]], "SKEMA 1", 15)
PREDIKSI - SKEMA 1 
Contoh 15 observasi dari test set:

   Predicted Actual
1          B      A
2          A      A
3          A      A
4          A      A
5          A      A
6          A      A
7          A      A
8          A      A
9          A      A
10         A      A
11         A      A
12         A      A
13         A      A
14         A      A
15         A      A
# Lihat prediksi skema 2
lihat_prediksi(results_final[[2]], "SKEMA 2", 15)
PREDIKSI - SKEMA 2 
Contoh 15 observasi dari test set:

   Predicted Actual
1          B      A
2          A      A
3          B      A
4          B      A
5          A      A
6          A      A
7          A      A
8          A      A
9          A      A
10         A      A
11         A      A
12         A      A
13         A      A
14         A      A
15         A      A

2 Inferensi: Model regresi logistik ordinal

data2$Akreditasi <- ordered(data2$Akreditasi, levels = c("C","B","A"))

2.1 Estimasi Parameter

# Packages
library(MASS)
library(car)
library(brant)
Warning: package 'brant' was built under R version 4.5.1
# skema2
model1 <- polr(Akreditasi ~ Lit_2023 + Num_2023 + Lit_2024 + Num_2024, 
                   data = data2, Hess = TRUE)

2.2 Uji Asumsi

# UJI PROPORTIONAL ODDS dengan Brant Test
brant::brant(model1)
-------------------------------------------- 
Test for    X2  df  probability 
-------------------------------------------- 
Omnibus     8.04    4   0.09
Lit_2023    0.19    1   0.67
Num_2023    1.16    1   0.28
Lit_2024    0.84    1   0.36
Num_2024    0.38    1   0.54
-------------------------------------------- 

H0: Parallel Regression Assumption holds

Hasil uji Brant menunjukkan nilai p -value omnibus = 0,09 > 0.05. Maka tidak cukup bukti untuk menolak hipotesis nol, sehingga asumsi proportional odds terpenuhi

# Uji Multikolinearitas
vif(model1)
Lit_2023 Num_2023 Lit_2024 Num_2024 
2.108742 2.114468 2.554486 2.380723 

Nilai VIF seluruh prediktor memiliki nilai < 5, sehingga dapat disimpulkan tidak terdapat masalah multikolinearitas dalam model.

2.3 Uji Signifikansi - Simultan

# GOODNESS OF FIT
null_model <- polr(Akreditasi ~ 1, data = data2, Hess = TRUE)
lr_test <- anova(null_model, model1)
print(lr_test)
Likelihood ratio tests of ordinal regression models

Response: Akreditasi
                                      Model Resid. df Resid. Dev   Test    Df
1                                         1       363   780.0582             
2 Lit_2023 + Num_2023 + Lit_2024 + Num_2024       359   569.0644 1 vs 2     4
  LR stat. Pr(Chi)
1                 
2 210.9939       0

Hasil Uji Likelihood Ratio menunjukkan pvalue < 0.05, sehingga model regresi logistik ordinal signifikan secara simultan, yang berarti keempat prediktor secara bersama-sama berpengaruh terhadap akreditasi.

2.4 Uji Signifikansi - Partial

cat("\n=== COEFFICIENTS WITH P-VALUES ===\n")

=== COEFFICIENTS WITH P-VALUES ===
coef_table <- coef(summary(model1))
p_values <- pnorm(abs(coef_table[, "t value"]), lower.tail = FALSE) * 2
final_table <- cbind(coef_table, "p value" = round(p_values, 4))
print(final_table)
              Value  Std. Error   t value p value
Lit_2023 0.03129987 0.007500649  4.172955  0.0000
Num_2023 0.01631026 0.008803754  1.852648  0.0639
Lit_2024 0.01655386 0.008499779  1.947564  0.0515
Num_2024 0.01343469 0.008753281  1.534817  0.1248
C|B      2.90229937 0.380948216  7.618619  0.0000
B|A      5.61509915 0.478353788 11.738381  0.0000

Interpretasi Uji Parsial

  • Lit_2023: p-value = 0.0000 (< 0.05) → signifikan

  • Num_2023: p-value = 0.0639 (> 0.05) → tidak signifikan.

  • Lit_2024: p-value = 0.0515 (> 0.05) → tidak signifikan

  • Num_2024: p-value = 0.1248 (> 0.05) → tidak signifikan.

2.5 Odds Ratio

# ODDS RATIO
odds_ratio <- exp(cbind(OR = coef(model1), confint(model1)))
Waiting for profiling to be done...
print(odds_ratio)
               OR     2.5 %   97.5 %
Lit_2023 1.031795 1.0167854 1.047203
Num_2023 1.016444 0.9991380 1.034313
Lit_2024 1.016692 0.9998890 1.033848
Num_2024 1.013525 0.9963227 1.031200

2.6 Persamaan Logit

summary(model1)
Call:
polr(formula = Akreditasi ~ Lit_2023 + Num_2023 + Lit_2024 + 
    Num_2024, data = data2, Hess = TRUE)

Coefficients:
           Value Std. Error t value
Lit_2023 0.03130   0.007501   4.173
Num_2023 0.01631   0.008804   1.853
Lit_2024 0.01655   0.008500   1.948
Num_2024 0.01343   0.008753   1.535

Intercepts:
    Value   Std. Error t value
C|B  2.9023  0.3809     7.6186
B|A  5.6151  0.4784    11.7384

Residual Deviance: 569.0644 
AIC: 581.0644