Tujuan Analisis: Menerapkan metode Small Area Estimation (SAE) untuk meningkatkan presisi estimasi statistik tingkat kabupaten/kota di Pulau Sulawesi. Dua pendekatan utama yang digunakan:
Pemodelan dilakukan dalam dua skenario: Global (seluruh Sulawesi) dan Per Provinsi.
Alur Analisis:
📂 Data Input
├── Estimasi Direct (KabKota)
├── Data Podes KabKota 2024 (Kovariat)
└── Data Penduduk
↓
🔧 Preprocessing & Transformasi Variabel
↓
📊 Seleksi Variabel (Korelasi + Stepwise + VIF)
↓
┌────────────────────────────────────┐
│ SAE Fay-Herriot │ SAE HB Beta │
│ - Global │ - Global │
│ - Per Provinsi │ - Per Prov │
└────────────────────────────────────┘
↓
📈 Perbandingan RSE Direct vs SAE
Paket yang digunakan mencakup: manajemen data (dplyr,
tidyr), pemilihan model (MASS,
car), visualisasi (ggplot2), dan estimasi area
kecil (sae, saeHB).
# ── Manajemen data ──────────────────────────────────────────
library(readxl)
library(dplyr)
library(tidyr)
# ── Pemilihan model & diagnostik ───────────────────────────
library(MASS) # stepAIC
library(car) # VIF
library(corrplot)
# ── Visualisasi ─────────────────────────────────────────────
library(ggplot2)
# ── Small Area Estimation ───────────────────────────────────
library(sae) # Fay-Herriot (EBLUP)
library(saeHB) # Hierarchical Bayes BetaTiga sumber data yang digunakan dalam analisis ini:
| Dataset | Deskripsi |
|---|---|
Estimasi_531 |
Estimasi direct (y) dan RSE per KabKota |
Podes_kab2024 |
Variabel kovariat dari Podes 2024 |
Data_Penduduk |
Jumlah penduduk untuk normalisasi |
# ── Dataset utama ─────────────────────────────────────────
Estimasi_531 <- read_excel("C:/Users/User/Downloads/Hasil_sulawesi_KabKota.xlsx")
Podes_kab2024 <- read_excel("C:/Users/User/Downloads/Podes_kab_lengkap.xlsx")
Data_Penduduk <- read_excel("C:/Users/User/Downloads/Data_Penduduk.xlsx")
# ── Cek struktur data ─────────────────────────────────────
str(Estimasi_531)## tibble [81 × 8] (S3: tbl_df/tbl/data.frame)
## $ Kako : num [1:81] 7101 7102 7103 7104 7105 ...
## $ Estimasi: num [1:81] 12.11 5.42 11.4 6.18 12.58 ...
## $ SE : num [1:81] 7.5 2.37 5.55 3.92 5.23 5.1 7.39 5.48 7.25 7.94 ...
## $ VAR : num [1:81] 56.25 5.62 30.8 15.37 27.35 ...
## $ CI LOWER: num [1:81] 0 0.77 0.53 0 2.33 0.52 6.44 1.83 4.24 1.02 ...
## $ CI UPPER: num [1:81] 26.8 10.1 22.3 13.9 22.8 ...
## $ RSE : num [1:81] 62 43.7 48.7 63.4 41.6 ...
## $ DEFF : num [1:81] 3.03 0.74 1.63 0.91 0.88 1.59 1.89 1.46 1.74 2.66 ...
## tibble [514 × 108] (S3: tbl_df/tbl/data.frame)
## $ kode_kab : chr [1:514] "1101" "1102" "1103" "1104" ...
## $ n_desa : num [1:514] 138 119 260 385 515 295 321 603 731 609 ...
## $ X1 : num [1:514] 37 31.9 31.5 14.3 23.3 ...
## $ X2 : num [1:514] 92.8 89.9 92.3 93 82.7 ...
## $ X8 : num [1:514] 100 94.1 100 100 99.6 ...
## $ X9 : num [1:514] 65.2 46.2 32.3 29.4 67.2 ...
## $ X13 : num [1:514] 90.6 85.7 83.1 92.7 92.8 ...
## $ X14 : num [1:514] 81.9 86.6 66.9 65.2 82.3 ...
## $ X15 : num [1:514] 79.7 79.8 65.8 63.9 79 ...
## $ X16 : num [1:514] 50 69.7 58.8 40 75.3 ...
## $ X17 : num [1:514] 53.6 43.7 30.4 27.8 63.1 ...
## $ X18 : num [1:514] 51.4 47.9 27.7 13.8 58.3 ...
## $ X19 : num [1:514] 0 0.84 0 0 0.194 ...
## $ X21 : num [1:514] 0 1.681 0.385 3.636 0.194 ...
## $ X22 : num [1:514] 100 100 100 100 100 100 100 100 100 100 ...
## $ X23 : num [1:514] 2.899 4.202 0.385 0 0.194 ...
## $ X24 : num [1:514] 2.17 13.45 4.62 1.82 3.5 ...
## $ X25 : num [1:514] 4.35 21.01 5.77 3.12 1.36 ...
## $ X26 : num [1:514] 0 5.882 1.538 2.597 0.388 ...
## $ X32 : num [1:514] 0 0 0 0 0.388 ...
## $ X37 : num [1:514] 0.725 1.681 3.077 2.078 2.718 ...
## $ X38 : num [1:514] 100 99.2 100 100 99.8 ...
## $ X39 : num [1:514] 0 4.202 0 0 0.194 ...
## $ X40 : num [1:514] 89.1 72.3 96.5 93.8 54.2 ...
## $ X41 : num [1:514] 19.6 20.2 28.1 42.6 21.4 ...
## $ X43 : num [1:514] 0 8.403 0.769 0.26 2.524 ...
## $ X44 : num [1:514] 2.17 3.36 4.62 1.82 1.75 ...
## $ X45 : num [1:514] 6.52 15.97 13.85 3.64 7.77 ...
## $ X1_jumlah : num [1:514] 51 38 82 55 120 72 71 223 116 159 ...
## $ X1_mean : num [1:514] 0.37 0.319 0.315 0.143 0.233 ...
## $ X2_jumlah : num [1:514] 128 107 240 358 426 247 261 585 716 552 ...
## $ X2_mean : num [1:514] 0.928 0.899 0.923 0.93 0.827 ...
## $ X8_jumlah : num [1:514] 138 112 260 385 513 295 320 603 730 609 ...
## $ X8_mean : num [1:514] 1 0.941 1 1 0.996 ...
## $ X9_jumlah : num [1:514] 90 55 84 113 346 224 195 177 247 326 ...
## $ X9_mean : num [1:514] 0.652 0.462 0.323 0.294 0.672 ...
## $ X13_jumlah: num [1:514] 125 102 216 357 478 288 224 466 677 552 ...
## $ X13_mean : num [1:514] 0.906 0.857 0.831 0.927 0.928 ...
## $ X14_jumlah: num [1:514] 113 103 174 251 424 262 224 473 555 578 ...
## $ X14_mean : num [1:514] 0.819 0.866 0.669 0.652 0.823 ...
## $ X15_jumlah: num [1:514] 110 95 171 246 407 253 203 470 544 571 ...
## $ X15_mean : num [1:514] 0.797 0.798 0.658 0.639 0.79 ...
## $ X16_jumlah: num [1:514] 69 83 153 154 388 236 149 368 448 440 ...
## $ X16_mean : num [1:514] 0.5 0.697 0.588 0.4 0.753 ...
## $ X17_jumlah: num [1:514] 74 52 79 107 325 215 181 163 225 317 ...
## $ X17_mean : num [1:514] 0.536 0.437 0.304 0.278 0.631 ...
## $ X18_jumlah: num [1:514] 71 57 72 53 300 207 211 219 178 300 ...
## $ X18_mean : num [1:514] 0.514 0.479 0.277 0.138 0.583 ...
## $ X19_jumlah: num [1:514] 0 1 0 0 1 1 0 0 0 0 ...
## $ X19_mean : num [1:514] 0 0.0084 0 0 0.00194 ...
## $ X21_jumlah: num [1:514] 0 2 1 14 1 3 1 2 0 4 ...
## $ X21_mean : num [1:514] 0 0.01681 0.00385 0.03636 0.00194 ...
## $ X22_jumlah: num [1:514] 138 119 260 385 515 295 321 603 731 609 ...
## $ X22_mean : num [1:514] 1 1 1 1 1 1 1 1 1 1 ...
## $ X23_jumlah: num [1:514] 4 5 1 0 1 1 4 3 1 3 ...
## $ X23_mean : num [1:514] 0.02899 0.04202 0.00385 0 0.00194 ...
## $ X24_jumlah: num [1:514] 3 16 12 7 18 12 6 20 2 10 ...
## $ X24_mean : num [1:514] 0.0217 0.1345 0.0462 0.0182 0.035 ...
## $ X25_jumlah: num [1:514] 6 25 15 12 7 55 8 60 5 24 ...
## $ X25_mean : num [1:514] 0.0435 0.2101 0.0577 0.0312 0.0136 ...
## $ X26_jumlah: num [1:514] 0 7 4 10 2 1 10 1 0 3 ...
## $ X26_mean : num [1:514] 0 0.05882 0.01538 0.02597 0.00388 ...
## $ X32_jumlah: num [1:514] 0 0 0 0 2 1 0 0 0 0 ...
## $ X32_mean : num [1:514] 0 0 0 0 0.00388 ...
## $ X34_jumlah: num [1:514] 34 48 83 88 51 50 140 20 108 131 ...
## $ X34_mean : num [1:514] 0.246 0.403 0.319 0.229 0.099 ...
## $ X37_jumlah: num [1:514] 1 2 8 8 14 1 0 2 6 12 ...
## $ X37_mean : num [1:514] 0.00725 0.01681 0.03077 0.02078 0.02718 ...
## $ X38_jumlah: num [1:514] 138 118 260 385 514 295 321 603 731 609 ...
## $ X38_mean : num [1:514] 1 0.992 1 1 0.998 ...
## $ X39_jumlah: num [1:514] 0 5 0 0 1 1 1 0 0 1 ...
## $ X39_mean : num [1:514] 0 0.04202 0 0 0.00194 ...
## $ X40_jumlah: num [1:514] 123 86 251 361 279 271 265 594 709 543 ...
## $ X40_mean : num [1:514] 0.891 0.723 0.965 0.938 0.542 ...
## $ X41_jumlah: num [1:514] 27 24 73 164 110 116 60 162 164 101 ...
## $ X41_mean : num [1:514] 0.196 0.202 0.281 0.426 0.214 ...
## $ X43_jumlah: num [1:514] 0 10 2 1 13 3 3 29 17 16 ...
## $ X43_mean : num [1:514] 0 0.08403 0.00769 0.0026 0.02524 ...
## $ X44_jumlah: num [1:514] 3 4 12 7 9 13 10 27 19 9 ...
## $ X44_mean : num [1:514] 0.0217 0.0336 0.0462 0.0182 0.0175 ...
## $ X45_jumlah: num [1:514] 9 19 36 14 40 54 26 99 50 64 ...
## $ X45_mean : num [1:514] 0.0652 0.1597 0.1385 0.0364 0.0777 ...
## $ X6_jumlah : num [1:514] 186 294 1 0 276 232 95 81 136 182 ...
## $ X6_mean : num [1:514] 1.34783 2.47059 0.00385 0 0.53592 ...
## $ X7_jumlah : num [1:514] 0 5 0 0 0 0 0 0 0 0 ...
## $ X7_mean : num [1:514] 0 0.042 0 0 0 ...
## $ X10_jumlah: num [1:514] 13155 9614 14021 19948 21183 ...
## $ X10_mean : num [1:514] 95.3 80.8 53.9 51.8 41.1 ...
## $ X11_jumlah: num [1:514] 168 186 310 383 747 306 382 688 782 811 ...
## $ X11_mean : num [1:514] 1.217 1.563 1.192 0.995 1.45 ...
## $ X12_jumlah: num [1:514] 707 958 1348 2175 2650 ...
## $ X12_mean : num [1:514] 5.12 8.05 5.18 5.65 5.15 ...
## $ X27_jumlah: num [1:514] 130 124 242 213 333 235 186 273 338 296 ...
## $ X27_mean : num [1:514] 0.942 1.042 0.931 0.553 0.647 ...
## $ X28_jumlah: num [1:514] 58 51 86 89 114 76 72 116 110 134 ...
## $ X28_mean : num [1:514] 0.42 0.429 0.331 0.231 0.221 ...
## $ X29_jumlah: num [1:514] 39 29 59 63 68 50 42 84 69 79 ...
## $ X29_mean : num [1:514] 0.283 0.244 0.227 0.164 0.132 ...
## $ X30_jumlah: num [1:514] 1001 976 1562 1310 1999 ...
## [list output truncated]
## tibble [81 × 4] (S3: tbl_df/tbl/data.frame)
## $ Kako : num [1:81] 7101 7102 7103 7104 7105 ...
## $ Jumlah_L : num [1:81] 133940 168503 69187 50500 126394 ...
## $ Jumlah_P : num [1:81] 123438 162340 66673 47800 117905 ...
## $ Jumlah_Penduduk: num [1:81] 257378 330843 135860 98300 244299 ...
Kode kabupaten/kota (Kako) disamakan ke format 4
digit character sebelum dilakukan left_join agar
tidak ada mismatch saat penggabungan.
# ── Standarisasi format kode ──────────────────────────────
Estimasi_531 <- Estimasi_531 %>%
mutate(Kako = sprintf("%04d", Kako)) # 4 digit numeric → character
Podes_kab2024 <- Podes_kab2024 %>%
mutate(kode_kab = sprintf("%04s", kode_kab))
# ── Gabungkan Estimasi + Podes (hanya KabKota di Estimasi) ─
df_gabungan <- Estimasi_531 %>%
left_join(Podes_kab2024, by = c("Kako" = "kode_kab")) %>%
dplyr::select(
Kako,
Estimasi,
RSE,
starts_with("X") # seluruh variabel kovariat
)
# ── Tambahkan data penduduk ────────────────────────────────
df_gabungan$Kako <- as.numeric(df_gabungan$Kako)
df_final <- df_gabungan %>%
left_join(Data_Penduduk, by = "Kako")
cat("✅ Data berhasil digabungkan:", nrow(df_final), "baris |",
ncol(df_final), "kolom\n")## ✅ Data berhasil digabungkan: 81 baris | 112 kolom
Variabel yang berbasis jumlah absolut perlu dinormalisasi agar sebanding antar kabupaten yang berbeda ukuran populasinya. Normalisasi dilakukan per 1.000 penduduk atau per 1.000 perempuan tergantung konteks variabel.
df_final <- df_final %>%
mutate(
# ── Berbasis jumlah TOTAL penduduk (/1.000 penduduk) ──
X6 = X6_jumlah / Jumlah_Penduduk * 1000,
X7 = X7_jumlah / Jumlah_Penduduk * 1000,
X10 = X10_jumlah / Jumlah_Penduduk * 1000,
X33 = X33_jumlah / Jumlah_Penduduk * 1000,
# ── Berbasis jumlah PEREMPUAN (/1.000 perempuan) ──────
X11 = X11_jumlah / Jumlah_P * 1000,
X12 = X12_jumlah / Jumlah_P * 1000,
X27 = X27_jumlah / Jumlah_P * 1000,
X28 = X28_jumlah / Jumlah_P * 1000,
X29 = X29_jumlah / Jumlah_P * 1000,
X30 = X30_jumlah / Jumlah_P * 1000,
X31 = X31_jumlah / Jumlah_P * 1000,
X35 = X35_jumlah / Jumlah_P * 1000,
X36 = X36_jumlah / Jumlah_P * 1000
)Setiap kelompok variabel (X1, X2, …) memiliki dua varian:
_jumlah dan _mean. Varian yang dipilih adalah
yang memiliki korelasi absolut tertinggi terhadap
estimasi direct (Estimasi).
df <- df_final
# Variabel yang dikecualikan dari prediktor
exclude_vars <- c("Kako", "Estimasi", "RSE",
"y_eblup", "mse", "RMSE_direct", "RMSE_eblup",
"RSE_direct", "RSE_eblup",
"y_hb", "sd_hb", "RSE_hb", "y_hb_percent", "vardir",
"Jumlah_L", "Jumlah_P", "Jumlah_Penduduk")
all_vars <- setdiff(names(df), exclude_vars)
# Fungsi ambil nama dasar variabel (X1, X2, dst.)
get_base <- function(x) sub("(_jumlah|_mean)$", "", x)
base_vars <- unique(sapply(all_vars, get_base))
hasil_vars <- c()
for (b in base_vars) {
group_vars <- all_vars[get_base(all_vars) == b]
cor_vals <- sapply(group_vars, function(v) {
cor(df[[v]], df$Estimasi, use = "complete.obs")
})
cor_vals <- sort(cor_vals, decreasing = TRUE)
cat("\n── Kelompok:", b, "──\n")
print(round(cor_vals, 3))
best_var <- names(which.max(abs(cor_vals)))
cat(" → Dipilih:", best_var, "\n")
hasil_vars <- c(hasil_vars, best_var)
}##
## ── Kelompok: X1 ──
## X1_jumlah X1 X1_mean
## -0.084 -0.374 -0.374
## → Dipilih: X1_mean
##
## ── Kelompok: X2 ──
## X2_jumlah X2 X2_mean
## 0.096 -0.150 -0.150
## → Dipilih: X2
##
## ── Kelompok: X8 ──
## X8 X8_mean X8_jumlah
## 0.475 0.475 0.218
## → Dipilih: X8
##
## ── Kelompok: X9 ──
## X9_jumlah X9_mean X9
## 0.054 -0.165 -0.165
## → Dipilih: X9
##
## ── Kelompok: X13 ──
## X13_jumlah X13_mean X13
## 0.111 0.053 0.053
## → Dipilih: X13_jumlah
##
## ── Kelompok: X14 ──
## X14_jumlah X14 X14_mean
## 0.089 -0.063 -0.063
## → Dipilih: X14_jumlah
##
## ── Kelompok: X15 ──
## X15_jumlah X15 X15_mean
## 0.089 -0.092 -0.092
## → Dipilih: X15_mean
##
## ── Kelompok: X16 ──
## X16_jumlah X16_mean X16
## 0.053 -0.101 -0.101
## → Dipilih: X16
##
## ── Kelompok: X17 ──
## X17_jumlah X17 X17_mean
## 0.057 -0.153 -0.153
## → Dipilih: X17
##
## ── Kelompok: X18 ──
## X18_jumlah X18 X18_mean
## 0.062 -0.107 -0.107
## → Dipilih: X18_mean
##
## ── Kelompok: X19 ──
## X19_jumlah X19 X19_mean
## -0.195 -0.241 -0.241
## → Dipilih: X19
##
## ── Kelompok: X21 ──
## X21_jumlah X21 X21_mean
## -0.147 -0.225 -0.225
## → Dipilih: X21_mean
##
## ── Kelompok: X22 ──
## X22_jumlah
## 0.105
## → Dipilih: X22_jumlah
##
## ── Kelompok: X23 ──
## X23_jumlah X23 X23_mean
## -0.130 -0.181 -0.181
## → Dipilih: X23_mean
##
## ── Kelompok: X24 ──
## X24_jumlah X24 X24_mean
## -0.004 -0.112 -0.112
## → Dipilih: X24_mean
##
## ── Kelompok: X25 ──
## X25_jumlah X25 X25_mean
## -0.284 -0.309 -0.309
## → Dipilih: X25_mean
##
## ── Kelompok: X26 ──
## X26_jumlah X26 X26_mean
## -0.229 -0.334 -0.334
## → Dipilih: X26
##
## ── Kelompok: X32 ──
## X32_jumlah X32 X32_mean
## -0.062 -0.160 -0.160
## → Dipilih: X32
##
## ── Kelompok: X37 ──
## X37_jumlah X37 X37_mean
## -0.031 -0.053 -0.053
## → Dipilih: X37_mean
##
## ── Kelompok: X38 ──
## X38_jumlah X38 X38_mean
## 0.105 -0.154 -0.154
## → Dipilih: X38_mean
##
## ── Kelompok: X39 ──
## X39_jumlah X39 X39_mean
## -0.219 -0.364 -0.364
## → Dipilih: X39_mean
##
## ── Kelompok: X40 ──
## X40_jumlah X40 X40_mean
## 0.087 -0.145 -0.145
## → Dipilih: X40_mean
##
## ── Kelompok: X41 ──
## X41_jumlah X41 X41_mean
## 0.033 -0.114 -0.114
## → Dipilih: X41_mean
##
## ── Kelompok: X43 ──
## X43_jumlah X43 X43_mean
## -0.108 -0.200 -0.200
## → Dipilih: X43
##
## ── Kelompok: X44 ──
## X44_jumlah X44 X44_mean
## -0.265 -0.411 -0.411
## → Dipilih: X44
##
## ── Kelompok: X45 ──
## X45_jumlah X45_mean X45
## -0.276 -0.478 -0.478
## → Dipilih: X45
##
## ── Kelompok: X34 ──
## X34_jumlah X34_mean
## 0.103 0.071
## → Dipilih: X34_jumlah
##
## ── Kelompok: X6 ──
## X6 X6_jumlah X6_mean
## 0.019 -0.137 -0.152
## → Dipilih: X6_mean
##
## ── Kelompok: X7 ──
## X7_jumlah X7_mean X7
## 0.118 0.118 0.118
## → Dipilih: X7_jumlah
##
## ── Kelompok: X10 ──
## X10 X10_jumlah X10_mean
## 0.088 -0.215 -0.314
## → Dipilih: X10_mean
##
## ── Kelompok: X11 ──
## X11 X11_jumlah X11_mean
## 0.199 -0.177 -0.368
## → Dipilih: X11_mean
##
## ── Kelompok: X12 ──
## X12 X12_jumlah X12_mean
## 0.359 -0.048 -0.247
## → Dipilih: X12
##
## ── Kelompok: X27 ──
## X27 X27_jumlah X27_mean
## 0.324 -0.163 -0.377
## → Dipilih: X27_mean
##
## ── Kelompok: X28 ──
## X28 X28_jumlah X28_mean
## 0.419 -0.149 -0.336
## → Dipilih: X28
##
## ── Kelompok: X29 ──
## X29 X29_jumlah X29_mean
## 0.156 -0.228 -0.383
## → Dipilih: X29_mean
##
## ── Kelompok: X30 ──
## X30 X30_jumlah X30_mean
## 0.295 -0.172 -0.383
## → Dipilih: X30_mean
##
## ── Kelompok: X31 ──
## X31 X31_jumlah X31_mean
## 0.026 -0.274 -0.382
## → Dipilih: X31_mean
##
## ── Kelompok: X33 ──
## X33 X33_jumlah X33_mean
## -0.001 -0.154 -0.247
## → Dipilih: X33_mean
##
## ── Kelompok: X35 ──
## X35 X35_jumlah X35_mean
## 0.006 -0.106 -0.251
## → Dipilih: X35_mean
##
## ── Kelompok: X36 ──
## X36 X36_jumlah X36_mean
## 0.047 -0.282 -0.439
## → Dipilih: X36_mean
# Data dengan variabel terpilih
df_gabungan <- df %>%
dplyr::select(Kako, Estimasi, RSE, all_of(hasil_vars))
cat("\n✅ Total variabel terpilih:", length(hasil_vars), "\n")##
## ✅ Total variabel terpilih: 40
## X1_mean, X2, X8, X9, X13_jumlah, X14_jumlah, X15_mean, X16, X17, X18_mean, X19, X21_mean, X22_jumlah, X23_mean, X24_mean, X25_mean, X26, X32, X37_mean, X38_mean, X39_mean, X40_mean, X41_mean, X43, X44, X45, X34_jumlah, X6_mean, X7_jumlah, X10_mean, X11_mean, X12, X27_mean, X28, X29_mean, X30_mean, X31_mean, X33_mean, X35_mean, X36_mean
Dua tahap pemilihan model:
library(MASS)
library(car)
library(corrplot)
df_model <- df_gabungan %>%
dplyr::select(-Kako, -RSE) %>%
as.data.frame()
# ── Langkah 1: Full model ─────────────────────────────────
full_model <- lm(Estimasi ~ ., data = df_model)
# ── Langkah 2: Stepwise backward ─────────────────────────
step_model <- stepAIC(full_model, direction = "backward", trace = FALSE)
cat("=== Model Parsimonious (Stepwise AIC) ===\n")## === Model Parsimonious (Stepwise AIC) ===
##
## Call:
## lm(formula = Estimasi ~ X8 + X9 + X13_jumlah + X15_mean + X17 +
## X19 + X21_mean + X22_jumlah + X40_mean + X43 + X44 + X45 +
## X33_mean, data = df_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.2084 -2.8638 0.4369 2.7798 11.2816
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.15540 6.48236 2.492 0.01518 *
## X8 0.08189 0.03941 2.078 0.04158 *
## X9 -0.31345 0.15145 -2.070 0.04235 *
## X13_jumlah 0.04572 0.02266 2.018 0.04762 *
## X15_mean -10.75526 6.18582 -1.739 0.08668 .
## X17 0.27877 0.14955 1.864 0.06670 .
## X19 -1.88648 0.72369 -2.607 0.01126 *
## X21_mean 31.20487 22.45506 1.390 0.16923
## X22_jumlah -0.04641 0.01973 -2.352 0.02160 *
## X40_mean 7.14191 4.35970 1.638 0.10608
## X43 -0.34448 0.17031 -2.023 0.04710 *
## X44 0.21503 0.12348 1.741 0.08620 .
## X45 -0.25014 0.08440 -2.964 0.00421 **
## X33_mean -0.20716 0.08436 -2.456 0.01667 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.57 on 67 degrees of freedom
## Multiple R-squared: 0.4425, Adjusted R-squared: 0.3344
## F-statistic: 4.091 on 13 and 67 DF, p-value: 6.184e-05
# ── Langkah 3: Ambil variabel hasil stepwise ───────────────
vars <- names(coef(step_model))[-1]
# ── Langkah 4: Iterasi VIF (hapus variabel VIF > 10) ──────
cat("\n--- Iterasi VIF ---\n")##
## --- Iterasi VIF ---
repeat {
form_tmp <- as.formula(paste("Estimasi ~", paste(vars, collapse = " + ")))
model_tmp <- lm(form_tmp, data = df_model)
vif_values <- vif(model_tmp)
cat("\nVIF saat ini:\n")
print(round(vif_values, 3))
if (all(vif_values <= 10)) {
cat("✅ Semua VIF ≤ 10, iterasi selesai.\n")
break
}
var_remove <- names(which.max(vif_values))
cat("🗑️ Hapus:", var_remove, "| VIF =", round(max(vif_values), 3), "\n")
vars <- vars[vars != var_remove]
}##
## VIF saat ini:
## X8 X9 X13_jumlah X15_mean X17 X19 X21_mean
## 5.541 19.711 9.286 2.143 21.137 1.947 2.128
## X22_jumlah X40_mean X43 X44 X45 X33_mean
## 7.769 1.554 1.194 6.318 3.510 1.278
## 🗑️ Hapus: X17 | VIF = 21.137
##
## VIF saat ini:
## X8 X9 X13_jumlah X15_mean X19 X21_mean X22_jumlah
## 5.447 1.518 9.083 2.071 1.947 2.121 7.760
## X40_mean X43 X44 X45 X33_mean
## 1.552 1.185 6.250 3.485 1.265
## ✅ Semua VIF ≤ 10, iterasi selesai.
# ── Model akhir ────────────────────────────────────────────
form_final <- as.formula(paste("Estimasi ~", paste(vars, collapse = " + ")))
model_final <- lm(form_final, data = df_model)
cat("\n=== Model Final ===\n")##
## === Model Final ===
##
## Call:
## lm(formula = form_final, data = df_model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.6712 -3.0341 -0.0208 2.8043 11.9060
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.29842 6.25403 1.966 0.05333 .
## X8 0.09145 0.03978 2.299 0.02461 *
## X9 -0.04223 0.04279 -0.987 0.32717
## X13_jumlah 0.03947 0.02281 1.730 0.08817 .
## X15_mean -8.63805 6.19030 -1.395 0.16743
## X19 -1.90208 0.73669 -2.582 0.01198 *
## X21_mean 33.72414 22.81854 1.478 0.14404
## X22_jumlah -0.04518 0.02007 -2.251 0.02765 *
## X40_mean 7.41696 4.43577 1.672 0.09910 .
## X43 -0.31713 0.17273 -1.836 0.07074 .
## X44 0.23890 0.12503 1.911 0.06026 .
## X45 -0.23676 0.08561 -2.765 0.00731 **
## X33_mean -0.19169 0.08547 -2.243 0.02817 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.653 on 68 degrees of freedom
## Multiple R-squared: 0.4136, Adjusted R-squared: 0.3101
## F-statistic: 3.997 on 12 and 68 DF, p-value: 0.0001133
# ── Matriks korelasi ───────────────────────────────────────
df_selected <- df_model %>% dplyr::select(all_of(vars))
cor_matrix <- cor(df_selected, use = "complete.obs")
cat("\n=== Matriks Korelasi Variabel Terpilih ===\n")##
## === Matriks Korelasi Variabel Terpilih ===
## X8 X9 X13_jumlah X15_mean X19 X21_mean X22_jumlah
## X8 1.000 -0.186 0.438 0.144 -0.281 -0.457 0.398
## X9 -0.186 1.000 -0.008 0.414 -0.124 0.087 -0.123
## X13_jumlah 0.438 -0.008 1.000 0.306 -0.193 -0.270 0.874
## X15_mean 0.144 0.414 0.306 1.000 -0.119 -0.104 0.007
## X19 -0.281 -0.124 -0.193 -0.119 1.000 0.628 -0.159
## X21_mean -0.457 0.087 -0.270 -0.104 0.628 1.000 -0.189
## X22_jumlah 0.398 -0.123 0.874 0.007 -0.159 -0.189 1.000
## X40_mean -0.345 0.223 -0.352 0.015 0.205 0.288 -0.244
## X43 -0.145 0.095 -0.180 0.143 -0.003 0.116 -0.203
## X44 -0.881 0.255 -0.460 -0.098 0.293 0.400 -0.408
## X45 -0.763 0.294 -0.419 0.080 0.218 0.373 -0.485
## X33_mean -0.275 0.028 -0.001 -0.048 -0.029 0.037 0.015
## X40_mean X43 X44 X45 X33_mean
## X8 -0.345 -0.145 -0.881 -0.763 -0.275
## X9 0.223 0.095 0.255 0.294 0.028
## X13_jumlah -0.352 -0.180 -0.460 -0.419 -0.001
## X15_mean 0.015 0.143 -0.098 0.080 -0.048
## X19 0.205 -0.003 0.293 0.218 -0.029
## X21_mean 0.288 0.116 0.400 0.373 0.037
## X22_jumlah -0.244 -0.203 -0.408 -0.485 0.015
## X40_mean 1.000 0.310 0.357 0.368 0.103
## X43 0.310 1.000 0.127 0.180 -0.042
## X44 0.357 0.127 1.000 0.771 0.333
## X45 0.368 0.180 0.771 1.000 0.134
## X33_mean 0.103 -0.042 0.333 0.134 1.000
corrplot(cor_matrix,
method = "color",
type = "upper",
tl.cex = 0.85,
addCoef.col = "black",
number.cex = 0.7,
col = colorRampPalette(c("#d32f2f", "white", "#1565c0"))(200),
title = "Matriks Korelasi Prediktor Terpilih",
mar = c(0, 0, 2, 0))Matriks Korelasi Antar Variabel Prediktor Terpilih
Hasil: Variabel final setelah stepwise dan VIF cleaning → X8, X9, X13_jumlah, X15_mean, X19, X21_mean, X22_jumlah, X40_mean, X43, X44, X45, X33_mean
Model Fay-Herriot adalah model area level yang menggabungkan estimasi direct dengan model regresi menggunakan kovariat. Estimasi yang dihasilkan disebut EBLUP (Empirical Best Linear Unbiased Prediction).
\[\hat{\theta}_i^{EBLUP} = \hat{B}_i \, \hat{\theta}_i^{direct} + (1 - \hat{B}_i) \, \mathbf{x}_i^T \hat{\boldsymbol{\beta}}\]
di mana \(\hat{B}_i = \sigma_e^2 / (\sigma_e^2 + \sigma_v^2)\) adalah shrinkage factor.
library(sae)
# ── Siapkan data ──────────────────────────────────────────
df_fh <- df_gabungan %>%
mutate(vardir = (RSE / 100 * Estimasi)^2) %>%
as.data.frame()
# ── Seleksi ulang menggunakan data fh ─────────────────────
full_model_fh <- lm(Estimasi ~ . - Kako - RSE - vardir, data = df_fh)
step_model_fh <- stepAIC(full_model_fh, direction = "backward", trace = FALSE)
vars_fh <- names(coef(step_model_fh))[-1]
# Iterasi VIF
repeat {
form_tmp <- as.formula(paste("Estimasi ~", paste(vars_fh, collapse = " + ")))
model_tmp <- lm(form_tmp, data = df_fh)
vif_values <- vif(model_tmp)
if (all(vif_values <= 10)) break
var_remove <- names(which.max(vif_values))
cat("Hapus VIF:", var_remove, "=", round(max(vif_values), 3), "\n")
vars_fh <- vars_fh[vars_fh != var_remove]
}## Hapus VIF: X17 = 21.137
form_sae <- as.formula(paste("Estimasi ~", paste(vars_fh, collapse = " + ")))
cat("Formula SAE Global:", deparse(form_sae), "\n")## Formula SAE Global: Estimasi ~ X8 + X9 + X13_jumlah + X15_mean + X19 + X21_mean + X22_jumlah + X40_mean + X43 + X44 + X45 + X33_mean
# ── Model Fay-Herriot ─────────────────────────────────────
model_sae <- mseFH(formula = form_sae,
vardir = vardir,
data = df_fh)
# ── Simpan hasil ──────────────────────────────────────────
df_fh$y_eblup <- model_sae$est$eblup
df_fh$mse <- model_sae$mse
df_fh$RMSE_direct <- sqrt(df_fh$vardir)
df_fh$RMSE_eblup <- sqrt(df_fh$mse)
df_fh$RSE_direct <- df_fh$RMSE_direct / df_fh$Estimasi * 100
df_fh$RSE_eblup <- df_fh$RMSE_eblup / df_fh$y_eblup * 100
# ── Ringkasan ─────────────────────────────────────────────
cat("\n=== Ringkasan Hasil SAE Global ===\n")##
## === Ringkasan Hasil SAE Global ===
summary_df <- data.frame(
Direct = df_fh$Estimasi,
EBLUP = df_fh$y_eblup,
RSE_Direct = df_fh$RSE_direct,
RSE_EBLUP = df_fh$RSE_eblup
)
print(summary(summary_df))## Direct EBLUP RSE_Direct RSE_EBLUP
## Min. : 0.08 Min. : 0.08192 Min. : 22.94 Min. :19.02
## 1st Qu.: 6.18 1st Qu.: 6.43940 1st Qu.: 38.29 1st Qu.:24.69
## Median : 9.32 Median : 9.04868 Median : 44.36 Median :28.33
## Mean :10.08 Mean : 8.57958 Mean : 47.56 Mean :33.15
## 3rd Qu.:13.93 3rd Qu.:10.97432 3rd Qu.: 53.52 3rd Qu.:35.80
## Max. :24.05 Max. :14.58553 Max. :101.66 Max. :99.27
df_plot_fh_global <- df_eblub_global %>%
dplyr::select(Kako, RSE_direct, RSE_eblup) %>%
mutate(Kako = as.character(Kako)) %>%
pivot_longer(cols = c(RSE_direct, RSE_eblup),
names_to = "Tipe",
values_to = "RSE") %>%
mutate(Tipe = case_match(Tipe,
"RSE_direct" ~ "Direct",
"RSE_eblup" ~ "EBLUP Global"))
ggplot(df_plot_fh_global,
aes(x = Kako, y = RSE, group = Tipe, color = Tipe)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
geom_hline(yintercept = 25,
linetype = "dashed", color = "red", linewidth = 1) +
scale_color_manual(values = c("Direct" = "#546e7a",
"EBLUP Global" = "#1565c0")) +
annotate("text", x = 3, y = 27, label = "Batas RSE 25%",
color = "red", size = 3.5) +
labs(title = "Perbandingan RSE: Direct vs EBLUP Fay-Herriot Global",
subtitle = "Seluruh Kabupaten/Kota di Pulau Sulawesi",
x = "Kode Kabupaten/Kota",
y = "RSE (%)",
color = "Metode") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")Perbandingan RSE: Direct vs EBLUP Global
Pemodelan per provinsi memungkinkan kovariat bervariasi antar provinsi, karena kondisi ekonomi dan sosial tiap provinsi berbeda. Algoritma robust diterapkan untuk menghindari kegagalan konvergensi.
# ── Setup ─────────────────────────────────────────────────
df_all <- df_gabungan %>%
mutate(
kdprov = substr(sprintf("%04d", as.numeric(Kako)), 1, 2),
vardir = (RSE / 100 * Estimasi)^2
) %>%
as.data.frame()
list_prov <- unique(df_all$kdprov)
hasil_list <- list()
for (p in list_prov) {
cat("\n===== PROVINSI", p, "=====\n")
df <- df_all %>% filter(kdprov == p) %>% as.data.frame()
if (nrow(df) < 5) { cat("Skip: data terlalu kecil\n"); next }
# Buang variabel konstan
keep_cols <- sapply(df, function(x) length(unique(x)) > 1)
df <- df[, keep_cols]
kandidat <- setdiff(names(df), c("Estimasi","Kako","RSE","vardir","kdprov"))
if (length(kandidat) == 0) next
# Korelasi → top 3
cor_vals <- sapply(df[kandidat], function(x)
abs(cor(x, df$Estimasi, use = "complete.obs")))
cor_vals <- sort(cor_vals, decreasing = TRUE)
vars_awal <- names(cor_vals)[1:min(3, length(cor_vals))]
cat("Top korelasi:\n"); print(round(cor_vals[vars_awal], 3))
# Stepwise
vars <- vars_awal
step_model <- try(
stepAIC(lm(as.formula(paste("Estimasi ~",
paste(vars_awal, collapse = " + "))), data = df),
direction = "backward", trace = FALSE),
silent = TRUE)
if (!inherits(step_model, "try-error")) {
vars <- names(coef(step_model))[-1]
cat("Setelah stepwise:", vars, "\n")
}
# Seleksi signifikansi p < 0.1
if (length(vars) > 1) {
model_tmp <- lm(as.formula(paste("Estimasi ~", paste(vars, collapse=" + "))), data=df)
pvals <- summary(model_tmp)$coefficients[-1, 4]
vars_sig <- names(pvals[pvals < 0.1])
if (length(vars_sig) >= 1) { vars <- vars_sig; cat("Variabel signifikan:", vars, "\n") }
}
# VIF cleaning
repeat {
if (length(vars) <= 1) break
model_tmp <- lm(as.formula(paste("Estimasi ~", paste(vars, collapse=" + "))), data=df)
vif_values <- try(vif(model_tmp), silent = TRUE)
if (inherits(vif_values, "try-error")) break
if (all(vif_values <= 10)) break
var_remove <- names(which.max(vif_values))
vars <- vars[vars != var_remove]
cat("Hapus VIF:", var_remove, "\n")
}
# Anti-overfit: jika R² > 0.65 → pakai 1 variabel
if (length(vars) > 1) {
model_tmp <- lm(as.formula(paste("Estimasi ~", paste(vars, collapse=" + "))), data=df)
if (summary(model_tmp)$r.squared > 0.65) {
cat("R² tinggi → gunakan 1 variabel terbaik\n")
vars <- vars[1]
}
}
# Scaling + stabilisasi vardir
df[vars] <- scale(df[vars])
df$vardir <- df$vardir + 1e-6
cat("✅ Final vars:", vars, "\n")
form_sae <- as.formula(paste("Estimasi ~", paste(vars, collapse = " + ")))
model_sae <- try(mseFH(form_sae, vardir = vardir, data = df), silent = TRUE)
if (inherits(model_sae, "try-error") || is.null(model_sae$mse)) {
cat("⚠️ SAE gagal → fallback ke Direct\n")
df$y_eblup <- df$Estimasi
df$mse <- df$vardir
} else {
df$y_eblup <- model_sae$est$eblup
df$mse <- model_sae$mse
}
df$RMSE_direct <- sqrt(df$vardir)
df$RMSE_eblup <- sqrt(df$mse)
df$RSE_direct <- df$RMSE_direct / df$Estimasi * 100
df$RSE_eblup <- df$RMSE_eblup / df$y_eblup * 100
hasil_list[[p]] <- df
}##
## ===== PROVINSI 71 =====
## Top korelasi:
## X30_mean X1_mean X8
## 0.711 0.648 0.645
## Setelah stepwise: X30_mean
## ✅ Final vars: X30_mean
##
## ===== PROVINSI 72 =====
## Top korelasi:
## X28 X33_mean X35_mean
## 0.791 0.595 0.530
## Setelah stepwise: X28 X35_mean
## Variabel signifikan: X28
## ✅ Final vars: X28
##
## ===== PROVINSI 73 =====
## Top korelasi:
## X26 X38_mean X36_mean
## 0.392 0.381 0.302
## Setelah stepwise: X26 X38_mean
## Variabel signifikan: X26
## ✅ Final vars: X26
##
## ===== PROVINSI 74 =====
## Top korelasi:
## X29_mean X27_mean X11_mean
## 0.738 0.736 0.685
## Setelah stepwise: X29_mean
## ✅ Final vars: X29_mean
##
## ===== PROVINSI 75 =====
## Top korelasi:
## X23_mean X24_mean X28
## 0.869 0.839 0.803
## Setelah stepwise: X23_mean
## ✅ Final vars: X23_mean
##
## ===== PROVINSI 76 =====
## Top korelasi:
## X24_mean X17 X9
## 0.944 0.856 0.855
## Setelah stepwise: X24_mean
## ✅ Final vars: X24_mean
##
## === Ringkasan EBLUP Per Provinsi ===
summary_df <- data.frame(
Direct = df_eblup_prov$Estimasi,
EBLUP = df_eblup_prov$y_eblup,
RSE_Direct = df_eblup_prov$RSE_direct,
RSE_EBLUP = df_eblup_prov$RSE_eblup
)
print(summary(summary_df))## Direct EBLUP RSE_Direct RSE_EBLUP
## Min. : 0.08 Min. : 0.08172 Min. : 22.94 Min. : 18.61
## 1st Qu.: 6.18 1st Qu.: 6.91838 1st Qu.: 38.29 1st Qu.: 20.95
## Median : 9.32 Median : 8.77315 Median : 44.36 Median : 25.90
## Mean :10.08 Mean : 8.67423 Mean : 47.56 Mean : 35.16
## 3rd Qu.:13.93 3rd Qu.:11.36307 3rd Qu.: 53.52 3rd Qu.: 35.05
## Max. :24.05 Max. :17.82463 Max. :101.67 Max. :231.13
df_plot_fh_prov <- df_eblup_prov %>%
dplyr::select(Kako, RSE_direct, RSE_eblup) %>%
mutate(Kako = as.character(Kako)) %>%
pivot_longer(cols = c(RSE_direct, RSE_eblup),
names_to = "Tipe",
values_to = "RSE") %>%
mutate(Tipe = case_match(Tipe,
"RSE_direct" ~ "Direct",
"RSE_eblup" ~ "EBLUP Per Provinsi"))
ggplot(df_plot_fh_prov,
aes(x = Kako, y = RSE, group = Tipe, color = Tipe)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
geom_hline(yintercept = 25,
linetype = "dashed", color = "red", linewidth = 1) +
scale_color_manual(values = c("Direct" = "#546e7a",
"EBLUP Per Provinsi" = "#6a1b9a")) +
annotate("text", x = 3, y = 27, label = "Batas RSE 25%",
color = "red", size = 3.5) +
labs(title = "Perbandingan RSE: Direct vs EBLUP Fay-Herriot Per Provinsi",
subtitle = "Model SAE difit terpisah untuk tiap provinsi di Sulawesi",
x = "Kode Kabupaten/Kota",
y = "RSE (%)",
color = "Metode") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")Perbandingan RSE: Direct vs EBLUP Per Provinsi
Model HB Beta cocok digunakan ketika variabel respons berupa proporsi (berada di antara 0 dan 1). Model ini menggunakan distribusi Beta sebagai likelihood dan menginkorporasikan ketidakpastian dalam parameter model secara penuh melalui pendekatan Bayesian.
Langkah transformasi: 1. Estimasi (%) → proporsi:
\(y_i = \text{Estimasi}_i / 100\) 2.
Batas: \(\varepsilon = 10^{-4}\) agar
\(y_i \in (0, 1)\)
3. Transformasi logit untuk seleksi variabel: \(\text{logit}(y_i) =
\log\left(\frac{y_i}{1-y_i}\right)\)
library(saeHB)
# ── Filter provinsi Sulawesi ──────────────────────────────
df_hb <- df_gabungan %>%
filter(substr(sprintf("%04d", as.numeric(Kako)), 1, 2) %in%
c("71","72","73","74","75","76")) %>%
as.data.frame()
# ── Transformasi ke proporsi ──────────────────────────────
eps <- 1e-4
df_hb$y_prop <- df_hb$Estimasi / 100
df_hb$y_prop <- pmax(pmin(df_hb$y_prop, 1 - eps), eps)
df_hb$y_logit <- log(df_hb$y_prop / (1 - df_hb$y_prop))
# ── Korelasi awal ─────────────────────────────────────────
kandidat_hb <- setdiff(names(df_hb), c("Kako","Estimasi","RSE","y_prop","y_logit"))
cor_vals_hb <- sapply(df_hb[kandidat_hb], function(x)
cor(x, df_hb$y_logit, use = "complete.obs"))
cat("=== Korelasi dengan y_logit ===\n")## === Korelasi dengan y_logit ===
## X8 X28 X12 X13_jumlah X22_jumlah X14_jumlah
## 0.55631485 0.40834467 0.29171249 0.23392134 0.23296287 0.21385676
## X34_jumlah X7_jumlah X37_mean X16 X6_mean X38_mean
## 0.20071963 0.09167068 0.02706726 -0.05033586 -0.06867471 -0.11109459
## X19 X15_mean X23_mean X32 X21_mean X41_mean
## -0.11889344 -0.12552748 -0.12733602 -0.14661001 -0.15495958 -0.16693474
## X33_mean X18_mean X9 X40_mean X17 X2
## -0.16994882 -0.17910787 -0.19302088 -0.19606421 -0.19663321 -0.20236735
## X43 X35_mean X24_mean X11_mean X10_mean X26
## -0.20617901 -0.21906019 -0.23230134 -0.27253914 -0.29225113 -0.30051845
## X27_mean X29_mean X25_mean X31_mean X1_mean X30_mean
## -0.30968188 -0.31068057 -0.32972741 -0.35289180 -0.39697723 -0.41149203
## X36_mean X39_mean X44 X45
## -0.42534591 -0.42807355 -0.45851667 -0.53596093
vars_awal_hb <- names(sort(abs(cor_vals_hb), decreasing = TRUE))[
1:min(8, length(cor_vals_hb))]
# ── Stepwise ─────────────────────────────────────────────
form_full_hb <- as.formula(paste("y_logit ~", paste(vars_awal_hb, collapse = " + ")))
full_model_hb <- lm(form_full_hb, data = df_hb)
step_model_hb <- stepAIC(full_model_hb, direction = "backward", trace = FALSE)
cat("\n=== Stepwise HB Global ===\n")##
## === Stepwise HB Global ===
##
## Call:
## lm(formula = y_logit ~ X8 + X45 + X44, data = df_hb)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4112 -0.3371 0.0671 0.5005 1.5674
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.486019 0.695277 -5.014 3.33e-06 ***
## X8 0.017137 0.006201 2.764 0.00715 **
## X45 -0.027838 0.012395 -2.246 0.02758 *
## X44 0.026258 0.018482 1.421 0.15943
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7692 on 77 degrees of freedom
## Multiple R-squared: 0.3561, Adjusted R-squared: 0.3311
## F-statistic: 14.2 on 3 and 77 DF, p-value: 1.875e-07
vars_hb <- names(coef(step_model_hb))[-1]
# ── VIF cleaning ──────────────────────────────────────────
repeat {
if (length(vars_hb) == 0) break
form_tmp <- as.formula(paste("y_logit ~", paste(vars_hb, collapse = " + ")))
model_tmp <- lm(form_tmp, data = df_hb)
vif_values <- try(vif(model_tmp), silent = TRUE)
if (inherits(vif_values, "try-error")) break
cat("\nVIF:\n"); print(round(vif_values, 3))
if (all(vif_values <= 10)) { cat("✅ VIF aman\n"); break }
var_remove <- names(which.max(vif_values))
cat("Hapus:", var_remove, "\n")
vars_hb <- vars_hb[vars_hb != var_remove]
}##
## VIF:
## X8 X45 X44
## 4.842 2.673 4.997
## ✅ VIF aman
form_hb_final <- as.formula(paste("y_prop ~", paste(vars_hb, collapse = " + ")))
cat("\n✅ Formula HB Beta Global:", deparse(form_hb_final), "\n")##
## ✅ Formula HB Beta Global: y_prop ~ X8 + X45 + X44
# ── Load hasil HB Beta Global (dari file tersimpan) ───────
df_HBbeta_global <- read_excel("C:/Users/User/Downloads/df_HBbeta_global.xlsx")
cat("=== Hasil Akhir HB Beta Global ===\n")## === Hasil Akhir HB Beta Global ===
## # A tibble: 81 × 5
## Kako Estimasi y_hb_percent RSE RSE_hb
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 7101 12.1 12.4 62.0 20.2
## 2 7102 5.42 8.77 43.7 27.8
## 3 7103 11.4 12.0 48.7 20.5
## 4 7104 6.18 9.05 63.4 23.1
## 5 7105 12.6 12.5 41.6 21.8
## 6 7106 10.5 11.0 48.5 25.0
## 7 7107 20.9 17.5 35.3 19.9
## 8 7108 12.6 12.9 43.6 21.3
## 9 7109 18.4 15.6 39.3 22.0
## 10 7110 16.6 14.1 47.9 21.7
## # ℹ 71 more rows
df_plot_hb_global <- df_HBbeta_global %>%
dplyr::select(Kako, RSE, RSE_hb) %>%
mutate(Kako = as.character(Kako)) %>%
pivot_longer(cols = c(RSE, RSE_hb),
names_to = "Metode",
values_to = "RSE_val") %>%
mutate(Metode = case_match(Metode,
"RSE" ~ "Direct",
"RSE_hb" ~ "HB Beta Global"))
ggplot(df_plot_hb_global,
aes(x = Kako, y = RSE_val, group = Metode, color = Metode)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
geom_hline(yintercept = 25,
linetype = "dashed", color = "red", linewidth = 1) +
scale_color_manual(values = c("Direct" = "#546e7a",
"HB Beta Global" = "#e65100")) +
annotate("text", x = 3, y = 27, label = "Batas RSE 25%",
color = "red", size = 3.5) +
labs(title = "Perbandingan RSE: Direct vs HB Beta Global",
subtitle = "Model HB Beta untuk seluruh Sulawesi",
x = "Kode Kabupaten/Kota",
y = "RSE (%)",
color = "Metode") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")Perbandingan RSE: Direct vs HB Beta Global
df_all_hb <- df_gabungan %>%
mutate(kdprov = substr(sprintf("%04d", as.numeric(Kako)), 1, 2)) %>%
as.data.frame()
list_prov_hb <- unique(df_all_hb$kdprov)
hasil_list_hb <- list()
for (p in list_prov_hb) {
cat("\n===== PROVINSI", p, "=====\n")
df <- df_all_hb %>% filter(kdprov == p) %>% as.data.frame()
if (nrow(df) < 5) { cat("Skip\n"); next }
# Proporsi + boundary
eps <- 1e-4
df$y_prop <- pmax(pmin(df$Estimasi / 100, 1 - eps), eps)
df$y_logit <- log(df$y_prop / (1 - df$y_prop))
# Buang variabel konstan
keep_cols <- sapply(df, function(x) length(unique(x)) > 1)
df <- df[, keep_cols]
kandidat <- setdiff(names(df),
c("Estimasi","Kako","RSE","kdprov","y_prop","y_logit"))
if (length(kandidat) == 0) next
# Korelasi → top 5
cor_vals <- sapply(df[kandidat], function(x)
abs(cor(x, df$y_logit, use = "complete.obs")))
cor_vals <- sort(cor_vals, decreasing = TRUE)
vars_awal <- names(cor_vals)[1:min(5, length(cor_vals))]
cat("Top korelasi:\n"); print(round(cor_vals[vars_awal], 3))
# Stepwise
vars <- vars_awal
step_model <- try(
stepAIC(lm(as.formula(paste("y_logit ~",
paste(vars_awal, collapse = " + "))), data = df),
direction = "backward", trace = FALSE), silent = TRUE)
if (!inherits(step_model, "try-error")) {
vars <- names(coef(step_model))[-1]
cat("Setelah stepwise:", vars, "\n")
} else {
vars <- vars_awal[1]
}
vars <- vars[!is.na(vars)]
if (length(vars) == 0) vars <- vars_awal[1]
# VIF cleaning
repeat {
if (length(vars) <= 1) break
model_tmp <- try(lm(as.formula(paste("y_logit ~",
paste(vars, collapse=" + "))), data=df), silent=TRUE)
if (inherits(model_tmp, "try-error")) { vars <- vars[1]; break }
vif_values <- try(vif(model_tmp), silent = TRUE)
if (inherits(vif_values, "try-error")) { vars <- vars[1]; break }
if (all(vif_values <= 10)) break
var_remove <- names(which.max(vif_values))
vars <- vars[vars != var_remove]
cat("Hapus VIF:", var_remove, "\n")
}
# Anti-overfit R² > 0.8
if (length(vars) > 1) {
model_tmp <- lm(as.formula(paste("y_logit ~", paste(vars, collapse=" + "))), data=df)
if (summary(model_tmp)$r.squared > 0.8) {
cat("R² terlalu tinggi → pakai 1 variabel\n")
vars <- vars[1]
}
}
# Scaling
df[vars] <- scale(df[vars])
cat("✅ Final vars:", vars, "\n")
form_hb_p <- as.formula(paste("y_prop ~", paste(vars, collapse = " + ")))
cat("Formula:", deparse(form_hb_p), "\n")
hasil_list_hb[[p]] <- df
}##
## ===== PROVINSI 71 =====
## Top korelasi:
## X30_mean X45 X8 X44 X35_mean
## 0.750 0.651 0.651 0.587 0.529
## Setelah stepwise: X30_mean X8 X44
## ✅ Final vars: X30_mean X8 X44
## Formula: y_prop ~ X30_mean + X8 + X44
##
## ===== PROVINSI 72 =====
## Top korelasi:
## X28 X33_mean X45 X39_mean X35_mean
## 0.773 0.736 0.622 0.573 0.573
## Setelah stepwise: X28 X33_mean
## ✅ Final vars: X28 X33_mean
## Formula: y_prop ~ X28 + X33_mean
##
## ===== PROVINSI 73 =====
## Top korelasi:
## X26 X39_mean X25_mean X32 X11_mean
## 0.399 0.391 0.344 0.338 0.335
## Setelah stepwise: X26 X25_mean
## ✅ Final vars: X26 X25_mean
## Formula: y_prop ~ X26 + X25_mean
##
## ===== PROVINSI 74 =====
## Top korelasi:
## X11_mean X29_mean X45 X27_mean X21_mean
## 0.794 0.790 0.779 0.753 0.732
## Setelah stepwise: X29_mean X21_mean
## ✅ Final vars: X29_mean X21_mean
## Formula: y_prop ~ X29_mean + X21_mean
##
## ===== PROVINSI 75 =====
## Top korelasi:
## X24_mean X8 X23_mean X10_mean X31_mean
## 0.990 0.967 0.964 0.944 0.929
## ✅ Final vars: X24_mean
## Formula: y_prop ~ X24_mean
##
## ===== PROVINSI 76 =====
## Top korelasi:
## X24_mean X17 X9 X43 X19
## 0.949 0.886 0.849 0.805 0.764
## ✅ Final vars: X24_mean
## Formula: y_prop ~ X24_mean
# Load hasil HB Beta per provinsi
df_HBbeta_prov <- read_excel("C:/Users/User/Downloads/df_HBbeta_prov.xlsx")
cat("\n=== Ringkasan HB Beta Per Provinsi ===\n")##
## === Ringkasan HB Beta Per Provinsi ===
summary_df_hb <- data.frame(
Direct = df_HBbeta_prov$Estimasi,
HB = df_HBbeta_prov$y_hb_percent,
RSE_Direct = df_HBbeta_prov$RSE,
RSE_HB = df_HBbeta_prov$RSE_hb
)
print(summary(summary_df_hb))## Direct HB RSE_Direct RSE_HB
## Min. : 0.08 Min. : 0.3705 Min. : 22.94 Min. : 0.6677
## 1st Qu.: 6.18 1st Qu.: 6.5159 1st Qu.: 38.29 1st Qu.: 2.8839
## Median : 9.32 Median : 9.3156 Median : 44.36 Median :12.2737
## Mean :10.08 Mean :10.0637 Mean : 47.56 Mean :13.1275
## 3rd Qu.:13.93 3rd Qu.:13.9018 3rd Qu.: 53.52 3rd Qu.:21.8326
## Max. :24.05 Max. :24.0632 Max. :101.66 Max. :39.4415
df_HBbeta_prov$Kako <- as.character(df_HBbeta_prov$Kako)
df_plot_hb_prov <- df_HBbeta_prov %>%
dplyr::select(Kako, RSE, RSE_hb) %>%
pivot_longer(cols = c(RSE, RSE_hb),
names_to = "Metode",
values_to = "RSE_val") %>%
mutate(Metode = case_match(Metode,
"RSE" ~ "Direct",
"RSE_hb" ~ "HB Beta Per Provinsi"))
ggplot(df_plot_hb_prov,
aes(x = Kako, y = RSE_val, group = Metode, color = Metode)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
geom_hline(yintercept = 25,
linetype = "dashed", color = "red", linewidth = 1) +
scale_color_manual(values = c("Direct" = "#546e7a",
"HB Beta Per Provinsi" = "#2e7d32")) +
annotate("text", x = 3, y = 27, label = "Batas RSE 25%",
color = "red", size = 3.5) +
labs(title = "Perbandingan RSE: Direct vs HB Beta Per Provinsi",
subtitle = "Model HB Beta difit terpisah untuk tiap provinsi di Sulawesi",
x = "Kode Kabupaten/Kota",
y = "RSE (%)",
color = "Metode") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")Perbandingan RSE: Direct vs HB Beta Per Provinsi
Tabel perbandingan menghitung jumlah kabupaten/kota yang memiliki RSE ≥ 25% (tidak presisi) vs < 25% (presisi) untuk setiap metode. Metode terbaik adalah yang paling banyak menurunkan kabupaten dengan RSE ≥ 25%.
# ── Fungsi bantu ──────────────────────────────────────────
buat_tabel_rse <- function(data, var_rse, label_metode) {
data.frame(
Metode = label_metode,
Kategori = c("RSE ≥ 25% (Tidak Presisi)", "RSE < 25% (Presisi)"),
Jumlah = c(
sum(data[[var_rse]] >= 25, na.rm = TRUE),
sum(data[[var_rse]] < 25, na.rm = TRUE)
)
)
}
# ── Buat tabel per metode ─────────────────────────────────
tabel_direct <- buat_tabel_rse(df_HBbeta_global, "RSE", "Direct")
tabel_eblup_glob <- buat_tabel_rse(df_eblub_global, "RSE_eblup", "EBLUP FH Global")
tabel_eblup_prov <- buat_tabel_rse(df_eblup_prov, "RSE_eblup", "EBLUP FH Per Provinsi")
tabel_hb_global <- buat_tabel_rse(df_HBbeta_global, "RSE_hb", "HB Beta Global")
tabel_hb_prov <- buat_tabel_rse(df_HBbeta_prov, "RSE_hb", "HB Beta Per Provinsi")
# ── Gabung menjadi satu tabel ─────────────────────────────
tabel_semua <- bind_rows(
tabel_direct,
tabel_eblup_glob,
tabel_eblup_prov,
tabel_hb_global,
tabel_hb_prov
)
# ── Tampilkan ─────────────────────────────────────────────
knitr::kable(
tabel_semua,
caption = "Perbandingan Presisi Estimasi: RSE ≥ 25% vs < 25%",
align = c("l","l","c")
)| Metode | Kategori | Jumlah |
|---|---|---|
| Direct | RSE ≥ 25% (Tidak Presisi) | 80 |
| Direct | RSE < 25% (Presisi) | 1 |
| EBLUP FH Global | RSE ≥ 25% (Tidak Presisi) | 59 |
| EBLUP FH Global | RSE < 25% (Presisi) | 22 |
| EBLUP FH Per Provinsi | RSE ≥ 25% (Tidak Presisi) | 43 |
| EBLUP FH Per Provinsi | RSE < 25% (Presisi) | 38 |
| HB Beta Global | RSE ≥ 25% (Tidak Presisi) | 27 |
| HB Beta Global | RSE < 25% (Presisi) | 54 |
| HB Beta Per Provinsi | RSE ≥ 25% (Tidak Presisi) | 9 |
| HB Beta Per Provinsi | RSE < 25% (Presisi) | 72 |
tabel_presisi <- tabel_semua %>%
filter(Kategori == "RSE < 25% (Presisi)") %>%
mutate(Metode = factor(Metode, levels = c(
"Direct",
"EBLUP FH Global", "EBLUP FH Per Provinsi",
"HB Beta Global", "HB Beta Per Provinsi")))
ggplot(tabel_presisi, aes(x = Metode, y = Jumlah, fill = Metode)) +
geom_col(width = 0.6, show.legend = FALSE) +
geom_text(aes(label = Jumlah), vjust = -0.4, fontface = "bold", size = 5) +
scale_fill_manual(values = c(
"Direct" = "#90a4ae",
"EBLUP FH Global" = "#1565c0",
"EBLUP FH Per Provinsi" = "#6a1b9a",
"HB Beta Global" = "#e65100",
"HB Beta Per Provinsi" = "#2e7d32")) +
labs(title = "Jumlah KabKota dengan RSE < 25% per Metode",
subtitle = "Semakin tinggi = semakin banyak area estimasi yang presisi",
x = NULL,
y = "Jumlah Kabupaten/Kota") +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 20, hjust = 1),
plot.title = element_text(face = "bold"))Jumlah KabKota Presisi (RSE < 25%) per Metode
library(dplyr)
library(knitr)
library(kableExtra)
# ── Fungsi Kategori RSE ─────────────────────────────
buat_tabel_rse <- function(df, kolom_rse, nama_metode) {
df %>%
mutate(
kategori_rse = case_when(
.data[[kolom_rse]] < 25 ~ "< 25",
.data[[kolom_rse]] <= 50 ~ "25 - 50",
.data[[kolom_rse]] > 50 ~ "> 50"
)
) %>%
group_by(kategori_rse) %>%
summarise(
Jumlah = n(),
Kako = paste(Kako, collapse = ", "),
.groups = "drop"
) %>%
mutate(Metode = nama_metode) %>%
dplyr::select(Metode, kategori_rse, Jumlah, Kako)
}
# ── Generate Semua Tabel ────────────────────────────
tabel_semua <- bind_rows(
buat_tabel_rse(df_HBbeta_global, "RSE", "Direct"),
buat_tabel_rse(df_eblub_global, "RSE_eblup", "EBLUP FH Global"),
buat_tabel_rse(df_eblup_prov, "RSE_eblup", "EBLUP FH Per Provinsi"),
buat_tabel_rse(df_HBbeta_global, "RSE_hb", "HB Beta Global"),
buat_tabel_rse(df_HBbeta_prov, "RSE_hb", "HB Beta Per Provinsi")
) %>%
mutate(
kategori_rse = factor(kategori_rse, levels = c("< 25", "25 - 50", "> 50"))
) %>%
arrange(Metode, kategori_rse)
# ── Output Tabel Cantik ─────────────────────────────
kable(
tabel_semua,
caption = "Distribusi Kategori RSE per Metode Estimasi",
align = "c"
) %>%
kable_styling(full_width = FALSE, position = "center") %>%
row_spec(0, bold = TRUE)| Metode | kategori_rse | Jumlah | Kako |
|---|---|---|---|
| Direct | < 25 | 1 | 7325 |
| Direct | 25 - 50 | 54 | 7102, 7103, 7105, 7106, 7107, 7108, 7109, 7110, 7111, 7172, 7174, 7202, 7205, 7206, 7207, 7208, 7209, 7210, 7211, 7303, 7304, 7306, 7308, 7309, 7310, 7313, 7314, 7316, 7317, 7318, 7322, 7326, 7372, 7373, 7401, 7403, 7404, 7406, 7408, 7409, 7410, 7411, 7412, 7413, 7471, 7501, 7503, 7504, 7601, 7602, 7603, 7604, 7605, 7606 |
| Direct | > 50 | 26 | 7101, 7104, 7171, 7173, 7201, 7203, 7204, 7212, 7271, 7301, 7302, 7305, 7307, 7311, 7312, 7315, 7371, 7402, 7405, 7407, 7414, 7415, 7472, 7502, 7505, 7571 |
| EBLUP FH Global | < 25 | 22 | 7101, 7107, 7109, 7111, 7201, 7205, 7206, 7210, 7211, 7212, 7314, 7316, 7325, 7406, 7408, 7409, 7410, 7412, 7503, 7504, 7505, 7604 |
| EBLUP FH Global | 25 - 50 | 49 | 7102, 7103, 7104, 7105, 7106, 7108, 7110, 7172, 7202, 7204, 7207, 7208, 7209, 7271, 7301, 7302, 7303, 7304, 7305, 7306, 7308, 7309, 7310, 7311, 7313, 7315, 7317, 7318, 7322, 7326, 7371, 7401, 7402, 7403, 7404, 7405, 7407, 7411, 7413, 7414, 7415, 7471, 7501, 7502, 7601, 7602, 7603, 7605, 7606 |
| EBLUP FH Global | > 50 | 10 | 7171, 7173, 7174, 7203, 7307, 7312, 7372, 7373, 7472, 7571 |
| EBLUP FH Per Provinsi | < 25 | 38 | 7101, 7103, 7104, 7105, 7106, 7107, 7108, 7109, 7110, 7111, 7172, 7201, 7202, 7205, 7206, 7207, 7209, 7211, 7325, 7403, 7405, 7406, 7407, 7408, 7409, 7410, 7411, 7412, 7413, 7501, 7502, 7503, 7504, 7505, 7603, 7604, 7605, 7606 |
| EBLUP FH Per Provinsi | 25 - 50 | 33 | 7102, 7204, 7208, 7210, 7212, 7301, 7302, 7303, 7304, 7305, 7306, 7307, 7308, 7309, 7310, 7311, 7312, 7313, 7314, 7315, 7316, 7317, 7318, 7322, 7326, 7372, 7373, 7401, 7402, 7404, 7414, 7415, 7601 |
| EBLUP FH Per Provinsi | > 50 | 10 | 7171, 7173, 7174, 7203, 7271, 7371, 7471, 7472, 7571, 7602 |
| HB Beta Global | < 25 | 54 | 7101, 7103, 7104, 7105, 7107, 7108, 7109, 7110, 7111, 7174, 7201, 7202, 7205, 7206, 7207, 7208, 7209, 7210, 7211, 7212, 7304, 7305, 7306, 7308, 7310, 7311, 7313, 7314, 7315, 7316, 7317, 7318, 7322, 7326, 7401, 7403, 7404, 7405, 7406, 7407, 7408, 7410, 7411, 7412, 7413, 7414, 7501, 7503, 7504, 7505, 7601, 7603, 7604, 7606 |
| HB Beta Global | 25 - 50 | 27 | 7102, 7106, 7171, 7172, 7173, 7203, 7204, 7271, 7301, 7302, 7303, 7307, 7309, 7312, 7325, 7371, 7372, 7373, 7402, 7409, 7415, 7471, 7472, 7502, 7571, 7602, 7605 |
| HB Beta Per Provinsi | < 25 | 72 | 7101, 7102, 7103, 7104, 7105, 7106, 7107, 7108, 7109, 7110, 7111, 7171, 7172, 7173, 7174, 7201, 7202, 7203, 7204, 7205, 7206, 7207, 7208, 7209, 7210, 7211, 7212, 7271, 7303, 7304, 7305, 7306, 7309, 7310, 7311, 7313, 7314, 7315, 7317, 7318, 7322, 7325, 7326, 7401, 7402, 7403, 7404, 7405, 7406, 7407, 7408, 7409, 7410, 7411, 7412, 7413, 7414, 7415, 7471, 7472, 7501, 7502, 7503, 7504, 7505, 7571, 7601, 7602, 7603, 7604, 7605, 7606 |
| HB Beta Per Provinsi | 25 - 50 | 9 | 7301, 7302, 7307, 7308, 7312, 7316, 7371, 7372, 7373 |
Ringkasan Hasil Analisis:
Data yang digunakan: Estimasi direct dari survei, kovariat Podes 2024, dan data penduduk Sulawesi.
Seleksi variabel menggunakan kombinasi tiga tahap: korelasi, stepwise backward (AIC), dan pembersihan VIF untuk menghindari multikolinearitas.
SAE Fay-Herriot berhasil meningkatkan presisi estimasi dengan menurunkan RSE pada sebagian besar kabupaten/kota, baik secara global maupun per provinsi.
SAE HB Beta memberikan pendekatan alternatif yang lebih sesuai untuk data proporsi, dengan memanfaatkan distribusi prior Bayesian.
Pemodelan per provinsi menunjukkan fleksibilitas yang lebih tinggi karena mempertimbangkan heterogenitas antar wilayah di Sulawesi.
Catatan Teknis:
Kakovardir distabilkan dengan penambahan \(10^{-6}\) untuk menghindari pembagian oleh
nol