Tugas PSAE #2 — Estimasi Langsung & Validitas TPT
Tugas Halaman 9 — Estimator Horvitz-Thompson
Tujuan: Menerapkan estimator langsung (Direct Estimator / Horvitz-Thompson) menggunakan fungsi
direct()dari paketsaepada datasetincomedata.
Eksplorasi Data
Dataset incomedata
Dataset incomedata merupakan data pendapatan rumah
tangga yang tersedia sebagai built-in di paket
sae. Dataset ini digunakan untuk mempraktikkan berbagai
metode estimasi area kecil.
# ============================================================
# SLIDE 9 — Estimator Horvitz-Thompson (HT) dengan incomedata
# ============================================================
library(sae)
# Load dataset bawaan package sae
data(incomedata)
# Lihat struktur data
knitr::kable(head(incomedata), caption = "6 Baris Pertama Dataset `incomedata`",
digits = 2)| provlab | prov | ac | gen | age | nat | educ | labor | age2 | age3 | age4 | age5 | educ1 | educ2 | educ3 | nat1 | labor1 | labor2 | labor3 | income | weight |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alava | 1 | 16 | 2 | 5 | 2 | 2 | 3 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 12021.83 | 2804.03 |
| Alava | 1 | 16 | 1 | 3 | 1 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 8609.85 | 1347.64 |
| Alava | 1 | 16 | 1 | 5 | 1 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 3384.20 | 2568.18 |
| Alava | 1 | 16 | 1 | 5 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2343.36 | 1168.23 |
| Alava | 1 | 16 | 1 | 3 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 9023.85 | 953.44 |
| Alava | 1 | 16 | 2 | 5 | 1 | 1 | 3 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 395.24 | 4402.61 |
'data.frame': 17199 obs. of 21 variables:
$ provlab: Factor w/ 52 levels "Alava","Albacete",..: 1 1 1 1 1 1 1 1 1 1 ...
$ prov : int 1 1 1 1 1 1 1 1 1 1 ...
$ ac : int 16 16 16 16 16 16 16 16 16 16 ...
$ gen : int 2 1 1 1 1 2 1 2 1 2 ...
$ age : int 5 3 5 5 3 5 4 4 3 3 ...
$ nat : int 2 1 1 1 1 1 1 1 1 1 ...
$ educ : int 2 2 2 0 1 1 1 2 0 3 ...
$ labor : int 3 1 1 0 1 3 3 3 0 1 ...
$ age2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ age3 : int 0 1 0 0 1 0 0 0 1 1 ...
$ age4 : int 0 0 0 0 0 0 1 1 0 0 ...
$ age5 : int 1 0 1 1 0 1 0 0 0 0 ...
$ educ1 : int 0 0 0 0 1 1 1 0 0 0 ...
$ educ2 : int 1 1 1 0 0 0 0 1 0 0 ...
$ educ3 : int 0 0 0 0 0 0 0 0 0 1 ...
$ nat1 : num 0 1 1 1 1 1 1 1 1 1 ...
$ labor1 : int 0 1 1 0 1 0 0 0 0 1 ...
$ labor2 : int 0 0 0 0 0 0 0 0 0 0 ...
$ labor3 : int 1 0 0 0 0 1 1 1 0 0 ...
$ income : num 12022 8610 3384 2343 9024 ...
$ weight : num 2804 1348 2568 1168 953 ...
Dataset sizeprov
Dataset sizeprov berisi ukuran populasi per provinsi,
digunakan sebagai argumen domsize pada fungsi
direct().
data("sizeprov")
knitr::kable(sizeprov, caption = "Dataset `sizeprov` — Ukuran Populasi per Provinsi")| provlab | prov | Nd |
|---|---|---|
| Alava | 1 | 296558 |
| Albacete | 2 | 381413 |
| Alicante | 3 | 1717556 |
| Almeria | 4 | 618150 |
| Avila | 5 | 163142 |
| Badajoz | 6 | 660203 |
| Baleares | 7 | 984224 |
| Barcelona | 8 | 5149877 |
| Burgos | 9 | 353099 |
| Caceres | 10 | 403966 |
| Cadiz | 11 | 1165911 |
| Castellon | 12 | 542448 |
| CiudadReal | 13 | 495305 |
| Cordoba | 14 | 771665 |
| CorunaLa | 15 | 1106660 |
| Cuenca | 16 | 206479 |
| Gerona | 17 | 657764 |
| Granada | 18 | 866032 |
| Guadalajara | 19 | 205620 |
| Guipuzcoa | 20 | 677697 |
| Huelva | 21 | 479951 |
| Huesca | 22 | 213217 |
| Jaen | 23 | 647120 |
| Leon | 24 | 479017 |
| Lerida | 25 | 398604 |
| RiojaLa | 26 | 298760 |
| Lugo | 27 | 346779 |
| Madrid | 28 | 5921832 |
| Malaga | 29 | 1443395 |
| Murcia | 30 | 1334159 |
| Navarra | 31 | 581532 |
| Orense | 32 | 327805 |
| Oviedo | 33 | 1051532 |
| Palencia | 34 | 168113 |
| PalmasLas | 35 | 1008195 |
| Pontevedra | 36 | 922947 |
| Salamanca | 37 | 342223 |
| Tenerife | 38 | 942587 |
| Santander | 39 | 553771 |
| Segovia | 40 | 153559 |
| Sevilla | 41 | 1784244 |
| Soria | 42 | 90064 |
| Tarragona | 43 | 707683 |
| Teruel | 44 | 138975 |
| Toledo | 45 | 594895 |
| Valencia | 46 | 2376984 |
| Valladolid | 47 | 506269 |
| Vizcaya | 48 | 1122978 |
| Zamora | 49 | 193664 |
| Zaragoza | 50 | 895909 |
| Ceuta | 51 | 70981 |
| Melilla | 52 | 65336 |
Estimasi Langsung menggunakan direct()
Fungsi direct() menghitung estimasi langsung berbobot
(Horvitz-Thompson) per domain. Berikut disajikan empat variasi
pemanggilan fungsi beserta penjelasan perbedaannya.
Result 1 — Domain Numerik, domsize kolom 2:3
Pada bagian ini dilakukan estimasi rata-rata pendapatan per provinsi menggunakan Horvitz–Thompson direct estimator dengan desain sampling tanpa pengembalian (WOR) di dalam setiap domain.
Variabel prov digunakan sebagai domain numerik,
sedangkan bobot sampling diberikan melalui
sweight = weight. Ukuran populasi tiap domain
(domsize) diambil dari kolom ke-2 dan ke-3 pada
sizeprov, yang berisi kode domain dan ukuran
populasinya.
# ============================================================
# RESULT 1
# domain = prov (numerik), dengan bobot & domsize kolom 2:3
# ============================================================
result1 <- direct(
y = income,
dom = prov,
sweight = weight,
domsize = sizeprov[, c(2, 3)],
data = incomedata
)
knitr::kable(result1, caption = "Result 1 — domain numerik `prov`, domsize kolom 2:3",
digits = 4)| Domain | SampSize | Direct | SD | CV |
|---|---|---|---|---|
| 1 | 96 | 7121.005 | 978.9280 | 13.7470 |
| 2 | 173 | 13091.669 | 1380.7139 | 10.5465 |
| 3 | 539 | 13054.001 | 697.0416 | 5.3397 |
| 4 | 198 | 13158.889 | 1242.0200 | 9.4386 |
| 5 | 58 | 10592.494 | 1802.3255 | 17.0151 |
| 6 | 494 | 13213.102 | 759.5069 | 5.7481 |
| 7 | 634 | 14240.291 | 780.7269 | 5.4825 |
| 8 | 1420 | 11391.583 | 387.3791 | 3.4006 |
| 9 | 168 | 17193.316 | 1941.0206 | 11.2894 |
| 10 | 282 | 9173.640 | 781.1630 | 8.5153 |
| 11 | 398 | 16590.280 | 1047.6163 | 6.3146 |
| 12 | 118 | 7922.602 | 890.4710 | 11.2396 |
| 13 | 250 | 13014.733 | 1069.4094 | 8.2169 |
| 14 | 224 | 9238.661 | 756.3274 | 8.1865 |
| 15 | 495 | 10664.224 | 640.1909 | 6.0032 |
| 16 | 92 | 9726.539 | 1278.6810 | 13.1463 |
| 17 | 142 | 9631.635 | 1001.6669 | 10.3998 |
| 18 | 208 | 9492.997 | 869.7582 | 9.1621 |
| 19 | 89 | 10197.979 | 1532.7710 | 15.0301 |
| 20 | 285 | 11665.247 | 942.8695 | 8.0827 |
| 21 | 122 | 13287.909 | 1524.2953 | 11.4713 |
| 22 | 115 | 9105.395 | 1136.5260 | 12.4819 |
| 23 | 232 | 12730.452 | 1085.3034 | 8.5253 |
| 24 | 218 | 9819.389 | 873.2891 | 8.8935 |
| 25 | 130 | 15563.262 | 2058.6055 | 13.2273 |
| 26 | 510 | 11836.719 | 780.5658 | 6.5944 |
| 27 | 173 | 9444.694 | 941.5500 | 9.9691 |
| 28 | 944 | 13057.342 | 605.1017 | 4.6342 |
| 29 | 379 | 11662.044 | 802.1492 | 6.8783 |
| 30 | 885 | 13259.506 | 638.0530 | 4.8120 |
| 31 | 564 | 13735.008 | 783.8567 | 5.7070 |
| 32 | 129 | 8942.922 | 1320.8213 | 14.7695 |
| 33 | 803 | 11221.799 | 591.8854 | 5.2744 |
| 34 | 72 | 9189.167 | 1385.4011 | 15.0765 |
| 35 | 472 | 16529.191 | 1080.7667 | 6.5385 |
| 36 | 448 | 15337.141 | 919.5169 | 5.9954 |
| 37 | 164 | 12020.292 | 1328.1631 | 11.0493 |
| 38 | 381 | 9092.715 | 611.0477 | 6.7202 |
| 39 | 434 | 9731.450 | 704.8761 | 7.2433 |
| 40 | 58 | 8994.437 | 1470.5287 | 16.3493 |
| 41 | 482 | 11117.879 | 617.7463 | 5.5563 |
| 42 | 20 | 6597.581 | 1753.4391 | 26.5770 |
| 43 | 134 | 6204.736 | 681.4471 | 10.9827 |
| 44 | 72 | 9677.507 | 1855.1802 | 19.1700 |
| 45 | 275 | 14107.708 | 1136.2947 | 8.0544 |
| 46 | 714 | 11643.482 | 553.7125 | 4.7556 |
| 47 | 299 | 16833.008 | 1206.8993 | 7.1698 |
| 48 | 524 | 12607.676 | 723.4707 | 5.7383 |
| 49 | 104 | 11360.706 | 1432.0437 | 12.6052 |
| 50 | 564 | 15454.577 | 883.0556 | 5.7139 |
| 51 | 235 | 14501.279 | 1307.0182 | 9.0131 |
| 52 | 180 | 11661.446 | 1370.2542 | 11.7503 |
Result 2 — Variabel Eksplisit $, domsize
kolom 1 & 3
Estimasi yang sama dilakukan seperti pada Result 1, namun semua
variabel dipanggil secara eksplisit menggunakan operator $
tanpa menyertakan argumen data.
Pada kasus ini, domain menggunakan provlab (label
provinsi), dan domsize diambil dari kolom ke-1 dan ke-3
sizeprov, yang menyesuaikan dengan format label domain.
# ============================================================
# RESULT 2
# Sama seperti result1, tapi variabel dipanggil dengan $ eksplisit
# ============================================================
result2 <- direct(
y = incomedata$income,
dom = incomedata$provlab,
sweight = incomedata$weight,
domsize = sizeprov[, c(1, 3)]
)
knitr::kable(result2, caption = "Result 2 — variabel eksplisit `$`, domsize kolom 1 & 3",
digits = 4)| Domain | SampSize | Direct | SD | CV | |
|---|---|---|---|---|---|
| 1 | Alava | 96 | 7121.005 | 978.9280 | 13.7470 |
| 2 | Albacete | 173 | 13091.669 | 1380.7139 | 10.5465 |
| 3 | Alicante | 539 | 13054.001 | 697.0416 | 5.3397 |
| 4 | Almeria | 198 | 13158.889 | 1242.0200 | 9.4386 |
| 5 | Avila | 58 | 10592.494 | 1802.3255 | 17.0151 |
| 6 | Badajoz | 494 | 13213.102 | 759.5069 | 5.7481 |
| 7 | Baleares | 634 | 14240.291 | 780.7269 | 5.4825 |
| 8 | Barcelona | 1420 | 11391.583 | 387.3791 | 3.4006 |
| 9 | Burgos | 168 | 17193.316 | 1941.0206 | 11.2894 |
| 10 | Caceres | 282 | 9173.640 | 781.1630 | 8.5153 |
| 11 | Cadiz | 398 | 16590.280 | 1047.6163 | 6.3146 |
| 12 | Castellon | 118 | 7922.602 | 890.4710 | 11.2396 |
| 51 | Ceuta | 235 | 14501.279 | 1307.0182 | 9.0131 |
| 13 | CiudadReal | 250 | 13014.733 | 1069.4094 | 8.2169 |
| 14 | Cordoba | 224 | 9238.661 | 756.3274 | 8.1865 |
| 15 | CorunaLa | 495 | 10664.224 | 640.1909 | 6.0032 |
| 16 | Cuenca | 92 | 9726.539 | 1278.6810 | 13.1463 |
| 17 | Gerona | 142 | 9631.635 | 1001.6669 | 10.3998 |
| 18 | Granada | 208 | 9492.997 | 869.7582 | 9.1621 |
| 19 | Guadalajara | 89 | 10197.979 | 1532.7710 | 15.0301 |
| 20 | Guipuzcoa | 285 | 11665.247 | 942.8695 | 8.0827 |
| 21 | Huelva | 122 | 13287.909 | 1524.2953 | 11.4713 |
| 22 | Huesca | 115 | 9105.395 | 1136.5260 | 12.4819 |
| 23 | Jaen | 232 | 12730.452 | 1085.3034 | 8.5253 |
| 24 | Leon | 218 | 9819.389 | 873.2891 | 8.8935 |
| 25 | Lerida | 130 | 15563.262 | 2058.6055 | 13.2273 |
| 27 | Lugo | 173 | 9444.694 | 941.5500 | 9.9691 |
| 28 | Madrid | 944 | 13057.342 | 605.1017 | 4.6342 |
| 29 | Malaga | 379 | 11662.044 | 802.1492 | 6.8783 |
| 52 | Melilla | 180 | 11661.446 | 1370.2542 | 11.7503 |
| 30 | Murcia | 885 | 13259.506 | 638.0530 | 4.8120 |
| 31 | Navarra | 564 | 13735.008 | 783.8567 | 5.7070 |
| 32 | Orense | 129 | 8942.922 | 1320.8213 | 14.7695 |
| 33 | Oviedo | 803 | 11221.799 | 591.8854 | 5.2744 |
| 34 | Palencia | 72 | 9189.167 | 1385.4011 | 15.0765 |
| 35 | PalmasLas | 472 | 16529.191 | 1080.7667 | 6.5385 |
| 36 | Pontevedra | 448 | 15337.141 | 919.5169 | 5.9954 |
| 26 | RiojaLa | 510 | 11836.719 | 780.5658 | 6.5944 |
| 37 | Salamanca | 164 | 12020.292 | 1328.1631 | 11.0493 |
| 39 | Santander | 434 | 9731.450 | 704.8761 | 7.2433 |
| 40 | Segovia | 58 | 8994.437 | 1470.5287 | 16.3493 |
| 41 | Sevilla | 482 | 11117.879 | 617.7463 | 5.5563 |
| 42 | Soria | 20 | 6597.581 | 1753.4391 | 26.5770 |
| 43 | Tarragona | 134 | 6204.736 | 681.4471 | 10.9827 |
| 38 | Tenerife | 381 | 9092.715 | 611.0477 | 6.7202 |
| 44 | Teruel | 72 | 9677.507 | 1855.1802 | 19.1700 |
| 45 | Toledo | 275 | 14107.708 | 1136.2947 | 8.0544 |
| 46 | Valencia | 714 | 11643.482 | 553.7125 | 4.7556 |
| 47 | Valladolid | 299 | 16833.008 | 1206.8993 | 7.1698 |
| 48 | Vizcaya | 524 | 12607.676 | 723.4707 | 5.7383 |
| 49 | Zamora | 104 | 11360.706 | 1432.0437 | 12.6052 |
| 50 | Zaragoza | 564 | 15454.577 | 883.0556 | 5.7139 |
Result 3 — Domain Label provlab, domsize
kolom 1 & 3
Pada bagian ini digunakan asumsi Simple Random Sampling Without Replacement (SRSWOR) dalam setiap domain.
Karena desain sampling dianggap SRS, maka bobot sampling
(sweight) tidak perlu disertakan. Variabel domain tetap
menggunakan provlab, dan ukuran domain
(domsize) tetap diperlukan untuk menghitung estimator dan
varians berbasis SRSWOR.
# ============================================================
# RESULT 3
# domain = provlab (nama), domsize kolom c(1,3) tanpa bobot eksplisit
# ============================================================
result3 <- direct(
y = income,
dom = provlab,
domsize = sizeprov[, c(1, 3)],
data = incomedata
)
knitr::kable(result3, caption = "Result 3 — domain label `provlab`, domsize kolom 1 & 3",
digits = 4)| Domain | SampSize | Direct | SD | CV | |
|---|---|---|---|---|---|
| 1 | Alava | 96 | 10105.287 | 702.3915 | 6.9507 |
| 2 | Albacete | 173 | 13265.970 | 539.9709 | 4.0703 |
| 3 | Alicante | 539 | 12179.521 | 313.9160 | 2.5774 |
| 4 | Almeria | 198 | 12059.517 | 529.6645 | 4.3921 |
| 5 | Avila | 58 | 14019.712 | 963.9689 | 6.8758 |
| 6 | Badajoz | 494 | 12511.970 | 321.8370 | 2.5722 |
| 7 | Baleares | 634 | 14882.657 | 305.4036 | 2.0521 |
| 8 | Barcelona | 1420 | 10807.535 | 177.9112 | 1.6462 |
| 9 | Burgos | 168 | 13186.984 | 608.5386 | 4.6147 |
| 10 | Caceres | 282 | 9896.033 | 354.1848 | 3.5791 |
| 11 | Cadiz | 398 | 14529.255 | 407.9764 | 2.8080 |
| 12 | Castellon | 118 | 10823.324 | 529.4779 | 4.8920 |
| 51 | Ceuta | 235 | 12969.666 | 492.9915 | 3.8011 |
| 13 | CiudadReal | 250 | 12674.666 | 476.8651 | 3.7623 |
| 14 | Cordoba | 224 | 10066.604 | 388.0481 | 3.8548 |
| 15 | CorunaLa | 495 | 11127.150 | 301.6030 | 2.7105 |
| 16 | Cuenca | 92 | 11617.417 | 726.9723 | 6.2576 |
| 17 | Gerona | 142 | 11499.240 | 490.8290 | 4.2684 |
| 18 | Granada | 208 | 10351.829 | 480.7398 | 4.6440 |
| 19 | Guadalajara | 89 | 11779.328 | 718.5038 | 6.0997 |
| 20 | Guipuzcoa | 285 | 11579.721 | 412.5300 | 3.5625 |
| 21 | Huelva | 122 | 14799.152 | 740.0982 | 5.0009 |
| 22 | Huesca | 115 | 10636.029 | 589.6462 | 5.5439 |
| 23 | Jaen | 232 | 11180.785 | 463.7766 | 4.1480 |
| 24 | Leon | 218 | 10760.151 | 387.5303 | 3.6015 |
| 25 | Lerida | 130 | 13651.307 | 702.5331 | 5.1463 |
| 27 | Lugo | 173 | 9711.226 | 408.5270 | 4.2067 |
| 28 | Madrid | 944 | 13082.632 | 250.4293 | 1.9142 |
| 29 | Malaga | 379 | 11864.375 | 364.1834 | 3.0696 |
| 52 | Melilla | 180 | 12336.983 | 546.3280 | 4.4284 |
| 30 | Murcia | 885 | 13051.420 | 251.9265 | 1.9303 |
| 31 | Navarra | 564 | 13747.753 | 353.4242 | 2.5708 |
| 32 | Orense | 129 | 10705.264 | 640.9692 | 5.9874 |
| 33 | Oviedo | 803 | 11022.379 | 240.5450 | 2.1823 |
| 34 | Palencia | 72 | 11074.678 | 713.4901 | 6.4425 |
| 35 | PalmasLas | 472 | 14373.607 | 382.8277 | 2.6634 |
| 36 | Pontevedra | 448 | 14104.526 | 383.2660 | 2.7173 |
| 26 | RiojaLa | 510 | 11501.371 | 322.5749 | 2.8047 |
| 37 | Salamanca | 164 | 13143.700 | 657.6108 | 5.0032 |
| 39 | Santander | 434 | 9587.014 | 310.8231 | 3.2421 |
| 40 | Segovia | 58 | 10528.493 | 817.9002 | 7.7684 |
| 41 | Sevilla | 482 | 11751.782 | 290.6764 | 2.4735 |
| 42 | Soria | 20 | 13250.332 | 1150.4417 | 8.6824 |
| 43 | Tarragona | 134 | 8500.724 | 462.8108 | 5.4444 |
| 38 | Tenerife | 381 | 11426.812 | 333.3318 | 2.9171 |
| 44 | Teruel | 72 | 10639.357 | 937.4045 | 8.8107 |
| 45 | Toledo | 275 | 13885.110 | 457.9127 | 3.2979 |
| 46 | Valencia | 714 | 11812.881 | 255.4532 | 2.1625 |
| 47 | Valladolid | 299 | 13652.277 | 431.5035 | 3.1607 |
| 48 | Vizcaya | 524 | 12098.812 | 321.0908 | 2.6539 |
| 49 | Zamora | 104 | 10996.844 | 649.9185 | 5.9100 |
| 50 | Zaragoza | 564 | 14940.941 | 336.9009 | 2.2549 |
Result 4 — Tanpa domsize, Sampling dengan
Pengembalian
Pada bagian ini digunakan asumsi Simple Random Sampling With Replacement (SRSWR).
Dalam kondisi ini, ukuran populasi (domsize) tidak
diperlukan karena varians dihitung berdasarkan asumsi sampling dengan
pengembalian. Argumen `replace = TRUE digunakan untuk mengaktifkan
pendekatan SRSWR.
# ============================================================
# RESULT 4
# Tanpa domsize, menggunakan replace = TRUE (SRSWR)
# ============================================================
result4 <- direct(
y = income,
dom = provlab,
data = incomedata,
replace = TRUE
)
knitr::kable(result4, caption = "Result 4 — tanpa `domsize`, `replace = TRUE` (SRSWR)",
digits = 4)| Domain | SampSize | Direct | SD | CV | |
|---|---|---|---|---|---|
| 1 | Alava | 96 | 10105.287 | 702.5052 | 6.9519 |
| 2 | Albacete | 173 | 13265.970 | 540.0934 | 4.0713 |
| 3 | Alicante | 539 | 12179.521 | 313.9653 | 2.5778 |
| 4 | Almeria | 198 | 12059.517 | 529.7494 | 4.3928 |
| 5 | Avila | 58 | 14019.712 | 964.1403 | 6.8770 |
| 6 | Badajoz | 494 | 12511.970 | 321.9575 | 2.5732 |
| 7 | Baleares | 634 | 14882.657 | 305.5020 | 2.0527 |
| 8 | Barcelona | 1420 | 10807.535 | 177.9358 | 1.6464 |
| 9 | Burgos | 168 | 13186.984 | 608.6834 | 4.6158 |
| 10 | Caceres | 282 | 9896.033 | 354.3085 | 3.5803 |
| 11 | Cadiz | 398 | 14529.255 | 408.0460 | 2.8084 |
| 12 | Castellon | 118 | 10823.324 | 529.5355 | 4.8925 |
| 51 | Ceuta | 235 | 12969.666 | 493.8096 | 3.8074 |
| 13 | CiudadReal | 250 | 12674.666 | 476.9855 | 3.7633 |
| 14 | Cordoba | 224 | 10066.604 | 388.1045 | 3.8554 |
| 15 | CorunaLa | 495 | 11127.150 | 301.6704 | 2.7111 |
| 16 | Cuenca | 92 | 11617.417 | 727.1344 | 6.2590 |
| 17 | Gerona | 142 | 11499.240 | 490.8820 | 4.2688 |
| 18 | Granada | 208 | 10351.829 | 480.7975 | 4.6446 |
| 19 | Guadalajara | 89 | 11779.328 | 718.6593 | 6.1010 |
| 20 | Guipuzcoa | 285 | 11579.721 | 412.6168 | 3.5633 |
| 21 | Huelva | 122 | 14799.152 | 740.1922 | 5.0016 |
| 22 | Huesca | 115 | 10636.029 | 589.8053 | 5.5454 |
| 23 | Jaen | 232 | 11180.785 | 463.8598 | 4.1487 |
| 24 | Leon | 218 | 10760.151 | 387.6185 | 3.6024 |
| 25 | Lerida | 130 | 13651.307 | 702.6477 | 5.1471 |
| 27 | Lugo | 173 | 9711.226 | 408.6289 | 4.2078 |
| 28 | Madrid | 944 | 13082.632 | 250.4493 | 1.9144 |
| 29 | Malaga | 379 | 11864.375 | 364.2313 | 3.0700 |
| 52 | Melilla | 180 | 12336.983 | 547.0822 | 4.4345 |
| 30 | Murcia | 885 | 13051.420 | 252.0101 | 1.9309 |
| 31 | Navarra | 564 | 13747.753 | 353.5958 | 2.5720 |
| 32 | Orense | 129 | 10705.264 | 641.0954 | 5.9886 |
| 33 | Oviedo | 803 | 11022.379 | 240.6369 | 2.1832 |
| 34 | Palencia | 72 | 11074.678 | 713.6430 | 6.4439 |
| 35 | PalmasLas | 472 | 14373.607 | 382.9173 | 2.6640 |
| 36 | Pontevedra | 448 | 14104.526 | 383.3590 | 2.7180 |
| 26 | RiojaLa | 510 | 11501.371 | 322.8506 | 2.8071 |
| 37 | Salamanca | 164 | 13143.700 | 657.7684 | 5.0044 |
| 39 | Santander | 434 | 9587.014 | 310.9450 | 3.2434 |
| 40 | Segovia | 58 | 10528.493 | 818.0547 | 7.7699 |
| 41 | Sevilla | 482 | 11751.782 | 290.7156 | 2.4738 |
| 42 | Soria | 20 | 13250.332 | 1150.5694 | 8.6833 |
| 43 | Tarragona | 134 | 8500.724 | 462.8547 | 5.4449 |
| 38 | Tenerife | 381 | 11426.812 | 333.3992 | 2.9177 |
| 44 | Teruel | 72 | 10639.357 | 937.6474 | 8.8130 |
| 45 | Toledo | 275 | 13885.110 | 458.0186 | 3.2986 |
| 46 | Valencia | 714 | 11812.881 | 255.4915 | 2.1628 |
| 47 | Valladolid | 299 | 13652.277 | 431.6310 | 3.1616 |
| 48 | Vizcaya | 524 | 12098.812 | 321.1658 | 2.6545 |
| 49 | Zamora | 104 | 10996.844 | 650.0931 | 5.9116 |
| 50 | Zaragoza | 564 | 14940.941 | 337.0070 | 2.2556 |
Tugas Slide 12 — Estimasi TPT Manual: 4 Metode
Tujuan: Menghitung Tingkat Pengangguran Terbuka (TPT) secara manual menggunakan 4 metode berbeda, serta membandingkan hasil dan implikasinya.
Data Input
Data berisi 15 unit sampel kecamatan dengan status angkatan kerja
(y_ij) dan bobot HT (w_ij).
# ============================================================
# SLIDE 12 — Latihan Estimasi TPT dengan HT Estimator
# Data: 15 unit sampel kecamatan i
# ============================================================
data_tpt <- data.frame(
j = 1:15,
y_ij = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0),
w_ij = c(665, 808, 1234, 605, 480, 586, 582, 507, 605,
684, 742, 605, 586, 808, 507)
)
data_tpt$w_x_y <- data_tpt$w_ij * data_tpt$y_ij
knitr::kable(
data_tpt,
caption = "Data Input: 15 Unit Sampel Kecamatan",
col.names = c("j", "y_ij (Status AK)", "w_ij (Bobot HT)", "(w_ij)(y_ij)"),
align = c("c","c","r","r")
)| j | y_ij (Status AK) | w_ij (Bobot HT) | (w_ij)(y_ij) |
|---|---|---|---|
| 1 | 0 | 665 | 0 |
| 2 | 0 | 808 | 0 |
| 3 | 0 | 1234 | 0 |
| 4 | 0 | 605 | 0 |
| 5 | 0 | 480 | 0 |
| 6 | 0 | 586 | 0 |
| 7 | 1 | 582 | 582 |
| 8 | 0 | 507 | 0 |
| 9 | 0 | 605 | 0 |
| 10 | 1 | 684 | 684 |
| 11 | 0 | 742 | 0 |
| 12 | 0 | 605 | 0 |
| 13 | 0 | 586 | 0 |
| 14 | 1 | 808 | 808 |
| 15 | 0 | 507 | 0 |
Perbandingan 4 Metode Estimasi TPT
Bagian ini membandingkan empat pendekatan: SRS versi PPT, SRS benar, HT Total, dan HT Delta Method — untuk data 15 unit sampel yang sama.
Persiapan Data dan Helper
# ================================================================
# SLIDE 12 — ESTIMASI TPT: 4 METODE LENGKAP
# ================================================================
data_tpt <- data.frame(
j = 1:15,
y_ij = c(0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0),
w_ij = c(665, 808, 1234, 605, 480, 586, 582, 507, 605,
684, 742, 605, 586, 808, 507)
)
n <- nrow(data_tpt)
sum_w <- sum(data_tpt$w_ij)
sum_wy <- sum(data_tpt$w_ij * data_tpt$y_ij)
p_srs <- sum(data_tpt$y_ij) / n
p_hat <- sum_wy / sum_w
data_tpt$pi_i <- 1 / data_tpt$w_ij
LINE <- paste0(strrep("=", 68), "\n")
line <- paste0(strrep("-", 68), "\n")
sec <- function(x) cat(sprintf("\n [%s]\n", x))
cat(LINE); cat(" DATA INPUT\n"); cat(LINE)
cat(sprintf(" n (ukuran sampel) = %d\n", n))
cat(sprintf(" Pengangguran (y=1) = %d (unit 7, 10, 14)\n", sum(data_tpt$y_ij)))
cat(sprintf(" Σ wi (pop AK) = %d\n", sum_w))
cat(sprintf(" Σ wi×yi (pengangguran) = %d\n\n", sum_wy))
cat(sprintf(" %-4s %-8s %-6s %-10s\n", "j", "Status AK", "w_ij", "(wij)×(yij)"))
cat(line)
for (i in 1:n) {
status <- ifelse(data_tpt$y_ij[i] == 1, "Nganggur", "Bekerja")
cat(sprintf(" %-4d %-8s %-6d %-10d\n",
i, status, data_tpt$w_ij[i], data_tpt$w_ij[i] * data_tpt$y_ij[i]))
}
cat(line)
cat(sprintf(" %-4s %-8s %-6d %-10d\n\n", "TOT", "", sum_w, sum_wy))====================================================================
DATA INPUT
====================================================================
n (ukuran sampel) = 15
Pengangguran (y=1) = 3 (unit 7, 10, 14)
Σ wi (pop AK) = 10004
Σ wi×yi (pengangguran) = 2074
j Status AK w_ij (wij)×(yij)
--------------------------------------------------------------------
1 Bekerja 665 0
2 Bekerja 808 0
3 Bekerja 1234 0
4 Bekerja 605 0
5 Bekerja 480 0
6 Bekerja 586 0
7 Nganggur 582 582
8 Bekerja 507 0
9 Bekerja 605 0
10 Nganggur 684 684
11 Bekerja 742 0
12 Bekerja 605 0
13 Bekerja 586 0
14 Nganggur 808 808
15 Bekerja 507 0
--------------------------------------------------------------------
TOT 10004 2074
Metode 1 — SRS Versi PPT
cat(LINE); cat(" METODE 1: SRS — VERSI PPT\n"); cat(LINE)
sec("1A Estimasi TPT — persis seperti di slide")
cat(line)
cat(sprintf(" Populasi AK = %d\n\n", sum_w))
cat(" Rumus slide:\n")
cat(" Σ (wij × yij)\n")
cat(" TPT = ────────────── × 100%\n")
cat(" Σ (wij)\n\n")
cat(sprintf(" %d\n", sum_wy))
cat(" TPT = ─────── × 100%\n")
cat(sprintf(" %d\n\n", sum_w))
cat(sprintf(" = %.1f%%\n", p_hat * 100))
sec("1B Varians — persis seperti di slide")
cat(line)
vp_m1 <- p_hat * (1 - p_hat) / (n - 1)
cat(sprintf(" v(p) = p(1-p) / (n-1)\n"))
cat(sprintf(" = %.4f × (1 - %.4f) / (%d - 1)\n", p_hat, p_hat, n))
cat(sprintf(" = %.4f × %.4f / %d\n", p_hat, 1 - p_hat, n - 1))
cat(sprintf(" = %.6f\n", vp_m1))
sec("1C SE & RSE — persis seperti di slide")
cat(line)
SE_m1 <- 0.00587
RSE_m1 <- (SE_m1 / p_hat) * 100
cat(sprintf(" SE (slide) = 0.00587\n"))
cat(sprintf(" RSE (slide) = (0.00587 / %.4f) × 100 = %.2f%%\n\n", p_hat, RSE_m1))
cat(sprintf(" ► Metode 1: TPT = %.1f%% | SE = %.5f | RSE = %.2f%%\n\n",
p_hat * 100, SE_m1, RSE_m1))====================================================================
METODE 1: SRS — VERSI PPT
====================================================================
[1A Estimasi TPT — persis seperti di slide]
--------------------------------------------------------------------
Populasi AK = 10004
Rumus slide:
Σ (wij × yij)
TPT = ────────────── × 100%
Σ (wij)
2074
TPT = ─────── × 100%
10004
= 20.7%
[1B Varians — persis seperti di slide]
--------------------------------------------------------------------
v(p) = p(1-p) / (n-1)
= 0.2073 × (1 - 0.2073) / (15 - 1)
= 0.2073 × 0.7927 / 14
= 0.011738
[1C SE & RSE — persis seperti di slide]
--------------------------------------------------------------------
SE (slide) = 0.00587
RSE (slide) = (0.00587 / 0.2073) × 100 = 2.83%
► Metode 1: TPT = 20.7% | SE = 0.00587 | RSE = 2.83%
Metode 2 — SRS yang Benar
cat(LINE); cat(" METODE 2: SRS — SEHARUSNYA (HITUNG BENAR)\n"); cat(LINE)
sec("2A Estimasi TPT — rumus SRS murni tanpa bobot")
cat(line)
cat(sprintf(" TPT = Σyi / n = %d / %d × 100%%\n", sum(data_tpt$y_ij), n))
cat(sprintf(" = %.4f%%\n", p_srs * 100))
sec("2B Varians")
cat(line)
vp_m2 <- p_srs * (1 - p_srs) / (n - 1)
cat(sprintf(" v(p) = p(1-p) / (n-1)\n"))
cat(sprintf(" = %.4f × (1 - %.4f) / (%d - 1)\n", p_srs, p_srs, n))
cat(sprintf(" = %.4f × %.4f / %d\n", p_srs, 1 - p_srs, n - 1))
cat(sprintf(" = %.6f\n", vp_m2))
sec("2C SE — koreksi kesalahan slide")
cat(line)
SE_m2 <- sqrt(vp_m2)
RSE_m2 <- (SE_m2 / p_srs) * 100
cat(sprintf(" SE = √v(p) = √%.6f = %.6f\n\n", vp_m2, SE_m2))
cat(" ⚠ Kesalahan di slide:\n")
cat(sprintf(" √0.012 yang benar = %.6f\n", sqrt(vp_m2)))
cat(sprintf(" Slide tulis 0.00587, padahal 0.00587² = %.7f ≠ %.4f\n",
0.00587^2, vp_m2))
sec("2D RSE")
cat(line)
cat(sprintf(" RSE = (SE / p) × 100\n"))
cat(sprintf(" = (%.6f / %.4f) × 100\n", SE_m2, p_srs))
cat(sprintf(" = %.4f%%\n\n", RSE_m2))
cat(sprintf(" ► Metode 2: TPT = %.4f%% | SE = %.6f | RSE = %.4f%%\n\n",
p_srs * 100, SE_m2, RSE_m2))====================================================================
METODE 2: SRS — SEHARUSNYA (HITUNG BENAR)
====================================================================
[2A Estimasi TPT — rumus SRS murni tanpa bobot]
--------------------------------------------------------------------
TPT = Σyi / n = 3 / 15 × 100%
= 20.0000%
[2B Varians]
--------------------------------------------------------------------
v(p) = p(1-p) / (n-1)
= 0.2000 × (1 - 0.2000) / (15 - 1)
= 0.2000 × 0.8000 / 14
= 0.011429
[2C SE — koreksi kesalahan slide]
--------------------------------------------------------------------
SE = √v(p) = √0.011429 = 0.106904
⚠ Kesalahan di slide:
√0.012 yang benar = 0.106904
Slide tulis 0.00587, padahal 0.00587² = 0.0000345 ≠ 0.0114
[2D RSE]
--------------------------------------------------------------------
RSE = (SE / p) × 100
= (0.106904 / 0.2000) × 100
= 53.4522%
► Metode 2: TPT = 20.0000% | SE = 0.106904 | RSE = 53.4522%
Metode 3 — HT Estimasi Total (Dibedah Lengkap)
cat(LINE); cat(" METODE 3: HT — ESTIMASI TOTAL (DIBEDAH LENGKAP)\n"); cat(LINE)
sec("3A Estimasi TPT — Ratio HT Estimator")
cat(line)
cat(sprintf(" Ŷ_HT = Σ(wi × yi) = %d\n", sum_wy))
cat(sprintf(" X̂_HT = Σ(wi) = %d\n", sum_w))
cat(sprintf(" TPT = %d / %d × 100%% = %.4f%%\n\n", sum_wy, sum_w, p_hat * 100))
sec("3B Varians — SUKU-1: Σ(wi² - wi) × yi²")
cat(line)
data_tpt$koef1 <- data_tpt$w_ij^2 - data_tpt$w_ij
data_tpt$term1_i <- data_tpt$koef1 * data_tpt$y_ij^2
cat(sprintf(" %-4s %-5s %-6s %-12s %-12s %-5s %-12s\n",
"j","y_ij","wi","wi²","wi²-wi","yi²","kontribusi"))
cat(sprintf(" %s\n", strrep("-", 62)))
for (i in 1:n) {
cat(sprintf(" %-4d %-5d %-6d %-12s %-12s %-5d %-12.2f\n",
data_tpt$j[i], data_tpt$y_ij[i], data_tpt$w_ij[i],
format(data_tpt$w_ij[i]^2, big.mark=","),
format(data_tpt$koef1[i], big.mark=","),
data_tpt$y_ij[i]^2, data_tpt$term1_i[i]))
}
SUKU1 <- sum(data_tpt$term1_i)
cat(sprintf(" %s\n", strrep("-", 62)))
cat(sprintf(" %-38s %-12s\n", "SUKU-1 = Σ kontribusi",
format(round(SUKU1), big.mark=",")))
cat("\n Detail unit y=1 (satu-satunya yang berkontribusi):\n")
for (i in which(data_tpt$y_ij == 1)) {
w <- data_tpt$w_ij[i]
cat(sprintf(" Unit %-2d (w=%d): %s - %d = %s\n",
data_tpt$j[i], w,
format(w^2, big.mark=","), w,
format(w^2-w, big.mark=",")))
}
sec("3C Varians — SUKU-2: 2 × ΣΣ(wi×wj - 1/πij) × yi × yj")
cat(line)
cat(" Hanya pasangan (i,j) di mana yi=1 DAN yj=1 yang aktif\n")
cat(" → Pasangan aktif: (7,10), (7,14), (10,14)\n\n")
N_est <- sum_w
pi_ij <- (n * (n-1)) / (N_est * (N_est-1))
cat(sprintf(" πij (SRSWOR) = n(n-1) / N(N-1)\n"))
cat(sprintf(" = %d×%d / %d×%d\n", n, n-1, N_est, N_est-1))
cat(sprintf(" = %d / %s\n", n*(n-1), format(N_est*(N_est-1), big.mark=",")))
cat(sprintf(" = %.10f\n\n", pi_ij))
cat(sprintf(" %-12s %-6s %-6s %-12s %-14s %-14s %-10s\n",
"Pasangan","wi","wj","wi×wj","1/πij","wi×wj-1/πij","2×kontrib"))
cat(sprintf(" %s\n", strrep("-", 78)))
unit_y1 <- which(data_tpt$y_ij == 1)
SUKU2_raw <- 0
for (a in 1:(length(unit_y1)-1)) {
for (b in (a+1):length(unit_y1)) {
i <- unit_y1[a]; j <- unit_y1[b]
wi <- data_tpt$w_ij[i]; wj <- data_tpt$w_ij[j]
wiwj <- wi * wj
inv_pi_ij <- 1 / pi_ij
koef <- wiwj - inv_pi_ij
kontrib <- koef * data_tpt$y_ij[i] * data_tpt$y_ij[j]
SUKU2_raw <- SUKU2_raw + kontrib
cat(sprintf(" (%-2d, %-2d) %-6d %-6d %-12s %-14s %-14s %-10.2f\n",
data_tpt$j[i], data_tpt$j[j], wi, wj,
format(round(wiwj), big.mark=","),
format(round(inv_pi_ij), big.mark=","),
format(round(koef), big.mark=","),
2 * kontrib))
}
}
SUKU2 <- 2 * SUKU2_raw
cat(sprintf(" %s\n", strrep("-", 78)))
cat(sprintf(" SUKU-2 = 2 × %.2f = %.2f\n", SUKU2_raw, SUKU2))
sec("3D Total v(Ŷ_HT) → v(TPT) → SE → RSE")
cat(line)
v_HT_total <- SUKU1 + SUKU2
var_tpt_3 <- v_HT_total / sum_w^2
SE_m3 <- sqrt(var_tpt_3) * 100
RSE_m3 <- (SE_m3 / (p_hat * 100)) * 100
cat(sprintf(" v(Ŷ_HT) = SUKU-1 + SUKU-2\n"))
cat(sprintf(" = %s + (%s)\n",
format(round(SUKU1), big.mark=","),
format(round(SUKU2), big.mark=",")))
cat(sprintf(" = %s\n\n", format(round(v_HT_total), big.mark=",")))
cat(sprintf(" v(TPT) = v(Ŷ_HT) / (Σwi)²\n"))
cat(sprintf(" = %s / %s\n",
format(round(v_HT_total), big.mark=","),
format(sum_w^2, big.mark=",")))
cat(sprintf(" = %.6f\n\n", var_tpt_3))
cat(sprintf(" SE = √%.6f × 100 = %.4f%%\n", var_tpt_3, SE_m3))
cat(sprintf(" RSE = (%.4f / %.4f) × 100 = %.4f%%\n\n", SE_m3, p_hat*100, RSE_m3))
cat(sprintf(" ► Metode 3: TPT = %.4f%% | SE = %.4f%% | RSE = %.4f%%\n\n",
p_hat*100, SE_m3, RSE_m3))====================================================================
METODE 3: HT — ESTIMASI TOTAL (DIBEDAH LENGKAP)
====================================================================
[3A Estimasi TPT — Ratio HT Estimator]
--------------------------------------------------------------------
Ŷ_HT = Σ(wi × yi) = 2074
X̂_HT = Σ(wi) = 10004
TPT = 2074 / 10004 × 100% = 20.7317%
[3B Varians — SUKU-1: Σ(wi² - wi) × yi²]
--------------------------------------------------------------------
j y_ij wi wi² wi²-wi yi² kontribusi
--------------------------------------------------------------
1 0 665 442,225 441,560 0 0.00
2 0 808 652,864 652,056 0 0.00
3 0 1234 1,522,756 1,521,522 0 0.00
4 0 605 366,025 365,420 0 0.00
5 0 480 230,400 229,920 0 0.00
6 0 586 343,396 342,810 0 0.00
7 1 582 338,724 338,142 1 338142.00
8 0 507 257,049 256,542 0 0.00
9 0 605 366,025 365,420 0 0.00
10 1 684 467,856 467,172 1 467172.00
11 0 742 550,564 549,822 0 0.00
12 0 605 366,025 365,420 0 0.00
13 0 586 343,396 342,810 0 0.00
14 1 808 652,864 652,056 1 652056.00
15 0 507 257,049 256,542 0 0.00
--------------------------------------------------------------
SUKU-1 = Σ kontribusi 1,457,370
Detail unit y=1 (satu-satunya yang berkontribusi):
Unit 7 (w=582): 338,724 - 582 = 338,142
Unit 10 (w=684): 467,856 - 684 = 467,172
Unit 14 (w=808): 652,864 - 808 = 652,056
[3C Varians — SUKU-2: 2 × ΣΣ(wi×wj - 1/πij) × yi × yj]
--------------------------------------------------------------------
Hanya pasangan (i,j) di mana yi=1 DAN yj=1 yang aktif
→ Pasangan aktif: (7,10), (7,14), (10,14)
πij (SRSWOR) = n(n-1) / N(N-1)
= 15×14 / 10004×10003
= 210 / 100,070,012
= 0.0000020985
Pasangan wi wj wi×wj 1/πij wi×wj-1/πij 2×kontrib
------------------------------------------------------------------------------
(7 , 10) 582 684 398,088 476,524 -78,436 -156871.73
(7 , 14) 582 808 470,256 476,524 -6,268 -12535.73
(10, 14) 684 808 552,672 476,524 76,148 152296.27
------------------------------------------------------------------------------
SUKU-2 = 2 × -8555.60 = -17111.20
[3D Total v(Ŷ_HT) → v(TPT) → SE → RSE]
--------------------------------------------------------------------
v(Ŷ_HT) = SUKU-1 + SUKU-2
= 1,457,370 + (-17,111)
= 1,440,259
v(TPT) = v(Ŷ_HT) / (Σwi)²
= 1,440,259 / 100,080,016
= 0.014391
SE = √0.014391 × 100 = 11.9963%
RSE = (11.9963 / 20.7317) × 100 = 57.8644%
► Metode 3: TPT = 20.7317% | SE = 11.9963% | RSE = 57.8644%
Metode 4 — HT Delta Method (Ratio Estimator)
cat(LINE); cat(" METODE 4: HT — DELTA METHOD (RATIO ESTIMATOR)\n"); cat(LINE)
sec("4A Estimasi TPT")
cat(line)
cat(sprintf(" TPT = %.4f%% (sama dengan Metode 3)\n", p_hat * 100))
sec("4B Kenapa Metode 3 over-estimasi varians?")
cat(line)
cat(" Metode 3 → v(Ŷ_HT) untuk estimasi TOTAL → yi² pakai nilai asli\n")
cat(" TPT adalah RASIO: TPT = Ŷ/X̂ → harus pakai Delta Method\n\n")
cat(" Kunci: ganti yi → ei = yi - p_hat sebelum hitung varians\n")
cat(sprintf(" Karena semua unit adalah AK (xi=1): ei = yi - %.4f\n", p_hat))
sec("4C Varians SUKU-1: Σ(wi² - wi) × ei²")
cat(line)
data_tpt$e_i <- data_tpt$y_ij - p_hat
data_tpt$c_delt <- (data_tpt$w_ij^2 - data_tpt$w_ij) * data_tpt$e_i^2
cat(sprintf(" e_i jika y=1: 1 - %.4f = %.4f\n", p_hat, 1 - p_hat))
cat(sprintf(" e_i jika y=0: 0 - %.4f = %.4f\n\n", p_hat, -p_hat))
cat(sprintf(" %-4s %-5s %-6s %-8s %-12s %-8s %-12s\n",
"j","y_ij","wi","e_i","wi²-wi","ei²","kontribusi"))
cat(sprintf(" %s\n", strrep("-", 62)))
for (i in 1:n) {
cat(sprintf(" %-4d %-5d %-6d %-8.4f %-12s %-8.4f %-12.2f\n",
data_tpt$j[i], data_tpt$y_ij[i], data_tpt$w_ij[i],
data_tpt$e_i[i],
format(data_tpt$w_ij[i]^2 - data_tpt$w_ij[i], big.mark=","),
data_tpt$e_i[i]^2, data_tpt$c_delt[i]))
}
v_d1 <- sum(data_tpt$c_delt)
cat(sprintf(" %s\n", strrep("-", 62)))
cat(sprintf(" %-38s %-12s\n", "SUKU-1(delta) = Σ kontribusi",
format(round(v_d1), big.mark=",")))
sec("4D Varians SUKU-2: 2 × ΣΣ(wi×wj - 1/πij) × ei × ej")
cat(line)
cat(" Semua pasangan aktif karena ei ≠ 0 untuk semua unit\n\n")
cat(sprintf(" %-10s %-6s %-6s %-8s %-8s %-14s %-8s %-10s\n",
"Pasangan","wi","wj","ei","ej","wi×wj-1/πij","ei×ej","2×kontrib"))
cat(sprintf(" %s\n", strrep("-", 82)))
d2_raw <- 0; shown <- 0
for (i in 1:(n-1)) {
for (j in (i+1):n) {
wi <- data_tpt$w_ij[i]; wj <- data_tpt$w_ij[j]
ei <- data_tpt$e_i[i]; ej <- data_tpt$e_i[j]
koef <- wi*wj - 1/pi_ij
kontrib <- koef * ei * ej
d2_raw <- d2_raw + kontrib
shown <- shown + 1
if (shown <= 10 || data_tpt$y_ij[i]==1 || data_tpt$y_ij[j]==1) {
cat(sprintf(" (%-2d,%-2d) %-6d %-6d %-8.4f %-8.4f %-14s %-8.4f %-10.2f\n",
data_tpt$j[i], data_tpt$j[j], wi, wj, ei, ej,
format(round(koef), big.mark=","), ei*ej, 2*kontrib))
} else if (shown == 11) {
cat(sprintf(" ... (%d pasangan lainnya dihitung tapi tidak ditampilkan)\n",
n*(n-1)/2 - 10))
}
}
}
d2 <- 2 * d2_raw
cat(sprintf(" %s\n", strrep("-", 82)))
cat(sprintf(" Total %d pasangan | SUKU-2(delta) = 2 × %.2f = %.2f\n",
n*(n-1)/2, d2_raw, d2))
sec("4E Total v(TPT) → SE → RSE")
cat(line)
v_d_tot <- v_d1 + d2
var_tpt4 <- v_d_tot / sum_w^2
SE_m4 <- sqrt(var_tpt4) * 100
RSE_m4 <- (SE_m4 / (p_hat*100)) * 100
cat(sprintf(" v_HT(ê) = SUKU-1 + SUKU-2\n"))
cat(sprintf(" = %s + (%s)\n",
format(round(v_d1), big.mark=","),
format(round(d2), big.mark=",")))
cat(sprintf(" = %s\n\n", format(round(v_d_tot), big.mark=",")))
cat(sprintf(" v(TPT) = %s / %s\n",
format(round(v_d_tot), big.mark=","),
format(sum_w^2, big.mark=",")))
cat(sprintf(" = %.6f\n\n", var_tpt4))
cat(sprintf(" SE = √%.6f × 100 = %.4f%%\n", var_tpt4, SE_m4))
cat(sprintf(" RSE = (%.4f / %.4f) × 100 = %.4f%%\n\n", SE_m4, p_hat*100, RSE_m4))
cat(sprintf(" ► Metode 4: TPT = %.4f%% | SE = %.4f%% | RSE = %.4f%%\n\n",
p_hat*100, SE_m4, RSE_m4))====================================================================
METODE 4: HT — DELTA METHOD (RATIO ESTIMATOR)
====================================================================
[4A Estimasi TPT]
--------------------------------------------------------------------
TPT = 20.7317% (sama dengan Metode 3)
[4B Kenapa Metode 3 over-estimasi varians?]
--------------------------------------------------------------------
Metode 3 → v(Ŷ_HT) untuk estimasi TOTAL → yi² pakai nilai asli
TPT adalah RASIO: TPT = Ŷ/X̂ → harus pakai Delta Method
Kunci: ganti yi → ei = yi - p_hat sebelum hitung varians
Karena semua unit adalah AK (xi=1): ei = yi - 0.2073
[4C Varians SUKU-1: Σ(wi² - wi) × ei²]
--------------------------------------------------------------------
e_i jika y=1: 1 - 0.2073 = 0.7927
e_i jika y=0: 0 - 0.2073 = -0.2073
j y_ij wi e_i wi²-wi ei² kontribusi
--------------------------------------------------------------
1 0 665 -0.2073 441,560 0.0430 18978.41
2 0 808 -0.2073 652,056 0.0430 28025.61
3 0 1234 -0.2073 1,521,522 0.0430 65395.58
4 0 605 -0.2073 365,420 0.0430 15705.89
5 0 480 -0.2073 229,920 0.0430 9882.05
6 0 586 -0.2073 342,810 0.0430 14734.10
7 1 582 0.7927 338,142 0.6283 212470.25
8 0 507 -0.2073 256,542 0.0430 11026.27
9 0 605 -0.2073 365,420 0.0430 15705.89
10 1 684 0.7927 467,172 0.6283 293545.76
11 0 742 -0.2073 549,822 0.0430 23631.55
12 0 605 -0.2073 365,420 0.0430 15705.89
13 0 586 -0.2073 342,810 0.0430 14734.10
14 1 808 0.7927 652,056 0.6283 409716.92
15 0 507 -0.2073 256,542 0.0430 11026.27
--------------------------------------------------------------
SUKU-1(delta) = Σ kontribusi 1,160,285
[4D Varians SUKU-2: 2 × ΣΣ(wi×wj - 1/πij) × ei × ej]
--------------------------------------------------------------------
Semua pasangan aktif karena ei ≠ 0 untuk semua unit
Pasangan wi wj ei ej wi×wj-1/πij ei×ej 2×kontrib
----------------------------------------------------------------------------------
(1 ,2 ) 665 808 -0.2073 -0.2073 60,796 0.0430 5226.08
(1 ,3 ) 665 1234 -0.2073 -0.2073 344,086 0.0430 29577.90
(1 ,4 ) 665 605 -0.2073 -0.2073 -74,199 0.0430 -6378.19
(1 ,5 ) 665 480 -0.2073 -0.2073 -157,324 0.0430 -13523.68
(1 ,6 ) 665 586 -0.2073 -0.2073 -86,834 0.0430 -7464.30
(1 ,7 ) 665 582 -0.2073 0.7927 -89,494 -0.1643 29414.25
(1 ,8 ) 665 507 -0.2073 -0.2073 -139,369 0.0430 -11980.25
(1 ,9 ) 665 605 -0.2073 -0.2073 -74,199 0.0430 -6378.19
(1 ,10) 665 684 -0.2073 0.7927 -21,664 -0.1643 7120.34
(1 ,11) 665 742 -0.2073 -0.2073 16,906 0.0430 1453.26
... (95 pasangan lainnya dihitung tapi tidak ditampilkan)
(1 ,14) 665 808 -0.2073 0.7927 60,796 -0.1643 -19982.07
(2 ,7 ) 808 582 -0.2073 0.7927 -6,268 -0.1643 2060.08
(2 ,10) 808 684 -0.2073 0.7927 76,148 -0.1643 -25027.87
(2 ,14) 808 808 -0.2073 0.7927 176,340 -0.1643 -57958.31
(3 ,7 ) 1234 582 -0.2073 0.7927 241,664 -0.1643 -79428.57
(3 ,10) 1234 684 -0.2073 0.7927 367,532 -0.1643 -120798.04
(3 ,14) 1234 808 -0.2073 0.7927 520,548 -0.1643 -171090.33
(4 ,7 ) 605 582 -0.2073 0.7927 -124,414 -0.1643 40891.53
(4 ,10) 605 684 -0.2073 0.7927 -62,704 -0.1643 20609.09
(4 ,14) 605 808 -0.2073 0.7927 12,316 -0.1643 -4047.99
(5 ,7 ) 480 582 -0.2073 0.7927 -197,164 -0.1643 64802.52
(5 ,10) 480 684 -0.2073 0.7927 -148,204 -0.1643 48710.67
(5 ,14) 480 808 -0.2073 0.7927 -88,684 -0.1643 29148.03
(6 ,7 ) 586 582 -0.2073 0.7927 -135,472 -0.1643 44526.00
(6 ,10) 586 684 -0.2073 0.7927 -75,700 -0.1643 24880.53
(6 ,14) 586 808 -0.2073 0.7927 -3,036 -0.1643 997.81
(7 ,8 ) 582 507 0.7927 -0.2073 -181,450 -0.1643 59637.75
(7 ,9 ) 582 605 0.7927 -0.2073 -124,414 -0.1643 40891.53
(7 ,10) 582 684 0.7927 0.7927 -78,436 0.6283 -98569.76
(7 ,11) 582 742 0.7927 -0.2073 -44,680 -0.1643 14685.08
(7 ,12) 582 605 0.7927 -0.2073 -124,414 -0.1643 40891.53
(7 ,13) 582 586 0.7927 -0.2073 -135,472 -0.1643 44526.00
(7 ,14) 582 808 0.7927 0.7927 -6,268 0.6283 -7876.78
(7 ,15) 582 507 0.7927 -0.2073 -181,450 -0.1643 59637.75
(8 ,10) 507 684 -0.2073 0.7927 -129,736 -0.1643 42640.73
(8 ,14) 507 808 -0.2073 0.7927 -66,868 -0.1643 21977.69
(9 ,10) 605 684 -0.2073 0.7927 -62,704 -0.1643 20609.09
(9 ,14) 605 808 -0.2073 0.7927 12,316 -0.1643 -4047.99
(10,11) 684 742 0.7927 -0.2073 31,004 -0.1643 -10190.23
(10,12) 684 605 0.7927 -0.2073 -62,704 -0.1643 20609.09
(10,13) 684 586 0.7927 -0.2073 -75,700 -0.1643 24880.53
(10,14) 684 808 0.7927 0.7927 76,148 0.6283 95694.78
(10,15) 684 507 0.7927 -0.2073 -129,736 -0.1643 42640.73
(11,14) 742 808 -0.2073 0.7927 123,012 -0.1643 -40430.82
(12,14) 605 808 -0.2073 0.7927 12,316 -0.1643 -4047.99
(13,14) 586 808 -0.2073 0.7927 -3,036 -0.1643 997.81
(14,15) 808 507 0.7927 -0.2073 -66,868 -0.1643 21977.69
----------------------------------------------------------------------------------
Total 105 pasangan | SUKU-2(delta) = 2 × -11814.49 = -23628.98
[4E Total v(TPT) → SE → RSE]
--------------------------------------------------------------------
v_HT(ê) = SUKU-1 + SUKU-2
= 1,160,285 + (-23,629)
= 1,136,656
v(TPT) = 1,136,656 / 100,080,016
= 0.011357
SE = √0.011357 × 100 = 10.6571%
RSE = (10.6571 / 20.7317) × 100 = 51.4050%
► Metode 4: TPT = 20.7317% | SE = 10.6571% | RSE = 51.4050%
Tabel Perbandingan 4 Metode
# Tabel ringkasan menggunakan knitr::kable
tabel_metode <- data.frame(
Metode = c("1. SRS versi PPT (berbobot)",
"2. SRS benar (tanpa bobot)",
"3. HT Estimasi Total",
"4. HT Delta Method"),
TPT_pct = round(c(p_hat*100, p_srs*100, p_hat*100, p_hat*100), 4),
SE_pct = round(c(SE_m1, SE_m2, SE_m3, SE_m4), 6),
RSE_pct = round(c(RSE_m1, RSE_m2, RSE_m3, RSE_m4), 4),
Keterangan = c("TPT berbobot; SE slide salah",
"Murni SRS tanpa bobot",
"Over-estimasi varians",
"✓ Benar untuk ratio estimator")
)
knitr::kable(
tabel_metode,
caption = "Ringkasan Perbandingan 4 Metode Estimasi TPT",
col.names = c("Metode", "TPT (%)", "SE (%)", "RSE (%)", "Keterangan"),
digits = 4,
align = c("l","r","r","r","l")
)| Metode | TPT (%) | SE (%) | RSE (%) | Keterangan |
|---|---|---|---|---|
| 1. SRS versi PPT (berbobot) | 20.7317 | 0.0059 | 2.8314 | TPT berbobot; SE slide salah |
| 2. SRS benar (tanpa bobot) | 20.0000 | 0.1069 | 53.4522 | Murni SRS tanpa bobot |
| 3. HT Estimasi Total | 20.7317 | 11.9963 | 57.8644 | Over-estimasi varians |
| 4. HT Delta Method | 20.7317 | 10.6571 | 51.4050 | ✓ Benar untuk ratio estimator |
Catatan Metodologis
cat(" CATATAN:\n")
cat(" [M1] Slide berjudul 'SRS' tapi pakai rumus berbobot (HT).\n")
cat(" v(p) dihitung dari p_hat=0.2073. SE ditulis 0.00587\n")
cat(sprintf(" padahal √%.4f = %.5f → RSE slide tidak valid.\n\n",
vp_m1, sqrt(vp_m1)))
cat(" [M2] SRS murni: TPT = Σyi/n = 20.00%% (tanpa bobot).\n")
cat(sprintf(" SE yang benar = √%.4f = %.5f.\n\n", vp_m2, SE_m2))
cat(" [M3] Rumus HT lengkap (Suku-1+Suku-2) untuk estimasi TOTAL.\n")
cat(" Jika dipakai untuk rasio TPT → varians over-estimasi.\n\n")
cat(" [M4] Delta Method: ganti yi → ei=yi-p_hat sebelum hitung\n")
cat(" varians. Ini metode yang benar untuk ratio estimator.\n\n")
cat(" RSE ~50%% pada M2-M4 adalah WAJAR: n=15 sangat kecil,\n")
cat(" hanya 3 dari 15 unit yang pengangguran.\n") CATATAN:
[M1] Slide berjudul 'SRS' tapi pakai rumus berbobot (HT).
v(p) dihitung dari p_hat=0.2073. SE ditulis 0.00587
padahal √0.0117 = 0.10834 → RSE slide tidak valid.
[M2] SRS murni: TPT = Σyi/n = 20.00%% (tanpa bobot).
SE yang benar = √0.0114 = 0.10690.
[M3] Rumus HT lengkap (Suku-1+Suku-2) untuk estimasi TOTAL.
Jika dipakai untuk rasio TPT → varians over-estimasi.
[M4] Delta Method: ganti yi → ei=yi-p_hat sebelum hitung
varians. Ini metode yang benar untuk ratio estimator.
RSE ~50%% pada M2-M4 adalah WAJAR: n=15 sangat kecil,
hanya 3 dari 15 unit yang pengangguran.
Tugas Slide 18 — Estimasi Langsung TPT SAKERNAS
Tujuan: Menerapkan estimasi langsung TPT per kecamatan menggunakan data SAKERNAS, desain sampling bertingkat (2-stage), dan metode Taylor Linearization melalui paket
survey.
Persiapan dan Load Data
Package dan Opsi Global
# ================================================================
# SLIDE 18–19 — Estimasi Langsung TPT dengan Taylor Linearization
# ================================================================
library(readxl)
library(survey)
library(haven)
library(dplyr)
library(writexl)
options(scipen = 99)
options(survey.lonely.psu = "adjust")
# Pilihan lonely.psu:
# "adjust" → SE = 0 di stratum itu (REKOMENDASI BPS)
# "average" → ambil rata-rata SE stratum lain
# "remove" → hapus stratum dari perhitungan
# "certainty" → anggap PSU pasti terpilihLoad Data SAKERNAS
cat("================================================================\n")
cat(" LOAD DATA\n")
cat("================================================================\n")
data_coba <- read_excel("C:/Users/User/Downloads/DATASAKERNASSIAPOLAH.xlsx")
cat(sprintf(" Jumlah baris = %d\n", nrow(data_coba)))
cat(sprintf(" Jumlah kolom = %d\n", ncol(data_coba)))
cat(sprintf(" Kolom tersedia : %s\n\n", paste(names(data_coba), collapse = ", ")))================================================================
LOAD DATA
================================================================
Jumlah baris = 12129
Jumlah kolom = 20
Kolom tersedia : tahun, urutan, SUPASfinal_weightR, KODE_PROV, NAMA_PROV, KODE_KAB2, NAMA_KAB, KODE_KEC2, NAMA_KEC, KLASIFIKASI2, no_urutNKS, NURT2, B1_R11, B4_K8, B5_R5A1, B5_R6, B5_R15A, B5_R15B, B5_R20A, STATUS
Variabel Turunan
Dari kolom STATUS dibuat dua variabel biner:
pengangguran dan angkatankerja.
cat("================================================================\n")
cat(" VARIABEL TURUNAN\n")
cat("================================================================\n")
data_coba$pengangguran <- as.numeric(
ifelse(data_coba$STATUS == "PENGANGGURAN", "1", "0")
)
data_coba$angkatankerja <- as.numeric(
ifelse(data_coba$STATUS == "PENGANGGURAN", "1",
ifelse(data_coba$STATUS == "BEKERJA", "1", "0"))
)
data_coba$pengangguran[is.na(data_coba$pengangguran)] <- 0
cat(" Distribusi STATUS:\n")
print(table(data_coba$STATUS))
cat(sprintf("\n Total pengangguran (n) = %d\n", sum(data_coba$pengangguran, na.rm = TRUE)))
cat(sprintf(" Total angkatan kerja (n) = %d\n\n", sum(data_coba$angkatankerja, na.rm = TRUE)))================================================================
VARIABEL TURUNAN
================================================================
Distribusi STATUS:
BEKERJA BUKAN ANGKATAN KERJA PENGANGGURAN
6856 4633 640
Total pengangguran (n) = 640
Total angkatan kerja (n) = 7496
Desain Sampling SAKERNAS
Definisi Strata, PSU, dan SSU
cat("================================================================\n")
cat(" DESAIN SAMPLING SAKERNAS\n")
cat("================================================================\n")
cat(" Strata : KODE_PROV + KLASIFIKASI2 (perkotaan/perdesaan)\n")
cat(" PSU : Blok Sensus (no_urutNKS)\n")
cat(" SSU : Rumahtangga (NURT2)\n")
cat(" Bobot : SUPASfinal_weightR\n\n")
data_coba$STAT <- paste(data_coba$KODE_PROV, data_coba$KODE_KAB2, data_coba$KODE_KEC2)
data_coba$PSU <- paste(data_coba$STAT, data_coba$no_urutNKS)
data_coba$SSU <- paste(data_coba$PSU, data_coba$NURT2)
data_coba$STRATA <- paste(data_coba$KODE_PROV, data_coba$KLASIFIKASI2)
cat(sprintf(" Jumlah kecamatan (STAT) = %d\n", length(unique(data_coba$STAT))))
cat(sprintf(" Jumlah PSU (blok sensus) = %d\n", length(unique(data_coba$PSU))))
cat(sprintf(" Jumlah SSU (rumahtangga) = %d\n", length(unique(data_coba$SSU))))
cat(sprintf(" Jumlah strata = %d\n\n", length(unique(data_coba$STRATA))))================================================================
DESAIN SAMPLING SAKERNAS
================================================================
Strata : KODE_PROV + KLASIFIKASI2 (perkotaan/perdesaan)
PSU : Blok Sensus (no_urutNKS)
SSU : Rumahtangga (NURT2)
Bobot : SUPASfinal_weightR
Jumlah kecamatan (STAT) = 144
Jumlah PSU (blok sensus) = 448
Jumlah SSU (rumahtangga) = 12129
Jumlah strata = 2
Penanganan Lonely PSU
psu_per_strata <- data_coba %>%
group_by(STRATA) %>%
summarise(n_PSU = n_distinct(PSU), .groups = "drop")
lonely <- psu_per_strata %>% filter(n_PSU == 1)
cat(sprintf(" Strata dengan 1 PSU (lonely): %d strata\n", nrow(lonely)))
if (nrow(lonely) > 0) {
cat(" Daftar lonely PSU strata:\n")
print(lonely)
}
cat(sprintf(" → Ditangani dengan: options(survey.lonely.psu = 'adjust')\n\n")) Strata dengan 1 PSU (lonely): 0 strata
→ Ditangani dengan: options(survey.lonely.psu = 'adjust')
Objek Desain svydesign
sakernas_coba <- svydesign(
id = ~PSU + SSU,
strata = ~STRATA,
data = data_coba,
weights = ~SUPASfinal_weightR
)
data_coba$PSU <- as.factor(data_coba$PSU)
data_coba$SSU <- as.factor(data_coba$SSU)
cat(" Ringkasan desain sampling:\n")
print(summary(sakernas_coba)) Ringkasan desain sampling:
Stratified 2 - level Cluster Sampling design (with replacement)
With (448, 12129) clusters.
svydesign(id = ~PSU + SSU, strata = ~STRATA, data = data_coba,
weights = ~SUPASfinal_weightR)
Probabilities:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0003987 0.0010163 0.0014347 0.0038716 0.0021645 0.2500000
First-level Stratum Sizes:
36 01 36 02
obs 8158 3971
design.PSU 304 144
actual.PSU 304 144
Data variables:
[1] "tahun" "urutan" "SUPASfinal_weightR"
[4] "KODE_PROV" "NAMA_PROV" "KODE_KAB2"
[7] "NAMA_KAB" "KODE_KEC2" "NAMA_KEC"
[10] "KLASIFIKASI2" "no_urutNKS" "NURT2"
[13] "B1_R11" "B4_K8" "B5_R5A1"
[16] "B5_R6" "B5_R15A" "B5_R15B"
[19] "B5_R20A" "STATUS" "pengangguran"
[22] "angkatankerja" "STAT" "PSU"
[25] "SSU" "STRATA"
Estimasi TPT per Kecamatan
Estimasi dilakukan dengan svyby() +
svyratio() menggunakan Taylor
Linearization sebagai metode penghitungan varians.
cat("\n================================================================\n")
cat(" ESTIMASI TPT PER KECAMATAN (svyratio + Taylor Linearization)\n")
cat("================================================================\n")
cat(" Formula : pengangguran / angkatankerja\n")
cat(" By : STAT (kode kecamatan)\n")
cat(" Vartype : SE, CI bawah, CI atas, VAR\n")
cat(" DEFF : Design Effect\n\n")
hasilku <- as.data.frame(
svyby(
formula = ~pengangguran,
denom = ~angkatankerja,
by = ~STAT,
design = sakernas_coba,
FUN = svyratio,
deff = TRUE,
vartype = c("se", "ci", "ci", "var"),
data = data_coba
)
)
cat(sprintf(" Estimasi selesai.\n"))
cat(sprintf(" Jumlah kecamatan = %d\n", nrow(hasilku)))
cat(sprintf(" Kolom awal : %s\n\n", paste(names(hasilku), collapse = ", ")))
================================================================
ESTIMASI TPT PER KECAMATAN (svyratio + Taylor Linearization)
================================================================
Formula : pengangguran / angkatankerja
By : STAT (kode kecamatan)
Vartype : SE, CI bawah, CI atas, VAR
DEFF : Design Effect
Estimasi selesai.
Jumlah kecamatan = 144
Kolom awal : STAT, pengangguran/angkatankerja, se.pengangguran/angkatankerja, ci_l, ci_u, var.pengangguran/angkatankerja, DEff
Post-Processing Hasil
Konversi Satuan dan Rename Kolom
cat("================================================================\n")
cat(" POST-PROCESSING HASIL\n")
cat("================================================================\n\n")
hasilku$`pengangguran/angkatankerja` <- round(
hasilku$`pengangguran/angkatankerja` * 100, 4)
hasilku$`se.pengangguran/angkatankerja` <- round(
hasilku$`se.pengangguran/angkatankerja` * 100, 4)
hasilku$`var.pengangguran/angkatankerja` <- round(
hasilku$`se.pengangguran/angkatankerja` *
hasilku$`se.pengangguran/angkatankerja`, 4)
hasilku$ci_l <- round(hasilku$ci_l * 100, 4)
hasilku$ci_u <- round(hasilku$ci_u * 100, 4)
hasilku$DEff <- round(hasilku$DEff, 4)
hasilku[is.na(hasilku)] <- 0
hasilku <- hasilku %>%
rename(
TPT = `pengangguran/angkatankerja`,
SE = `se.pengangguran/angkatankerja`,
VAR = `var.pengangguran/angkatankerja`,
ci_lower = ci_l,
ci_upper = ci_u
)
hasilku$RSE <- ifelse(
hasilku$TPT == 0, 0,
round((hasilku$SE / hasilku$TPT) * 100, 4)
)
hasilku[is.na(hasilku)] <- 0
cat(" Kolom akhir : ", paste(names(hasilku), collapse = ", "), "\n\n")================================================================
POST-PROCESSING HASIL
================================================================
Kolom akhir : STAT, TPT, SE, ci_lower, ci_upper, VAR, DEff, RSE
Ringkasan Statistik Hasil
stat_ringkasan <- data.frame(
Statistik = c("Min", "Median", "Mean*", "Max"),
`TPT (%)` = c(min(hasilku$TPT),
median(hasilku$TPT),
mean(hasilku$TPT[hasilku$TPT > 0]),
max(hasilku$TPT)),
`SE (%)` = c(min(hasilku$SE),
median(hasilku$SE),
mean(hasilku$SE[hasilku$SE > 0]),
max(hasilku$SE)),
`RSE (%)` = c(min(hasilku$RSE),
median(hasilku$RSE),
mean(hasilku$RSE[hasilku$RSE > 0]),
max(hasilku$RSE)),
DEff = c(min(hasilku$DEff),
median(hasilku$DEff),
mean(hasilku$DEff[hasilku$DEff > 0]),
max(hasilku$DEff)),
check.names = FALSE
)
knitr::kable(
stat_ringkasan,
caption = "Statistik Ringkasan Hasil Estimasi TPT per Kecamatan (* Mean dihitung untuk nilai > 0)",
digits = 4,
align = c("l","r","r","r","r")
)| Statistik | TPT (%) | SE (%) | RSE (%) | DEff |
|---|---|---|---|---|
| Min | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| Median | 7.5015 | 1.8508 | 25.1245 | 0.3720 |
| Mean* | 10.3264 | 2.7999 | 40.2107 | 0.8022 |
| Max | 71.3973 | 9.6047 | 108.4039 | 4.5873 |
n_tidak_presisi <- sum(hasilku$RSE > 25 & hasilku$RSE != 0)
cat(sprintf("\nKecamatan RSE > 25%% (tidak presisi) = %d dari %d kecamatan\n",
n_tidak_presisi, nrow(hasilku)))
Kecamatan RSE > 25% (tidak presisi) = 74 dari 144 kecamatan
| STAT | TPT | SE | ci_lower | ci_upper | VAR | DEff | RSE | |
|---|---|---|---|---|---|---|---|---|
| 36 01 010 | 36 01 010 | 6.5541 | 0.0000 | 6.5541 | 6.5541 | 0.0000 | 0.0000 | 0.0000 |
| 36 01 020 | 36 01 020 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| 36 01 030 | 36 01 030 | 4.7484 | 3.0519 | -1.2332 | 10.7299 | 9.3141 | 0.6570 | 64.2722 |
| 36 01 031 | 36 01 031 | 20.6821 | 0.0000 | 20.6821 | 20.6821 | 0.0000 | 0.0000 | 0.0000 |
| 36 01 040 | 36 01 040 | 11.3390 | 5.4648 | 0.6282 | 22.0498 | 29.8640 | 0.5038 | 48.1947 |
| 36 01 050 | 36 01 050 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| 36 01 060 | 36 01 060 | 3.9085 | 2.0567 | -0.1226 | 7.9396 | 4.2300 | 0.6881 | 52.6212 |
| 36 01 061 | 36 01 061 | 20.7317 | 0.0000 | 20.7317 | 20.7317 | 0.0000 | 0.0000 | 0.0000 |
| 36 01 070 | 36 01 070 | 8.7767 | 0.0000 | 8.7767 | 8.7767 | 0.0000 | 0.0000 | 0.0000 |
| 36 01 071 | 36 01 071 | 4.8308 | 3.8047 | -2.6262 | 12.2879 | 14.4757 | 1.0741 | 78.7592 |
Export ke Excel
cat("\n================================================================\n")
cat(" EXPORT HASIL\n")
cat("================================================================\n")
write_xlsx(hasilku, "hasilku.xlsx", col_names = TRUE)
cat(" ✓ File 'hasilku.xlsx' berhasil disimpan!\n")
cat(sprintf(" Total kecamatan : %d\n", nrow(hasilku)))
cat(sprintf(" Rata-rata TPT (TPT > 0) : %.4f%%\n", mean(hasilku$TPT[hasilku$TPT > 0])))
cat(sprintf(" Rata-rata RSE (RSE > 0) : %.4f%%\n", mean(hasilku$RSE[hasilku$RSE > 0])))
cat(sprintf(" Kecamatan tidak presisi : %d\n", n_tidak_presisi))
================================================================
EXPORT HASIL
================================================================
✓ File 'hasilku.xlsx' berhasil disimpan!
Total kecamatan : 144
Rata-rata TPT (TPT > 0) : 10.3264%
Rata-rata RSE (RSE > 0) : 40.2107%
Kecamatan tidak presisi : 74
Identifikasi Validitas Estimasi TPT
Tujuan: Mengklasifikasikan setiap estimasi TPT per kecamatan ke dalam kategori validitas berdasarkan nilai SE, RSE, CI, dan DEff — mengikuti standar presisi BPS (RSE ≤ 25%).
Fungsi Pendukung
# ================================================================
# IDENTIFIKASI VALIDITAS ESTIMASI TPT — VERSI FINAL (DIPERBAIKI)
# ================================================================
library(dplyr)
library(writexl)
LINE <- paste0(strrep("=", 68), "\n")
line <- paste0(strrep("-", 68), "\n")
safe_median <- function(x) {
x <- x[is.finite(x) & !is.na(x)]
if (length(x) == 0) return(NA_real_)
median(x)
}
safe_mean <- function(x) {
x <- x[is.finite(x) & !is.na(x)]
if (length(x) == 0) return(NA_real_)
mean(x)
}Step 1 — Klasifikasi Validitas
Setiap kecamatan diklasifikasikan menggunakan aturan berjenjang berikut:
| Status | Kondisi | Skor |
|---|---|---|
| TIDAK VALID | SE/RSE/CI/DEff = Inf |
0 |
| PERLU CEK | TPT=0 & SE=0 | 1 |
| MENCURIGAKAN | TPT>0 & SE=0 | 1 |
| TIDAK PRESISI | CI<0 & RSE>25%, atau RSE>25% | 2–3 |
| PERLU PERHATIAN | CI_lower<0 saja | 3 |
| VALID & PRESISI | RSE≤25%, SE>0, CI≥0 | 5 |
# ================================================================
# STEP 1 — KLASIFIKASI
# ================================================================
klasifikasi_validitas <- function(df) {
df$STATUS_VALIDITAS <- NA_character_
df$ALASAN <- NA_character_
df$SKOR_VALID <- NA_integer_
for (i in 1:nrow(df)) {
tpt <- df$TPT[i]; se <- df$SE[i]; rse <- df$RSE[i]
ci_l <- df$ci_lower[i]; ci_u <- df$ci_upper[i]; deff <- df$DEff[i]
if (is.infinite(se) || is.infinite(rse) ||
is.infinite(ci_l) || is.infinite(ci_u) || is.infinite(deff)) {
df$STATUS_VALIDITAS[i] <- "TIDAK VALID"
df$ALASAN[i] <- "SE/RSE = Inf: lonely PSU di strata"
df$SKOR_VALID[i] <- 0
} else if (tpt == 0 && se == 0) {
df$STATUS_VALIDITAS[i] <- "PERLU CEK"
df$ALASAN[i] <- "TPT=0 & SE=0: sampel nol pengangguran / nol AK"
df$SKOR_VALID[i] <- 1
} else if (tpt > 0 && se == 0) {
df$STATUS_VALIDITAS[i] <- "MENCURIGAKAN"
df$ALASAN[i] <- "TPT>0 tapi SE=0: variasi antar PSU collapse"
df$SKOR_VALID[i] <- 1
} else if (ci_l < 0 && rse > 25) {
df$STATUS_VALIDITAS[i] <- "TIDAK PRESISI"
df$ALASAN[i] <- "CI_lower<0 & RSE>25%: estimasi tidak stabil"
df$SKOR_VALID[i] <- 2
} else if (ci_l < 0) {
df$STATUS_VALIDITAS[i] <- "PERLU PERHATIAN"
df$ALASAN[i] <- "CI_lower<0: interval mencakup nilai negatif"
df$SKOR_VALID[i] <- 3
} else if (rse > 25) {
df$STATUS_VALIDITAS[i] <- "TIDAK PRESISI"
df$ALASAN[i] <- "RSE>25%: tidak presisi standar BPS"
df$SKOR_VALID[i] <- 3
} else {
df$STATUS_VALIDITAS[i] <- "VALID & PRESISI"
df$ALASAN[i] <- "RSE<=25%, SE>0, CI>=0"
df$SKOR_VALID[i] <- 5
}
}
return(df)
}
hasilku <- klasifikasi_validitas(hasilku)Step 2 — Join Data Sampel
Menggabungkan statistik sampel per kecamatan (n_AK,
n_PSU, dll.) ke dataframe hasilku. Terdapat
mekanisme fallback untuk mengatasi inkonsistensi spasi
ganda pada kunci join.
# ================================================================
# STEP 2 — JOIN DATA SAMPEL (FIX: trimws + fallback gsub)
# ================================================================
hasilku$STAT <- trimws(hasilku$STAT)
sampel_per_kec <- data_coba %>%
mutate(STAT = trimws(paste(KODE_PROV, KODE_KAB2, KODE_KEC2))) %>%
group_by(STAT) %>%
summarise(
n_sampel_AK = sum(angkatankerja, na.rm = TRUE),
n_pengangguran = sum(pengangguran, na.rm = TRUE),
n_PSU = n_distinct(PSU),
.groups = "drop"
)
dist_ngang_psu <- data_coba %>%
mutate(STAT = trimws(paste(KODE_PROV, KODE_KAB2, KODE_KEC2))) %>%
filter(pengangguran == 1) %>%
group_by(STAT, PSU) %>%
summarise(n_ngang_psu = sum(pengangguran, na.rm = TRUE), .groups = "drop") %>%
group_by(STAT) %>%
summarise(
n_PSU_dgn_ngang = n_distinct(PSU),
max_ngang_1psu = max(n_ngang_psu),
.groups = "drop"
)
hasilku <- hasilku %>%
dplyr::select(-any_of(c("n_sampel_AK","n_pengangguran","n_PSU",
"n_PSU_dgn_ngang","max_ngang_1psu")))
hasilku <- hasilku %>%
left_join(sampel_per_kec, by = "STAT") %>%
left_join(dist_ngang_psu, by = "STAT")
hasilku$n_PSU_dgn_ngang[is.na(hasilku$n_PSU_dgn_ngang)] <- 0
hasilku$max_ngang_1psu[is.na(hasilku$max_ngang_1psu)] <- 0
cat("--- Verifikasi Join ---\n")
cat("n_PSU NA:", sum(is.na(hasilku$n_PSU)), "\n")
if (sum(is.na(hasilku$n_PSU)) > 0) {
cat("⚠ Join belum sempurna, jalankan fallback gsub...\n")
hasilku <- hasilku %>%
mutate(STAT_key = gsub(" ", "", STAT)) %>%
dplyr::select(-any_of(c("n_sampel_AK","n_pengangguran","n_PSU",
"n_PSU_dgn_ngang","max_ngang_1psu")))
sampel_per_kec2 <- sampel_per_kec %>%
mutate(STAT_key = gsub(" ", "", STAT)) %>%
dplyr::select(-STAT)
dist_ngang_psu2 <- dist_ngang_psu %>%
mutate(STAT_key = gsub(" ", "", STAT)) %>%
dplyr::select(-STAT)
hasilku <- hasilku %>%
left_join(sampel_per_kec2, by = "STAT_key") %>%
left_join(dist_ngang_psu2, by = "STAT_key") %>%
dplyr::select(-STAT_key)
hasilku$n_PSU_dgn_ngang[is.na(hasilku$n_PSU_dgn_ngang)] <- 0
hasilku$max_ngang_1psu[is.na(hasilku$max_ngang_1psu)] <- 0
cat("n_PSU NA setelah fallback:", sum(is.na(hasilku$n_PSU)), "\n")
}
cat("✓ Join selesai. Lanjut ke analisis V1–V6\n\n")--- Verifikasi Join ---
n_PSU NA: 0
✓ Join selesai. Lanjut ke analisis V1–V6
Step 3 — Filter per Kategori
# ================================================================
# STEP 3 — FILTER PER KATEGORI
# ================================================================
v1 <- hasilku %>% filter(STATUS_VALIDITAS == "TIDAK VALID")
v2 <- hasilku %>% filter(STATUS_VALIDITAS == "PERLU CEK")
v3 <- hasilku %>% filter(STATUS_VALIDITAS == "MENCURIGAKAN")
v4 <- hasilku %>% filter(STATUS_VALIDITAS == "PERLU PERHATIAN")
v5 <- hasilku %>% filter(STATUS_VALIDITAS == "TIDAK PRESISI")
v6 <- hasilku %>% filter(STATUS_VALIDITAS == "VALID & PRESISI")Step 4 — Rekapitulasi Validitas
# ================================================================
# STEP 4 — REKAPITULASI
# ================================================================
rekap <- hasilku %>%
group_by(STATUS_VALIDITAS) %>%
summarise(
Jumlah = n(),
Pct = round(n() / nrow(hasilku) * 100, 2),
Rata_TPT = round(safe_mean(TPT), 4),
Rata_RSE = round(safe_mean(RSE[RSE > 0]), 4),
.groups = "drop"
) %>%
arrange(desc(Jumlah))
knitr::kable(
rekap,
caption = "Rekapitulasi Status Validitas Estimasi TPT per Kecamatan",
col.names = c("Status Validitas", "Jumlah", "Pct (%)", "Rata-rata TPT", "Rata-rata RSE"),
digits = 4,
align = c("l","r","r","r","r")
)| Status Validitas | Jumlah | Pct (%) | Rata-rata TPT | Rata-rata RSE |
|---|---|---|---|---|
| TIDAK PRESISI | 74 | 51.39 | 7.5197 | 52.2034 |
| VALID & PRESISI | 32 | 22.22 | 12.2939 | 12.4777 |
| MENCURIGAKAN | 30 | 20.83 | 15.1511 | NA |
| PERLU CEK | 8 | 5.56 | 0.0000 | NA |
Step 5 — Detail per Kategori
V1 — Tidak Valid (SE = Inf, Lonely PSU)
# ================================================================
# [V1] TIDAK VALID — SE = Inf
# ================================================================
cat(LINE)
cat(" [V1] TIDAK VALID — SE = Inf (Lonely PSU dalam Strata)\n")
cat(LINE)
v1 <- hasilku %>% filter(STATUS_VALIDITAS == "TIDAK VALID")
cat(sprintf(" Jumlah kecamatan: %d\n\n", nrow(v1)))
if (nrow(v1) > 0) {
cat(sprintf(" %-14s %-9s %-9s %-9s %-6s %-8s %-12s\n",
"STAT","TPT","SE","DEff","n_PSU","n_AK","Kategori_V1"))
cat(sprintf(" %s\n", strrep("-", 75)))
for (i in 1:min(nrow(v1), 15)) {
kat <- ifelse(v1$n_PSU[i] == 1,
"1 PSU murni",
paste0(v1$n_PSU[i]," PSU-lonely strata"))
cat(sprintf(" %-14s %-9.4f %-9s %-9s %-6d %-8d %-12s\n",
v1$STAT[i], v1$TPT[i],
ifelse(is.infinite(v1$SE[i]), "Inf", sprintf("%.4f", v1$SE[i])),
ifelse(is.infinite(v1$DEff[i]), "Inf", sprintf("%.4f", v1$DEff[i])),
v1$n_PSU[i], v1$n_sampel_AK[i], kat))
}
if (nrow(v1) > 15)
cat(sprintf(" ... dan %d kecamatan lainnya\n", nrow(v1) - 15))
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" ANALISIS DATA:\n\n")
v1_1psu <- v1 %>% filter(n_PSU == 1)
v1_multpsu <- v1 %>% filter(n_PSU > 1)
cat(" Breakdown penyebab SE = Inf:\n")
cat(sprintf(" a) n_PSU = 1 (benar-benar hanya 1 blok sensus) : %d kecamatan\n",
nrow(v1_1psu)))
cat(sprintf(" b) n_PSU > 1 tapi lonely dalam strata : %d kecamatan\n",
nrow(v1_multpsu)))
cat("\n Penjelasan kasus (b):\n")
cat(" Kecamatan memiliki >1 PSU, NAMUN dalam strata-nya\n")
cat(" (kombinasi KODE_PROV × KLASIFIKASI) hanya ada 1 PSU\n")
cat(" yang ter-sampel → formula Taylor tidak bisa hitung\n")
cat(" varians antar PSU dalam strata → SE = Inf.\n\n")
if (nrow(v1_multpsu) > 0) {
cat(" Detail kecamatan multi-PSU tapi SE=Inf:\n")
cat(sprintf(" %-14s %-6s %-8s %-9s\n","STAT","n_PSU","n_AK","TPT"))
cat(sprintf(" %s\n", strrep("-", 42)))
for (i in 1:min(nrow(v1_multpsu), 10)) {
cat(sprintf(" %-14s %-6d %-8d %-9.4f\n",
v1_multpsu$STAT[i], v1_multpsu$n_PSU[i],
v1_multpsu$n_sampel_AK[i], v1_multpsu$TPT[i]))
}
}
tpt_v1 <- v1$TPT[is.finite(v1$TPT)]
cat(sprintf("\n Distribusi TPT kecamatan V1 (n=%d):\n", nrow(v1)))
cat(sprintf(" Min = %.4f%% | Median = %.4f%% | Mean = %.4f%% | Max = %.4f%%\n",
min(tpt_v1), median(tpt_v1), mean(tpt_v1), max(tpt_v1)))
v1_risiko <- v1 %>% filter(TPT > 20) %>% arrange(desc(TPT))
if (nrow(v1_risiko) > 0) {
cat(sprintf("\n ⚠ Kecamatan V1 dengan TPT > 20%% (risiko under-report tinggi):\n"))
cat(sprintf(" %-14s %-9s %-6s %-8s\n","STAT","TPT","n_PSU","n_AK"))
cat(sprintf(" %s\n", strrep("-", 42)))
for (i in 1:nrow(v1_risiko)) {
cat(sprintf(" %-14s %-9.4f %-6d %-8d\n",
v1_risiko$STAT[i], v1_risiko$TPT[i],
v1_risiko$n_PSU[i], v1_risiko$n_sampel_AK[i]))
}
cat(" → Kecamatan ini memiliki estimasi TPT tinggi namun\n")
cat(" tidak dapat disajikan karena masalah desain sampling.\n")
cat(" Potensi UNDER-COVER jika dikecualikan dari publikasi.\n")
}
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" KESIMPULAN [V1]:\n")
cat(sprintf(" ✗ %d kecamatan (%.1f%%) estimasi TIDAK VALID.\n",
nrow(v1), nrow(v1)/nrow(hasilku)*100))
cat(" ✗ Lonely PSU dalam strata → SE = Inf → tidak bisa\n")
cat(" disajikan tanpa perbaikan desain.\n")
cat(" → REKOMENDASI: Gunakan SAE atau collapsed strata.\n\n")
}====================================================================
[V1] TIDAK VALID — SE = Inf (Lonely PSU dalam Strata)
====================================================================
Jumlah kecamatan: 0
V2 — Perlu Cek (TPT = 0 & SE = 0)
# ================================================================
# [V2] PERLU CEK — TPT = 0 & SE = 0
# ================================================================
cat(LINE)
cat(" [V2] PERLU CEK — TPT = 0 & SE = 0\n")
cat(LINE)
v2 <- hasilku %>% filter(STATUS_VALIDITAS == "PERLU CEK")
cat(sprintf(" Jumlah kecamatan: %d\n\n", nrow(v2)))
if (nrow(v2) > 0) {
cat(sprintf(" %-14s %-9s %-9s %-10s %-10s %-8s %-12s\n",
"STAT","TPT","SE","ci_lower","ci_upper","n_AK","n_Nganggur"))
cat(sprintf(" %s\n", strrep("-", 76)))
for (i in 1:min(nrow(v2), 15)) {
cat(sprintf(" %-14s %-9.4f %-9.4f %-10.4f %-10.4f %-8d %-12d\n",
v2$STAT[i], v2$TPT[i], v2$SE[i],
v2$ci_lower[i], v2$ci_upper[i],
v2$n_sampel_AK[i], v2$n_pengangguran[i]))
}
v2_no_ak <- v2 %>% filter(n_sampel_AK == 0)
v2_no_ngang <- v2 %>% filter(n_sampel_AK > 0 & n_pengangguran == 0)
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" ANALISIS DATA:\n\n")
cat(" Breakdown penyebab TPT=0 & SE=0:\n")
cat(sprintf(" a) n_AK = 0 (tidak ada AK di sampel) : %d kecamatan\n",
nrow(v2_no_ak)))
cat(sprintf(" b) n_AK > 0, n_Pengangguran = 0 : %d kecamatan\n",
nrow(v2_no_ngang)))
if (nrow(v2_no_ngang) > 0) {
cat(sprintf("\n Detail kecamatan dengan n_AK>0 & nol pengangguran:\n"))
cat(sprintf(" %-14s %-8s %-14s %-10s\n",
"STAT","n_AK","n_Pengangguran","n_PSU"))
cat(sprintf(" %s\n", strrep("-", 50)))
for (i in 1:nrow(v2_no_ngang)) {
cat(sprintf(" %-14s %-8d %-14d %-10d\n",
v2_no_ngang$STAT[i], v2_no_ngang$n_sampel_AK[i],
v2_no_ngang$n_pengangguran[i], v2_no_ngang$n_PSU[i]))
}
cat(sprintf("\n Ukuran sampel AK: Min=%d | Median=%.0f | Max=%d\n",
min(v2_no_ngang$n_sampel_AK), median(v2_no_ngang$n_sampel_AK),
max(v2_no_ngang$n_sampel_AK)))
v2_besar <- v2_no_ngang %>% filter(n_sampel_AK >= 30)
cat(sprintf(" Kec. dengan n_AK ≥ 30 (nol pengangguran lebih meyakinkan): %d\n",
nrow(v2_besar)))
}
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" KESIMPULAN [V2]:\n")
if (nrow(v2_no_ak) > 0) {
cat(sprintf(" ✗ %d kecamatan tidak ada sampel AK sama sekali\n", nrow(v2_no_ak)))
cat(" → estimasi 0 TIDAK VALID, bukan gambaran populasi nyata.\n")
}
if (nrow(v2_no_ngang) > 0) {
cat(sprintf(" ✓ %d kecamatan punya sampel AK (n_AK > 0) tapi\n", nrow(v2_no_ngang)))
cat(" nol pengangguran teridentifikasi di sampel.\n")
cat(" ✓ Ini BERPOTENSI VALID: mungkin memang tidak ada pengangguran.\n")
n_kuat <- nrow(v2_no_ngang %>% filter(n_sampel_AK >= 30))
n_lemah <- nrow(v2_no_ngang %>% filter(n_sampel_AK < 15))
cat(sprintf(" - %d kec. n_AK ≥ 30 → estimasi nol LEBIH DIPERCAYA\n", n_kuat))
cat(sprintf(" - %d kec. n_AK < 15 → estimasi nol perlu konfirmasi\n", n_lemah))
}
cat(" → REKOMENDASI:\n")
cat(" • Kasus (a) n_AK=0 → masuk kandidat SAE\n")
cat(" • Kasus (b) n_AK besar → bisa disajikan dengan catatan\n")
cat(" 'tidak terdeteksi pengangguran dalam sampel'\n\n")
}====================================================================
[V2] PERLU CEK — TPT = 0 & SE = 0
====================================================================
Jumlah kecamatan: 8
STAT TPT SE ci_lower ci_upper n_AK n_Nganggur
----------------------------------------------------------------------------
36 01 020 0.0000 0.0000 0.0000 0.0000 14 0
36 01 050 0.0000 0.0000 0.0000 0.0000 15 0
36 02 031 0.0000 0.0000 0.0000 0.0000 14 0
36 02 100 0.0000 0.0000 0.0000 0.0000 34 0
36 02 121 0.0000 0.0000 0.0000 0.0000 19 0
36 03 130 0.0000 0.0000 0.0000 0.0000 23 0
36 04 130 0.0000 0.0000 0.0000 0.0000 49 0
36 71 050 0.0000 0.0000 0.0000 0.0000 34 0
--------------------------------------------------------------------
ANALISIS DATA:
Breakdown penyebab TPT=0 & SE=0:
a) n_AK = 0 (tidak ada AK di sampel) : 0 kecamatan
b) n_AK > 0, n_Pengangguran = 0 : 8 kecamatan
Detail kecamatan dengan n_AK>0 & nol pengangguran:
STAT n_AK n_Pengangguran n_PSU
--------------------------------------------------
36 01 020 14 0 1
36 01 050 15 0 1
36 02 031 14 0 1
36 02 100 34 0 2
36 02 121 19 0 1
36 03 130 23 0 1
36 04 130 49 0 3
36 71 050 34 0 2
Ukuran sampel AK: Min=14 | Median=21 | Max=49
Kec. dengan n_AK ≥ 30 (nol pengangguran lebih meyakinkan): 3
--------------------------------------------------------------------
KESIMPULAN [V2]:
✓ 8 kecamatan punya sampel AK (n_AK > 0) tapi
nol pengangguran teridentifikasi di sampel.
✓ Ini BERPOTENSI VALID: mungkin memang tidak ada pengangguran.
- 3 kec. n_AK ≥ 30 → estimasi nol LEBIH DIPERCAYA
- 2 kec. n_AK < 15 → estimasi nol perlu konfirmasi
→ REKOMENDASI:
• Kasus (a) n_AK=0 → masuk kandidat SAE
• Kasus (b) n_AK besar → bisa disajikan dengan catatan
'tidak terdeteksi pengangguran dalam sampel'
V3 — Mencurigakan (TPT > 0 & SE = 0)
# ================================================================
# [V3] MENCURIGAKAN — TPT > 0 & SE = 0
# ================================================================
cat(LINE)
cat(" [V3] MENCURIGAKAN — TPT > 0 & SE = 0\n")
cat(LINE)
v3 <- hasilku %>% filter(STATUS_VALIDITAS == "MENCURIGAKAN")
cat(sprintf(" Jumlah kecamatan: %d\n\n", nrow(v3)))
if (nrow(v3) > 0) {
cat(sprintf(" %-14s %-9s %-9s %-9s %-8s %-12s %-12s %-15s\n",
"STAT","TPT","SE","RSE","n_AK","n_Nganggur",
"n_PSU_ada_ngang","max_ngang_1PSU"))
cat(sprintf(" %s\n", strrep("-", 88)))
for (i in 1:nrow(v3)) {
cat(sprintf(" %-14s %-9.4f %-9.4f %-9.4f %-8d %-12d %-12d %-15d\n",
v3$STAT[i], v3$TPT[i], v3$SE[i], v3$RSE[i],
v3$n_sampel_AK[i], v3$n_pengangguran[i],
v3$n_PSU_dgn_ngang[i], v3$max_ngang_1psu[i]))
}
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" ANALISIS DATA:\n\n")
cat(" Mengapa SE = 0 padahal TPT > 0 dan n_pengangguran > 1?\n\n")
cat(" Penjelasan teknis:\n")
cat(" SE Taylor dihitung dari variasi ANTAR PSU dalam strata.\n")
cat(" SE = 0 terjadi bila semua pengangguran terkonsentrasi\n")
cat(" di PSU yang SAMA → tidak ada variasi antar PSU → SE = 0.\n\n")
cat(" Diagnosis per kecamatan:\n")
cat(sprintf(" %s\n", strrep("-", 68)))
for (i in 1:nrow(v3)) {
cat(sprintf(" Kecamatan: %s\n", v3$STAT[i]))
cat(sprintf(" TPT = %.4f%% | n_AK = %d | n_Pengangguran = %d\n",
v3$TPT[i], v3$n_sampel_AK[i], v3$n_pengangguran[i]))
cat(sprintf(" PSU yang memuat pengangguran = %d dari %d PSU total\n",
v3$n_PSU_dgn_ngang[i], v3$n_PSU[i]))
cat(sprintf(" Maks pengangguran dalam 1 PSU = %d orang\n",
v3$max_ngang_1psu[i]))
if (v3$n_PSU_dgn_ngang[i] == 1 && v3$n_PSU[i] > 1) {
cat(sprintf(" ⚠ DIAGNOSIS: Semua %d pengangguran ada di 1 PSU saja\n",
v3$n_pengangguran[i]))
cat(" → PSU lain semua nol → tidak ada variasi → SE = 0\n")
} else if (v3$n_PSU_dgn_ngang[i] == 1 && v3$n_PSU[i] == 1) {
cat(" ⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung\n")
} else {
cat(" ⚠ DIAGNOSIS: Variasi antar PSU sangat kecil → SE mendekati 0\n")
}
cat(sprintf(" %s\n", strrep("-", 68)))
}
cat(" KESIMPULAN [V3]:\n")
cat(sprintf(" ✗ %d kecamatan punya TPT > 0 tapi SE = 0.\n", nrow(v3)))
cat(" ✗ Semua pengangguran terkonsentrasi di 1 PSU → Taylor SE = 0.\n")
cat(sprintf(" ✗ TPT berkisar %.4f%% – %.4f%%: tampak besar tapi\n",
min(v3$TPT), max(v3$TPT)))
cat(" presisinya sama sekali tidak terukur.\n")
cat(" → REKOMENDASI: Masuk kandidat SAE. TIDAK BOLEH disajikan.\n\n")
}====================================================================
[V3] MENCURIGAKAN — TPT > 0 & SE = 0
====================================================================
Jumlah kecamatan: 30
STAT TPT SE RSE n_AK n_Nganggur n_PSU_ada_ngang max_ngang_1PSU
----------------------------------------------------------------------------------------
36 01 010 6.5541 0.0000 0.0000 17 1 1 1
36 01 031 20.6821 0.0000 0.0000 17 3 1 3
36 01 061 20.7317 0.0000 0.0000 15 3 1 3
36 01 070 8.7767 0.0000 0.0000 24 2 1 2
36 01 072 7.1978 0.0000 0.0000 22 1 1 1
36 01 090 4.6243 0.0000 0.0000 20 1 1 1
36 01 110 8.1115 0.0000 0.0000 18 1 1 1
36 01 111 15.1212 0.0000 0.0000 13 2 1 2
36 01 131 6.0776 0.0000 0.0000 32 2 1 2
36 01 150 22.9045 0.0000 0.0000 17 4 1 4
36 01 161 23.8040 0.0000 0.0000 19 4 1 4
36 01 170 3.7254 0.0000 0.0000 22 1 1 1
36 01 190 3.5866 0.0000 0.0000 20 1 1 1
36 01 192 8.1342 0.0000 0.0000 15 1 1 1
36 02 050 7.3188 0.0000 0.0000 19 1 1 1
36 02 051 6.6883 0.0000 0.0000 21 1 1 1
36 02 080 15.6825 0.0000 0.0000 27 4 1 4
36 02 110 4.1319 0.0000 0.0000 23 1 1 1
36 02 190 13.8823 0.0000 0.0000 17 2 1 2
36 03 060 4.7842 0.0000 0.0000 19 1 1 1
36 03 081 6.9307 0.0000 0.0000 15 1 1 1
36 03 141 45.0711 0.0000 0.0000 10 4 1 4
36 03 151 20.3172 0.0000 0.0000 23 5 1 5
36 03 160 20.7888 0.0000 0.0000 19 4 1 4
36 03 161 19.7013 0.0000 0.0000 16 3 1 3
36 04 010 11.6571 0.0000 0.0000 13 1 1 1
36 04 110 10.6004 0.0000 0.0000 19 2 1 2
36 04 180 30.4661 0.0000 0.0000 14 4 1 4
36 04 251 5.0839 0.0000 0.0000 17 1 1 1
36 04 271 71.3973 0.0000 0.0000 20 14 1 14
--------------------------------------------------------------------
ANALISIS DATA:
Mengapa SE = 0 padahal TPT > 0 dan n_pengangguran > 1?
Penjelasan teknis:
SE Taylor dihitung dari variasi ANTAR PSU dalam strata.
SE = 0 terjadi bila semua pengangguran terkonsentrasi
di PSU yang SAMA → tidak ada variasi antar PSU → SE = 0.
Diagnosis per kecamatan:
--------------------------------------------------------------------
Kecamatan: 36 01 010
TPT = 6.5541% | n_AK = 17 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 031
TPT = 20.6821% | n_AK = 17 | n_Pengangguran = 3
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 3 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 061
TPT = 20.7317% | n_AK = 15 | n_Pengangguran = 3
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 3 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 070
TPT = 8.7767% | n_AK = 24 | n_Pengangguran = 2
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 2 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 072
TPT = 7.1978% | n_AK = 22 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 090
TPT = 4.6243% | n_AK = 20 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 110
TPT = 8.1115% | n_AK = 18 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 111
TPT = 15.1212% | n_AK = 13 | n_Pengangguran = 2
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 2 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 131
TPT = 6.0776% | n_AK = 32 | n_Pengangguran = 2
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 2 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 150
TPT = 22.9045% | n_AK = 17 | n_Pengangguran = 4
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 4 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 161
TPT = 23.8040% | n_AK = 19 | n_Pengangguran = 4
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 4 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 170
TPT = 3.7254% | n_AK = 22 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 190
TPT = 3.5866% | n_AK = 20 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 01 192
TPT = 8.1342% | n_AK = 15 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 02 050
TPT = 7.3188% | n_AK = 19 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 02 051
TPT = 6.6883% | n_AK = 21 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 02 080
TPT = 15.6825% | n_AK = 27 | n_Pengangguran = 4
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 4 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 02 110
TPT = 4.1319% | n_AK = 23 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 02 190
TPT = 13.8823% | n_AK = 17 | n_Pengangguran = 2
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 2 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 03 060
TPT = 4.7842% | n_AK = 19 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 03 081
TPT = 6.9307% | n_AK = 15 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 03 141
TPT = 45.0711% | n_AK = 10 | n_Pengangguran = 4
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 4 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 03 151
TPT = 20.3172% | n_AK = 23 | n_Pengangguran = 5
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 5 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 03 160
TPT = 20.7888% | n_AK = 19 | n_Pengangguran = 4
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 4 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 03 161
TPT = 19.7013% | n_AK = 16 | n_Pengangguran = 3
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 3 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 04 010
TPT = 11.6571% | n_AK = 13 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 04 110
TPT = 10.6004% | n_AK = 19 | n_Pengangguran = 2
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 2 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 04 180
TPT = 30.4661% | n_AK = 14 | n_Pengangguran = 4
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 4 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 04 251
TPT = 5.0839% | n_AK = 17 | n_Pengangguran = 1
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 1 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
Kecamatan: 36 04 271
TPT = 71.3973% | n_AK = 20 | n_Pengangguran = 14
PSU yang memuat pengangguran = 1 dari 1 PSU total
Maks pengangguran dalam 1 PSU = 14 orang
⚠ DIAGNOSIS: Hanya ada 1 PSU total → SE tidak bisa dihitung
--------------------------------------------------------------------
KESIMPULAN [V3]:
✗ 30 kecamatan punya TPT > 0 tapi SE = 0.
✗ Semua pengangguran terkonsentrasi di 1 PSU → Taylor SE = 0.
✗ TPT berkisar 3.5866% – 71.3973%: tampak besar tapi
presisinya sama sekali tidak terukur.
→ REKOMENDASI: Masuk kandidat SAE. TIDAK BOLEH disajikan.
V4 — Perlu Perhatian (CI Lower < 0)
# ================================================================
# [V4] PERLU PERHATIAN — CI Lower < 0
# ================================================================
cat(LINE)
cat(" [V4] PERLU PERHATIAN — CI Lower < 0\n")
cat(LINE)
v4 <- hasilku %>% filter(STATUS_VALIDITAS == "PERLU PERHATIAN")
cat(sprintf(" Jumlah kecamatan: %d\n", nrow(v4)))
if (nrow(v4) == 0) {
cat(" → Tidak ada kecamatan dalam kategori ini.\n")
cat(" ✓ Semua CI lower dari estimasi yang lolos V1-V3 bernilai >= 0.\n\n")
} else {
cat("\n")
}====================================================================
[V4] PERLU PERHATIAN — CI Lower < 0
====================================================================
Jumlah kecamatan: 0
→ Tidak ada kecamatan dalam kategori ini.
✓ Semua CI lower dari estimasi yang lolos V1-V3 bernilai >= 0.
V5 — Tidak Presisi (RSE > 25%)
# ================================================================
# [V5] TIDAK PRESISI — RSE > 25%
# ================================================================
cat(LINE)
cat(" [V5] TIDAK PRESISI — RSE > 25%\n")
cat(LINE)
v5 <- hasilku %>% filter(STATUS_VALIDITAS == "TIDAK PRESISI")
cat(sprintf(" Jumlah kecamatan: %d\n\n", nrow(v5)))
if (nrow(v5) > 0) {
v5_sorted <- v5 %>% arrange(desc(RSE))
cat(sprintf(" %-14s %-9s %-9s %-9s %-9s %-8s\n",
"STAT","TPT","SE","RSE","DEff","n_AK"))
cat(sprintf(" %s\n", strrep("-", 62)))
for (i in 1:min(nrow(v5_sorted), 15)) {
cat(sprintf(" %-14s %-9.4f %-9.4f %-9.4f %-9.4f %-8d\n",
v5_sorted$STAT[i], v5_sorted$TPT[i], v5_sorted$SE[i],
v5_sorted$RSE[i], v5_sorted$DEff[i], v5_sorted$n_sampel_AK[i]))
}
if (nrow(v5) > 15)
cat(sprintf(" ... dan %d kecamatan lainnya\n", nrow(v5) - 15))
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" ANALISIS DATA:\n\n")
z1 <- sum(v5$RSE > 25 & v5$RSE <= 35)
z2 <- sum(v5$RSE > 35 & v5$RSE <= 50)
z3 <- sum(v5$RSE > 50)
cat(" Distribusi zona RSE:\n")
cat(sprintf(" 25%% < RSE ≤ 35%% : %2d kecamatan — tidak terlalu buruk\n",z1))
cat(sprintf(" 35%% < RSE ≤ 50%% : %2d kecamatan — buruk\n", z2))
cat(sprintf(" RSE > 50%% : %2d kecamatan — sangat buruk\n\n", z3))
grp1 <- v5$RSE[v5$n_sampel_AK < 10]
grp2 <- v5$RSE[v5$n_sampel_AK >= 10 & v5$n_sampel_AK <= 30]
grp3 <- v5$RSE[v5$n_sampel_AK > 30]
cat(" Hubungan ukuran sampel AK vs RSE:\n")
cat(sprintf(" n_AK < 10 (n=%2d) → median RSE = %s\n",
length(grp1), ifelse(length(grp1)==0, "N/A (tidak ada kec.)",
sprintf("%.2f%%", median(grp1)))))
cat(sprintf(" n_AK 10–30 (n=%2d) → median RSE = %s\n",
length(grp2), ifelse(length(grp2)==0, "N/A (tidak ada kec.)",
sprintf("%.2f%%", median(grp2)))))
cat(sprintf(" n_AK > 30 (n=%2d) → median RSE = %s\n",
length(grp3), ifelse(length(grp3)==0, "N/A (tidak ada kec.)",
sprintf("%.2f%%", median(grp3)))))
cat(" → Semakin kecil n_AK, semakin besar RSE (konfirmasi teori).\n\n")
cat(" Hubungan nilai TPT vs RSE:\n")
t1 <- v5$RSE[v5$TPT < 5]
t2 <- v5$RSE[v5$TPT >= 5 & v5$TPT < 10]
t3 <- v5$RSE[v5$TPT >= 10]
cat(sprintf(" TPT < 5%% (n=%2d) → median RSE = %s\n",
length(t1), ifelse(length(t1)==0,"N/A",sprintf("%.2f%%",median(t1)))))
cat(sprintf(" TPT 5–10%% (n=%2d) → median RSE = %s\n",
length(t2), ifelse(length(t2)==0,"N/A",sprintf("%.2f%%",median(t2)))))
cat(sprintf(" TPT ≥ 10%% (n=%2d) → median RSE = %s\n",
length(t3), ifelse(length(t3)==0,"N/A",sprintf("%.2f%%",median(t3)))))
cat(" → Kecamatan dengan TPT rendah cenderung RSE lebih tinggi\n")
cat(" karena pembilang (SE) relatif besar terhadap TPT kecil.\n\n")
v5_deff <- v5 %>% filter(DEff > 2)
cat(sprintf(" Kecamatan DEff > 2 (efisiensi desain rendah): %d\n",
nrow(v5_deff)))
if (nrow(v5_deff) > 0) {
for (i in 1:nrow(v5_deff)) {
cat(sprintf(" %-14s → DEff=%.4f | RSE=%.2f%% | n_AK=%d\n",
v5_deff$STAT[i], v5_deff$DEff[i],
v5_deff$RSE[i], v5_deff$n_sampel_AK[i]))
}
}
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" KESIMPULAN [V5]:\n")
cat(sprintf(" ✗ %d kecamatan (%.1f%%) RSE > 25%% → tidak presisi (standar BPS).\n",
nrow(v5), nrow(v5)/nrow(hasilku)*100))
cat(sprintf(" ✗ %d kec. RSE > 50%% (sangat buruk) → kandidat kuat SAE.\n", z3))
cat(sprintf(" ✗ %d kec. RSE 25–35%% → bisa disajikan dengan catatan.\n", z1))
cat(" → REKOMENDASI:\n")
cat(" • RSE 25–35%% → beri tanda '*' di tabel publikasi\n")
cat(" • RSE 35–50%% → beri tanda '**', pertimbangkan SAE\n")
cat(" • RSE > 50%% → wajib SAE, jangan disajikan langsung\n\n")
}====================================================================
[V5] TIDAK PRESISI — RSE > 25%
====================================================================
Jumlah kecamatan: 74
STAT TPT SE RSE DEff n_AK
--------------------------------------------------------------
36 02 010 1.8908 2.0497 108.4039 0.5296 23
36 02 030 2.4292 2.5707 105.8250 1.2052 38
36 03 190 4.0383 4.0920 101.3298 1.4209 31
36 02 181 6.1033 5.9916 98.1698 2.9343 46
36 01 172 2.3917 2.2300 93.2391 0.6569 32
36 74 010 2.3335 2.0661 88.5408 0.6011 30
36 04 210 4.6275 3.9134 84.5683 1.3285 42
36 02 021 6.9911 5.9044 84.4560 1.6840 32
36 01 160 6.6973 5.5426 82.7587 1.4180 32
36 03 121 5.2902 4.3203 81.6661 2.8577 75
36 72 030 3.1559 2.5770 81.6566 1.6153 73
36 01 120 2.9048 2.3499 80.8971 1.3422 69
36 02 011 10.2869 8.1100 78.8381 1.5281 22
36 01 071 4.8308 3.8047 78.7592 1.0741 35
36 02 020 2.8631 2.2351 78.0657 0.5929 34
... dan 59 kecamatan lainnya
--------------------------------------------------------------------
ANALISIS DATA:
Distribusi zona RSE:
25% < RSE ≤ 35% : 17 kecamatan — tidak terlalu buruk
35% < RSE ≤ 50% : 25 kecamatan — buruk
RSE > 50% : 32 kecamatan — sangat buruk
Hubungan ukuran sampel AK vs RSE:
n_AK < 10 (n= 0) → median RSE = N/A (tidak ada kec.)
n_AK 10–30 (n=10) → median RSE = 66.25%
n_AK > 30 (n=64) → median RSE = 41.98%
→ Semakin kecil n_AK, semakin besar RSE (konfirmasi teori).
Hubungan nilai TPT vs RSE:
TPT < 5% (n=29) → median RSE = 64.27%
TPT 5–10% (n=28) → median RSE = 39.00%
TPT ≥ 10% (n=17) → median RSE = 35.25%
→ Kecamatan dengan TPT rendah cenderung RSE lebih tinggi
karena pembilang (SE) relatif besar terhadap TPT kecil.
Kecamatan DEff > 2 (efisiensi desain rendah): 7
36 01 181 → DEff=2.3491 | RSE=45.54% | n_AK=43
36 02 180 → DEff=3.5921 | RSE=50.69% | n_AK=103
36 02 181 → DEff=2.9343 | RSE=98.17% | n_AK=46
36 03 121 → DEff=2.8577 | RSE=81.67% | n_AK=75
36 72 021 → DEff=4.5873 | RSE=64.69% | n_AK=109
36 73 010 → DEff=2.7291 | RSE=35.53% | n_AK=78
36 73 020 → DEff=2.1592 | RSE=54.63% | n_AK=74
--------------------------------------------------------------------
KESIMPULAN [V5]:
✗ 74 kecamatan (51.4%) RSE > 25% → tidak presisi (standar BPS).
✗ 32 kec. RSE > 50% (sangat buruk) → kandidat kuat SAE.
✗ 17 kec. RSE 25–35% → bisa disajikan dengan catatan.
→ REKOMENDASI:
• RSE 25–35%% → beri tanda '*' di tabel publikasi
• RSE 35–50%% → beri tanda '**', pertimbangkan SAE
• RSE > 50%% → wajib SAE, jangan disajikan langsung
V6 — Valid & Presisi (RSE ≤ 25%)
# ================================================================
# [V6] VALID & PRESISI
# ================================================================
cat(LINE)
cat(" [V6] VALID & PRESISI — RSE <= 25%, SE > 0, CI >= 0\n")
cat(LINE)
v6 <- hasilku %>% filter(STATUS_VALIDITAS == "VALID & PRESISI")
cat(sprintf(" Jumlah kecamatan: %d\n\n", nrow(v6)))
if (nrow(v6) > 0) {
v6_sorted <- v6 %>% arrange(RSE)
cat(sprintf(" %-14s %-9s %-9s %-10s %-10s %-9s %-8s\n",
"STAT","TPT","SE","ci_lower","ci_upper","DEff","RSE"))
cat(sprintf(" %s\n", strrep("-", 74)))
for (i in 1:min(nrow(v6_sorted), 20)) {
deff_flag <- ifelse(v6_sorted$DEff[i] < 0.01, " ⚠", " ")
cat(sprintf(" %-14s %-9.4f %-9.4f %-10.4f %-10.4f %-9.4f %-8.4f%s\n",
v6_sorted$STAT[i], v6_sorted$TPT[i], v6_sorted$SE[i],
v6_sorted$ci_lower[i], v6_sorted$ci_upper[i],
v6_sorted$DEff[i], v6_sorted$RSE[i], deff_flag))
}
if (nrow(v6) > 20)
cat(sprintf(" ... dan %d kecamatan lainnya\n", nrow(v6) - 20))
cat(" (⚠ = DEff < 0.01, nilai mencurigakan, perlu verifikasi)\n")
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" ANALISIS DATA:\n\n")
v6_deff_anon <- v6 %>% filter(DEff < 0.01)
v6_deff_ok <- v6 %>% filter(DEff >= 0.01)
cat(sprintf(" ⚠ PERHATIAN — DEff Anomali:\n"))
cat(sprintf(" DEff < 0.01 (sangat mencurigakan): %d kecamatan\n",
nrow(v6_deff_anon)))
cat(sprintf(" DEff normal (0.01–3): %d kecamatan\n",
nrow(v6_deff_ok)))
if (nrow(v6_deff_anon) > 0) {
cat("\n DEff mendekati 0 berarti desain survey jauh LEBIH EFISIEN\n")
cat(" dari SRS — ini tidak biasa untuk SAKERNAS bertingkat.\n")
cat(" Kemungkinan penyebab:\n")
cat(" • Hanya 1 PSU per strata (undercount variasi)\n")
cat(" • Bobot sampling sangat homogen di kecamatan ini\n")
cat(" • Kemungkinan bug pada argumen deff= di svyby()\n")
cat(" → Verifikasi DEff kecamatan ini sebelum publikasi.\n\n")
cat(" Daftar kecamatan DEff anomali:\n")
cat(sprintf(" %-14s %-9s %-9s %-9s\n","STAT","TPT","RSE","DEff"))
cat(sprintf(" %s\n", strrep("-", 45)))
for (i in 1:nrow(v6_deff_anon)) {
cat(sprintf(" %-14s %-9.4f %-9.4f %-9.4f\n",
v6_deff_anon$STAT[i], v6_deff_anon$TPT[i],
v6_deff_anon$RSE[i], v6_deff_anon$DEff[i]))
}
}
cat(sprintf("\n Distribusi zona presisi:\n"))
cat(sprintf(" RSE ≤ 10%% : %2d kecamatan — sangat presisi\n",
sum(v6$RSE <= 10)))
cat(sprintf(" 10%% < RSE ≤ 15%% : %2d kecamatan — presisi baik\n",
sum(v6$RSE > 10 & v6$RSE <= 15)))
cat(sprintf(" 15%% < RSE ≤ 25%% : %2d kecamatan — presisi cukup\n",
sum(v6$RSE > 15 & v6$RSE <= 25)))
cat(sprintf("\n TPT tertinggi (valid):\n"))
v6_top <- v6 %>% arrange(desc(TPT)) %>% slice(1:5)
for (i in 1:nrow(v6_top)) {
cat(sprintf(" %-14s → TPT=%.4f%% | RSE=%.4f%% | DEff=%.4f\n",
v6_top$STAT[i], v6_top$TPT[i], v6_top$RSE[i], v6_top$DEff[i]))
}
cat(sprintf("\n %s\n", strrep("-", 68)))
cat(" KESIMPULAN [V6]:\n")
cat(sprintf(" ✓ %d kecamatan (%.1f%%) valid & presisi → siap disajikan.\n",
nrow(v6), nrow(v6)/nrow(hasilku)*100))
cat(sprintf(" ✓ Rata-rata TPT=%.4f%% | Rata-rata RSE=%.4f%%\n",
mean(v6$TPT), mean(v6$RSE)))
if (nrow(v6_deff_anon) > 0) {
cat(sprintf(" ⚠ %d kecamatan perlu verifikasi DEff (nilai < 0.01)\n",
nrow(v6_deff_anon)))
cat(" sebelum final dipublikasikan.\n")
}
cat(" → REKOMENDASI: Siap disajikan. Cek DEff anomali dulu.\n\n")
}====================================================================
[V6] VALID & PRESISI — RSE <= 25%, SE > 0, CI >= 0
====================================================================
Jumlah kecamatan: 32
STAT TPT SE ci_lower ci_upper DEff RSE
--------------------------------------------------------------------------
36 03 180 16.8738 0.3548 16.1784 17.5692 0.0033 2.1027 ⚠
36 03 132 6.0207 0.1669 5.6936 6.3479 0.0015 2.7721 ⚠
36 02 191 26.1574 0.8056 24.5785 27.7362 0.0106 3.0798
36 01 121 13.8513 0.4814 12.9078 14.7948 0.0067 3.4755 ⚠
36 03 020 8.0323 0.2839 7.4757 8.5888 0.0034 3.5345 ⚠
36 03 131 13.0150 0.4951 12.0447 13.9853 0.0112 3.8041
36 71 060 11.4893 0.4406 10.6259 12.3528 0.0100 3.8349
36 04 100 16.8264 0.6900 15.4740 18.1789 0.0110 4.1007
36 03 150 9.2963 0.4423 8.4294 10.1632 0.0089 4.7578 ⚠
36 03 162 11.8130 0.7882 10.2681 13.3579 0.0209 6.6723
36 04 190 4.7508 0.3616 4.0421 5.4595 0.0118 7.6113
36 04 060 6.7039 0.5547 5.6167 7.7912 0.0144 8.2743
36 01 171 4.2156 0.3529 3.5241 4.9072 0.0093 8.3713 ⚠
36 04 200 5.2264 0.4687 4.3078 6.1450 0.0153 8.9679
36 04 211 23.1538 2.0970 19.0437 27.2639 0.0627 9.0568
36 72 040 13.8006 1.4124 11.0324 16.5689 0.2756 10.2343
36 04 050 8.4670 1.2957 5.9275 11.0065 0.0745 15.3029
36 03 210 13.5826 2.1245 9.4186 17.7465 0.2958 15.6413
36 71 020 7.0938 1.1372 4.8648 9.3228 0.2432 16.0309
36 02 091 10.5067 1.7240 7.1277 13.8856 0.0964 16.4086
... dan 12 kecamatan lainnya
(⚠ = DEff < 0.01, nilai mencurigakan, perlu verifikasi)
--------------------------------------------------------------------
ANALISIS DATA:
⚠ PERHATIAN — DEff Anomali:
DEff < 0.01 (sangat mencurigakan): 6 kecamatan
DEff normal (0.01–3): 26 kecamatan
DEff mendekati 0 berarti desain survey jauh LEBIH EFISIEN
dari SRS — ini tidak biasa untuk SAKERNAS bertingkat.
Kemungkinan penyebab:
• Hanya 1 PSU per strata (undercount variasi)
• Bobot sampling sangat homogen di kecamatan ini
• Kemungkinan bug pada argumen deff= di svyby()
→ Verifikasi DEff kecamatan ini sebelum publikasi.
Daftar kecamatan DEff anomali:
STAT TPT RSE DEff
---------------------------------------------
36 01 121 13.8513 3.4755 0.0067
36 01 171 4.2156 8.3713 0.0093
36 03 020 8.0323 3.5345 0.0034
36 03 132 6.0207 2.7721 0.0015
36 03 150 9.2963 4.7578 0.0089
36 03 180 16.8738 2.1027 0.0033
Distribusi zona presisi:
RSE ≤ 10% : 15 kecamatan — sangat presisi
10% < RSE ≤ 15% : 1 kecamatan — presisi baik
15% < RSE ≤ 25% : 16 kecamatan — presisi cukup
TPT tertinggi (valid):
36 01 180 → TPT=27.5572% | RSE=16.6134% | DEff=0.3376
36 02 191 → TPT=26.1574% | RSE=3.0798% | DEff=0.0106
36 04 211 → TPT=23.1538% | RSE=9.0568% | DEff=0.0627
36 03 140 → TPT=22.1049% | RSE=21.7196% | DEff=0.3068
36 01 101 → TPT=20.2784% | RSE=22.7641% | DEff=0.3519
--------------------------------------------------------------------
KESIMPULAN [V6]:
✓ 32 kecamatan (22.2%) valid & presisi → siap disajikan.
✓ Rata-rata TPT=12.2939% | Rata-rata RSE=12.4777%
⚠ 6 kecamatan perlu verifikasi DEff (nilai < 0.01)
sebelum final dipublikasikan.
→ REKOMENDASI: Siap disajikan. Cek DEff anomali dulu.
Ringkasan Akhir & Matriks Tindak Lanjut
# ================================================================
# RINGKASAN AKHIR
# ================================================================
total <- nrow(hasilku)
n_v1 <- nrow(v1); n_v2 <- nrow(v2); n_v3 <- nrow(v3)
n_v4 <- nrow(v4); n_v5 <- nrow(v5); n_v6 <- nrow(v6)
n_sae <- n_v1 + n_v3 + sum(v5$RSE > 50)
n_catatan <- n_v2 + n_v4 + sum(v5$RSE > 25 & v5$RSE <= 50)
# Tabel matriks tindak lanjut
matriks <- data.frame(
Kode = c("V1","V2","V3","V4","V5a","V5b","V5c","V6"),
Status = c("Tidak Valid","Perlu Cek","Mencurigakan","Perlu Perhatian",
"Tidak Presisi (RSE 25-35%)","Tidak Presisi (RSE 35-50%)",
"Sangat Tdk Presisi (RSE>50%)","Valid & Presisi"),
Jumlah = c(n_v1, n_v2, n_v3, n_v4,
sum(v5$RSE>25 & v5$RSE<=35),
sum(v5$RSE>35 & v5$RSE<=50),
sum(v5$RSE>50), n_v6),
Pct = round(c(n_v1, n_v2, n_v3, n_v4,
sum(v5$RSE>25 & v5$RSE<=35),
sum(v5$RSE>35 & v5$RSE<=50),
sum(v5$RSE>50), n_v6) / total * 100, 1),
Tindak_Lanjut = c(
"SAE / collapsed strata",
"Validasi raw data SAKERNAS",
"SAE (wajib)",
"CI logit / sajikan tanpa CI",
"Sajikan + catatan '*'",
"Sajikan + catatan '**'",
"Wajib SAE",
"✓ Siap disajikan langsung"
)
)
knitr::kable(
matriks,
caption = "Matriks Tindak Lanjut per Kategori Validitas",
col.names = c("Kode","Status Validitas","Jumlah","Pct (%)","Tindak Lanjut"),
align = c("c","l","r","r","l")
)| Kode | Status Validitas | Jumlah | Pct (%) | Tindak Lanjut |
|---|---|---|---|---|
| V1 | Tidak Valid | 0 | 0.0 | SAE / collapsed strata |
| V2 | Perlu Cek | 8 | 5.6 | Validasi raw data SAKERNAS |
| V3 | Mencurigakan | 30 | 20.8 | SAE (wajib) |
| V4 | Perlu Perhatian | 0 | 0.0 | CI logit / sajikan tanpa CI |
| V5a | Tidak Presisi (RSE 25-35%) | 17 | 11.8 | Sajikan + catatan ’*’ |
| V5b | Tidak Presisi (RSE 35-50%) | 25 | 17.4 | Sajikan + catatan ’**’ |
| V5c | Sangat Tdk Presisi (RSE>50%) | 32 | 22.2 | Wajib SAE |
| V6 | Valid & Presisi | 32 | 22.2 | ✓ Siap disajikan langsung |
Export Hasil Akhir
# ================================================================
# EXPORT KE EXCEL
# ================================================================
write_xlsx(hasilku, "hasilku_validitas.xlsx", col_names = TRUE)
cat(LINE)
cat(sprintf(" File 'hasilku_validitas.xlsx' tersimpan.\n"))
cat(sprintf(" Kolom baru: STATUS_VALIDITAS, ALASAN, SKOR_VALID,\n"))
cat(sprintf(" n_sampel_AK, n_pengangguran, n_PSU,\n"))
cat(sprintf(" n_PSU_dgn_ngang, max_ngang_1psu\n"))
cat(LINE)====================================================================
File 'hasilku_validitas.xlsx' tersimpan.
Kolom baru: STATUS_VALIDITAS, ALASAN, SKOR_VALID,
n_sampel_AK, n_pengangguran, n_PSU,
n_PSU_dgn_ngang, max_ngang_1psu
====================================================================