saePendugaan Area Kecil (Small Area Estimation / SAE) adalah metode statistik yang digunakan untuk menghasilkan estimasi yang andal pada level domain kecil (seperti kabupaten, kecamatan, atau subpopulasi tertentu) ketika ukuran sampel di domain tersebut terlalu kecil untuk menghasilkan estimasi langsung yang presisi.
Dua model utama yang dibahas dalam analisis ini adalah:
pkgs <- c("sae", "dplyr", "ggplot2", "knitr", "kableExtra", "rmdformats")
for (p in pkgs) {
if (!require(p, character.only = TRUE, quietly = TRUE))
install.packages(p)
}
library(sae)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)milk — Model Fay-Herriot (Area Level)milkDataset milk tersedia di dalam package sae
dan merupakan data area-level.
#> 'data.frame': 43 obs. of 6 variables:
#> $ SmallArea: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ ni : int 191 633 597 221 195 191 183 188 204 188 ...
#> $ yi : num 1.099 1.075 1.105 0.628 0.753 ...
#> $ SD : num 0.163 0.08 0.083 0.109 0.119 0.141 0.202 0.127 0.168 0.178 ...
#> $ CV : num 0.148 0.074 0.075 0.174 0.158 0.144 0.161 0.116 0.12 0.131 ...
#> $ MajorArea: int 1 1 1 1 1 1 1 2 2 2 ...
#> SmallArea ni yi SD
#> Min. : 1.0 Min. : 95.0 Min. :0.4400 Min. :0.0670
#> 1st Qu.:11.5 1st Qu.:189.5 1st Qu.:0.7645 1st Qu.:0.1075
#> Median :22.0 Median :207.0 Median :0.9520 Median :0.1290
#> Mean :22.0 Mean :236.0 Mean :0.9695 Mean :0.1387
#> 3rd Qu.:32.5 3rd Qu.:234.0 3rd Qu.:1.1880 3rd Qu.:0.1630
#> Max. :43.0 Max. :633.0 Max. :1.4600 Max. :0.2590
#> CV MajorArea
#> Min. :0.0740 Min. :1.00
#> 1st Qu.:0.1225 1st Qu.:2.00
#> Median :0.1400 Median :3.00
#> Mean :0.1483 Mean :2.93
#> 3rd Qu.:0.1605 3rd Qu.:4.00
#> Max. :0.3410 Max. :4.00
kable(head(milk, 10),
caption = "10 Baris Pertama Dataset milk",
digits = 4,
align = "c") %>%
kb_simple()| SmallArea | ni | yi | SD | CV | MajorArea |
|---|---|---|---|---|---|
| 1 | 191 | 1.099 | 0.163 | 0.148 | 1 |
| 2 | 633 | 1.075 | 0.080 | 0.074 | 1 |
| 3 | 597 | 1.105 | 0.083 | 0.075 | 1 |
| 4 | 221 | 0.628 | 0.109 | 0.174 | 1 |
| 5 | 195 | 0.753 | 0.119 | 0.158 | 1 |
| 6 | 191 | 0.981 | 0.141 | 0.144 | 1 |
| 7 | 183 | 1.257 | 0.202 | 0.161 | 1 |
| 8 | 188 | 1.095 | 0.127 | 0.116 | 2 |
| 9 | 204 | 1.405 | 0.168 | 0.120 | 2 |
| 10 | 188 | 1.356 | 0.178 | 0.131 | 2 |
| Variabel | Keterangan |
|---|---|
SmallArea |
Kode area inferensi (area kecil) |
ni |
Ukuran sampel di setiap area |
yi |
Rata-rata pengeluaran susu segar tahun 1989 (estimasi langsung) |
SD |
Estimasi standar deviasi dari yi |
CV |
Koefisien variasi estimasi yi |
MajorArea |
Pengelompokan major area (You & Chapman, 2006) |
Catatan: Dataset ini terdiri dari 43 observasi (area kecil) dengan 6 variabel. Variabel
MajorAreamembagi 43 area ke dalam 4 major area yang memiliki estimasi langsung serupa dan menghasilkan pengurangan CV yang besar ketika menggunakan model FH.
ggplot(milk, aes(x = factor(MajorArea), y = yi, fill = factor(MajorArea))) +
geom_boxplot(alpha = 0.7, outlier.colour = "red", outlier.shape = 16) +
geom_jitter(width = 0.15, alpha = 0.5, size = 2) +
scale_fill_brewer(palette = "Set2", name = "Major Area") +
labs(
title = "Distribusi Estimasi Langsung (yi) per Major Area",
subtitle = "Dataset milk — 43 Small Areas",
x = "Major Area",
y = "Pengeluaran Susu Segar (yi)"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")ggplot(milk, aes(x = reorder(SmallArea, CV), y = CV, fill = factor(MajorArea))) +
geom_col(alpha = 0.85) +
scale_fill_brewer(palette = "Set1", name = "Major Area") +
labs(
title = "Koefisien Variasi (CV) Estimasi Langsung per Small Area",
subtitle = "CV tinggi menunjukkan estimasi langsung kurang presisi",
x = "Small Area",
y = "CV (%)"
) +
theme_minimal(base_size = 11) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")eblupFH)Model Fay-Herriot diasumsikan sebagai berikut:
\[y_i = \mathbf{x}_i^\top \boldsymbol{\beta} + v_i + e_i\]
di mana:
MajorArea)REML (Restricted Maximum Likelihood) adalah metode default untuk estimasi \(\sigma_v^2\).
#> $eblup
#> [,1]
#> 1 1.0219703
#> 2 1.0476018
#> 3 1.0679513
#> 4 0.7608170
#> 5 0.8461574
#> 6 0.9743727
#> 7 1.0584523
#> 8 1.0977762
#> 9 1.2215449
#> 10 1.1951455
#> 11 0.7852155
#> 12 1.2139456
#> 13 1.2096593
#> 14 0.9834967
#> 15 1.1864247
#> 16 1.1556982
#> 17 1.2263411
#> 18 1.2856486
#> 19 1.2363247
#> 20 1.2349600
#> 21 1.0903019
#> 22 1.1923057
#> 23 1.1216470
#> 24 1.2230296
#> 25 1.1938054
#> 26 0.7627195
#> 27 0.7649550
#> 28 0.7338443
#> 29 0.7699294
#> 30 0.6134418
#> 31 0.7695558
#> 32 0.7958250
#> 33 0.7723187
#> 34 0.6102302
#> 35 0.7001782
#> 36 0.7592787
#> 37 0.5298867
#> 38 0.7434466
#> 39 0.7548996
#> 40 0.7701918
#> 41 0.7481164
#> 42 0.8040773
#> 43 0.6810870
#>
#> $fit
#> $fit$method
#> [1] "REML"
#>
#> $fit$convergence
#> [1] TRUE
#>
#> $fit$iterations
#> [1] 4
#>
#> $fit$estcoef
#> beta std.error tvalue pvalue
#> (Intercept) 0.9681890 0.06936208 13.958476 2.793443e-44
#> as.factor(MajorArea)2 0.1327801 0.10300072 1.289119 1.973569e-01
#> as.factor(MajorArea)3 0.2269462 0.09232981 2.457995 1.397151e-02
#> as.factor(MajorArea)4 -0.2413011 0.08161707 -2.956503 3.111496e-03
#>
#> $fit$refvar
#> [1] 0.01855022
#>
#> $fit$goodness
#> loglike AIC BIC KIC
#> 12.677478 -15.354956 -6.548956 -10.354956
df_reml <- data.frame(
SmallArea = milk$SmallArea,
MajorArea = milk$MajorArea,
yi_direct = milk$yi,
eblup_REML = resultREML$eblup
)
kable(df_reml,
caption = "EBLUP Fay-Herriot — Metode REML",
digits = 4, align = "c") %>%
kb_html(height = "350px")| SmallArea | MajorArea | yi_direct | eblup_REML |
|---|---|---|---|
| 1 | 1 | 1.099 | 1.0220 |
| 2 | 1 | 1.075 | 1.0476 |
| 3 | 1 | 1.105 | 1.0680 |
| 4 | 1 | 0.628 | 0.7608 |
| 5 | 1 | 0.753 | 0.8462 |
| 6 | 1 | 0.981 | 0.9744 |
| 7 | 1 | 1.257 | 1.0585 |
| 8 | 2 | 1.095 | 1.0978 |
| 9 | 2 | 1.405 | 1.2215 |
| 10 | 2 | 1.356 | 1.1951 |
| 11 | 2 | 0.615 | 0.7852 |
| 12 | 2 | 1.460 | 1.2139 |
| 13 | 2 | 1.338 | 1.2097 |
| 14 | 2 | 0.854 | 0.9835 |
| 15 | 3 | 1.176 | 1.1864 |
| 16 | 3 | 1.111 | 1.1557 |
| 17 | 3 | 1.257 | 1.2263 |
| 18 | 3 | 1.430 | 1.2856 |
| 19 | 3 | 1.278 | 1.2363 |
| 20 | 3 | 1.292 | 1.2350 |
| 21 | 3 | 1.002 | 1.0903 |
| 22 | 3 | 1.183 | 1.1923 |
| 23 | 3 | 1.044 | 1.1216 |
| 24 | 3 | 1.267 | 1.2230 |
| 25 | 3 | 1.193 | 1.1938 |
| 26 | 4 | 0.791 | 0.7627 |
| 27 | 4 | 0.795 | 0.7650 |
| 28 | 4 | 0.759 | 0.7338 |
| 29 | 4 | 0.796 | 0.7699 |
| 30 | 4 | 0.565 | 0.6134 |
| 31 | 4 | 0.886 | 0.7696 |
| 32 | 4 | 0.952 | 0.7958 |
| 33 | 4 | 0.807 | 0.7723 |
| 34 | 4 | 0.582 | 0.6102 |
| 35 | 4 | 0.684 | 0.7002 |
| 36 | 4 | 0.787 | 0.7593 |
| 37 | 4 | 0.440 | 0.5299 |
| 38 | 4 | 0.759 | 0.7434 |
| 39 | 4 | 0.770 | 0.7549 |
| 40 | 4 | 0.800 | 0.7702 |
| 41 | 4 | 0.756 | 0.7481 |
| 42 | 4 | 0.865 | 0.8041 |
| 43 | 4 | 0.640 | 0.6811 |
#> $eblup
#> [,1]
#> 1 1.0179759
#> 2 1.0449639
#> 3 1.0644808
#> 4 0.7706920
#> 5 0.8525124
#> 6 0.9738262
#> 7 1.0508569
#> 8 1.0961652
#> 9 1.2105053
#> 10 1.1856404
#> 11 0.7975687
#> 12 1.2021499
#> 13 1.2004587
#> 14 0.9889713
#> 15 1.1867450
#> 16 1.1579920
#> 17 1.2242232
#> 18 1.2786804
#> 19 1.2335659
#> 20 1.2318601
#> 21 1.0959551
#> 22 1.1922126
#> 23 1.1259973
#> 24 1.2206948
#> 25 1.1936875
#> 26 0.7602435
#> 27 0.7623581
#> 28 0.7322880
#> 29 0.7674591
#> 30 0.6173102
#> 31 0.7649969
#> 32 0.7893148
#> 33 0.7693760
#> 34 0.6128615
#> 35 0.7009616
#> 36 0.7568908
#> 37 0.5371932
#> 38 0.7418816
#> 39 0.7532513
#> 40 0.7675187
#> 41 0.7470595
#> 42 0.7993630
#> 43 0.6831609
#>
#> $fit
#> $fit$method
#> [1] "FH"
#>
#> $fit$convergence
#> [1] TRUE
#>
#> $fit$iterations
#> [1] 3
#>
#> $fit$estcoef
#> beta std.error tvalue pvalue
#> (Intercept) 0.9679012 0.06695896 14.455139 2.326581e-47
#> as.factor(MajorArea)2 0.1294502 0.09980334 1.297053 1.946131e-01
#> as.factor(MajorArea)3 0.2267910 0.08941191 2.536474 1.119750e-02
#> as.factor(MajorArea)4 -0.2421518 0.07877987 -3.073778 2.113670e-03
#>
#> $fit$refvar
#> [1] 0.01642027
#>
#> $fit$goodness
#> loglike AIC BIC KIC
#> 12.76205 -15.52410 -6.71810 -10.52410
df_compare <- data.frame(
SmallArea = milk$SmallArea,
MajorArea = milk$MajorArea,
Direct = round(milk$yi, 4),
EBLUP_REML = round(resultREML$eblup, 4),
EBLUP_FH = round(resultFH$eblup, 4),
Shrinkage = round(1 - resultREML$eblup / milk$yi, 4)
)
kable(df_compare,
caption = "Perbandingan Estimasi Langsung vs EBLUP (REML & FH)",
digits = 4, align = "c") %>%
kb_html(height = "400px")| SmallArea | MajorArea | Direct | EBLUP_REML | EBLUP_FH | Shrinkage |
|---|---|---|---|---|---|
| 1 | 1 | 1.099 | 1.0220 | 1.0180 | 0.0701 |
| 2 | 1 | 1.075 | 1.0476 | 1.0450 | 0.0255 |
| 3 | 1 | 1.105 | 1.0680 | 1.0645 | 0.0335 |
| 4 | 1 | 0.628 | 0.7608 | 0.7707 | -0.2115 |
| 5 | 1 | 0.753 | 0.8462 | 0.8525 | -0.1237 |
| 6 | 1 | 0.981 | 0.9744 | 0.9738 | 0.0068 |
| 7 | 1 | 1.257 | 1.0585 | 1.0509 | 0.1580 |
| 8 | 2 | 1.095 | 1.0978 | 1.0962 | -0.0025 |
| 9 | 2 | 1.405 | 1.2215 | 1.2105 | 0.1306 |
| 10 | 2 | 1.356 | 1.1951 | 1.1856 | 0.1186 |
| 11 | 2 | 0.615 | 0.7852 | 0.7976 | -0.2768 |
| 12 | 2 | 1.460 | 1.2139 | 1.2021 | 0.1685 |
| 13 | 2 | 1.338 | 1.2097 | 1.2005 | 0.0959 |
| 14 | 2 | 0.854 | 0.9835 | 0.9890 | -0.1516 |
| 15 | 3 | 1.176 | 1.1864 | 1.1867 | -0.0089 |
| 16 | 3 | 1.111 | 1.1557 | 1.1580 | -0.0402 |
| 17 | 3 | 1.257 | 1.2263 | 1.2242 | 0.0244 |
| 18 | 3 | 1.430 | 1.2856 | 1.2787 | 0.1009 |
| 19 | 3 | 1.278 | 1.2363 | 1.2336 | 0.0326 |
| 20 | 3 | 1.292 | 1.2350 | 1.2319 | 0.0441 |
| 21 | 3 | 1.002 | 1.0903 | 1.0960 | -0.0881 |
| 22 | 3 | 1.183 | 1.1923 | 1.1922 | -0.0079 |
| 23 | 3 | 1.044 | 1.1216 | 1.1260 | -0.0744 |
| 24 | 3 | 1.267 | 1.2230 | 1.2207 | 0.0347 |
| 25 | 3 | 1.193 | 1.1938 | 1.1937 | -0.0007 |
| 26 | 4 | 0.791 | 0.7627 | 0.7602 | 0.0358 |
| 27 | 4 | 0.795 | 0.7650 | 0.7624 | 0.0378 |
| 28 | 4 | 0.759 | 0.7338 | 0.7323 | 0.0331 |
| 29 | 4 | 0.796 | 0.7699 | 0.7675 | 0.0328 |
| 30 | 4 | 0.565 | 0.6134 | 0.6173 | -0.0857 |
| 31 | 4 | 0.886 | 0.7696 | 0.7650 | 0.1314 |
| 32 | 4 | 0.952 | 0.7958 | 0.7893 | 0.1640 |
| 33 | 4 | 0.807 | 0.7723 | 0.7694 | 0.0430 |
| 34 | 4 | 0.582 | 0.6102 | 0.6129 | -0.0485 |
| 35 | 4 | 0.684 | 0.7002 | 0.7010 | -0.0237 |
| 36 | 4 | 0.787 | 0.7593 | 0.7569 | 0.0352 |
| 37 | 4 | 0.440 | 0.5299 | 0.5372 | -0.2043 |
| 38 | 4 | 0.759 | 0.7434 | 0.7419 | 0.0205 |
| 39 | 4 | 0.770 | 0.7549 | 0.7533 | 0.0196 |
| 40 | 4 | 0.800 | 0.7702 | 0.7675 | 0.0373 |
| 41 | 4 | 0.756 | 0.7481 | 0.7471 | 0.0104 |
| 42 | 4 | 0.865 | 0.8041 | 0.7994 | 0.0704 |
| 43 | 4 | 0.640 | 0.6811 | 0.6832 | -0.0642 |
df_long <- data.frame(
SmallArea = rep(milk$SmallArea, 3),
Estimasi = c(milk$yi, resultREML$eblup, resultFH$eblup),
Metode = rep(c("Direct", "EBLUP-REML", "EBLUP-FH"), each = nrow(milk))
)
ggplot(df_long, aes(x = SmallArea, y = Estimasi, color = Metode, shape = Metode)) +
geom_point(size = 2.2, alpha = 0.85) +
geom_line(aes(group = Metode), alpha = 0.5, linewidth = 0.7) +
scale_color_manual(values = c("Direct" = "grey50",
"EBLUP-REML" = "#E74C3C",
"EBLUP-FH" = "#2980B9")) +
labs(
title = "Perbandingan Estimasi: Direct vs EBLUP (REML & FH)",
subtitle = "Dataset milk — Model Fay-Herriot",
x = "Small Area",
y = "Estimasi yi",
color = "Metode", shape = "Metode"
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")Interpretasi:
Estimator EBLUP merupakan kombinasi linier tertimbang antara estimasi langsung (\(y_i\)) dan estimasi regresi sintesis (\(\hat{y}_i^{syn} = \mathbf{x}_i^\top \hat{\boldsymbol{\beta}}\)).
Shrinkage effect: Pada area dengan sampel kecil (presisi rendah, CV tinggi), EBLUP lebih “ditarik” ke arah estimasi sintesis. Sebaliknya, pada area dengan sampel besar, EBLUP mendekati estimasi langsung.
Variansi komponen (\(\hat{\sigma}_v^2\)):
- REML: 0.0186
- FH: 0.0164
Nilai \(\hat{\sigma}_v^2\) yang lebih besar menandakan heterogenitas antar area yang lebih tinggi di luar pengaruh
MajorArea.Metode REML umumnya menghasilkan estimasi variansi komponen yang tidak bias dan lebih stabil dibanding ML untuk model campuran.
Pengelompokan
MajorAreaterbukti efektif sebagai kovariat karena mampu menjelaskan sebagian variasi antar area, sehingga meningkatkan presisi estimasi EBLUP.
mseFH)Fungsi mseFH menghitung estimasi MSE dari estimator EB
menggunakan pendekatan Prasad-Rao (analitik), yang
merupakan second-order approximation dari MSE sebenarnya.
#> $est
#> $est$eblup
#> [,1]
#> 1 1.0161733
#> 2 1.0436968
#> 3 1.0628168
#> 4 0.7753489
#> 5 0.8554903
#> 6 0.9735857
#> 7 1.0474786
#> 8 1.0953436
#> 9 1.2054092
#> 10 1.1812565
#> 11 0.8033701
#> 12 1.1967756
#> 13 1.1961592
#> 14 0.9914053
#> 15 1.1868829
#> 16 1.1590363
#> 17 1.2232369
#> 18 1.2755188
#> 19 1.2322850
#> 20 1.2304422
#> 21 1.0985768
#> 22 1.1921598
#> 23 1.1279920
#> 24 1.2196285
#> 25 1.1936256
#> 26 0.7590651
#> 27 0.7611232
#> 28 0.7315647
#> 29 0.7662730
#> 30 0.6191454
#> 31 0.7629389
#> 32 0.7863747
#> 33 0.7679782
#> 34 0.6141348
#> 35 0.7013109
#> 36 0.7557564
#> 37 0.5406643
#> 38 0.7411316
#> 39 0.7524506
#> 40 0.7662423
#> 41 0.7465360
#> 42 0.7971405
#> 43 0.6840976
#>
#> $est$fit
#> $est$fit$method
#> [1] "ML"
#>
#> $est$fit$convergence
#> [1] TRUE
#>
#> $est$fit$iterations
#> [1] 4
#>
#> $est$fit$estcoef
#> beta std.error tvalue pvalue
#> X(Intercept) 0.9677986 0.06590747 14.684203 8.138516e-49
#> Xas.factor(MajorArea)2 0.1278756 0.09840939 1.299425 1.937982e-01
#> Xas.factor(MajorArea)3 0.2266909 0.08813973 2.571949 1.011278e-02
#> Xas.factor(MajorArea)4 -0.2425804 0.07753875 -3.128505 1.756978e-03
#>
#> $est$fit$refvar
#> [1] 0.01551755
#>
#> $est$fit$goodness
#> loglike AIC BIC KIC
#> 12.771174 -15.542349 -6.736348 -10.542349
#>
#>
#>
#> $mse
#> [1] 0.013579953 0.005512868 0.005850584 0.008735454 0.009774528 0.011840745
#> [7] 0.015934511 0.010821811 0.014345961 0.015036089 0.007911095 0.016404523
#> [13] 0.012771021 0.012334611 0.012192499 0.011877067 0.011041285 0.013805155
#> [19] 0.011213853 0.013213712 0.010138289 0.017193729 0.011467631 0.013741841
#> [25] 0.008251436 0.009344874 0.009344874 0.016390149 0.007941642 0.006222263
#> [31] 0.015404327 0.014655741 0.009165440 0.003946977 0.007941642 0.009782385
#> [37] 0.006532467 0.010285994 0.007347143 0.008612538 0.005597690 0.009344874
#> [43] 0.010037140
#> $est
#> $est$eblup
#> [,1]
#> 1 1.0219703
#> 2 1.0476018
#> 3 1.0679513
#> 4 0.7608170
#> 5 0.8461574
#> 6 0.9743727
#> 7 1.0584523
#> 8 1.0977762
#> 9 1.2215449
#> 10 1.1951455
#> 11 0.7852155
#> 12 1.2139456
#> 13 1.2096593
#> 14 0.9834967
#> 15 1.1864247
#> 16 1.1556982
#> 17 1.2263411
#> 18 1.2856486
#> 19 1.2363247
#> 20 1.2349600
#> 21 1.0903019
#> 22 1.1923057
#> 23 1.1216470
#> 24 1.2230296
#> 25 1.1938054
#> 26 0.7627195
#> 27 0.7649550
#> 28 0.7338443
#> 29 0.7699294
#> 30 0.6134418
#> 31 0.7695558
#> 32 0.7958250
#> 33 0.7723187
#> 34 0.6102302
#> 35 0.7001782
#> 36 0.7592787
#> 37 0.5298867
#> 38 0.7434466
#> 39 0.7548996
#> 40 0.7701918
#> 41 0.7481164
#> 42 0.8040773
#> 43 0.6810870
#>
#> $est$fit
#> $est$fit$method
#> [1] "REML"
#>
#> $est$fit$convergence
#> [1] TRUE
#>
#> $est$fit$iterations
#> [1] 4
#>
#> $est$fit$estcoef
#> beta std.error tvalue pvalue
#> X(Intercept) 0.9681890 0.06936208 13.958476 2.793443e-44
#> Xas.factor(MajorArea)2 0.1327801 0.10300072 1.289119 1.973569e-01
#> Xas.factor(MajorArea)3 0.2269462 0.09232981 2.457995 1.397151e-02
#> Xas.factor(MajorArea)4 -0.2413011 0.08161707 -2.956503 3.111496e-03
#>
#> $est$fit$refvar
#> [1] 0.01855022
#>
#> $est$fit$goodness
#> loglike AIC BIC KIC
#> 12.677478 -15.354956 -6.548956 -10.354956
#>
#>
#>
#> $mse
#> [1] 0.013460220 0.005372876 0.005701990 0.008541740 0.009579594 0.011670632
#> [7] 0.015926137 0.010586518 0.014184043 0.014901472 0.007694262 0.016336469
#> [13] 0.012562726 0.012117378 0.012031229 0.011709147 0.010859780 0.013690860
#> [19] 0.011034674 0.013079686 0.009948636 0.017243977 0.011292325 0.013625297
#> [25] 0.008065787 0.009205133 0.009205133 0.016476912 0.007800626 0.006098668
#> [31] 0.015441564 0.014657866 0.009024699 0.003870786 0.007800626 0.009646139
#> [37] 0.006404335 0.010155645 0.007209937 0.008470277 0.005484860 0.009205133
#> [43] 0.009903626
#> $est
#> $est$eblup
#> [,1]
#> 1 1.0179759
#> 2 1.0449639
#> 3 1.0644808
#> 4 0.7706920
#> 5 0.8525124
#> 6 0.9738262
#> 7 1.0508569
#> 8 1.0961652
#> 9 1.2105053
#> 10 1.1856404
#> 11 0.7975687
#> 12 1.2021499
#> 13 1.2004587
#> 14 0.9889713
#> 15 1.1867450
#> 16 1.1579920
#> 17 1.2242232
#> 18 1.2786804
#> 19 1.2335659
#> 20 1.2318601
#> 21 1.0959551
#> 22 1.1922126
#> 23 1.1259973
#> 24 1.2206948
#> 25 1.1936875
#> 26 0.7602435
#> 27 0.7623581
#> 28 0.7322880
#> 29 0.7674591
#> 30 0.6173102
#> 31 0.7649969
#> 32 0.7893148
#> 33 0.7693760
#> 34 0.6128615
#> 35 0.7009616
#> 36 0.7568908
#> 37 0.5371932
#> 38 0.7418816
#> 39 0.7532513
#> 40 0.7675187
#> 41 0.7470595
#> 42 0.7993630
#> 43 0.6831609
#>
#> $est$fit
#> $est$fit$method
#> [1] "FH"
#>
#> $est$fit$convergence
#> [1] TRUE
#>
#> $est$fit$iterations
#> [1] 3
#>
#> $est$fit$estcoef
#> beta std.error tvalue pvalue
#> X(Intercept) 0.9679012 0.06695896 14.455139 2.326581e-47
#> Xas.factor(MajorArea)2 0.1294502 0.09980334 1.297053 1.946131e-01
#> Xas.factor(MajorArea)3 0.2267910 0.08941191 2.536474 1.119750e-02
#> Xas.factor(MajorArea)4 -0.2421518 0.07877987 -3.073778 2.113670e-03
#>
#> $est$fit$refvar
#> [1] 0.01642027
#>
#> $est$fit$goodness
#> loglike AIC BIC KIC
#> 12.76205 -15.52410 -6.71810 -10.52410
#>
#>
#>
#> $mse
#> [1] 0.012757016 0.005314467 0.005632201 0.008323471 0.009283520 0.011178151
#> [7] 0.014867662 0.010252709 0.013470878 0.014094867 0.007558331 0.015325274
#> [13] 0.012038944 0.011640323 0.011466958 0.011181888 0.010423673 0.012914353
#> [19] 0.010580561 0.012385544 0.009599980 0.015890240 0.010810966 0.012857861
#> [25] 0.007864537 0.008855177 0.008855177 0.015041526 0.007569376 0.005975211
#> [31] 0.014211800 0.013571949 0.008691547 0.003833361 0.007569376 0.009253153
#> [37] 0.006264329 0.009709450 0.007020470 0.008185874 0.005391055 0.008855177
#> [43] 0.009484220
df_mse <- data.frame(
SmallArea = milk$SmallArea,
MajorArea = milk$MajorArea,
Direct_yi = round(milk$yi, 3),
MSE_ML = round(resultML$mse, 5),
MSE_REML = round(resultREML_mse$mse, 5),
MSE_FH = round(resultFH_mse$mse, 5),
RRMSE_ML = round(sqrt(resultML$mse) / resultML$est$eblup * 100, 2),
RRMSE_REML = round(sqrt(resultREML_mse$mse) / resultREML_mse$est$eblup * 100, 2),
RRMSE_FH = round(sqrt(resultFH_mse$mse) / resultFH_mse$est$eblup * 100, 2)
)
kable(df_mse,
caption = "Estimasi MSE dan RRMSE (%) per Metode — Dataset milk",
digits = 5, align = "c") %>%
kb_html(height = "400px")| SmallArea | MajorArea | Direct_yi | MSE_ML | MSE_REML | MSE_FH | RRMSE_ML | RRMSE_REML | RRMSE_FH |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1.099 | 0.01358 | 0.01346 | 0.01276 | 11.47 | 11.35 | 11.10 |
| 2 | 1 | 1.075 | 0.00551 | 0.00537 | 0.00531 | 7.11 | 7.00 | 6.98 |
| 3 | 1 | 1.105 | 0.00585 | 0.00570 | 0.00563 | 7.20 | 7.07 | 7.05 |
| 4 | 1 | 0.628 | 0.00874 | 0.00854 | 0.00832 | 12.05 | 12.15 | 11.84 |
| 5 | 1 | 0.753 | 0.00977 | 0.00958 | 0.00928 | 11.56 | 11.57 | 11.30 |
| 6 | 1 | 0.981 | 0.01184 | 0.01167 | 0.01118 | 11.18 | 11.09 | 10.86 |
| 7 | 1 | 1.257 | 0.01593 | 0.01593 | 0.01487 | 12.05 | 11.92 | 11.60 |
| 8 | 2 | 1.095 | 0.01082 | 0.01059 | 0.01025 | 9.50 | 9.37 | 9.24 |
| 9 | 2 | 1.405 | 0.01435 | 0.01418 | 0.01347 | 9.94 | 9.75 | 9.59 |
| 10 | 2 | 1.356 | 0.01504 | 0.01490 | 0.01409 | 10.38 | 10.21 | 10.01 |
| 11 | 2 | 0.615 | 0.00791 | 0.00769 | 0.00756 | 11.07 | 11.17 | 10.90 |
| 12 | 2 | 1.460 | 0.01640 | 0.01634 | 0.01533 | 10.70 | 10.53 | 10.30 |
| 13 | 2 | 1.338 | 0.01277 | 0.01256 | 0.01204 | 9.45 | 9.27 | 9.14 |
| 14 | 2 | 0.854 | 0.01233 | 0.01212 | 0.01164 | 11.20 | 11.19 | 10.91 |
| 15 | 3 | 1.176 | 0.01219 | 0.01203 | 0.01147 | 9.30 | 9.25 | 9.02 |
| 16 | 3 | 1.111 | 0.01188 | 0.01171 | 0.01118 | 9.40 | 9.36 | 9.13 |
| 17 | 3 | 1.257 | 0.01104 | 0.01086 | 0.01042 | 8.59 | 8.50 | 8.34 |
| 18 | 3 | 1.430 | 0.01381 | 0.01369 | 0.01291 | 9.21 | 9.10 | 8.89 |
| 19 | 3 | 1.278 | 0.01121 | 0.01103 | 0.01058 | 8.59 | 8.50 | 8.34 |
| 20 | 3 | 1.292 | 0.01321 | 0.01308 | 0.01239 | 9.34 | 9.26 | 9.03 |
| 21 | 3 | 1.002 | 0.01014 | 0.00995 | 0.00960 | 9.17 | 9.15 | 8.94 |
| 22 | 3 | 1.183 | 0.01719 | 0.01724 | 0.01589 | 11.00 | 11.01 | 10.57 |
| 23 | 3 | 1.044 | 0.01147 | 0.01129 | 0.01081 | 9.49 | 9.47 | 9.23 |
| 24 | 3 | 1.267 | 0.01374 | 0.01363 | 0.01286 | 9.61 | 9.54 | 9.29 |
| 25 | 3 | 1.193 | 0.00825 | 0.00807 | 0.00786 | 7.61 | 7.52 | 7.43 |
| 26 | 4 | 0.791 | 0.00934 | 0.00921 | 0.00886 | 12.74 | 12.58 | 12.38 |
| 27 | 4 | 0.795 | 0.00934 | 0.00921 | 0.00886 | 12.70 | 12.54 | 12.34 |
| 28 | 4 | 0.759 | 0.01639 | 0.01648 | 0.01504 | 17.50 | 17.49 | 16.75 |
| 29 | 4 | 0.796 | 0.00794 | 0.00780 | 0.00757 | 11.63 | 11.47 | 11.34 |
| 30 | 4 | 0.565 | 0.00622 | 0.00610 | 0.00598 | 12.74 | 12.73 | 12.52 |
| 31 | 4 | 0.886 | 0.01540 | 0.01544 | 0.01421 | 16.27 | 16.15 | 15.58 |
| 32 | 4 | 0.952 | 0.01466 | 0.01466 | 0.01357 | 15.39 | 15.21 | 14.76 |
| 33 | 4 | 0.807 | 0.00917 | 0.00902 | 0.00869 | 12.47 | 12.30 | 12.12 |
| 34 | 4 | 0.582 | 0.00395 | 0.00387 | 0.00383 | 10.23 | 10.20 | 10.10 |
| 35 | 4 | 0.684 | 0.00794 | 0.00780 | 0.00757 | 12.71 | 12.61 | 12.41 |
| 36 | 4 | 0.787 | 0.00978 | 0.00965 | 0.00925 | 13.09 | 12.94 | 12.71 |
| 37 | 4 | 0.440 | 0.00653 | 0.00640 | 0.00626 | 14.95 | 15.10 | 14.73 |
| 38 | 4 | 0.759 | 0.01029 | 0.01016 | 0.00971 | 13.68 | 13.56 | 13.28 |
| 39 | 4 | 0.770 | 0.00735 | 0.00721 | 0.00702 | 11.39 | 11.25 | 11.12 |
| 40 | 4 | 0.800 | 0.00861 | 0.00847 | 0.00819 | 12.11 | 11.95 | 11.79 |
| 41 | 4 | 0.756 | 0.00560 | 0.00548 | 0.00539 | 10.02 | 9.90 | 9.83 |
| 42 | 4 | 0.865 | 0.00934 | 0.00921 | 0.00886 | 12.13 | 11.93 | 11.77 |
| 43 | 4 | 0.640 | 0.01004 | 0.00990 | 0.00948 | 14.64 | 14.61 | 14.26 |
df_mse_long <- data.frame(
SmallArea = rep(milk$SmallArea, 3),
MSE = c(resultML$mse, resultREML_mse$mse, resultFH_mse$mse),
Metode = rep(c("ML", "REML", "FH"), each = nrow(milk))
)
ggplot(df_mse_long, aes(x = SmallArea, y = MSE, color = Metode, group = Metode)) +
geom_line(linewidth = 0.8, alpha = 0.85) +
geom_point(size = 1.8) +
scale_color_manual(values = c("ML" = "#E67E22", "REML" = "#27AE60", "FH" = "#8E44AD")) +
labs(
title = "Perbandingan Estimasi MSE — ML vs REML vs FH",
subtitle = "Model Fay-Herriot, Dataset milk",
x = "Small Area",
y = "Estimasi MSE",
color = "Metode"
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
legend.position = "top")df_rrmse_long <- data.frame(
RRMSE = c(df_mse$RRMSE_ML, df_mse$RRMSE_REML, df_mse$RRMSE_FH),
Metode = rep(c("ML", "REML", "FH"), each = nrow(milk))
)
ggplot(df_rrmse_long, aes(x = Metode, y = RRMSE, fill = Metode)) +
geom_boxplot(alpha = 0.7, outlier.colour = "red") +
geom_jitter(width = 0.12, alpha = 0.5, size = 2) +
scale_fill_manual(values = c("ML" = "#E67E22", "REML" = "#27AE60", "FH" = "#8E44AD")) +
labs(
title = "Distribusi RRMSE (%) per Metode Estimasi",
x = "Metode",
y = "RRMSE (%)"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "none")mse_summary <- data.frame(
Metode = c("ML", "REML", "FH"),
Sigma2v = c(resultML$est$fit$refvar,
resultREML_mse$est$fit$refvar,
resultFH_mse$est$fit$refvar),
Mean_MSE = c(mean(resultML$mse),
mean(resultREML_mse$mse),
mean(resultFH_mse$mse)),
Median_MSE = c(median(resultML$mse),
median(resultREML_mse$mse),
median(resultFH_mse$mse)),
Min_MSE = c(min(resultML$mse),
min(resultREML_mse$mse),
min(resultFH_mse$mse)),
Max_MSE = c(max(resultML$mse),
max(resultREML_mse$mse),
max(resultFH_mse$mse)),
Mean_RRMSE = c(mean(df_mse$RRMSE_ML),
mean(df_mse$RRMSE_REML),
mean(df_mse$RRMSE_FH))
)
kable(mse_summary,
caption = "Ringkasan Estimasi MSE per Metode",
digits = 5, align = "c") %>%
kb_simple()| Metode | Sigma2v | Mean_MSE | Median_MSE | Min_MSE | Max_MSE | Mean_RRMSE |
|---|---|---|---|---|---|---|
| ML | 0.01552 | 0.01076 | 0.01029 | 0.00395 | 0.01719 | 11.22233 |
| REML | 0.01855 | 0.01063 | 0.01016 | 0.00387 | 0.01724 | 11.13535 |
| FH | 0.01642 | 0.01014 | 0.00971 | 0.00383 | 0.01589 | 10.90279 |
Interpretasi:
Variansi komponen (\(\hat{\sigma}_v^2\)):
- Tiga metode (ML, REML, FH) menghasilkan estimasi \(\hat{\sigma}_v^2\) yang sedikit berbeda.
- REML cenderung menghasilkan estimasi \(\hat{\sigma}_v^2\) yang tidak bias (unbiased), karena mengoreksi bias yang terjadi pada estimasi ML.
- ML cenderung underestimate \(\hat{\sigma}_v^2\) terutama pada sampel kecil karena tidak memperhitungkan derajat kebebasan yang hilang akibat estimasi parameter tetap (\(\boldsymbol{\beta}\)).
- FH (Fay-Herriot) menggunakan pendekatan method of moments yang berbeda dari keduanya.
MSE Estimator EB:
Fungsi
mseFHmenggunakan dekomposisi MSE Prasad-Rao: \[\widehat{MSE}(\hat{\theta}_i^{EB}) = g_{1i}(\hat{\sigma}_v^2) + g_{2i}(\hat{\sigma}_v^2) + 2g_{3i}(\hat{\sigma}_v^2)\] di mana \(g_1\) adalah komponen utama (shrinkage), \(g_2\) berkaitan dengan estimasi \(\boldsymbol{\beta}\), dan \(g_3\) adalah koreksi bias.Area dengan CV tinggi (presisi langsung rendah) umumnya memiliki MSE yang lebih kecil dari variansi estimasi langsungnya — menunjukkan keuntungan model FH.
RRMSE rata-rata di bawah 10% mengindikasikan bahwa EBLUP sudah cukup presisi untuk keperluan pelaporan statistik resmi.
Jika \(\hat{\sigma}_v^2 = 0\) (boundary solution), semua estimasi EBLUP menjadi estimasi sintesis murni. Ini perlu diwaspadai sebagai tanda model mungkin terlalu dihaluskan.
cornsoybean — Model BHF (Unit Level)#> 'data.frame': 37 obs. of 5 variables:
#> $ County : int 1 2 3 4 4 5 5 5 6 6 ...
#> $ CornHec : num 165.8 96.3 76.1 185.3 116.4 ...
#> $ SoyBeansHec: num 8.09 106.03 103.6 6.47 63.82 ...
#> $ CornPix : int 374 209 253 432 367 361 288 369 206 316 ...
#> $ SoyBeansPix: int 55 218 250 96 178 137 206 165 218 221 ...
#> 'data.frame': 12 obs. of 6 variables:
#> $ CountyIndex : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ CountyName : Factor w/ 12 levels "CerroGordo","Franklin",..: 1 3 11 6 2 8 10 12 9 4 ...
#> $ SampSegments : int 1 1 1 2 3 3 3 3 4 5 ...
#> $ PopnSegments : num 545 566 394 424 564 570 402 567 687 569 ...
#> $ MeanCornPixPerSeg : num 295 300 290 291 318 ...
#> $ MeanSoyBeansPixPerSeg: num 190 197 205 220 188 ...
kable(head(cornsoybean, 10),
caption = "10 Baris Pertama Dataset cornsoybean (Unit Level)",
digits = 2, align = "c") %>%
kb_simple()| County | CornHec | SoyBeansHec | CornPix | SoyBeansPix |
|---|---|---|---|---|
| 1 | 165.76 | 8.09 | 374 | 55 |
| 2 | 96.32 | 106.03 | 209 | 218 |
| 3 | 76.08 | 103.60 | 253 | 250 |
| 4 | 185.35 | 6.47 | 432 | 96 |
| 4 | 116.43 | 63.82 | 367 | 178 |
| 5 | 162.08 | 43.50 | 361 | 137 |
| 5 | 152.04 | 71.43 | 288 | 206 |
| 5 | 161.75 | 42.49 | 369 | 165 |
| 6 | 92.88 | 105.26 | 206 | 218 |
| 6 | 149.94 | 76.49 | 316 | 221 |
kable(head(cornsoybeanmeans, 12),
caption = "Dataset cornsoybeanmeans (County Level Means)",
digits = 2, align = "c") %>%
kb_simple()| CountyIndex | CountyName | SampSegments | PopnSegments | MeanCornPixPerSeg | MeanSoyBeansPixPerSeg |
|---|---|---|---|---|---|
| 1 | CerroGordo | 1 | 545 | 295.29 | 189.70 |
| 2 | Hamilton | 1 | 566 | 300.40 | 196.65 |
| 3 | Worth | 1 | 394 | 289.60 | 205.28 |
| 4 | Humboldt | 2 | 424 | 290.74 | 220.22 |
| 5 | Franklin | 3 | 564 | 318.21 | 188.06 |
| 6 | Pocahontas | 3 | 570 | 257.17 | 247.13 |
| 7 | Winnebago | 3 | 402 | 291.77 | 185.37 |
| 8 | Wright | 3 | 567 | 301.26 | 221.36 |
| 9 | Webster | 4 | 687 | 262.17 | 247.09 |
| 10 | Hancock | 5 | 569 | 314.28 | 198.66 |
| 11 | Kossuth | 5 | 965 | 298.65 | 204.61 |
| 12 | Hardin | 6 | 556 | 325.99 | 177.05 |
cornsoybean (unit/segmen level):
| Variabel | Keterangan |
|---|---|
County |
Kode county (domain) |
CornHec |
Hektar jagung per segmen |
SoyBeansHec |
Hektar kedelai per segmen |
CornPix |
Jumlah piksel jagung per segmen (data satelit) |
SoyBeansPix |
Jumlah piksel kedelai per segmen |
cornsoybeanmeans (county/domain
level):
| Variabel | Keterangan |
|---|---|
CountyIndex |
Kode county |
MeanCornPixPerSeg |
Rata-rata piksel jagung per segmen di county |
MeanSoyBeansPixPerSeg |
Rata-rata piksel kedelai per segmen di county |
PopnSegments |
Jumlah total segmen (ukuran populasi) di county |
ggplot(cornsoybean, aes(x = factor(County), y = CornHec, fill = factor(County))) +
geom_boxplot(alpha = 0.7, show.legend = FALSE) +
labs(
title = "Distribusi Hektar Jagung (CornHec) per County",
x = "County",
y = "CornHec (Hektar)"
) +
theme_minimal(base_size = 12)# Kolom pertama HARUS berupa kode domain/county
Xmean <- data.frame(CountyIndex, MeanCornPixPerSeg, MeanSoyBeansPixPerSeg)
Popn <- data.frame(CountyIndex, PopnSegments)
kable(Xmean,
caption = "Data Frame Xmean (Rata-rata Auxiliary per County)",
digits = 2, align = "c") %>%
kb_simple()| CountyIndex | MeanCornPixPerSeg | MeanSoyBeansPixPerSeg |
|---|---|---|
| 1 | 295.29 | 189.70 |
| 2 | 300.40 | 196.65 |
| 3 | 289.60 | 205.28 |
| 4 | 290.74 | 220.22 |
| 5 | 318.21 | 188.06 |
| 6 | 257.17 | 247.13 |
| 7 | 291.77 | 185.37 |
| 8 | 301.26 | 221.36 |
| 9 | 262.17 | 247.09 |
| 10 | 314.28 | 198.66 |
| 11 | 298.65 | 204.61 |
| 12 | 325.99 | 177.05 |
kable(Popn,
caption = "Data Frame Popn (Ukuran Populasi per County)",
digits = 0, align = "c") %>%
kb_simple()| CountyIndex | PopnSegments |
|---|---|
| 1 | 545 |
| 2 | 566 |
| 3 | 394 |
| 4 | 424 |
| 5 | 564 |
| 6 | 570 |
| 7 | 402 |
| 8 | 567 |
| 9 | 687 |
| 10 | 569 |
| 11 | 965 |
| 12 | 556 |
eblupBHF)Model BHF (Battese-Harter-Fuller, 1988) adalah model unit-level:
\[y_{ij} = \mathbf{x}_{ij}^\top \boldsymbol{\beta} + v_i + e_{ij}\]
di mana:
EBLUP untuk rata-rata domain \(i\) adalah:
\[\hat{\bar{Y}}_i^{EBLUP} = \hat{\bar{Y}}_i^{syn} + \hat{\gamma}_i(\bar{y}_i - \bar{\mathbf{x}}_i^\top \hat{\boldsymbol{\beta}} - \hat{v}_i^*)\]
resultCorn <- eblupBHF(
CornHec ~ CornPix + SoyBeansPix,
dom = County,
meanxpop = Xmean,
popnsize = Popn,
data = cornsoybean
)
cat("=== EBLUP Hektar Jagung (Semua County) ===\n")#> === EBLUP Hektar Jagung (Semua County) ===
#> domain eblup sampsize
#> 1 1 122.5825 1
#> 2 2 123.5274 1
#> 3 3 113.0343 1
#> 4 4 114.9901 2
#> 5 5 137.2660 3
#> 6 6 108.9807 3
#> 7 7 116.4839 3
#> 8 8 122.7711 3
#> 9 9 111.5648 4
#> 10 10 124.1565 5
#> 11 11 112.4626 5
#> 12 12 131.2515 6
kable(resultCorn$eblup,
caption = "EBLUP Rata-rata Hektar Jagung per County (BHF, REML)",
digits = 3, align = "c") %>%
kb_simple()| domain | eblup | sampsize |
|---|---|---|
| 1 | 122.583 | 1 |
| 2 | 123.527 | 1 |
| 3 | 113.034 | 1 |
| 4 | 114.990 | 2 |
| 5 | 137.266 | 3 |
| 6 | 108.981 | 3 |
| 7 | 116.484 | 3 |
| 8 | 122.771 | 3 |
| 9 | 111.565 | 4 |
| 10 | 124.157 | 5 |
| 11 | 112.463 | 5 |
| 12 | 131.252 | 6 |
#> === Ringkasan Model Fit (Jagung) ===
#> Variansi Random Effect (sigma_v^2): 63.3149
#> Variansi Error (sigma_e^2) : 297.7128
#> Koefisien Regresi:
#> Xs(Intercept) XsCornPix XsSoyBeansPix
#> 17.9639791 0.3663352 -0.0303638
domains <- c(10, 1, 5)
resultBean <- eblupBHF(
SoyBeansHec ~ CornPix + SoyBeansPix,
dom = County,
selectdom = domains,
meanxpop = Xmean,
popnsize = Popn,
method = "ML",
data = cornsoybean
)
cat("=== EBLUP Hektar Kedelai — County 10, 1, 5 (Metode ML) ===\n")#> === EBLUP Hektar Kedelai — County 10, 1, 5 (Metode ML) ===
#> domain eblup sampsize
#> 1 10 100.61124 5
#> 2 1 78.63690 1
#> 3 5 66.26077 3
#> === Ringkasan Model Fit (Kedelai, ML) ===
#> Variansi Random Effect (sigma_v^2): 219.322
#> Variansi Error (sigma_e^2) : 170.2856
#> Koefisien Regresi:
#> Xs(Intercept) XsCornPix XsSoyBeansPix
#> -16.34565637 0.02806513 0.49678511
# SE per county = sqrt(sigma_e^2 / ni) — aproksimasi dari variansi error model
df_corn_eblup <- resultCorn$eblup
df_corn_eblup$County <- as.numeric(rownames(df_corn_eblup))
ni_corn <- as.numeric(table(cornsoybean$County))
df_corn_eblup$se <- sqrt(resultCorn$fit$errorvar / ni_corn)
ggplot(df_corn_eblup, aes(x = factor(County), y = eblup)) +
geom_col(fill = "#F39C12", alpha = 0.8) +
geom_errorbar(aes(ymin = eblup - se, ymax = eblup + se),
width = 0.3, color = "black") +
labs(
title = "EBLUP Rata-rata Hektar Jagung per County +/- SE",
subtitle = "Model BHF — Metode REML (SE = sqrt(sigma_e^2 / ni))",
x = "County",
y = "EBLUP (Hektar)"
) +
theme_minimal(base_size = 12)Interpretasi:
Model dan Kovariat:
- Model BHF menggunakan data piksel satelit (
CornPix,SoyBeansPix) sebagai prediktor untuk menduga rata-rata areal jagung dan kedelai per county.- Korelasi antara data lapangan dan piksel satelit yang tinggi memungkinkan model menghasilkan estimasi yang jauh lebih presisi dibanding estimasi langsung.
EBLUP Jagung (Semua County):
- Koefisien
CornPixpositif menandakan bahwa semakin banyak piksel jagung di segmen, semakin besar area jagung yang diukur — konsisten secara agronomis.- Koefisien
SoyBeansPixbisa positif atau negatif bergantung pada struktur lahan yang saling berkorelasi di area tersebut.EBLUP Kedelai (Subset County 10, 1, 5 — Metode ML):
- Pemilihan
method = "ML"memberikan estimasi \(\hat{\sigma}_v^2\) yang sedikit underestimate dibanding REML, namun konsisten untuk sampel besar.- Pembatasan pada 3 county (
selectdom = c(10, 1, 5)) berguna ketika hanya domain tertentu yang menjadi fokus kebijakan, sehingga efisiensi komputasi meningkat.Interpretasi Variansi:
- Nilai \(\hat{\sigma}_v^2\) yang relatif kecil dibanding \(\hat{\sigma}_e^2\) menunjukkan bahwa variasi dominan terjadi antar unit dalam county, bukan antar county itu sendiri.
- Sebaliknya jika \(\hat{\sigma}_v^2 \gg \hat{\sigma}_e^2\), perbedaan antar county sangat signifikan.
pbmseBHF)Fungsi pbmseBHF menghitung estimasi MSE menggunakan
Parametric Bootstrap yang lebih robust — terutama
ketika asumsi normalitas tidak terpenuhi atau pendekatan analitik kurang
akurat.
Algoritma parametric bootstrap untuk model BHF:
set.seed(123)
BHF <- pbmseBHF(CornHec ~ CornPix + SoyBeansPix,
dom = County,
meanxpop = Xmean,
popnsize = Popn,
B = 200,
data = cornsoybean)#>
#> Bootstrap procedure with B = 200 iterations starts.
#> b = 1
#> b = 2
#> b = 3
#> b = 4
#> b = 5
#> b = 6
#> b = 7
#> b = 8
#> b = 9
#> b = 10
#> b = 11
#> b = 12
#> b = 13
#> b = 14
#> b = 15
#> b = 16
#> b = 17
#> b = 18
#> b = 19
#> b = 20
#> b = 21
#> b = 22
#> b = 23
#> b = 24
#> b = 25
#> b = 26
#> b = 27
#> b = 28
#> b = 29
#> b = 30
#> b = 31
#> b = 32
#> b = 33
#> b = 34
#> b = 35
#> b = 36
#> b = 37
#> b = 38
#> b = 39
#> b = 40
#> b = 41
#> b = 42
#> b = 43
#> b = 44
#> b = 45
#> b = 46
#> b = 47
#> b = 48
#> b = 49
#> b = 50
#> b = 51
#> b = 52
#> b = 53
#> b = 54
#> b = 55
#> b = 56
#> b = 57
#> b = 58
#> b = 59
#> b = 60
#> b = 61
#> b = 62
#> b = 63
#> b = 64
#> b = 65
#> b = 66
#> b = 67
#> b = 68
#> b = 69
#> b = 70
#> b = 71
#> b = 72
#> b = 73
#> b = 74
#> b = 75
#> b = 76
#> b = 77
#> b = 78
#> b = 79
#> b = 80
#> b = 81
#> b = 82
#> b = 83
#> b = 84
#> b = 85
#> b = 86
#> b = 87
#> b = 88
#> b = 89
#> b = 90
#> b = 91
#> b = 92
#> b = 93
#> b = 94
#> b = 95
#> b = 96
#> b = 97
#> b = 98
#> b = 99
#> b = 100
#> b = 101
#> b = 102
#> b = 103
#> b = 104
#> b = 105
#> b = 106
#> b = 107
#> b = 108
#> b = 109
#> b = 110
#> b = 111
#> b = 112
#> b = 113
#> b = 114
#> b = 115
#> b = 116
#> b = 117
#> b = 118
#> b = 119
#> b = 120
#> b = 121
#> b = 122
#> b = 123
#> b = 124
#> b = 125
#> b = 126
#> b = 127
#> b = 128
#> b = 129
#> b = 130
#> b = 131
#> b = 132
#> b = 133
#> b = 134
#> b = 135
#> b = 136
#> b = 137
#> b = 138
#> b = 139
#> b = 140
#> b = 141
#> b = 142
#> b = 143
#> b = 144
#> b = 145
#> b = 146
#> b = 147
#> b = 148
#> b = 149
#> b = 150
#> b = 151
#> b = 152
#> b = 153
#> b = 154
#> b = 155
#> b = 156
#> b = 157
#> b = 158
#> b = 159
#> b = 160
#> b = 161
#> b = 162
#> b = 163
#> b = 164
#> b = 165
#> b = 166
#> b = 167
#> b = 168
#> b = 169
#> b = 170
#> b = 171
#> b = 172
#> b = 173
#> b = 174
#> b = 175
#> b = 176
#> b = 177
#> b = 178
#> b = 179
#> b = 180
#> b = 181
#> b = 182
#> b = 183
#> b = 184
#> b = 185
#> b = 186
#> b = 187
#> b = 188
#> b = 189
#> b = 190
#> b = 191
#> b = 192
#> b = 193
#> b = 194
#> b = 195
#> b = 196
#> b = 197
#> b = 198
#> b = 199
#> b = 200
resid <- BHF$est$fit$residuals
sigmav2est <- BHF$est$fit$refvar
sigmae2est <- BHF$est$fit$errorvar
cat("sigma_v^2 (random effect variance) :", round(sigmav2est, 4), "\n")#> sigma_v^2 (random effect variance) : 63.3149
#> sigma_e^2 (error variance) : 297.7128
#>
#> Ringkasan Residual:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -50.534 -9.854 1.795 0.000 10.272 26.456
df_boot_mse <- data.frame(
County = BHF$est$eblup$domain,
EBLUP = round(BHF$est$eblup$eblup, 3),
MSE_Bootstrap = round(BHF$mse$mse, 5),
SE_Bootstrap = round(sqrt(BHF$mse$mse), 4),
RRMSE_Boot = round(sqrt(BHF$mse$mse) / BHF$est$eblup$eblup * 100, 2)
)
kable(df_boot_mse,
caption = "Estimasi MSE Bootstrap (B = 200) — Model BHF, Variabel Jagung",
digits = 5, align = "c") %>%
kb_simple()| County | EBLUP | MSE_Bootstrap | SE_Bootstrap | RRMSE_Boot |
|---|---|---|---|---|
| 1 | 122.583 | 85.00324 | 9.2197 | 7.52 |
| 2 | 123.527 | 74.65850 | 8.6405 | 6.99 |
| 3 | 113.034 | 74.88418 | 8.6536 | 7.66 |
| 4 | 114.990 | 64.32547 | 8.0203 | 6.97 |
| 5 | 137.266 | 69.21492 | 8.3196 | 6.06 |
| 6 | 108.981 | 50.86237 | 7.1318 | 6.54 |
| 7 | 116.484 | 46.95725 | 6.8525 | 5.88 |
| 8 | 122.771 | 49.40878 | 7.0291 | 5.73 |
| 9 | 111.565 | 46.06986 | 6.7875 | 6.08 |
| 10 | 124.157 | 39.00881 | 6.2457 | 5.03 |
| 11 | 112.463 | 37.91072 | 6.1572 | 5.47 |
| 12 | 131.252 | 35.47207 | 5.9558 | 4.54 |
# Plot MSE Bootstrap per County (hanya menggunakan kolom dari df_boot_mse)
ggplot(df_boot_mse, aes(x = factor(County), y = MSE_Bootstrap)) +
geom_col(fill = "#E74C3C", alpha = 0.8) +
geom_text(aes(label = round(MSE_Bootstrap, 1)),
vjust = -0.4, size = 3.2, color = "black") +
labs(
title = "Estimasi MSE Bootstrap per County",
subtitle = "Model BHF — Dataset cornsoybean (B = 200)",
x = "County",
y = "MSE Bootstrap"
) +
theme_minimal(base_size = 12)ggplot(df_boot_mse, aes(x = factor(County), y = RRMSE_Boot)) +
geom_col(fill = "#2ECC71", alpha = 0.8) +
geom_hline(yintercept = 10, linetype = "dashed",
color = "red", linewidth = 0.8) +
annotate("text", x = 1, y = 10.8,
label = "Batas RRMSE 10%",
hjust = 0, color = "red", size = 3.5) +
labs(
title = "RRMSE Bootstrap (%) per County",
subtitle = "Garis merah = batas 10% untuk pelaporan resmi",
x = "County",
y = "RRMSE (%)"
) +
theme_minimal(base_size = 12)par(mfrow = c(1, 2))
hist(resid, breaks = 15,
col = "#85C1E9",
border = "white",
main = "Distribusi Residual Model BHF",
xlab = "Residual",
ylab = "Frekuensi")
abline(v = 0, col = "red", lwd = 2, lty = 2)
qqnorm(resid, main = "Q-Q Plot Residual", pch = 16, col = "#2874A6")
qqline(resid, col = "red", lwd = 2)Interpretasi:
Variansi Komponen Model BHF:
- \(\hat{\sigma}_v^2\) = 63.3149 — mengukur variabilitas efek acak antar county (heterogenitas antar domain).
- \(\hat{\sigma}_e^2\) = 297.7128 — mengukur variabilitas unit dalam county (noise pengukuran per segmen).
MSE dan RRMSE Bootstrap:
- County dengan RRMSE di bawah 10% menunjukkan estimasi layak untuk pelaporan statistik resmi.
- County dengan sampel kecil cenderung memiliki MSE bootstrap lebih besar — mencerminkan ketidakpastian yang lebih tinggi.
- Umumnya pada \(B = 200\) bootstrap, hasil sudah cukup stabil; untuk akurasi lebih tinggi gunakan \(B \geq 500\).
Diagnostik Residual:
- Histogram residual yang mendekati simetris/normal dan Q-Q Plot yang mendekati garis diagonal mengkonfirmasi asumsi normalitas pada model BHF terpenuhi.
- Penyimpangan dari garis Q-Q di ekor distribusi (heavy tails) menunjukkan kemungkinan adanya outlier atau data yang membutuhkan transformasi.
Keunggulan Bootstrap MSE:
- Tidak bergantung pada asumsi normalitas yang ketat
- Lebih akurat untuk sampel kecil
- Cocok digunakan sebagai validasi silang terhadap estimasi MSE analitik
summary_table <- data.frame(
Aspek = c("Jenis Model", "Level Data", "Dataset R", "Fungsi Utama",
"Metode Estimasi Variansi", "Metode MSE",
"Jumlah Domain", "Kelebihan Utama"),
FH = c("Area-level", "Agregat per area", "milk (43 obs)",
"eblupFH, mseFH", "ML, REML, FH",
"Prasad-Rao (Analitik)", "43 small areas",
"Hemat data, cukup statistik ringkasan"),
BHF = c("Unit-level", "Individu/unit", "cornsoybean",
"eblupBHF, pbmseBHF", "REML, ML",
"Parametric Bootstrap", "12 counties",
"Lebih efisien, gunakan info unit")
)
kable(summary_table,
caption = "Perbandingan Model SAE: Fay-Herriot vs BHF",
col.names = c("Aspek", "Model Fay-Herriot", "Model BHF"),
align = "l") %>%
kb_simple()| Aspek | Model Fay-Herriot | Model BHF |
|---|---|---|
| Jenis Model | Area-level | Unit-level |
| Level Data | Agregat per area | Individu/unit |
| Dataset R | milk (43 obs) | cornsoybean |
| Fungsi Utama | eblupFH, mseFH | eblupBHF, pbmseBHF |
| Metode Estimasi Variansi | ML, REML, FH | REML, ML |
| Metode MSE | Prasad-Rao (Analitik) | Parametric Bootstrap |
| Jumlah Domain | 43 small areas | 12 counties |
| Kelebihan Utama | Hemat data, cukup statistik ringkasan | Lebih efisien, gunakan info unit |
| No | Poin | Penjelasan |
|---|---|---|
| 1 | EBLUP = Borrowing Strength | EBLUP memanfaatkan informasi dari area lain melalui model mixed untuk memperkuat estimasi di area dengan sampel kecil. |
| 2 | Pilihan Metode Variansi | REML direkomendasikan untuk estimasi variansi komponen yang tidak bias. ML cocok untuk perbandingan model. FH adalah alternatif non-iteratif. |
| 3 | MSE sebagai Ukuran Presisi | MSE yang kecil (RRMSE < 10%) menunjukkan estimasi layak untuk publikasi statistik resmi. |
| 4 | Bootstrap untuk Validasi | Parametric Bootstrap dengan B >= 200 memberikan validasi terhadap MSE analitik dan lebih robust terhadap pelanggaran asumsi normalitas. |
| 5 | MajorArea sebagai Kovariat | Pengelompokan MajorArea terbukti efektif mengurangi CV estimasi secara signifikan pada dataset milk. |
| 6 | Diagnostik Residual | Residual yang mendekati normal pada Q-Q Plot mengkonfirmasi keabsahan model BHF. Outlier perlu diinvestigasi lebih lanjut. |