1 Bukti Unit Kompetensi 1 — Menentukan Objektif Bisnis

Kode Unit: J.62DMI00.001.1 | Judul Unit: Menentukan Objektif Bisnis

Bukti ini berhubungan dengan pengetahuan, keterampilan, dan sikap kerja yang dibutuhkan dalam menentukan objektif bisnis dari proyek data science prediksi risiko stunting pada balita di 50 kabupaten/kota prioritas Indonesia, sebagai bagian dari Program Percepatan Penurunan Stunting (PPS) Nasional 2025–2026.


1.1 IK 01 — Menentukan Permasalahan Bisnis dan Objektif Bisnis

1.1.1 Latar Belakang dan Permasalahan Bisnis [KUK 1.1]

Stunting pada balita merupakan kondisi gagal tumbuh akibat kekurangan gizi kronis yang berdampak jangka panjang pada kualitas sumber daya manusia Indonesia. Berdasarkan data terkini, prevalensi stunting nasional masih berada di angka 21,5%, jauh dari target nasional sebesar <14% pada tahun 2026. Salah satu kendala utama adalah intervensi PMT (Pemberian Makanan Tambahan) yang belum tepat sasaran akibat tidak adanya sistem deteksi dini berbasis data untuk mengidentifikasi balita yang berisiko stunting secara lebih awal dan akurat. Kementerian Kesehatan bekerja sama dengan BPS membentuk Tim Data Science guna membangun model prediksi risiko stunting berbasis data survei gizi balita di 50 kabupaten/kota prioritas.

1.1.2 Objektif Bisnis dan Metrik Kesuksesan [KUK 1.2–1.3]

Table 1.1: Table 1.2: Objektif Bisnis dan Metrik Kesuksesan Proyek Prediksi Stunting
No Objektif Bisnis Metrik Kesuksesan (Kuantitatif)
1 Membangun model prediksi risiko stunting yang akurat untuk mendeteksi balita berisiko secara dini Recall/Sensitivity ≥ 80% pada data uji; AUC-ROC ≥ 0,80
2 Menurunkan proporsi balita stunting yang tidak terdeteksi (False Negative) agar intervensi PMT lebih tepat sasaran False Negative Rate < 20%; cakupan intervensi tepat sasaran meningkat ≥ 30%
3 Mengidentifikasi faktor-faktor risiko dominan stunting di 50 kab/kota prioritas sebagai dasar kebijakan Minimal 5 variabel prediktor signifikan (p < 0,05); model menjelaskan pola risiko per wilayah
4 Mendukung percepatan penurunan prevalensi stunting nasional dari 21,5% menuju target <14% pada 2026 Penurunan prevalensi ≥ 7,5 poin persentase dalam 2 tahun dengan dukungan sistem deteksi

1.1.3 Daftar Terminologi [KUK 2.1]

Table 1.3: Table 1.4: Daftar Terminologi Proyek Prediksi Stunting
Terminologi Definisi
Stunting Kondisi gagal tumbuh pada balita akibat kekurangan gizi kronis, ditandai Z-score TB/U < -2 SD (WHO)
Z-score TB/U Nilai standar tinggi badan menurut umur; indikator utama status stunting
PMT Pemberian Makanan Tambahan; program intervensi gizi bagi balita berisiko
Class Imbalance Kondisi distribusi kelas target yang tidak seimbang dalam dataset
CRISP-DM Cross Industry Standard Process for Data Mining; metodologi standar proyek data science
Recall/Sensitivity Metrik evaluasi kemampuan model mendeteksi kasus positif (Stunting) yang sebenarnya
False Negative Kondisi saat balita Stunting diprediksi sebagai Normal oleh model
Feature Engineering Proses pembuatan fitur baru dari data mentah untuk meningkatkan performa model

1.1.4 Asumsi Utama Proyek [KUK 2.1]

  1. Data survei gizi balita (data_stunting_raw.csv) dianggap representatif untuk mewakili populasi balita di 50 kab/kota prioritas
  2. Definisi dan pengukuran status stunting mengacu pada standar WHO: Z-score TB/U < −2 SD
  3. Tidak ada perubahan kebijakan gizi nasional yang signifikan selama periode proyek 2025–2026
  4. Data yang dikumpulkan petugas lapangan sudah melalui proses verifikasi awal
  5. Infrastruktur teknologi informasi di tingkat kab/kota tersedia untuk mendukung implementasi

1.1.5 Batasan Proyek (Scope) [KUK 2.2]

  • Cakupan wilayah terbatas pada 50 kab/kota prioritas PPS Nasional 2025–2026
  • Subjek analisis: balita usia 6–59 bulan (500 balita)
  • Model klasifikasi biner (Stunting vs Normal), bukan regresi Z-score kontinu
  • Variabel terbatas pada 17 variabel dalam dataset
  • Horizon proyek: 2025–2026; pembaruan model sesuai ketersediaan data baru

1.2 IK 02 — Risiko, Biaya, dan Keuntungan Proyek

1.2.1 Daftar Risiko dan Rencana Mitigasi [KUK 3.1–3.3]

Table 1.5: Table 1.6: Daftar Risiko dan Rencana Mitigasi Proyek
No Jenis Risiko Deskripsi Risiko Rencana Mitigasi
1 Data Missing values pada bb_lahir (20), pend_ibu (15), kunjungan_posyandu (10); outlier ekstrem pada zscore_tbu Imputasi median per kelompok usia (numerik) & modus (kategorik); penanganan outlier dengan batas WHO; dokumentasi lengkap
2 Kebijakan Perubahan kebijakan intervensi stunting selama proyek dapat membuat model tidak relevan Koordinasi rutin dengan Kemenkes & BPS; penyusunan klausul pembaruan model berkala (min. 1×/tahun)
3 Teknis Overfitting akibat data terbatas (500 obs) & class imbalance Normal:Stunting = 2,88:1 5-fold stratified cross-validation; teknik oversampling (ROSE/SMOTE); metrik utama: Recall & AUC-ROC
4 Etika Data balita bersifat sensitif; penggunaan tidak bertanggung jawab berpotensi menimbulkan stigmatisasi Anonimisasi data; izin penggunaan dari instansi berwenang; tidak publikasi data individual; mematuhi regulasi PDP
5 Operasional Keterbatasan kapasitas SDM kab/kota dalam mengoperasikan sistem prediksi berbasis ML Penyusunan user manual; pelatihan petugas Puskesmas & Dinkes; dashboard sederhana berbasis model

1.2.2 Tabel Komponen Biaya Proyek [KUK 4.1]

Table 1.7: Table 1.8: Komponen Biaya Pelaksanaan Proyek Data Science Stunting
No Komponen Estimasi Biaya
1 SDM
NA — Project Manager (12 bln) Rp 120.000.000
NA — 2 Data Scientist (12 bln) Rp 240.000.000
NA — 1 Data Engineer + 1 Domain Expert Rp 180.000.000
2 Infrastruktur
NA — Server cloud & storage Rp 60.000.000
NA — Laptop/workstation (5 unit) Rp 50.000.000
NA — Maintenance IT (12 bln) Rp 24.000.000
3 Data
NA — Pengumpulan data lapangan Rp 80.000.000
NA — Validasi & quality control Rp 20.000.000
4 Lisensi & Lain-lain
NA — Lisensi software analitik Rp 15.000.000
NA — Pelatihan & sosialisasi Rp 40.000.000
NA — Dokumentasi & pelaporan Rp 21.000.000
TOTAL ESTIMASI BIAYA Rp 850.000.000

2 Bukti Unit Kompetensi 2 — Menentukan Tujuan Teknis Data Science

Kode Unit: J.62DMI00.002.1


2.1 IK 03 — Eksplorasi Distribusi Variabel Target & Class Imbalance

library(readr); library(tidyverse); library(knitr); library(kableExtra)

data_stunting_raw <- read_csv("C:/Users/User/Downloads/data_stunting_raw.csv")
tabel_target <- data_stunting_raw %>%
  count(status_stunting) %>%
  mutate(persentase = round(n / sum(n) * 100, 2))

kable(tabel_target,
      caption = "Distribusi Frekuensi Variabel Target status_stunting",
      col.names = c("Status Stunting","Frekuensi (n)","Persentase (%)"),
      align = "lrr") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = FALSE, position = "center") %>%
  row_spec(which(tabel_target$status_stunting == "Stunting"),
           background = "#FFEBEE", color = "#B71C1C", bold = TRUE) %>%
  row_spec(which(tabel_target$status_stunting == "Normal"),
           background = "#E3F2FD", color = "#0D47A1") %>%
  row_spec(0, background = "#1A237E", color = "white", bold = TRUE)
Table 2.1: Table 2.2: Distribusi Frekuensi Variabel Target status_stunting
Status Stunting Frekuensi (n) Persentase (%)
Normal 371 74.2
Stunting 129 25.8
pct_min <- tabel_target %>% filter(status_stunting == "Stunting") %>% pull(persentase)
cat(sprintf(
  "Kelas minoritas (Stunting) : %s%%\nKelas mayoritas (Normal)   : %s%%\nRasio imbalance            : %.2f : 1\nKategori imbalance         : Mild Imbalance (20–40%%)\n",
  pct_min, 100 - pct_min, (100 - pct_min) / pct_min))
#> Kelas minoritas (Stunting) : 25.8%
#> Kelas mayoritas (Normal)   : 74.2%
#> Rasio imbalance            : 2.88 : 1
#> Kategori imbalance         : Mild Imbalance (20–40%)
ggplot(data_stunting_raw, aes(x = status_stunting, fill = status_stunting)) +
  geom_bar(width = 0.5, colour = "white", linewidth = 0.5) +
  geom_text(stat = "count",
            aes(label = paste0(..count.., "\n(",
                               round(..count.. / nrow(data_stunting_raw) * 100, 1), "%)")),
            vjust = -0.3, size = 4.5, fontface = "bold") +
  scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.18)), limits = c(0, NA)) +
  labs(title    = "Distribusi Variabel Target: Status Stunting",
       subtitle = "Dataset data_stunting_raw.csv — 500 Balita, 50 Kab/Kota Prioritas",
       x = "Status Stunting", y = "Jumlah Balita", fill = "Status",
       caption = "Sumber: data_stunting_raw.csv | Rasio Imbalance 2,88 : 1 (Mild Imbalance)") +
  annotate("label", x = 1.5, y = 420,
           label = "Rasio Imbalance\n2,88 : 1\n(Mild Imbalance)",
           fill = "#FFF9C4", colour = "#E65100", size = 3.2,
           label.size = 0.4, fontface = "italic")
Distribusi Variabel Target Status Stunting (n = 500 Balita)

Figure 2.1: Distribusi Variabel Target Status Stunting (n = 500 Balita)

2.1.1 Interpretasi: Class Imbalance

Dari 500 balita, 371 berstatus Normal (74,2%) dan 129 berstatus Stunting (25,8%). Rasio ketidakseimbangan adalah 2,88 : 1 yang tergolong Mild Imbalance (20–40%). Kondisi ini tetap perlu diperhatikan karena dapat membuat model bias ke kelas mayoritas dalam mendeteksi kelas Stunting yang paling penting dideteksi dalam konteks kesehatan publik.

2.1.2 Metrik Evaluasi dan Kriteria Kesuksesan Teknis [KUK 2.1–2.2]

Table 2.3: Table 2.4: Metrik Evaluasi dan Threshold Kriteria Kesuksesan Teknis
Metrik Rumus Threshold Alasan
Recall / Sensitivity TP / (TP + FN) ≥ 80% METRIK UTAMA — Mendeteksi semua balita Stunting yang sebenarnya
Precision TP / (TP + FP) ≥ 70% Ketepatan prediksi agar intervensi tidak terbuang sia-sia
F1-Score 2 × (P × R) / (P + R) ≥ 75% Keseimbangan antara Recall dan Precision
AUC-ROC Area under ROC curve ≥ 0,80 Kemampuan diskriminasi model secara keseluruhan
Specificity TN / (TN + FP) Sekunder Mendeteksi balita Normal yang sebenarnya

False Negative jauh lebih kritis dibanding False Positive. Balita Stunting yang tidak terdeteksi tidak mendapat intervensi PMT → gagal tumbuh permanen, target nasional tidak tercapai. Sebaliknya, False Positive hanya menyebabkan inefisiensi anggaran PMT tanpa bahaya medis.


3 Bukti Unit Kompetensi 3 — Menelaah Data

Kode Unit: J.62DMI00.005.1


3.1 IK 04 — Analisis Tipe dan Relasi Data [KUK 1.1–1.3]

str(data_stunting_raw)
#> spc_tbl_ [500 × 17] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
#>  $ id_balita         : chr [1:500] "BAL0001" "BAL0002" "BAL0003" "BAL0004" ...
#>  $ kab_kota          : chr [1:500] "KAB_02" "KAB_07" "KAB_27" "KAB_14" ...
#>  $ jenis_kelamin     : chr [1:500] "L" "L" "L" "L" ...
#>  $ usia_bulan        : num [1:500] 33 24 10 27 55 29 21 32 26 23 ...
#>  $ bb_lahir          : num [1:500] 2.16 2.95 3.03 3 3.07 2.76 3.11 3.08 2.52 3.71 ...
#>  $ tb_sekarang       : num [1:500] 78.4 65.2 61.1 74.6 90.3 73.1 79.7 70.8 65.2 72 ...
#>  $ bb_sekarang       : num [1:500] 9.39 10.61 8.45 11.99 12.78 ...
#>  $ pend_ibu          : chr [1:500] "SD" "SMP" "SD" "SMA" ...
#>  $ pendapatan        : num [1:500] 2828197 1981942 779264 7148622 7289472 ...
#>  $ akses_air         : chr [1:500] "Tidak" "Ya" "Ya" "Ya" ...
#>  $ sanitasi          : chr [1:500] "Baik" "Baik" "Kurang" "Kurang" ...
#>  $ asi_eksklusif     : chr [1:500] "Tidak" "Ya" "Ya" "Ya" ...
#>  $ imunisasi         : chr [1:500] "Lengkap" "Tidak Lengkap" "Lengkap" "Lengkap" ...
#>  $ diare_3bln        : num [1:500] 0 2 0 1 0 0 1 0 0 0 ...
#>  $ kunjungan_posyandu: num [1:500] 11 11 1 5 7 5 4 7 8 3 ...
#>  $ zscore_tbu        : num [1:500] 0.246 -2.549 -2.2 -0.189 1.257 ...
#>  $ status_stunting   : chr [1:500] "Normal" "Stunting" "Stunting" "Normal" ...
#>  - attr(*, "spec")=
#>   .. cols(
#>   ..   id_balita = col_character(),
#>   ..   kab_kota = col_character(),
#>   ..   jenis_kelamin = col_character(),
#>   ..   usia_bulan = col_double(),
#>   ..   bb_lahir = col_double(),
#>   ..   tb_sekarang = col_double(),
#>   ..   bb_sekarang = col_double(),
#>   ..   pend_ibu = col_character(),
#>   ..   pendapatan = col_double(),
#>   ..   akses_air = col_character(),
#>   ..   sanitasi = col_character(),
#>   ..   asi_eksklusif = col_character(),
#>   ..   imunisasi = col_character(),
#>   ..   diare_3bln = col_double(),
#>   ..   kunjungan_posyandu = col_double(),
#>   ..   zscore_tbu = col_double(),
#>   ..   status_stunting = col_character()
#>   .. )
#>  - attr(*, "problems")=<externalptr>
head(data_stunting_raw, 5)
#> # A tibble: 5 × 17
#>   id_balita kab_kota jenis_kelamin usia_bulan bb_lahir tb_sekarang bb_sekarang
#>   <chr>     <chr>    <chr>              <dbl>    <dbl>       <dbl>       <dbl>
#> 1 BAL0001   KAB_02   L                     33     2.16        78.4        9.39
#> 2 BAL0002   KAB_07   L                     24     2.95        65.2       10.6 
#> 3 BAL0003   KAB_27   L                     10     3.03        61.1        8.45
#> 4 BAL0004   KAB_14   L                     27     3           74.6       12.0 
#> 5 BAL0005   KAB_14   P                     55     3.07        90.3       12.8 
#> # ℹ 10 more variables: pend_ibu <chr>, pendapatan <dbl>, akses_air <chr>,
#> #   sanitasi <chr>, asi_eksklusif <chr>, imunisasi <chr>, diare_3bln <dbl>,
#> #   kunjungan_posyandu <dbl>, zscore_tbu <dbl>, status_stunting <chr>

3.1.1 Tabel Ringkasan Tipe Data

Table 3.1: Table 3.2: Tabel Ringkasan Tipe Data Setiap Variabel
No Variabel Tipe Skala Contoh Nilai Keterangan
1 id_balita Kategorik Nominal BAL0001 ID unik; tidak dipakai pemodelan
2 kab_kota Kategorik Nominal KAB_02 50 wilayah; tidak dipakai pemodelan
3 jenis_kelamin Kategorik Nominal L, P Perlu binary encoding
4 usia_bulan Numerik Rasio 33, 55 Usia balita dalam bulan
5 bb_lahir Numerik Rasio 2.16, 3.03 Berat lahir (kg); ada 20 NA
6 tb_sekarang Numerik Rasio 78.4, 65.2 Tinggi badan saat ini (cm)
7 bb_sekarang Numerik Rasio 9.39, 12.78 Berat badan saat ini (kg)
8 pend_ibu Kategorik Ordinal SD, SMP, SMA Bertingkat; perlu ordinal encoding; ada 15 NA
9 pendapatan Numerik Rasio 779264, 7148622 Pendapatan keluarga (Rp)
10 akses_air Kategorik Nominal Ya, Tidak Perlu binary encoding
11 sanitasi Kategorik Nominal Baik, Kurang Perlu binary encoding
12 asi_eksklusif Kategorik Nominal Ya, Tidak Perlu binary encoding
13 imunisasi Kategorik Nominal Lengkap, Tidak Lengkap Perlu binary encoding
14 diare_3bln Numerik Rasio 0, 1, 2 Episode diare 3 bulan terakhir
15 kunjungan_posyandu Numerik Rasio 1, 5, 11 Frekuensi kunjungan; ada 10 NA
16 zscore_tbu Numerik Interval -2.549, 0.246 Z-score TB/U WHO; ada outlier ekstrem
17 status_stunting Kategorik Nominal Normal, Stunting Variabel TARGET

3.1.2 Matriks Korelasi Variabel Numerik [KUK 1.3]

library(corrplot)

numerik_vars <- data_stunting_raw %>%
  select(usia_bulan, bb_lahir, tb_sekarang, bb_sekarang,
         pendapatan, diare_3bln, kunjungan_posyandu, zscore_tbu)

cor_matrix <- cor(numerik_vars, use = "complete.obs")

corrplot(cor_matrix,
         method      = "color",
         type        = "upper",
         order       = "hclust",
         addCoef.col = "#37474F",
         tl.col      = "#1A237E",
         tl.srt      = 40,
         tl.cex      = 0.85,
         number.cex  = 0.72,
         col         = colorRampPalette(c(COL_STUNTING, "white", COL_NORMAL))(200),
         cl.cex      = 0.75,
         title       = "Matriks Korelasi Variabel Numerik — Data Stunting",
         mar         = c(0, 0, 2.5, 0))
Matriks Korelasi Variabel Numerik — Dataset Stunting

Figure 3.1: Matriks Korelasi Variabel Numerik — Dataset Stunting

cor_df <- as.data.frame(as.table(cor_matrix)) %>%
  filter(Var1 != Var2) %>%
  mutate(abs_cor = abs(Freq)) %>%
  arrange(desc(abs_cor)) %>%
  filter(!duplicated(abs_cor)) %>%
  head(5)

kable(cor_df, digits = 3,
      caption = "Top 5 Pasangan Variabel dengan Korelasi Tertinggi",
      col.names = c("Variabel 1","Variabel 2","Nilai Korelasi","|Korelasi|")) %>%
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE, position = "center") %>%
  row_spec(1, background = "#FFEBEE", bold = TRUE, color = "#B71C1C") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE)
Table 3.3: Table 3.4: Top 5 Pasangan Variabel dengan Korelasi Tertinggi
Variabel 1 Variabel 2 Nilai Korelasi &#124;Korelasi&#124;
zscore_tbu tb_sekarang 0.880 0.880
bb_sekarang usia_bulan 0.493 0.493
tb_sekarang usia_bulan 0.470 0.470
bb_sekarang tb_sekarang 0.261 0.261
bb_sekarang bb_lahir -0.091 0.091

Korelasi tertinggi: zscore_tbu dan tb_sekarang (r = 0,880) — sangat logis secara medis karena Z-score TB/U diturunkan dari nilai tinggi badan aktual. Perlu diwaspadai potensi multikolinearitas antara kedua variabel ini dalam pemodelan.


3.2 IK 05 — Statistik Deskriptif dan Visualisasi [KUK 2.1–2.3]

library(e1071)

stat_desk <- data_stunting_raw %>%
  select(usia_bulan, bb_lahir, tb_sekarang, bb_sekarang,
         pendapatan, diare_3bln, kunjungan_posyandu, zscore_tbu) %>%
  summarise(across(everything(), list(
    Min    = ~min(., na.rm = TRUE),
    Maks   = ~max(., na.rm = TRUE),
    Mean   = ~mean(., na.rm = TRUE),
    Median = ~median(., na.rm = TRUE),
    SD     = ~sd(., na.rm = TRUE),
    P25    = ~quantile(., 0.25, na.rm = TRUE),
    P75    = ~quantile(., 0.75, na.rm = TRUE),
    Skew   = ~skewness(., na.rm = TRUE)
  ), .names = "{.col}__{.fn}")) %>%
  pivot_longer(everything(), names_to = c("Variabel","Statistik"), names_sep = "__") %>%
  pivot_wider(names_from = Statistik, values_from = value)

kable(stat_desk, digits = 2,
      caption = "Statistik Deskriptif Variabel Numerik",
      align   = c("l", rep("r", 8))) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) %>%
  column_spec(1, bold = TRUE, background = "#E8EAF6") %>%
  column_spec(9, background = "#FFF9C4",
              color = ifelse(abs(as.numeric(stat_desk$Skew)) >= 1, COL_STUNTING, "black"),
              bold  = ifelse(abs(as.numeric(stat_desk$Skew)) >= 1, TRUE, FALSE)) %>%
  add_header_above(c(" " = 1, "Nilai Ekstrem" = 2, "Pemusatan" = 2,
                     "Sebaran" = 3, "Bentuk" = 1)) %>%
  row_spec(0, background = "#1A237E", color = "white", bold = TRUE)
Table 3.5: Table 3.6: Statistik Deskriptif Variabel Numerik
Nilai Ekstrem
Pemusatan
Sebaran
Bentuk
Variabel Min Maks Mean Median SD P25 P75 Skew
usia_bulan 6.00 59.00 32.96 32.50 15.77 19.00 48.00 0.04
bb_lahir 1.54 4.43 3.10 3.10 0.51 2.76 3.45 -0.07
tb_sekarang 5.00 200.00 74.93 74.70 12.47 68.60 81.00 2.45
bb_sekarang 0.50 50.00 12.10 12.02 3.17 10.37 13.77 4.04
pendapatan 500000.00 14881706.00 3063535.35 2650597.50 1959533.69 1697066.25 3879258.00 1.94
diare_3bln 0.00 5.00 0.99 0.50 1.32 0.00 2.00 1.42
kunjungan_posyandu 1.00 12.00 6.62 7.00 3.56 3.00 10.00 -0.05
zscore_tbu -20.29 36.51 -0.74 -0.77 3.13 -2.02 0.61 3.69
ggplot(data_stunting_raw, aes(x = zscore_tbu, fill = status_stunting)) +
  geom_histogram(bins = 42, alpha = 0.75, position = "identity",
                 colour = "white", linewidth = 0.15) +
  geom_vline(xintercept = -2, linetype = "dashed", colour = COL_ORANGE, linewidth = 1) +
  annotate("rect", xmin = -Inf, xmax = -2, ymin = -Inf, ymax = Inf,
           fill = "#FFCCBC", alpha = 0.12) +
  annotate("label", x = -4.2, y = 25, label = "Zona Stunting\n(< −2 SD)",
           fill = "#FFF3E0", colour = COL_ORANGE, size = 3, label.size = 0.35) +
  scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
  scale_x_continuous(breaks = seq(-6, 4, 1)) +
  coord_cartesian(xlim = c(-8, 6)) +
  labs(title    = "Distribusi Z-Score TB/U berdasarkan Status Stunting",
       subtitle = "Outlier ekstrem (di luar rentang ±6 SD) terpotong untuk keterbacaan",
       x = "Z-Score TB/U", y = "Frekuensi", fill = "Status Stunting",
       caption = "Garis putus-putus oranye: cut-off WHO −2 SD")
Distribusi Z-Score TB/U berdasarkan Status Stunting

Figure 3.2: Distribusi Z-Score TB/U berdasarkan Status Stunting

library(patchwork)

make_box <- function(var, label, unit) {
  ggplot(data_stunting_raw, aes(x = status_stunting, y = .data[[var]],
                                 fill = status_stunting)) +
    geom_violin(alpha = 0.25, trim = TRUE, colour = NA) +
    geom_boxplot(width = 0.3, outlier.colour = COL_ORANGE, outlier.shape = 21,
                 outlier.size = 1.8, outlier.fill = "#FFB300", linewidth = 0.45) +
    stat_summary(fun = mean, geom = "point", shape = 23,
                 size = 3, fill = "white", colour = COL_ACCENT) +
    scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
    labs(title = label, x = "Status Stunting", y = unit) +
    theme(legend.position = "none")
}

p1 <- make_box("tb_sekarang", "Tinggi Badan (cm)", "Tinggi Badan (cm)")
p2 <- make_box("bb_sekarang", "Berat Badan (kg)",  "Berat Badan (kg)")
p3 <- make_box("zscore_tbu",  "Z-Score TB/U",       "Z-Score TB/U")

(p1 | p2 | p3) +
  plot_annotation(
    title    = "Distribusi Variabel Antropometri per Status Stunting",
    subtitle = "Violin + Boxplot | Berlian putih = rata-rata | Titik oranye = outlier",
    theme    = theme(plot.title    = element_text(face = "bold", colour = "#1A237E", size = 13),
                     plot.subtitle = element_text(colour = "#546E7A", size = 9.5))
  )
Distribusi Tinggi & Berat Badan per Status Stunting

Figure 3.3: Distribusi Tinggi & Berat Badan per Status Stunting

ggplot(data_stunting_raw %>% filter(zscore_tbu > -8, zscore_tbu < 6),
       aes(x = tb_sekarang, y = zscore_tbu, colour = status_stunting)) +
  geom_point(alpha = 0.5, size = 1.8, shape = 16) +
  geom_smooth(aes(group = status_stunting), method = "lm", se = TRUE,
              linewidth = 0.9) +
  geom_hline(yintercept = -2, linetype = "dashed", colour = COL_ORANGE, linewidth = 0.9) +
  annotate("label", x = 115, y = -1.55, label = "Cut-off −2 SD",
           fill = "#FFF3E0", colour = COL_ORANGE, size = 3, label.size = 0.3) +
  scale_colour_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
  labs(title    = "Scatter Plot: Tinggi Badan vs Z-Score TB/U",
       subtitle = "Korelasi positif sangat kuat (r = 0,880) | Outlier ekstrem dipotong",
       x = "Tinggi Badan Sekarang (cm)", y = "Z-Score TB/U",
       colour = "Status Stunting",
       caption = "Sumber: data_stunting_raw.csv")
Scatter Plot: Tinggi Badan vs Z-Score TB/U

Figure 3.4: Scatter Plot: Tinggi Badan vs Z-Score TB/U

skew_df <- data_stunting_raw %>%
  select(usia_bulan, bb_lahir, tb_sekarang, bb_sekarang,
         pendapatan, diare_3bln, kunjungan_posyandu, zscore_tbu) %>%
  summarise(across(everything(), ~round(skewness(., na.rm = TRUE), 3))) %>%
  pivot_longer(everything(), names_to = "Variabel", values_to = "Skewness") %>%
  mutate(Kategori = case_when(
    abs(Skewness) < 0.5  ~ "Simetris",
    abs(Skewness) < 1    ~ "Skewed Sedang",
    TRUE                 ~ "Skewed Berat"
  ))

kable(skew_df, caption = "Nilai Skewness Variabel Numerik",
      col.names = c("Variabel","Nilai Skewness","Kategori")) %>%
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE, position = "center") %>%
  row_spec(which(skew_df$Kategori == "Skewed Berat"),
           background = "#FFEBEE", color = COL_STUNTING, bold = TRUE) %>%
  row_spec(which(skew_df$Kategori == "Simetris"),
           background = "#E8F5E9") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE)
Table 3.7: Table 3.8: Nilai Skewness Variabel Numerik
Variabel Nilai Skewness Kategori
usia_bulan 0.045 Simetris
bb_lahir -0.073 Simetris
tb_sekarang 2.450 Skewed Berat
bb_sekarang 4.039 Skewed Berat
pendapatan 1.939 Skewed Berat
diare_3bln 1.422 Skewed Berat
kunjungan_posyandu -0.049 Simetris
zscore_tbu 3.695 Skewed Berat

3.3 IK 06 — Analisis Hubungan Variabel Kategorik dengan Status Stunting [KUK 3.1–3.2]

buat_tabel_proporsi <- function(data, var) {
  data %>%
    group_by(across(all_of(c(var, "status_stunting")))) %>%
    summarise(n = n(), .groups = "drop") %>%
    group_by(across(all_of(var))) %>%
    mutate(proporsi = round(n / sum(n) * 100, 1)) %>%
    ungroup()
}

for (v in c("pend_ibu","akses_air","sanitasi","asi_eksklusif","imunisasi")) {
  tbl <- buat_tabel_proporsi(data_stunting_raw, v)
  cat("\n\n**Tabel Proporsi:", v, "vs status_stunting**\n\n")
  print(kable(tbl, caption = paste("Proporsi:", v, "vs Status Stunting")) %>%
    kable_styling(bootstrap_options = c("striped","hover","condensed"),
                  full_width = FALSE, position = "center"))
}

Tabel Proporsi: pend_ibu vs status_stunting

Table 3.9: Table 3.10: Proporsi: pend_ibu vs Status Stunting
pend_ibu status_stunting n proporsi
PT Normal 37 84.1
PT Stunting 7 15.9
SD Normal 75 68.2
SD Stunting 35 31.8
SMA Normal 107 69.9
SMA Stunting 46 30.1
SMP Normal 123 80.4
SMP Stunting 30 19.6
Tidak Sekolah Normal 19 76.0
Tidak Sekolah Stunting 6 24.0
NA Normal 10 66.7
NA Stunting 5 33.3

Tabel Proporsi: akses_air vs status_stunting

Table 3.9: Table 3.9: Proporsi: akses_air vs Status Stunting
akses_air status_stunting n proporsi
Tidak Normal 101 78.3
Tidak Stunting 28 21.7
Ya Normal 270 72.8
Ya Stunting 101 27.2

Tabel Proporsi: sanitasi vs status_stunting

Table 3.9: Table 3.9: Proporsi: sanitasi vs Status Stunting
sanitasi status_stunting n proporsi
Baik Normal 220 74.3
Baik Stunting 76 25.7
Kurang Normal 151 74.0
Kurang Stunting 53 26.0

Tabel Proporsi: asi_eksklusif vs status_stunting

Table 3.9: Table 3.9: Proporsi: asi_eksklusif vs Status Stunting
asi_eksklusif status_stunting n proporsi
Tidak Normal 166 73.1
Tidak Stunting 61 26.9
Ya Normal 205 75.1
Ya Stunting 68 24.9

Tabel Proporsi: imunisasi vs status_stunting

Table 3.9: Table 3.9: Proporsi: imunisasi vs Status Stunting
imunisasi status_stunting n proporsi
Lengkap Normal 254 72.8
Lengkap Stunting 95 27.2
Tidak Lengkap Normal 117 77.5
Tidak Lengkap Stunting 34 22.5
data_stunting_raw %>%
  filter(!is.na(pend_ibu)) %>%
  mutate(pend_ibu = factor(pend_ibu,
         levels = c("Tidak Sekolah","SD","SMP","SMA","PT"))) %>%
  count(pend_ibu, status_stunting) %>%
  group_by(pend_ibu) %>%
  mutate(pct = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = pend_ibu, y = pct, fill = status_stunting)) +
  geom_col(position = "fill", colour = "white", linewidth = 0.4, width = 0.65) +
  geom_text(aes(label = paste0(pct, "%")),
            position = position_fill(vjust = 0.5),
            colour = "white", size = 3.5, fontface = "bold") +
  scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(title    = "Proporsi Status Stunting per Tingkat Pendidikan Ibu",
       subtitle = "Semakin tinggi pendidikan ibu → prevalensi stunting cenderung menurun",
       x = "Tingkat Pendidikan Ibu", y = "Proporsi (%)", fill = "Status Stunting",
       caption = "Prevalensi stunting tertinggi: SD (31,8%) | terendah: PT (15,9%)")
Proporsi Status Stunting per Pendidikan Ibu

Figure 3.5: Proporsi Status Stunting per Pendidikan Ibu

data_stunting_raw %>%
  select(akses_air, sanitasi, asi_eksklusif, imunisasi, status_stunting) %>%
  pivot_longer(-status_stunting, names_to = "variabel", values_to = "kategori") %>%
  mutate(variabel = recode(variabel,
    akses_air     = "Akses Air Bersih",
    sanitasi      = "Sanitasi",
    asi_eksklusif = "ASI Eksklusif",
    imunisasi     = "Imunisasi")) %>%
  count(variabel, kategori, status_stunting) %>%
  group_by(variabel, kategori) %>%
  mutate(pct = round(n / sum(n) * 100, 1)) %>%
  ggplot(aes(x = kategori, y = pct, fill = status_stunting)) +
  geom_col(position = "fill", colour = "white", linewidth = 0.35, width = 0.6) +
  geom_text(aes(label = paste0(pct, "%")),
            position = position_fill(vjust = 0.5),
            colour = "white", size = 3.2, fontface = "bold") +
  facet_wrap(~variabel, scales = "free_x", nrow = 2) +
  scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(title    = "Proporsi Status Stunting per Variabel Kategorik",
       subtitle = "Faktor akses layanan kesehatan dan perilaku kesehatan",
       x = NULL, y = "Proporsi (%)", fill = "Status Stunting") +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))
Proporsi Status Stunting per Variabel Kategorik (Panel)

Figure 3.6: Proporsi Status Stunting per Variabel Kategorik (Panel)


4 Bukti Unit Kompetensi 4 — Memvalidasi Data

Kode Unit: J.62DMI00.006.1


4.1 IK 07 — Identifikasi Nilai Tidak Valid & Kecukupan Data [KUK 1.1–1.2]

Table 4.1: Table 4.2: Batas Wajar WHO untuk Validasi Data Antropometri
Variabel Batas Bawah (−3SD) Batas Atas (+3SD) Dasar Referensi
tb_sekarang 61,2 cm 124,0 cm WHO Height-for-Age 6–59 bln
bb_sekarang 5,5 kg 24,0 kg WHO Weight-for-Age 6–59 bln
bb_lahir 2,0 kg 4,5 kg WHO: BBLR <2500g; LGA >4500g
zscore_tbu −6,0 SD +6,0 SD WHO biologically implausible
outlier_tb <- data_stunting_raw %>%
  filter(tb_sekarang < 61.2 | tb_sekarang > 124.0) %>%
  select(id_balita, usia_bulan, tb_sekarang, status_stunting)

outlier_bb <- data_stunting_raw %>%
  filter(bb_sekarang < 5.5 | bb_sekarang > 24.0) %>%
  select(id_balita, usia_bulan, bb_sekarang, status_stunting)

outlier_bbl <- data_stunting_raw %>%
  filter(!is.na(bb_lahir), bb_lahir < 2.0 | bb_lahir > 4.5) %>%
  select(id_balita, usia_bulan, bb_lahir, status_stunting)

outlier_zscore <- data_stunting_raw %>%
  filter(zscore_tbu < -6 | zscore_tbu > 6) %>%
  select(id_balita, usia_bulan, zscore_tbu, status_stunting)

cat("=== Outlier tb_sekarang (WHO: 61.2–124.0 cm) ===\n")
#> === Outlier tb_sekarang (WHO: 61.2–124.0 cm) ===
cat("Jumlah outlier:", nrow(outlier_tb), "\n")
#> Jumlah outlier: 42
print(head(outlier_tb, 10))
#> # A tibble: 10 × 4
#>    id_balita usia_bulan tb_sekarang status_stunting
#>    <chr>          <dbl>       <dbl> <chr>          
#>  1 BAL0003           10        61.1 Stunting       
#>  2 BAL0023            6        61   Normal         
#>  3 BAL0040           18        60.8 Stunting       
#>  4 BAL0060           12        60.5 Stunting       
#>  5 BAL0071           24        56.2 Stunting       
#>  6 BAL0078           11        53.2 Stunting       
#>  7 BAL0079            7        52.1 Stunting       
#>  8 BAL0085           11        58.1 Stunting       
#>  9 BAL0102           16        58.9 Stunting       
#> 10 BAL0109           16        58.7 Stunting
cat("\n=== Outlier bb_sekarang (WHO: 5.5–24.0 kg) ===\n")
#> 
#> === Outlier bb_sekarang (WHO: 5.5–24.0 kg) ===
cat("Jumlah outlier:", nrow(outlier_bb), "\n")
#> Jumlah outlier: 3
print(outlier_bb)
#> # A tibble: 3 × 4
#>   id_balita usia_bulan bb_sekarang status_stunting
#>   <chr>          <dbl>       <dbl> <chr>          
#> 1 BAL0006           29        35   Normal         
#> 2 BAL0028           52         0.5 Normal         
#> 3 BAL0212           56        50   Normal
cat("\n=== Outlier bb_lahir (WHO: 2.0–4.5 kg) ===\n")
#> 
#> === Outlier bb_lahir (WHO: 2.0–4.5 kg) ===
cat("Jumlah outlier:", nrow(outlier_bbl), "\n")
#> Jumlah outlier: 9
print(outlier_bbl)
#> # A tibble: 9 × 4
#>   id_balita usia_bulan bb_lahir status_stunting
#>   <chr>          <dbl>    <dbl> <chr>          
#> 1 BAL0013           28     1.72 Normal         
#> 2 BAL0132            9     1.95 Normal         
#> 3 BAL0155           21     1.83 Normal         
#> 4 BAL0156           16     1.85 Normal         
#> 5 BAL0249           57     1.54 Normal         
#> 6 BAL0326           45     1.7  Stunting       
#> 7 BAL0358           42     1.98 Normal         
#> 8 BAL0370           37     1.66 Stunting       
#> 9 BAL0494            9     1.78 Normal
cat("\n=== Outlier zscore_tbu (WHO: -6 s/d +6) ===\n")
#> 
#> === Outlier zscore_tbu (WHO: -6 s/d +6) ===
cat("Jumlah outlier:", nrow(outlier_zscore), "\n")
#> Jumlah outlier: 5
print(outlier_zscore)
#> # A tibble: 5 × 4
#>   id_balita usia_bulan zscore_tbu status_stunting
#>   <chr>          <dbl>      <dbl> <chr>          
#> 1 BAL0129           51      -18.4 Stunting       
#> 2 BAL0301           29      -20.3 Stunting       
#> 3 BAL0341           11      -15.5 Stunting       
#> 4 BAL0390           54       27.0 Normal         
#> 5 BAL0467           19       36.5 Normal
rekap_outlier <- data.frame(
  Variabel    = c("tb_sekarang","bb_sekarang","bb_lahir","zscore_tbu","TOTAL"),
  Jumlah      = c(nrow(outlier_tb), nrow(outlier_bb), nrow(outlier_bbl),
                  nrow(outlier_zscore),
                  nrow(outlier_tb)+nrow(outlier_bb)+nrow(outlier_bbl)+nrow(outlier_zscore)),
  Persen      = round(c(nrow(outlier_tb), nrow(outlier_bb), nrow(outlier_bbl),
                         nrow(outlier_zscore),
                         nrow(outlier_tb)+nrow(outlier_bb)+nrow(outlier_bbl)+nrow(outlier_zscore)
                         ) / 500 * 100, 1),
  Batas_WHO   = c("61,2–124,0 cm","5,5–24,0 kg","2,0–4,5 kg","−6 s/d +6","—")
)

kable(rekap_outlier, caption = "Rekap Total Nilai Tidak Valid Berdasarkan Batas WHO",
      col.names = c("Variabel","Jumlah Tidak Valid","Persentase (%)","Batas WHO")) %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE,
                position = "center") %>%
  row_spec(nrow(rekap_outlier), background = "#B71C1C", color = "white", bold = TRUE) %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE)
Table 4.3: Table 4.4: Rekap Total Nilai Tidak Valid Berdasarkan Batas WHO
Variabel Jumlah Tidak Valid Persentase (%) Batas WHO
tb_sekarang 42 8.4 61,2–124,0 cm
bb_sekarang 3 0.6 5,5–24,0 kg
bb_lahir 9 1.8 2,0–4,5 kg
zscore_tbu 5 1.0 −6 s/d +6
TOTAL 59 11.8
cat("\n=== Missing Values per Variabel ===\n")
#> 
#> === Missing Values per Variabel ===
data_stunting_raw %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variabel", values_to = "NA_Count") %>%
  filter(NA_Count > 0) %>%
  mutate(Persen = round(NA_Count/500*100, 1)) %>%
  print()
#> # A tibble: 3 × 3
#>   Variabel           NA_Count Persen
#>   <chr>                 <int>  <dbl>
#> 1 bb_lahir                 20      4
#> 2 pend_ibu                 15      3
#> 3 kunjungan_posyandu       10      2
n_obs <- nrow(data_stunting_raw); n_pred <- 14; min_obs <- n_pred * 10
cat(sprintf(
  "\n=== Kecukupan Data (Rule of Thumb: 10 obs/prediktor) ===\nObservasi       : %d\nPrediktor aktif : %d\nMinimum (10x)   : %d\nRasio           : %.2f\nKesimpulan      : %s\n",
  n_obs, n_pred, min_obs, n_obs/n_pred,
  ifelse(n_obs >= min_obs, "CUKUP ✓ (rasio 35,71 ≥ 10)", "TIDAK CUKUP ✗")))
#> 
#> === Kecukupan Data (Rule of Thumb: 10 obs/prediktor) ===
#> Observasi       : 500
#> Prediktor aktif : 14
#> Minimum (10x)   : 140
#> Rasio           : 35.71
#> Kesimpulan      : CUKUP ✓ (rasio 35,71 ≥ 10)

5 Bukti Unit Kompetensi 5 — Menentukan Objek Data

Kode Unit: J.62DMI00.007.1


5.1 IK 08 — Pemilihan Atribut dan Stratified Sampling [KUK 1.1–2.2]

cat("=== Dimensi Dataset ===\n")
#> === Dimensi Dataset ===
cat("Jumlah baris (observasi):", nrow(data_stunting_raw), "\n")
#> Jumlah baris (observasi): 500
cat("Jumlah kolom (variabel) :", ncol(data_stunting_raw), "\n")
#> Jumlah kolom (variabel) : 17
cat("\n=== Distribusi Target: status_stunting ===\n")
#> 
#> === Distribusi Target: status_stunting ===
data_stunting_raw %>%
  count(status_stunting) %>%
  mutate(persen = round(n/sum(n)*100, 2)) %>% print()
#> # A tibble: 2 × 3
#>   status_stunting     n persen
#>   <chr>           <int>  <dbl>
#> 1 Normal            371   74.2
#> 2 Stunting          129   25.8
cat("\n=== Proporsi Stunting per Jenis Kelamin ===\n")
#> 
#> === Proporsi Stunting per Jenis Kelamin ===
data_stunting_raw %>%
  group_by(jenis_kelamin, status_stunting) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(jenis_kelamin) %>%
  mutate(proporsi = round(n/sum(n)*100, 2)) %>% print()
#> # A tibble: 4 × 4
#> # Groups:   jenis_kelamin [2]
#>   jenis_kelamin status_stunting     n proporsi
#>   <chr>         <chr>           <int>    <dbl>
#> 1 L             Normal            184     69.4
#> 2 L             Stunting           81     30.6
#> 3 P             Normal            187     79.6
#> 4 P             Stunting           48     20.4
cat("\n=== Proporsi Stunting per Kelompok Usia ===\n")
#> 
#> === Proporsi Stunting per Kelompok Usia ===
data_stunting_raw %>%
  mutate(kel_usia = case_when(
    usia_bulan <= 11 ~ "6-11 bln",
    usia_bulan <= 23 ~ "12-23 bln",
    usia_bulan <= 35 ~ "24-35 bln",
    usia_bulan <= 47 ~ "36-47 bln",
    TRUE             ~ "48-59 bln")) %>%
  group_by(kel_usia, status_stunting) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(kel_usia) %>%
  mutate(proporsi = round(n/sum(n)*100, 2)) %>%
  arrange(kel_usia) %>% print()
#> # A tibble: 10 × 4
#> # Groups:   kel_usia [5]
#>    kel_usia  status_stunting     n proporsi
#>    <chr>     <chr>           <int>    <dbl>
#>  1 12-23 bln Normal             90     77.6
#>  2 12-23 bln Stunting           26     22.4
#>  3 24-35 bln Normal             87     73.7
#>  4 24-35 bln Stunting           31     26.3
#>  5 36-47 bln Normal             66     72.5
#>  6 36-47 bln Stunting           25     27.5
#>  7 48-59 bln Normal             95     74.8
#>  8 48-59 bln Stunting           32     25.2
#>  9 6-11 bln  Normal             33     68.8
#> 10 6-11 bln  Stunting           15     31.2
cat("\n=== Simulasi Stratified Split 80:20 ===\n")
#> 
#> === Simulasi Stratified Split 80:20 ===
cat("Training (80%): Normal ≈", round(371*0.8), "| Stunting ≈", round(129*0.8), "\n")
#> Training (80%): Normal ≈ 297 | Stunting ≈ 103
cat("Test     (20%): Normal ≈", round(371*0.2), "| Stunting ≈", round(129*0.2), "\n")
#> Test     (20%): Normal ≈ 74 | Stunting ≈ 26
cat("\n=== Missing Values per Variabel Prediktor ===\n")
#> 
#> === Missing Values per Variabel Prediktor ===
data_stunting_raw %>%
  select(-id_balita, -kab_kota, -status_stunting) %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variabel", values_to = "Jumlah_NA") %>%
  print()
#> # A tibble: 14 × 2
#>    Variabel           Jumlah_NA
#>    <chr>                  <int>
#>  1 jenis_kelamin              0
#>  2 usia_bulan                 0
#>  3 bb_lahir                  20
#>  4 tb_sekarang                0
#>  5 bb_sekarang                0
#>  6 pend_ibu                  15
#>  7 pendapatan                 0
#>  8 akses_air                  0
#>  9 sanitasi                   0
#> 10 asi_eksklusif              0
#> 11 imunisasi                  0
#> 12 diare_3bln                 0
#> 13 kunjungan_posyandu        10
#> 14 zscore_tbu                 0

6 Bukti Unit Kompetensi 6 — Membersihkan Data

Kode Unit: J.62DMI00.008.1


6.1 IK 09 — Proses Pembersihan Data Lengkap [KUK 1.1–1.2]

# ── Kondisi sebelum cleaning ──────────────────────────────────────────────────
cat("=== KONDISI DATA SEBELUM CLEANING ===\n")
#> === KONDISI DATA SEBELUM CLEANING ===
cat("Jumlah baris:", nrow(data_stunting_raw), "\n")
#> Jumlah baris: 500
cat("\n--- Missing Values ---\n")
#> 
#> --- Missing Values ---
data_stunting_raw %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variabel", values_to = "NA_Count") %>%
  filter(NA_Count > 0) %>%
  mutate(Persen = round(NA_Count / 500 * 100, 1)) %>% print()
#> # A tibble: 3 × 3
#>   Variabel           NA_Count Persen
#>   <chr>                 <int>  <dbl>
#> 1 bb_lahir                 20      4
#> 2 pend_ibu                 15      3
#> 3 kunjungan_posyandu       10      2
n_tb  <- sum(data_stunting_raw$tb_sekarang < 30 | data_stunting_raw$tb_sekarang > 130, na.rm=TRUE)
n_bb  <- sum(data_stunting_raw$bb_sekarang < 2  | data_stunting_raw$bb_sekarang > 20,  na.rm=TRUE)
n_bbl <- sum(data_stunting_raw$bb_lahir    < 0.5| data_stunting_raw$bb_lahir    > 5.5,  na.rm=TRUE)
cat(sprintf("\n--- Outlier (Batas Soal) ---\ntb_sekarang  (30-130 cm): %d\nbb_sekarang  (2-20 kg)  : %d\nbb_lahir     (0.5-5.5 kg): %d\nDuplikat: %d\n",
    n_tb, n_bb, n_bbl, sum(duplicated(data_stunting_raw))))
#> 
#> --- Outlier (Batas Soal) ---
#> tb_sekarang  (30-130 cm): 5
#> bb_sekarang  (2-20 kg)  : 3
#> bb_lahir     (0.5-5.5 kg): 0
#> Duplikat: 0
# ── Langkah 1: Hapus outlier ──────────────────────────────────────────────────
data_step1 <- data_stunting_raw %>%
  filter(tb_sekarang >= 30 & tb_sekarang <= 130,
         bb_sekarang >= 2  & bb_sekarang <= 20,
         (bb_lahir >= 0.5  & bb_lahir <= 5.5) | is.na(bb_lahir))
cat("\nSetelah hapus outlier:", nrow(data_step1), "baris (dihapus:",
    nrow(data_stunting_raw) - nrow(data_step1), ")\n")
#> 
#> Setelah hapus outlier: 492 baris (dihapus: 8 )
# ── Langkah 2: Imputasi ───────────────────────────────────────────────────────
data_step2 <- data_step1 %>%
  mutate(kelompok_usia = case_when(
    usia_bulan <= 12 ~ "0-12",  usia_bulan <= 24 ~ "13-24",
    usia_bulan <= 36 ~ "25-36", usia_bulan <= 48 ~ "37-48", TRUE ~ "49-59"))

median_bb  <- data_step2 %>% group_by(kelompok_usia) %>%
              summarise(med_bb = median(bb_lahir, na.rm=TRUE))
median_kun <- data_step2 %>% group_by(kelompok_usia) %>%
              summarise(med_kun = median(kunjungan_posyandu, na.rm=TRUE))
modus_pend <- data_step2 %>% filter(!is.na(pend_ibu)) %>%
              count(pend_ibu, sort=TRUE) %>% slice(1) %>% pull(pend_ibu)

cat("\nModus pend_ibu:", modus_pend, "\n")
#> 
#> Modus pend_ibu: SMP
data_step2 <- data_step2 %>%
  left_join(median_bb,  by = "kelompok_usia") %>%
  left_join(median_kun, by = "kelompok_usia") %>%
  mutate(
    bb_lahir           = ifelse(is.na(bb_lahir),           med_bb,  bb_lahir),
    kunjungan_posyandu = ifelse(is.na(kunjungan_posyandu), med_kun, kunjungan_posyandu),
    pend_ibu           = ifelse(is.na(pend_ibu),           modus_pend, pend_ibu)
  ) %>% select(-med_bb, -med_kun)

cat("Missing setelah imputasi:\n")
#> Missing setelah imputasi:
data_step2 %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(everything(), names_to = "Variabel", values_to = "NA") %>%
  filter(NA > 0) %>% print()
#> # A tibble: 0 × 2
#> # ℹ 2 variables: Variabel <chr>, NA <int>
cat("(Tidak ada output = tidak ada missing tersisa)\n")
#> (Tidak ada output = tidak ada missing tersisa)
# ── Langkah 3: Hapus duplikat ─────────────────────────────────────────────────
data_step3 <- data_step2 %>% distinct()
cat("\nSetelah hapus duplikat:", nrow(data_step3), "baris\n")
#> 
#> Setelah hapus duplikat: 492 baris
cat("\n=== SUMMARY SEBELUM vs SESUDAH ===\n")
#> 
#> === SUMMARY SEBELUM vs SESUDAH ===
cat("--- Sebelum ---\n")
#> --- Sebelum ---
data_stunting_raw %>%
  select(tb_sekarang, bb_sekarang, bb_lahir, kunjungan_posyandu) %>%
  summary() %>% print()
#>   tb_sekarang      bb_sekarang       bb_lahir     kunjungan_posyandu
#>  Min.   :  5.00   Min.   : 0.50   Min.   :1.540   Min.   : 1.000    
#>  1st Qu.: 68.60   1st Qu.:10.37   1st Qu.:2.760   1st Qu.: 3.000    
#>  Median : 74.70   Median :12.02   Median :3.100   Median : 7.000    
#>  Mean   : 74.93   Mean   :12.10   Mean   :3.102   Mean   : 6.618    
#>  3rd Qu.: 81.00   3rd Qu.:13.77   3rd Qu.:3.450   3rd Qu.:10.000    
#>  Max.   :200.00   Max.   :50.00   Max.   :4.430   Max.   :12.000    
#>                                   NA's   :20      NA's   :10
cat("--- Sesudah ---\n")
#> --- Sesudah ---
data_step3 %>%
  select(tb_sekarang, bb_sekarang, bb_lahir, kunjungan_posyandu) %>%
  summary() %>% print()
#>   tb_sekarang      bb_sekarang       bb_lahir     kunjungan_posyandu
#>  Min.   : 50.20   Min.   : 5.88   Min.   :1.540   Min.   : 1.00     
#>  1st Qu.: 68.67   1st Qu.:10.37   1st Qu.:2.788   1st Qu.: 3.00     
#>  Median : 74.70   Median :12.01   Median :3.100   Median : 7.00     
#>  Mean   : 74.80   Mean   :12.00   Mean   :3.105   Mean   : 6.62     
#>  3rd Qu.: 80.90   3rd Qu.:13.66   3rd Qu.:3.440   3rd Qu.:10.00     
#>  Max.   :104.10   Max.   :19.34   Max.   :4.430   Max.   :12.00
# ── Simpan ────────────────────────────────────────────────────────────────────
data_stunting_clean <- data_step3 %>% select(-kelompok_usia)
write_csv(data_stunting_clean, "C:/Users/User/Downloads/data_stunting_clean.csv")
cat("\n✓ data_stunting_clean.csv tersimpan |",
    nrow(data_stunting_clean), "baris ×", ncol(data_stunting_clean), "kolom\n")
#> 
#> ✓ data_stunting_clean.csv tersimpan | 492 baris × 17 kolom
make_box_clean <- function(df, var, unit, label, col_fill) {
  ggplot(df, aes(y = .data[[var]])) +
    geom_boxplot(fill = col_fill, colour = darken_col(col_fill, 0.3),
                 outlier.colour = COL_ORANGE, outlier.shape = 21,
                 outlier.size = 1.5, width = 0.45) +
    labs(title = label, y = unit) +
    theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())
}

darken_col <- function(col, factor = 0.3) {
  rgb_col <- col2rgb(col) / 255
  rgb(rgb_col[1]*factor, rgb_col[2]*factor, rgb_col[3]*factor)
}

p1a <- ggplot(data_stunting_raw, aes(y = tb_sekarang)) +
  geom_boxplot(fill = "#FFCDD2", colour = COL_STUNTING,
               outlier.colour = COL_ORANGE, outlier.shape = 21, width = 0.45) +
  labs(title = "Tinggi Badan\n(Sebelum)", y = "cm") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

p1b <- ggplot(data_step3, aes(y = tb_sekarang)) +
  geom_boxplot(fill = "#C8E6C9", colour = COL_GREEN,
               outlier.colour = COL_ORANGE, outlier.shape = 21, width = 0.45) +
  labs(title = "Tinggi Badan\n(Sesudah)", y = "cm") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

p2a <- ggplot(data_stunting_raw, aes(y = bb_sekarang)) +
  geom_boxplot(fill = "#FFCDD2", colour = COL_STUNTING,
               outlier.colour = COL_ORANGE, outlier.shape = 21, width = 0.45) +
  labs(title = "Berat Badan\n(Sebelum)", y = "kg") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

p2b <- ggplot(data_step3, aes(y = bb_sekarang)) +
  geom_boxplot(fill = "#C8E6C9", colour = COL_GREEN,
               outlier.colour = COL_ORANGE, outlier.shape = 21, width = 0.45) +
  labs(title = "Berat Badan\n(Sesudah)", y = "kg") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

(p1a | p1b | p2a | p2b) +
  plot_annotation(
    title    = "Perbandingan Distribusi Antropometri Sebelum dan Sesudah Cleaning",
    subtitle = "Merah = sebelum cleaning (outlier terlihat) | Hijau = sesudah cleaning (bersih)",
    theme    = theme(plot.title    = element_text(face = "bold", colour = "#1A237E", size = 12),
                     plot.subtitle = element_text(colour = "#546E7A", size = 9))
  )
Perbandingan Distribusi Sebelum dan Sesudah Cleaning

Figure 6.1: Perbandingan Distribusi Sebelum dan Sesudah Cleaning


6.2 IK 10 — Prevalensi Stunting & Evaluasi Bias [KUK 2.2–2.3]

prev_sblm <- data_stunting_raw %>%
  count(status_stunting) %>%
  mutate(persen = round(n/sum(n)*100, 2), dataset = "Sebelum Cleaning")

prev_ssdh <- data_stunting_clean %>%
  count(status_stunting) %>%
  mutate(persen = round(n/sum(n)*100, 2), dataset = "Sesudah Cleaning")

perbandingan <- bind_rows(prev_sblm, prev_ssdh) %>%
  select(dataset, status_stunting, n, persen)

kable(perbandingan, caption = "Perbandingan Prevalensi Stunting Sebelum dan Sesudah Cleaning",
      col.names = c("Dataset","Status","n","%"), align = "llrr") %>%
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE, position = "center") %>%
  collapse_rows(columns = 1, valign = "middle") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE)
Table 6.1: Table 6.2: Perbandingan Prevalensi Stunting Sebelum dan Sesudah Cleaning
Dataset Status n %
Sebelum Cleaning Normal 371 74.20
Stunting 129 25.80
Sesudah Cleaning Normal 366 74.39
Stunting 126 25.61
pct_s <- prev_sblm %>% filter(status_stunting == "Stunting") %>% pull(persen)
pct_a <- prev_ssdh %>% filter(status_stunting == "Stunting") %>% pull(persen)
cat(sprintf("Prevalensi Stunting sebelum : %s%%\nPrevalensi Stunting sesudah : %s%%\nSelisih                     : %s poin persentase\n",
    pct_s, pct_a, round(pct_a - pct_s, 2)))
#> Prevalensi Stunting sebelum : 25.8%
#> Prevalensi Stunting sesudah : 25.61%
#> Selisih                     : -0.19 poin persentase
bind_rows(prev_sblm, prev_ssdh) %>%
  ggplot(aes(x = dataset, y = persen, fill = status_stunting)) +
  geom_col(position = "fill", colour = "white", linewidth = 0.5, width = 0.5) +
  geom_text(aes(label = paste0(persen, "%")),
            position = position_fill(vjust = 0.5),
            colour = "white", size = 4.2, fontface = "bold") +
  scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(title    = "Perbandingan Prevalensi Stunting",
       subtitle = "Sebelum vs Sesudah Pembersihan Data",
       x = NULL, y = "Proporsi (%)", fill = "Status Stunting",
       caption = "Sumber: data_stunting_raw.csv & data_stunting_clean.csv")
Perbandingan Prevalensi Stunting Sebelum dan Sesudah Cleaning

Figure 6.2: Perbandingan Prevalensi Stunting Sebelum dan Sesudah Cleaning


7 Bukti Unit Kompetensi 7 — Mengkonstruksi Data

Kode Unit: J.62DMI00.009.1


7.1 IK 11a — Encoding Variabel Kategorik [KUK 1.1–2.1]

data_stunting_clean <- read_csv("C:/Users/User/Downloads/data_stunting_clean.csv")

data_encoded <- data_stunting_clean %>%
  mutate(
    pend_ibu_enc = case_when(
      pend_ibu == "Tidak Sekolah" ~ 0, pend_ibu == "SD" ~ 1,
      pend_ibu == "SMP"           ~ 2, pend_ibu == "SMA" ~ 3,
      pend_ibu == "PT"            ~ 4, TRUE ~ NA_real_),
    jenis_kelamin_enc = ifelse(jenis_kelamin == "L",        1, 0),
    akses_air_enc     = ifelse(akses_air     == "Ya",       1, 0),
    sanitasi_enc      = ifelse(sanitasi      == "Baik",     1, 0),
    asi_eksklusif_enc = ifelse(asi_eksklusif == "Ya",       1, 0),
    imunisasi_enc     = ifelse(imunisasi     == "Lengkap",  1, 0)
  )

cat("=== Hasil Encoding pend_ibu (Ordinal) ===\n")
#> === Hasil Encoding pend_ibu (Ordinal) ===
data_encoded %>%
  count(pend_ibu, pend_ibu_enc) %>%
  arrange(pend_ibu_enc) %>% print()
#> # A tibble: 5 × 3
#>   pend_ibu      pend_ibu_enc     n
#>   <chr>                <dbl> <int>
#> 1 Tidak Sekolah            0    25
#> 2 SD                       1   109
#> 3 SMP                      2   165
#> 4 SMA                      3   149
#> 5 PT                       4    44
cat("\n=== Hasil Encoding Variabel Binary ===\n")
#> 
#> === Hasil Encoding Variabel Binary ===
data_encoded %>%
  select(jenis_kelamin, jenis_kelamin_enc, akses_air, akses_air_enc,
         sanitasi, sanitasi_enc, asi_eksklusif, asi_eksklusif_enc,
         imunisasi, imunisasi_enc) %>%
  distinct() %>% arrange(jenis_kelamin_enc) %>% print(n = 20)
#> # A tibble: 32 × 10
#>    jenis_kelamin jenis_kelamin_enc akses_air akses_air_enc sanitasi sanitasi_enc
#>    <chr>                     <dbl> <chr>             <dbl> <chr>           <dbl>
#>  1 P                             0 Tidak                 0 Baik                1
#>  2 P                             0 Ya                    1 Kurang              0
#>  3 P                             0 Ya                    1 Baik                1
#>  4 P                             0 Ya                    1 Baik                1
#>  5 P                             0 Tidak                 0 Kurang              0
#>  6 P                             0 Tidak                 0 Baik                1
#>  7 P                             0 Ya                    1 Kurang              0
#>  8 P                             0 Tidak                 0 Baik                1
#>  9 P                             0 Ya                    1 Kurang              0
#> 10 P                             0 Ya                    1 Baik                1
#> 11 P                             0 Tidak                 0 Kurang              0
#> 12 P                             0 Tidak                 0 Kurang              0
#> 13 P                             0 Tidak                 0 Baik                1
#> 14 P                             0 Tidak                 0 Kurang              0
#> 15 P                             0 Ya                    1 Baik                1
#> 16 P                             0 Ya                    1 Kurang              0
#> 17 L                             1 Tidak                 0 Baik                1
#> 18 L                             1 Ya                    1 Baik                1
#> 19 L                             1 Ya                    1 Kurang              0
#> 20 L                             1 Tidak                 0 Kurang              0
#> # ℹ 12 more rows
#> # ℹ 4 more variables: asi_eksklusif <chr>, asi_eksklusif_enc <dbl>,
#> #   imunisasi <chr>, imunisasi_enc <dbl>

7.2 IK 11b — Feature Engineering & Validasi Statistik [KUK 2.2]

data_encoded <- data_encoded %>%
  mutate(
    bmi      = bb_sekarang / (tb_sekarang / 100)^2,
    rasio_bb = bb_sekarang / bb_lahir
  )

cat("=== Ringkasan 2 Fitur Baru per Status Stunting ===\n")
#> === Ringkasan 2 Fitur Baru per Status Stunting ===
data_encoded %>%
  select(bmi, rasio_bb, status_stunting) %>%
  group_by(status_stunting) %>%
  summarise(
    n              = n(),
    BMI_mean       = round(mean(bmi,      na.rm=TRUE), 3),
    BMI_median     = round(median(bmi,    na.rm=TRUE), 3),
    BMI_sd         = round(sd(bmi,        na.rm=TRUE), 3),
    RasioBB_mean   = round(mean(rasio_bb, na.rm=TRUE), 3),
    RasioBB_median = round(median(rasio_bb, na.rm=TRUE), 3),
    RasioBB_sd     = round(sd(rasio_bb,  na.rm=TRUE), 3)
  ) %>% print()
#> # A tibble: 2 × 8
#>   status_stunting     n BMI_mean BMI_median BMI_sd RasioBB_mean RasioBB_median
#>   <chr>           <int>    <dbl>      <dbl>  <dbl>        <dbl>          <dbl>
#> 1 Normal            366     20.0       19.6   4.02         3.95           3.85
#> 2 Stunting          126     27.9       27.1   5.97         4.11           3.86
#> # ℹ 1 more variable: RasioBB_sd <dbl>
cat("\n=== Uji Mann-Whitney: BMI vs status_stunting ===\n")
#> 
#> === Uji Mann-Whitney: BMI vs status_stunting ===
wilcox.test(bmi ~ status_stunting, data = data_encoded) %>% print()
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  bmi by status_stunting
#> W = 5824, p-value < 2.2e-16
#> alternative hypothesis: true location shift is not equal to 0
cat("\n=== Uji Mann-Whitney: rasio_bb vs status_stunting ===\n")
#> 
#> === Uji Mann-Whitney: rasio_bb vs status_stunting ===
wilcox.test(rasio_bb ~ status_stunting, data = data_encoded) %>% print()
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  rasio_bb by status_stunting
#> W = 22114, p-value = 0.4928
#> alternative hypothesis: true location shift is not equal to 0
library(patchwork)

make_violin <- function(var, label, unit) {
  ggplot(data_encoded, aes(x = status_stunting, y = .data[[var]],
                            fill = status_stunting)) +
    geom_violin(alpha = 0.3, trim = FALSE, colour = NA) +
    geom_boxplot(width = 0.22, outlier.colour = COL_ORANGE, outlier.shape = 21,
                 outlier.size = 1.8, outlier.fill = "#FFB300", linewidth = 0.45) +
    stat_summary(fun = mean, geom = "point", shape = 23, size = 3.5,
                 fill = "white", colour = COL_ACCENT) +
    scale_fill_manual(values = c("Normal" = COL_NORMAL, "Stunting" = COL_STUNTING)) +
    labs(title = label, x = "Status Stunting", y = unit) +
    theme(legend.position = "none")
}

(make_violin("bmi",      "Indeks Massa Tubuh (BMI)",          "BMI (kg/m²)") |
 make_violin("rasio_bb", "Rasio BB Sekarang / BB Lahir", "Rasio BB")) +
  plot_annotation(
    title    = "Distribusi Fitur Rekayasa per Status Stunting",
    subtitle = "Violin = distribusi | Box = kuartil | Berlian putih = rata-rata",
    theme    = theme(plot.title    = element_text(face = "bold", colour = "#1A237E", size = 12),
                     plot.subtitle = element_text(colour = "#546E7A", size = 9))
  )
Distribusi Fitur Rekayasa Baru per Status Stunting (Violin + Boxplot)

Figure 7.1: Distribusi Fitur Rekayasa Baru per Status Stunting (Violin + Boxplot)


7.3 IK 11c — Normalisasi Min-Max [KUK 1.2]

minmax <- function(x) (x - min(x, na.rm=TRUE)) / (max(x, na.rm=TRUE) - min(x, na.rm=TRUE))

vars_norm <- c("usia_bulan","bb_lahir","tb_sekarang","bb_sekarang","pendapatan",
               "diare_3bln","kunjungan_posyandu","zscore_tbu","bmi","rasio_bb","pend_ibu_enc")

data_normalized <- data_encoded %>%
  mutate(across(all_of(vars_norm), minmax, .names = "{.col}_norm"))

cat("=== Verifikasi Range Setelah Min-Max Normalisasi ===\n")
#> === Verifikasi Range Setelah Min-Max Normalisasi ===
data_normalized %>%
  select(ends_with("_norm")) %>%
  summarise(across(everything(), list(
    Min = ~round(min(., na.rm=TRUE), 3),
    Max = ~round(max(., na.rm=TRUE), 3)))) %>%
  pivot_longer(everything(), names_to = c("Variabel","Stat"),
               names_sep = "_(?=[^_]+$)") %>%
  pivot_wider(names_from = Stat, values_from = value) %>%
  print(n = 15)
#> # A tibble: 11 × 3
#>    Variabel                  Min   Max
#>    <chr>                   <dbl> <dbl>
#>  1 usia_bulan_norm             0     1
#>  2 bb_lahir_norm               0     1
#>  3 tb_sekarang_norm            0     1
#>  4 bb_sekarang_norm            0     1
#>  5 pendapatan_norm             0     1
#>  6 diare_3bln_norm             0     1
#>  7 kunjungan_posyandu_norm     0     1
#>  8 zscore_tbu_norm             0     1
#>  9 bmi_norm                    0     1
#> 10 rasio_bb_norm               0     1
#> 11 pend_ibu_enc_norm           0     1

7.4 IK 12 — Multikolinearitas (VIF) & Simpan Dataset Final [KUK 2.3–2.4]

7.5 IK 12 — Deteksi Multikolinearitas (VIF) & Simpan Dataset Final

Dimensi: TRS + JRES | KUK: 2.3–2.4

# ── Library ───────────────────────────────────────────────────────────────────
library(car)
library(dplyr)
library(kableExtra)

# ── Persiapan data & helper ───────────────────────────────────────────────────
fitur_model <- data_encoded %>%
  select(usia_bulan, bb_lahir, tb_sekarang, bb_sekarang,
         pendapatan, diare_3bln, kunjungan_posyandu,
         bmi, rasio_bb, pend_ibu_enc,
         jenis_kelamin_enc, akses_air_enc, sanitasi_enc,
         asi_eksklusif_enc, imunisasi_enc,
         status_stunting) %>%
  mutate(status_stunting_num = ifelse(status_stunting == "Stunting", 1, 0)) %>%
  drop_na()

semua_prediktor <- c("usia_bulan", "bb_lahir", "tb_sekarang", "bb_sekarang",
                     "pendapatan", "diare_3bln", "kunjungan_posyandu",
                     "bmi", "rasio_bb", "pend_ibu_enc",
                     "jenis_kelamin_enc", "akses_air_enc", "sanitasi_enc",
                     "asi_eksklusif_enc", "imunisasi_enc")

kandidat_drop <- c("tb_sekarang", "bb_sekarang", "bmi", "rasio_bb")

hitung_vif <- function(data, prediktor) {
  formula_str <- paste("status_stunting_num ~", paste(prediktor, collapse = " + "))
  vif(lm(as.formula(formula_str), data = data))
}

render_vif_table <- function(vif_vals, caption_text) {
  df <- data.frame(Fitur = names(vif_vals), VIF = round(vif_vals, 3)) %>%
    arrange(desc(VIF))
  kable(df, caption = caption_text,
        col.names = c("Fitur", "VIF"), align = "lr", booktabs = TRUE) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                  full_width = FALSE, position = "center") %>%
    row_spec(0, background = "#37474F", color = "white", bold = TRUE) %>%
    row_spec(which(df$VIF > 10), background = "#FFEBEE",
             color = COL_STUNTING, bold = TRUE) %>%
    row_spec(which(df$VIF <= 5), background = "#E8F5E9")
}

# ══════════════════════════════════════════════════════════════════════════════
# [0] BASELINE — Semua variabel
# ══════════════════════════════════════════════════════════════════════════════
vif_base    <- hitung_vif(fitur_model, semua_prediktor)
vif_base_df <- data.frame(Fitur = names(vif_base), VIF = round(vif_base, 3)) %>%
  arrange(desc(VIF))

render_vif_table(vif_base, "Baseline — Nilai VIF Semua Prediktor")
Table 7.1: Table 7.2: Baseline — Nilai VIF Semua Prediktor
Fitur VIF
bb_sekarang bb_sekarang 21.610
rasio_bb rasio_bb 20.178
tb_sekarang tb_sekarang 18.240
bmi bmi 17.548
bb_lahir bb_lahir 8.717
usia_bulan usia_bulan 2.390
akses_air_enc akses_air_enc 1.042
asi_eksklusif_enc asi_eksklusif_enc 1.039
diare_3bln diare_3bln 1.038
pend_ibu_enc pend_ibu_enc 1.035
imunisasi_enc imunisasi_enc 1.029
kunjungan_posyandu kunjungan_posyandu 1.026
sanitasi_enc sanitasi_enc 1.026
jenis_kelamin_enc jenis_kelamin_enc 1.023
pendapatan pendapatan 1.021

Baseline: Max VIF = 21.61 | Jumlah VIF > 10 = 4 — ❌ Terdeteksi multikolinearitas berat.

# ══════════════════════════════════════════════════════════════════════════════
# [1] ITERASI 1 — Keluarkan 1 variabel
# ══════════════════════════════════════════════════════════════════════════════
hasil_iter1 <- lapply(kandidat_drop, function(var) {
  v <- hitung_vif(fitur_model, setdiff(semua_prediktor, var))
  data.frame(Variabel_Dikeluarkan = var,
             Max_VIF              = round(max(v), 3),
             N_VIF_gt10           = sum(v > 10))
}) %>% bind_rows() %>% arrange(N_VIF_gt10, Max_VIF)

kable(hasil_iter1,
      caption = "Iterasi 1 — Efek Pengeluaran 1 Variabel terhadap VIF",
      align = "lrr", booktabs = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "center") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE) %>%
  row_spec(1, background = "#E3F2FD", bold = TRUE)
Table 7.3: Table 7.4: Iterasi 1 — Efek Pengeluaran 1 Variabel terhadap VIF
Variabel_Dikeluarkan Max_VIF N_VIF_gt10
bb_sekarang 11.488 2
tb_sekarang 20.146 2
bmi 20.160 2
rasio_bb 18.210 3
# ══════════════════════════════════════════════════════════════════════════════
# [2] ITERASI 2 — Keluarkan 2 variabel
# ══════════════════════════════════════════════════════════════════════════════
hasil_iter2 <- combn(kandidat_drop, 2, simplify = FALSE) %>%
  lapply(function(combo) {
    v <- hitung_vif(fitur_model, setdiff(semua_prediktor, combo))
    data.frame(Variabel_Dikeluarkan = paste(combo, collapse = " + "),
               Max_VIF              = round(max(v), 3),
               N_VIF_gt10           = sum(v > 10))
  }) %>% bind_rows() %>% arrange(N_VIF_gt10, Max_VIF)

kable(hasil_iter2,
      caption = "Iterasi 2 — Efek Pengeluaran 2 Variabel terhadap VIF",
      align = "lrr", booktabs = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "center") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE) %>%
  row_spec(1, background = "#E3F2FD", bold = TRUE)
Table 7.3: Table 7.3: Iterasi 2 — Efek Pengeluaran 2 Variabel terhadap VIF
Variabel_Dikeluarkan Max_VIF N_VIF_gt10
bmi + rasio_bb 2.386 0
tb_sekarang + rasio_bb 2.684 0
bb_sekarang + bmi 2.836 0
bb_sekarang + rasio_bb 3.973 0
tb_sekarang + bb_sekarang 4.241 0
tb_sekarang + bmi 20.134 2
# ══════════════════════════════════════════════════════════════════════════════
# [3] ITERASI 3 — Keluarkan 3 variabel
# ══════════════════════════════════════════════════════════════════════════════
hasil_iter3 <- combn(kandidat_drop, 3, simplify = FALSE) %>%
  lapply(function(combo) {
    v <- hitung_vif(fitur_model, setdiff(semua_prediktor, combo))
    data.frame(Variabel_Dikeluarkan = paste(combo, collapse = " + "),
               Max_VIF              = round(max(v), 3),
               N_VIF_gt10           = sum(v > 10))
  }) %>% bind_rows() %>% arrange(N_VIF_gt10, Max_VIF)

kable(hasil_iter3,
      caption = "Iterasi 3 — Efek Pengeluaran 3 Variabel terhadap VIF",
      align = "lrr", booktabs = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE, position = "center") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE) %>%
  row_spec(1, background = "#E3F2FD", bold = TRUE)
Table 7.3: Table 7.3: Iterasi 3 — Efek Pengeluaran 3 Variabel terhadap VIF
Variabel_Dikeluarkan Max_VIF N_VIF_gt10
tb_sekarang + bb_sekarang + rasio_bb 1.058 0
tb_sekarang + bmi + rasio_bb 1.623 0
bb_sekarang + bmi + rasio_bb 1.817 0
tb_sekarang + bb_sekarang + bmi 2.836 0
# ══════════════════════════════════════════════════════════════════════════════
# [4] RINGKASAN KOMPARATIF & KEPUTUSAN OPTIMAL
# ══════════════════════════════════════════════════════════════════════════════
ringkasan <- data.frame(
  Skenario      = c("Baseline (0 dikeluarkan)",
                    paste0("Iterasi 1: keluarkan ", hasil_iter1$Variabel_Dikeluarkan[1]),
                    paste0("Iterasi 2: keluarkan ", hasil_iter2$Variabel_Dikeluarkan[1]),
                    paste0("Iterasi 3: keluarkan ", hasil_iter3$Variabel_Dikeluarkan[1])),
  Fitur_Tersisa = c(length(semua_prediktor),
                    length(semua_prediktor) - 1,
                    length(semua_prediktor) - 2,
                    length(semua_prediktor) - 3),
  Max_VIF       = c(round(max(vif_base), 3),
                    hasil_iter1$Max_VIF[1],
                    hasil_iter2$Max_VIF[1],
                    hasil_iter3$Max_VIF[1]),
  N_VIF_gt10    = c(sum(vif_base > 10),
                    hasil_iter1$N_VIF_gt10[1],
                    hasil_iter2$N_VIF_gt10[1],
                    hasil_iter3$N_VIF_gt10[1]),
  Status        = c("❌ Multikolinear berat",
                    ifelse(hasil_iter1$N_VIF_gt10[1] == 0, "✅ Bersih", "⚠️ Masih ada"),
                    ifelse(hasil_iter2$N_VIF_gt10[1] == 0, "✅ Bersih", "⚠️ Masih ada"),
                    ifelse(hasil_iter3$N_VIF_gt10[1] == 0, "✅ Bersih", "⚠️ Masih ada")),
  stringsAsFactors = FALSE
)

optimal_idx <- which(ringkasan$N_VIF_gt10 == 0 &
                     ringkasan$Fitur_Tersisa ==
                       max(ringkasan$Fitur_Tersisa[ringkasan$N_VIF_gt10 == 0]))
if (length(optimal_idx) == 0) optimal_idx <- which.min(ringkasan$Max_VIF)

kable(ringkasan,
      caption = "Ringkasan Komparatif Skenario Pengeluaran Variabel",
      align = "lrrrc", booktabs = TRUE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"),
                full_width = FALSE, position = "center") %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE) %>%
  row_spec(1, background = "#FFEBEE") %>%
  row_spec(optimal_idx, background = "#C8E6C9", bold = TRUE)
Table 7.5: Table 7.6: Ringkasan Komparatif Skenario Pengeluaran Variabel
Skenario Fitur_Tersisa Max_VIF N_VIF_gt10 Status
Baseline (0 dikeluarkan) 15 21.610 4 ❌ Multikolinear berat |
Iterasi 1: keluarkan bb_sekarang 14 11.488 2 ⚠️ Masih ada
Iterasi 2: keluarkan bmi + rasio_bb 13 2.386 0 ✅ Bersih |
Iterasi 3: keluarkan tb_sekarang + bb_sekarang + rasio_bb 12 1.058 0 ✅ Bersih |

Keputusan Optimal: Keluarkan Iterasi 2: keluarkan bmi + rasio_bb → Fitur tersisa: 13 | Max VIF: 2.386 | Semua VIF ≤ 10: YA ✅

# ══════════════════════════════════════════════════════════════════════════════
# [5] VIF FINAL & KORELASI
# ══════════════════════════════════════════════════════════════════════════════
n_drop_optimal <- length(semua_prediktor) - ringkasan$Fitur_Tersisa[optimal_idx]

vars_optimal_drop <- switch(as.character(n_drop_optimal),
  "1" = hasil_iter1$Variabel_Dikeluarkan[1],
  "2" = unlist(strsplit(hasil_iter2$Variabel_Dikeluarkan[1], " \\+ ")),
  "3" = unlist(strsplit(hasil_iter3$Variabel_Dikeluarkan[1], " \\+ "))
)

pred_final <- setdiff(semua_prediktor, vars_optimal_drop)
vif_final  <- hitung_vif(fitur_model, pred_final)

render_vif_table(vif_final,
  paste("VIF Final — Variabel dikeluarkan:",
        paste(vars_optimal_drop, collapse = ", ")))
Table 7.7: Table 7.8: VIF Final — Variabel dikeluarkan: bmi, rasio_bb
Fitur VIF
usia_bulan usia_bulan 2.386
tb_sekarang tb_sekarang 1.817
bb_sekarang bb_sekarang 1.623
akses_air_enc akses_air_enc 1.041
asi_eksklusif_enc asi_eksklusif_enc 1.038
bb_lahir bb_lahir 1.033
pend_ibu_enc pend_ibu_enc 1.033
diare_3bln diare_3bln 1.030
imunisasi_enc imunisasi_enc 1.025
kunjungan_posyandu kunjungan_posyandu 1.024
sanitasi_enc sanitasi_enc 1.022
pendapatan pendapatan 1.021
jenis_kelamin_enc jenis_kelamin_enc 1.013
# ── Matriks korelasi (pasangan r > 0.9) ──────────────────────────────────────
cor_mat  <- cor(fitur_model %>%
                  select(usia_bulan, bb_lahir, tb_sekarang, bb_sekarang,
                         pendapatan, diare_3bln, kunjungan_posyandu,
                         bmi, rasio_bb),
                use = "complete.obs")

cor_tinggi <- as.data.frame(as.table(cor_mat)) %>%
  filter(Var1 != Var2, abs(Freq) > 0.9) %>%
  arrange(desc(abs(Freq)))

if (nrow(cor_tinggi) == 0) {
  cat("\n✅ Tidak ada pasangan variabel dengan korelasi r > 0.9.",
      "\n   Multikolinearitas bersifat multivariat, bukan berpasangan.\n")
} else {
  kable(cor_tinggi,
        caption = "Pasangan Variabel dengan Korelasi r > 0.9",
        col.names = c("Var 1", "Var 2", "Korelasi (r)"),
        align = "llr", digits = 3, booktabs = TRUE) %>%
    kable_styling(bootstrap_options = c("striped", "hover"),
                  full_width = FALSE, position = "center") %>%
    row_spec(0, background = "#37474F", color = "white", bold = TRUE)
}
#> 
#> ✅ Tidak ada pasangan variabel dengan korelasi r > 0.9. 
#>    Multikolinearitas bersifat multivariat, bukan berpasangan.
data_stunting_model <- data_encoded %>%
  select(usia_bulan, jenis_kelamin_enc, bb_lahir, tb_sekarang, bb_sekarang,
         pend_ibu_enc, pendapatan, akses_air_enc, sanitasi_enc,
         kunjungan_posyandu, asi_eksklusif_enc, imunisasi_enc, diare_3bln,
         bmi, rasio_bb, status_stunting) %>%
  drop_na()

cat("=== Dataset Final untuk Modeling ===\n")
#> === Dataset Final untuk Modeling ===
cat("Dimensi  :", dim(data_stunting_model), "\n")
#> Dimensi  : 492 16
cat("Variabel :", names(data_stunting_model), "\n")
#> Variabel : usia_bulan jenis_kelamin_enc bb_lahir tb_sekarang bb_sekarang pend_ibu_enc pendapatan akses_air_enc sanitasi_enc kunjungan_posyandu asi_eksklusif_enc imunisasi_enc diare_3bln bmi rasio_bb status_stunting
cat("Missing  :", sum(is.na(data_stunting_model)), "\n")
#> Missing  : 0
write_csv(data_stunting_model, "C:/Users/User/Downloads/data_stunting_model.csv")
cat("\n✓ data_stunting_model.csv tersimpan!\n")
#> 
#> ✓ data_stunting_model.csv tersimpan!

8 Bukti Unit Kompetensi 8 — Membangun Skenario Model

Kode Unit: J.62DMI00.012.1


8.1 IK 13 — Identifikasi dan Pemilihan Algoritma [KUK 1.1–2.2]

Table 8.1: Table 8.2: Perbandingan Algoritma Klasifikasi untuk Proyek Prediksi Stunting
Algoritma Tipe Asumsi Keunggulan Kelemahan Dipilih
Logistic Regression Parametrik / Linear Linearitas log-odds; tidak ada multikolinearitas berat Interpretabel; efisien; output probabilitas; baseline yang baik Sensitif multikolinearitas; kurang baik untuk hubungan non-linear ✓ Ya (Baseline)
Decision Tree Non-parametrik / Tree Tidak ada asumsi distribusi; bisa menangani data campuran Mudah diinterpretasi secara visual; fast training; seleksi fitur otomatis Cenderung overfit; tidak stabil (sensitif terhadap perubahan data) ✓ Ya
Random Forest Ensemble / Bagging Tidak ada asumsi distribusi; robust terhadap outlier & imbalance Performa tinggi; robust terhadap noise & imbalance; feature importance Black-box; lambat untuk data besar; butuh tuning hyperparameter ✓ Ya (Terbaik)

8.2 IK 14 — Skenario Pengujian [KUK 3.1–3.2]

library(caret)

data_stunting_model <- read_csv("C:/Users/User/Downloads/data_stunting_model.csv") %>%
  mutate(status_stunting = factor(status_stunting, levels = c("Normal","Stunting")))

cat("Dimensi data:", dim(data_stunting_model), "\n")
#> Dimensi data: 492 16
cat("Distribusi target:\n")
#> Distribusi target:
print(prop.table(table(data_stunting_model$status_stunting)) %>% round(3))
#> 
#>   Normal Stunting 
#>    0.744    0.256
# ── Skenario 1: 80:20 Split ───────────────────────────────────────────────────
set.seed(42)
split_idx <- createDataPartition(data_stunting_model$status_stunting, p = 0.8, list = FALSE)
train_set <- data_stunting_model[split_idx,]
test_set  <- data_stunting_model[-split_idx,]

cat("\n=== SKENARIO 1: Percentage Split 80:20 ===\n")
#> 
#> === SKENARIO 1: Percentage Split 80:20 ===
cat("Training:", nrow(train_set), "| Test:", nrow(test_set), "\n")
#> Training: 394 | Test: 98
cat("Distribusi Training:\n")
#> Distribusi Training:
print(round(prop.table(table(train_set$status_stunting))*100, 1))
#> 
#>   Normal Stunting 
#>     74.4     25.6
cat("Distribusi Test:\n")
#> Distribusi Test:
print(round(prop.table(table(test_set$status_stunting))*100, 1))
#> 
#>   Normal Stunting 
#>     74.5     25.5
# ── Skenario 2: 5-Fold CV ────────────────────────────────────────────────────
set.seed(42)
cv_folds <- createFolds(data_stunting_model$status_stunting, k = 5,
                        list = TRUE, returnTrain = FALSE)
cat("\n=== SKENARIO 2: Stratified 5-Fold Cross Validation ===\n")
#> 
#> === SKENARIO 2: Stratified 5-Fold Cross Validation ===
for (i in 1:5) {
  ft <- data_stunting_model[cv_folds[[i]],]
  fr <- data_stunting_model[-cv_folds[[i]],]
  cat(sprintf("Fold %d | Train: %d | Test: %d | Stunting di test: %d (%.1f%%)\n",
              i, nrow(fr), nrow(ft),
              sum(ft$status_stunting == "Stunting"),
              mean(ft$status_stunting == "Stunting") * 100))
}
#> Fold 1 | Train: 392 | Test: 100 | Stunting di test: 26 (26.0%)
#> Fold 2 | Train: 394 | Test: 98 | Stunting di test: 25 (25.5%)
#> Fold 3 | Train: 394 | Test: 98 | Stunting di test: 25 (25.5%)
#> Fold 4 | Train: 394 | Test: 98 | Stunting di test: 25 (25.5%)
#> Fold 5 | Train: 394 | Test: 98 | Stunting di test: 25 (25.5%)
ctrl_cv <- trainControl(method = "cv", number = 5, classProbs = TRUE,
                        summaryFunction = twoClassSummary,
                        savePredictions = "final", sampling = NULL)
saveRDS(train_set, "C:/Users/User/Downloads/train_set.rds")
saveRDS(test_set,  "C:/Users/User/Downloads/test_set.rds")
saveRDS(ctrl_cv,   "C:/Users/User/Downloads/ctrl_cv.rds")
cat("\n✓ train_set.rds, test_set.rds, ctrl_cv.rds tersimpan!\n")
#> 
#> ✓ train_set.rds, test_set.rds, ctrl_cv.rds tersimpan!

9 Bukti Unit Kompetensi 9 — Membangun Model

Kode Unit: J.62DMI00.013.1


9.1 IK 15a–b — Logistic Regression (Baseline) [KUK 1.1, 2.1–2.2]

# ── Muat data & set factor ────────────────────────────────────────────────────
data_stunting_model <- read_csv("C:/Users/User/Downloads/data_stunting_model.csv") %>%
  mutate(status_stunting = factor(status_stunting, levels = c("Normal", "Stunting")))

train_set <- readRDS("C:/Users/User/Downloads/train_set.rds") %>%
  mutate(status_stunting = factor(status_stunting, levels = c("Normal", "Stunting")))
test_set  <- readRDS("C:/Users/User/Downloads/test_set.rds") %>%
  mutate(status_stunting = factor(status_stunting, levels = c("Normal", "Stunting")))

cat("Train:", nrow(train_set), "| Test:", nrow(test_set), "\n")
#> Train: 394 | Test: 98
# ── Bangun Logistic Regression ────────────────────────────────────────────────
set.seed(42)
model_lr <- glm(status_stunting ~ . - bmi - rasio_bb,
                data   = train_set,
                family = binomial(link = "logit"))

cat("\n=== RINGKASAN MODEL LOGISTIC REGRESSION ===\n")
#> 
#> === RINGKASAN MODEL LOGISTIC REGRESSION ===
summary(model_lr)
#> 
#> Call:
#> glm(formula = status_stunting ~ . - bmi - rasio_bb, family = binomial(link = "logit"), 
#>     data = train_set)
#> 
#> Coefficients:
#>                      Estimate Std. Error z value Pr(>|z|)
#> (Intercept)         3.983e+03  3.382e+05   0.012    0.991
#> usia_bulan          2.557e+01  2.677e+03   0.010    0.992
#> jenis_kelamin_enc   1.160e+01  2.181e+04   0.001    1.000
#> bb_lahir           -9.238e+00  2.255e+04   0.000    1.000
#> tb_sekarang        -6.811e+01  5.504e+03  -0.012    0.990
#> bb_sekarang         5.446e+00  2.109e+04   0.000    1.000
#> pend_ibu_enc       -1.817e+01  1.066e+04  -0.002    0.999
#> pendapatan         -5.384e-06  4.766e-03  -0.001    0.999
#> akses_air_enc       1.096e+01  7.466e+04   0.000    1.000
#> sanitasi_enc       -1.467e+01  7.683e+04   0.000    1.000
#> kunjungan_posyandu  8.950e-01  3.924e+03   0.000    1.000
#> asi_eksklusif_enc   1.060e+01  2.307e+04   0.000    1.000
#> imunisasi_enc      -9.341e+00  1.395e+04  -0.001    0.999
#> diare_3bln          4.636e-01  5.128e+03   0.000    1.000
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 4.4853e+02  on 393  degrees of freedom
#> Residual deviance: 3.8287e-07  on 380  degrees of freedom
#> AIC: 28
#> 
#> Number of Fisher Scoring iterations: 25
# ── Cek Perfect Separation ────────────────────────────────────────────────────
cat("\n=== CEK PERFECT SEPARATION ===\n")
#> 
#> === CEK PERFECT SEPARATION ===
cat(sprintf("  Null deviance    : %.4f (df = %d)\n",
            model_lr$null.deviance, model_lr$df.null))
#>   Null deviance    : 448.5290 (df = 393)
cat(sprintf("  Residual deviance: %.6e (df = %d)\n",
            model_lr$deviance, model_lr$df.residual))
#>   Residual deviance: 3.828667e-07 (df = 380)
cat(sprintf("  Iterasi Fisher   : %d\n", model_lr$iter))
#>   Iterasi Fisher   : 25
if (model_lr$deviance < 1e-3) {
  cat("  ⚠️  Perfect/quasi-complete separation terdeteksi!\n")
  cat("     SE sangat besar → p-value tidak reliable.\n")
  cat("     CI dihitung via Wald (±1.96×SE) sebagai aproksimasi.\n")
}
#>   ⚠️  Perfect/quasi-complete separation terdeteksi!
#>      SE sangat besar → p-value tidak reliable.
#>      CI dihitung via Wald (±1.96×SE) sebagai aproksimasi.
# ── Koefisien + p-value, diurutkan dari Pr(>|z|) terkecil ────────────────────
cat("\n=== KOEFISIEN MODEL (Diurutkan: p-value terkecil → terbesar) ===\n")
#> 
#> === KOEFISIEN MODEL (Diurutkan: p-value terkecil → terbesar) ===
smry      <- summary(model_lr)$coefficients
coef_df   <- as.data.frame(smry) %>%
  rownames_to_column("Variabel") %>%
  rename(Estimate   = Estimate,
         Std_Error  = `Std. Error`,
         z_value    = `z value`,
         p_value    = `Pr(>|z|)`) %>%
  arrange(p_value) %>%                      # ← urutkan terkecil
  mutate(
    Signifikan = case_when(
      p_value < 0.001 ~ "***",
      p_value < 0.01  ~ "**",
      p_value < 0.05  ~ "*",
      p_value < 0.1   ~ ".",
      TRUE            ~ ""
    )
  )

# Tampilkan semua, highlight top-3
kable(coef_df %>%
        mutate(across(c(Estimate, Std_Error, z_value), ~round(., 4)),
               p_value = formatC(p_value, format = "e", digits = 4)),
      caption = "Koefisien LR — Diurutkan Berdasarkan p-value (Terkecil ke Terbesar)",
      align   = "lrrrrc") %>%
  kable_styling(bootstrap_options = c("striped","hover","bordered"),
                full_width = FALSE, position = "center") %>%
  row_spec(1:3, background = "#C8E6C9", bold = TRUE) %>%       # top-3 hijau
  row_spec(which(coef_df$p_value < 0.05 & !((1:nrow(coef_df)) %in% 1:3)),
           background = "#FFF9C4") %>%                         # sig lain kuning
  add_header_above(c(" " = 1, "Estimasi" = 3, "Uji" = 2))
Table 9.1: Table 9.2: Koefisien LR — Diurutkan Berdasarkan p-value (Terkecil ke Terbesar)
Estimasi
Uji
Variabel Estimate Std_Error z_value p_value Signifikan
tb_sekarang -68.1059 5503.7636 -0.0124 9.9013e-01
(Intercept) 3982.8677 338213.5118 0.0118 9.9060e-01
usia_bulan 25.5675 2677.4763 0.0095 9.9238e-01
pend_ibu_enc -18.1671 10657.0762 -0.0017 9.9864e-01
pendapatan 0.0000 0.0048 -0.0011 9.9910e-01
imunisasi_enc -9.3415 13953.9854 -0.0007 9.9947e-01
jenis_kelamin_enc 11.6017 21810.9579 0.0005 9.9958e-01
asi_eksklusif_enc 10.6046 23066.6935 0.0005 9.9963e-01
bb_lahir -9.2382 22549.4959 -0.0004 9.9967e-01
bb_sekarang 5.4457 21089.7194 0.0003 9.9979e-01
kunjungan_posyandu 0.8950 3924.2929 0.0002 9.9982e-01
sanitasi_enc -14.6695 76833.7371 -0.0002 9.9985e-01
akses_air_enc 10.9590 74658.0120 0.0001 9.9988e-01
diare_3bln 0.4636 5127.9207 0.0001 9.9993e-01
# ── Cetak Top-3 Paling Signifikan ─────────────────────────────────────────────
cat("\n=== TOP 3 VARIABEL PALING SIGNIFIKAN (p-value terkecil) ===\n")
#> 
#> === TOP 3 VARIABEL PALING SIGNIFIKAN (p-value terkecil) ===
top3 <- coef_df %>% slice(1:3)
for (i in 1:3) {
  cat(sprintf("  [%d] %-25s | p = %s | z = %.4f | Estimate = %.4f\n",
              i,
              top3$Variabel[i],
              formatC(top3$p_value[i], format = "e", digits = 4),
              top3$z_value[i],
              top3$Estimate[i]))
}
#>   [1] tb_sekarang               | p = 9.9013e-01 | z = -0.0124 | Estimate = -68.1059
#>   [2] (Intercept)               | p = 9.9060e-01 | z = 0.0118 | Estimate = 3982.8677
#>   [3] usia_bulan                | p = 9.9238e-01 | z = 0.0095 | Estimate = 25.5675
# ── Odds Ratio dengan CI Wald (robust terhadap separation) ───────────────────
cat("\n=== ODDS RATIO + 95% CI Wald ===\n")
#> 
#> === ODDS RATIO + 95% CI Wald ===
or_df <- data.frame(
  Variabel  = names(coef(model_lr)),
  Koefisien = round(coef(model_lr), 4),
  OddsRatio = round(exp(coef(model_lr)), 4),
  CI_Lower  = round(exp(coef(model_lr) - 1.96 * sqrt(diag(vcov(model_lr)))), 4),
  CI_Upper  = round(exp(coef(model_lr) + 1.96 * sqrt(diag(vcov(model_lr)))), 4)
) %>%
  left_join(coef_df %>% select(Variabel, p_value), by = "Variabel") %>%
  arrange(p_value)                                   # ← ikut urutan signifikansi

kable(or_df %>% select(-p_value),
      caption = "Odds Ratio + 95% CI Wald (diurutkan: paling signifikan)",
      align   = "lrrrr") %>%
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE, position = "center") %>%
  row_spec(1:3, background = "#C8E6C9", bold = TRUE)
Table 9.1: Table 9.1: Odds Ratio + 95% CI Wald (diurutkan: paling signifikan)
Variabel Koefisien OddsRatio CI_Lower CI_Upper
tb_sekarang -68.1059 0.000000e+00 0.0000 Inf
(Intercept) 3982.8677 Inf 0.0000 Inf
usia_bulan 25.5675 1.270041e+11 0.0000 Inf
pend_ibu_enc -18.1671 0.000000e+00 0.0000 Inf
pendapatan 0.0000 1.000000e+00 0.9907 1.0094
imunisasi_enc -9.3415 1.000000e-04 0.0000 Inf
jenis_kelamin_enc 11.6017 1.092788e+05 0.0000 Inf
asi_eksklusif_enc 10.6046 4.032096e+04 0.0000 Inf
bb_lahir -9.2382 1.000000e-04 0.0000 Inf
bb_sekarang 5.4457 2.317620e+02 0.0000 Inf
kunjungan_posyandu 0.8950 2.447400e+00 0.0000 Inf
sanitasi_enc -14.6695 0.000000e+00 0.0000 Inf
akses_air_enc 10.9590 5.746826e+04 0.0000 Inf
diare_3bln 0.4636 1.589800e+00 0.0000 Inf

9.2 IK 15c — Decision Tree dengan Tuning cp [KUK 1.1–2.3]

library(rpart); library(rpart.plot)

set.seed(42)
ctrl_dt   <- trainControl(method = "cv", number = 5, classProbs = TRUE,
                           summaryFunction = twoClassSummary)
cp_grid   <- expand.grid(cp = seq(0.001, 0.05, by = 0.001))
model_dt_cv <- train(status_stunting ~ ., data = train_set, method = "rpart",
                     trControl = ctrl_dt, tuneGrid = cp_grid, metric = "ROC")

cat("=== HASIL TUNING cp ===\n")
#> === HASIL TUNING cp ===
print(model_dt_cv$results[, c("cp","ROC","Sens","Spec")])
#>       cp       ROC      Sens      Spec
#> 1  0.001 0.8495003 0.9147282 0.7119048
#> 2  0.002 0.8495003 0.9147282 0.7119048
#> 3  0.003 0.8495003 0.9147282 0.7119048
#> 4  0.004 0.8495003 0.9147282 0.7119048
#> 5  0.005 0.8495003 0.9147282 0.7119048
#> 6  0.006 0.8495003 0.9147282 0.7119048
#> 7  0.007 0.8495003 0.9147282 0.7119048
#> 8  0.008 0.8495003 0.9147282 0.7119048
#> 9  0.009 0.8495003 0.9147282 0.7119048
#> 10 0.010 0.8495003 0.9147282 0.7119048
#> 11 0.011 0.8495003 0.9147282 0.7119048
#> 12 0.012 0.8495003 0.9147282 0.7119048
#> 13 0.013 0.8439730 0.9216248 0.7023810
#> 14 0.014 0.8439730 0.9216248 0.7023810
#> 15 0.015 0.8439730 0.9216248 0.7023810
#> 16 0.016 0.8439730 0.9216248 0.7023810
#> 17 0.017 0.8439730 0.9216248 0.7023810
#> 18 0.018 0.8439730 0.9216248 0.7023810
#> 19 0.019 0.8439730 0.9216248 0.7023810
#> 20 0.020 0.8439730 0.9216248 0.7023810
#> 21 0.021 0.8439730 0.9216248 0.7023810
#> 22 0.022 0.8439730 0.9216248 0.7023810
#> 23 0.023 0.8439730 0.9216248 0.7023810
#> 24 0.024 0.8439730 0.9216248 0.7023810
#> 25 0.025 0.8310069 0.9419638 0.6823810
#> 26 0.026 0.8310069 0.9419638 0.6823810
#> 27 0.027 0.8310069 0.9419638 0.6823810
#> 28 0.028 0.8310069 0.9419638 0.6823810
#> 29 0.029 0.8310069 0.9419638 0.6823810
#> 30 0.030 0.8310069 0.9419638 0.6823810
#> 31 0.031 0.8310069 0.9419638 0.6823810
#> 32 0.032 0.8310069 0.9419638 0.6823810
#> 33 0.033 0.8310069 0.9419638 0.6823810
#> 34 0.034 0.8310069 0.9419638 0.6823810
#> 35 0.035 0.8310069 0.9419638 0.6823810
#> 36 0.036 0.8310069 0.9419638 0.6823810
#> 37 0.037 0.8310069 0.9419638 0.6823810
#> 38 0.038 0.8323629 0.9453536 0.6723810
#> 39 0.039 0.8323629 0.9453536 0.6723810
#> 40 0.040 0.8323629 0.9453536 0.6723810
#> 41 0.041 0.8323629 0.9453536 0.6723810
#> 42 0.042 0.8323629 0.9453536 0.6723810
#> 43 0.043 0.8323629 0.9453536 0.6723810
#> 44 0.044 0.8277866 0.9284044 0.6723810
#> 45 0.045 0.8277866 0.9284044 0.6723810
#> 46 0.046 0.8277866 0.9284044 0.6723810
#> 47 0.047 0.8277866 0.9284044 0.6723810
#> 48 0.048 0.8277866 0.9284044 0.6723810
#> 49 0.049 0.8277866 0.9284044 0.6723810
#> 50 0.050 0.8277866 0.9284044 0.6723810
cat("\ncp Optimal:", model_dt_cv$bestTune$cp, "\n")
#> 
#> cp Optimal: 0.012
res_dt    <- model_dt_cv$results
best_roc  <- max(res_dt$ROC)
se_roc    <- sd(res_dt$ROC) / sqrt(5)
cp_1se    <- res_dt$cp[res_dt$ROC >= (best_roc - se_roc)][1]
cat(sprintf("Best ROC: %.4f | 1-SE threshold: %.4f | cp (1-SE): %s\n",
            best_roc, best_roc - se_roc, cp_1se))
#> Best ROC: 0.8495 | 1-SE threshold: 0.8457 | cp (1-SE): 0.001
set.seed(42)
model_dt_final <- rpart(status_stunting ~ ., data = train_set, method = "class",
                         control = rpart.control(cp = cp_1se))
printcp(model_dt_final)
#> 
#> Classification tree:
#> rpart(formula = status_stunting ~ ., data = train_set, method = "class", 
#>     control = rpart.control(cp = cp_1se))
#> 
#> Variables actually used in tree construction:
#> [1] bmi         rasio_bb    tb_sekarang usia_bulan 
#> 
#> Root node error: 101/394 = 0.25635
#> 
#> n= 394 
#> 
#>         CP nsplit rel error  xerror     xstd
#> 1 0.356436      0   1.00000 1.00000 0.085807
#> 2 0.089109      1   0.64356 0.69307 0.075119
#> 3 0.054455      2   0.55446 0.58416 0.070126
#> 4 0.019802      5   0.37624 0.54455 0.068110
#> 5 0.009901      7   0.33663 0.53465 0.067588
#> 6 0.001000      8   0.32673 0.54455 0.068110
print(model_dt_final$variable.importance)
#>                bmi        tb_sekarang        bb_sekarang           rasio_bb 
#>         61.0341758         41.1727881         13.8814786         12.4942513 
#>         usia_bulan           bb_lahir         pendapatan       pend_ibu_enc 
#>         10.3824263          3.4376910          1.4257306          0.8554017 
#> kunjungan_posyandu 
#>          0.7136781
rpart.plot(model_dt_final, type = 4, extra = 104, under = TRUE,
           fallen.leaves = TRUE, branch.lty = 3, shadow.col = "gray75",
           col = "#1A237E", main = paste0("Decision Tree Prediksi Stunting  (cp = ", cp_1se, ")"),
           cex = 0.78, box.palette = c(COL_NORMAL, COL_STUNTING))
Decision Tree Prediksi Stunting (cp Optimal — 1-SE Rule)

Figure 9.1: Decision Tree Prediksi Stunting (cp Optimal — 1-SE Rule)


9.3 IK 16a — Random Forest dengan Tuning mtry [KUK 2.3]

library(randomForest)

set.seed(42)
ctrl_rf <- trainControl(method = "cv", number = 5, classProbs = TRUE,
                         summaryFunction = twoClassSummary)
mtry_grid   <- expand.grid(mtry = 2:8)
model_rf_cv <- train(status_stunting ~ ., data = train_set, method = "rf",
                     trControl = ctrl_rf, tuneGrid = mtry_grid,
                     metric = "ROC", ntree = 200)

cat("=== HASIL TUNING mtry ===\n")
#> === HASIL TUNING mtry ===
print(model_rf_cv$results[, c("mtry","ROC","Sens","Spec")])
#>   mtry       ROC      Sens      Spec
#> 1    2 0.9215732 0.9794857 0.6028571
#> 2    3 0.9382779 0.9692577 0.6223810
#> 3    4 0.9401904 0.9657510 0.6623810
#> 4    5 0.9470196 0.9589714 0.7023810
#> 5    6 0.9543966 0.9590298 0.7219048
#> 6    7 0.9521226 0.9589129 0.7319048
#> 7    8 0.9527794 0.9657510 0.7409524
cat("\nmtry Optimal:", model_rf_cv$bestTune$mtry, "\n")
#> 
#> mtry Optimal: 6
plot(model_rf_cv, main = "Tuning mtry Random Forest (5-Fold CV)")
Tuning Hyperparameter mtry — Random Forest

Figure 9.2: Tuning Hyperparameter mtry — Random Forest

imp_df <- varImp(model_rf_cv)$importance %>%
  as.data.frame() %>%
  tibble::rownames_to_column("Fitur") %>%
  rename(Importance = Overall) %>%
  arrange(Importance)

ggplot(imp_df, aes(x = reorder(Fitur, Importance), y = Importance, fill = Importance)) +
  geom_col(colour = "white", width = 0.72) +
  geom_text(aes(label = round(Importance, 1)), hjust = -0.1,
            size = 3.2, colour = "#37474F") +
  scale_fill_gradient(low = "#BBDEFB", high = COL_NORMAL) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.18))) +
  labs(title    = "Feature Importance — Random Forest",
       subtitle = paste0("mtry optimal = ", model_rf_cv$bestTune$mtry, " | ntree = 200"),
       x = NULL, y = "Importance Score (ROC)",
       caption  = "Semakin tinggi skor → semakin penting fitur dalam prediksi") +
  theme(legend.position = "none")
Feature Importance — Random Forest

Figure 9.3: Feature Importance — Random Forest


9.4 IK 16c — Penanganan Class Imbalance dengan ROSE [KUK 2.4]

library(ROSE)

set.seed(42)
cat("Distribusi kelas training (Sebelum ROSE):\n")
#> Distribusi kelas training (Sebelum ROSE):
print(table(train_set$status_stunting))
#> 
#>   Normal Stunting 
#>      293      101
train_rose <- ROSE(status_stunting ~ ., data = train_set, seed = 42)$data
cat("\nDistribusi kelas training (Sesudah ROSE):\n")
#> 
#> Distribusi kelas training (Sesudah ROSE):
print(table(train_rose$status_stunting))
#> 
#>   Normal Stunting 
#>      203      191
pred_rf_no <- predict(model_rf_cv, newdata = test_set)
cm_no      <- confusionMatrix(pred_rf_no, test_set$status_stunting, positive = "Stunting")

set.seed(42)
model_rf_rose <- train(status_stunting ~ ., data = train_rose, method = "rf",
                       trControl = trainControl(method = "cv", number = 5,
                                                classProbs = TRUE,
                                                summaryFunction = twoClassSummary),
                       tuneGrid = expand.grid(mtry = model_rf_cv$bestTune$mtry),
                       metric = "ROC", ntree = 200)
pred_rf_rose <- predict(model_rf_rose, newdata = test_set)
cm_rose      <- confusionMatrix(pred_rf_rose, test_set$status_stunting, positive = "Stunting")

cat("\n=== RF TANPA ROSE ===\n"); print(cm_no)
#> 
#> === RF TANPA ROSE ===
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Normal Stunting
#>   Normal       72        5
#>   Stunting      1       20
#>                                           
#>                Accuracy : 0.9388          
#>                  95% CI : (0.8715, 0.9772)
#>     No Information Rate : 0.7449          
#>     P-Value [Acc > NIR] : 6.059e-07       
#>                                           
#>                   Kappa : 0.83            
#>                                           
#>  Mcnemar's Test P-Value : 0.2207          
#>                                           
#>             Sensitivity : 0.8000          
#>             Specificity : 0.9863          
#>          Pos Pred Value : 0.9524          
#>          Neg Pred Value : 0.9351          
#>              Prevalence : 0.2551          
#>          Detection Rate : 0.2041          
#>    Detection Prevalence : 0.2143          
#>       Balanced Accuracy : 0.8932          
#>                                           
#>        'Positive' Class : Stunting        
#> 
cat("\n=== RF DENGAN ROSE ===\n"); print(cm_rose)
#> 
#> === RF DENGAN ROSE ===
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Normal Stunting
#>   Normal       66        7
#>   Stunting      7       18
#>                                           
#>                Accuracy : 0.8571          
#>                  95% CI : (0.7719, 0.9196)
#>     No Information Rate : 0.7449          
#>     P-Value [Acc > NIR] : 0.005306        
#>                                           
#>                   Kappa : 0.6241          
#>                                           
#>  Mcnemar's Test P-Value : 1.000000        
#>                                           
#>             Sensitivity : 0.7200          
#>             Specificity : 0.9041          
#>          Pos Pred Value : 0.7200          
#>          Neg Pred Value : 0.9041          
#>              Prevalence : 0.2551          
#>          Detection Rate : 0.1837          
#>    Detection Prevalence : 0.2551          
#>       Balanced Accuracy : 0.8121          
#>                                           
#>        'Positive' Class : Stunting        
#> 
df_rose_cmp <- data.frame(
  Metrik      = c("Sensitivity","Specificity","Accuracy","F1-Score"),
  Tanpa_ROSE  = c(round(cm_no$byClass["Sensitivity"],   4),
                  round(cm_no$byClass["Specificity"],   4),
                  round(cm_no$overall["Accuracy"],      4),
                  round(cm_no$byClass["F1"],            4)),
  Dengan_ROSE = c(round(cm_rose$byClass["Sensitivity"], 4),
                  round(cm_rose$byClass["Specificity"], 4),
                  round(cm_rose$overall["Accuracy"],    4),
                  round(cm_rose$byClass["F1"],          4))
)

kable(df_rose_cmp, caption = "Perbandingan Performa RF: Tanpa vs Dengan ROSE",
      col.names = c("Metrik","Tanpa ROSE","Dengan ROSE")) %>%
  kable_styling(bootstrap_options = c("striped","hover"),
                full_width = FALSE, position = "center") %>%
  row_spec(1, background = "#FCE4EC", bold = TRUE) %>%
  row_spec(0, background = "#37474F", color = "white", bold = TRUE)
Table 9.3: Table 9.4: Perbandingan Performa RF: Tanpa vs Dengan ROSE
Metrik Tanpa ROSE Dengan ROSE
Sensitivity Sensitivity 0.8000 0.7200
Specificity Specificity 0.9863 0.9041
Accuracy Accuracy 0.9388 0.8571
F1 F1-Score 0.8696 0.7200

10 Bukti Unit Kompetensi 10 — Mengevaluasi Hasil Pemodelan

Kode Unit: J.62DMI00.014.1


10.1 IK 17 — Evaluasi Ketiga Model pada Test Set [KUK 1.2, 2.1–2.2]

library(pROC)

test_set  <- readRDS("C:/Users/User/Downloads/test_set.rds") %>%
  mutate(status_stunting = factor(status_stunting, levels = c("Normal","Stunting")))
train_set <- readRDS("C:/Users/User/Downloads/train_set.rds") %>%
  mutate(status_stunting = factor(status_stunting, levels = c("Normal","Stunting")))

# ── Model 1: Logistic Regression ─────────────────────────────────────────────
set.seed(42)
model_lr <- glm(status_stunting ~ . - bmi - rasio_bb,
                data   = train_set,
                family = binomial(link = "logit"))
pred_lr_prob <- predict(model_lr, newdata = test_set, type = "response")
pred_lr_lbl  <- factor(ifelse(pred_lr_prob >= 0.5, "Stunting","Normal"),
                       levels = c("Normal","Stunting"))
cm_lr  <- confusionMatrix(pred_lr_lbl, test_set$status_stunting, positive = "Stunting")
roc_lr <- roc(test_set$status_stunting, pred_lr_prob,
              levels = c("Normal","Stunting"), quiet = TRUE)
cat("=== Logistic Regression ===\n"); print(cm_lr)
#> === Logistic Regression ===
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Normal Stunting
#>   Normal       72        1
#>   Stunting      1       24
#>                                           
#>                Accuracy : 0.9796          
#>                  95% CI : (0.9282, 0.9975)
#>     No Information Rate : 0.7449          
#>     P-Value [Acc > NIR] : 1.729e-10       
#>                                           
#>                   Kappa : 0.9463          
#>                                           
#>  Mcnemar's Test P-Value : 1               
#>                                           
#>             Sensitivity : 0.9600          
#>             Specificity : 0.9863          
#>          Pos Pred Value : 0.9600          
#>          Neg Pred Value : 0.9863          
#>              Prevalence : 0.2551          
#>          Detection Rate : 0.2449          
#>    Detection Prevalence : 0.2551          
#>       Balanced Accuracy : 0.9732          
#>                                           
#>        'Positive' Class : Stunting        
#> 
cat("AUC-ROC:", round(auc(roc_lr), 4), "\n")
#> AUC-ROC: 0.9989
# ── Model 2: Decision Tree ────────────────────────────────────────────────────
set.seed(42)
model_dt_fin  <- rpart(status_stunting ~ ., data = train_set,
                       method = "class", control = rpart.control(cp = 0.001))
pred_dt_prob  <- predict(model_dt_fin, newdata = test_set, type = "prob")[,"Stunting"]
pred_dt_lbl   <- predict(model_dt_fin, newdata = test_set, type = "class")
cm_dt  <- confusionMatrix(pred_dt_lbl, test_set$status_stunting, positive = "Stunting")
roc_dt <- roc(test_set$status_stunting, pred_dt_prob,
              levels = c("Normal","Stunting"), quiet = TRUE)
cat("\n=== Decision Tree ===\n"); print(cm_dt)
#> 
#> === Decision Tree ===
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Normal Stunting
#>   Normal       68        7
#>   Stunting      5       18
#>                                           
#>                Accuracy : 0.8776          
#>                  95% CI : (0.7959, 0.9351)
#>     No Information Rate : 0.7449          
#>     P-Value [Acc > NIR] : 0.0009983       
#>                                           
#>                   Kappa : 0.6691          
#>                                           
#>  Mcnemar's Test P-Value : 0.7728300       
#>                                           
#>             Sensitivity : 0.7200          
#>             Specificity : 0.9315          
#>          Pos Pred Value : 0.7826          
#>          Neg Pred Value : 0.9067          
#>              Prevalence : 0.2551          
#>          Detection Rate : 0.1837          
#>    Detection Prevalence : 0.2347          
#>       Balanced Accuracy : 0.8258          
#>                                           
#>        'Positive' Class : Stunting        
#> 
cat("AUC-ROC:", round(auc(roc_dt), 4), "\n")
#> AUC-ROC: 0.9099
# ── Model 3: Random Forest ────────────────────────────────────────────────────
set.seed(42)
model_rf_fin  <- randomForest(status_stunting ~ ., data = train_set, mtry = 6, ntree = 200)
pred_rf_prob  <- predict(model_rf_fin, newdata = test_set, type = "prob")[,"Stunting"]
pred_rf_lbl   <- predict(model_rf_fin, newdata = test_set, type = "class")
cm_rf  <- confusionMatrix(pred_rf_lbl, test_set$status_stunting, positive = "Stunting")
roc_rf <- roc(test_set$status_stunting, pred_rf_prob,
              levels = c("Normal","Stunting"), quiet = TRUE)
cat("\n=== Random Forest ===\n"); print(cm_rf)
#> 
#> === Random Forest ===
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction Normal Stunting
#>   Normal       73        5
#>   Stunting      0       20
#>                                           
#>                Accuracy : 0.949           
#>                  95% CI : (0.8849, 0.9832)
#>     No Information Rate : 0.7449          
#>     P-Value [Acc > NIR] : 1.099e-07       
#>                                           
#>                   Kappa : 0.8563          
#>                                           
#>  Mcnemar's Test P-Value : 0.07364         
#>                                           
#>             Sensitivity : 0.8000          
#>             Specificity : 1.0000          
#>          Pos Pred Value : 1.0000          
#>          Neg Pred Value : 0.9359          
#>              Prevalence : 0.2551          
#>          Detection Rate : 0.2041          
#>    Detection Prevalence : 0.2041          
#>       Balanced Accuracy : 0.9000          
#>                                           
#>        'Positive' Class : Stunting        
#> 
cat("AUC-ROC:", round(auc(roc_rf), 4), "\n")
#> AUC-ROC: 0.9816
rekap <- data.frame(
  Model       = c("Logistic Regression","Decision Tree","Random Forest"),
  Accuracy    = c(round(cm_lr$overall["Accuracy"],   4),
                  round(cm_dt$overall["Accuracy"],   4),
                  round(cm_rf$overall["Accuracy"],   4)),
  Sensitivity = c(round(cm_lr$byClass["Sensitivity"], 4),
                  round(cm_dt$byClass["Sensitivity"], 4),
                  round(cm_rf$byClass["Sensitivity"], 4)),
  Specificity = c(round(cm_lr$byClass["Specificity"], 4),
                  round(cm_dt$byClass["Specificity"], 4),
                  round(cm_rf$byClass["Specificity"], 4)),
  Precision   = c(round(cm_lr$byClass["Precision"],   4),
                  round(cm_dt$byClass["Precision"],   4),
                  round(cm_rf$byClass["Precision"],   4)),
  F1_Score    = c(round(cm_lr$byClass["F1"],           4),
                  round(cm_dt$byClass["F1"],           4),
                  round(cm_rf$byClass["F1"],           4)),
  AUC_ROC     = c(round(auc(roc_lr), 4),
                  round(auc(roc_dt), 4),
                  round(auc(roc_rf), 4)),
  Kappa       = c(round(cm_lr$overall["Kappa"],        4),
                  round(cm_dt$overall["Kappa"],        4),
                  round(cm_rf$overall["Kappa"],        4))
)

kable(rekap, caption = "Rekap Perbandingan Metrik Evaluasi — Ketiga Model",
      align = c("l", rep("r",7))) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) %>%
  column_spec(1, bold = TRUE, background = "#E8EAF6") %>%
  column_spec(3, background = "#E8F5E9",
              bold  = rekap$Sensitivity == max(rekap$Sensitivity),
              color = ifelse(rekap$Sensitivity == max(rekap$Sensitivity), COL_GREEN, "black")) %>%
  column_spec(7, background = "#E3F2FD",
              bold  = rekap$AUC_ROC == max(rekap$AUC_ROC),
              color = ifelse(rekap$AUC_ROC == max(rekap$AUC_ROC), COL_NORMAL, "black")) %>%
  add_header_above(c(" " = 1, "Performa Model" = 7)) %>%
  row_spec(0, background = "#1A237E", color = "white", bold = TRUE)
Table 10.1: Table 10.2: Rekap Perbandingan Metrik Evaluasi — Ketiga Model
Performa Model
Model Accuracy Sensitivity Specificity Precision F1_Score AUC_ROC Kappa
Logistic Regression 0.9796 0.96 0.9863 0.9600 0.9600 0.9989 0.9463
Decision Tree 0.8776 0.72 0.9315 0.7826 0.7500 0.9099 0.6691
Random Forest 0.9490 0.80 1.0000 1.0000 0.8889 0.9816 0.8563
roc_data <- bind_rows(
  data.frame(fpr = 1 - roc_lr$specificities, tpr = roc_lr$sensitivities,
             Model = paste0("Logistic Regression  (AUC = ", round(auc(roc_lr),3), ")")),
  data.frame(fpr = 1 - roc_dt$specificities, tpr = roc_dt$sensitivities,
             Model = paste0("Decision Tree  (AUC = ",       round(auc(roc_dt),3), ")")),
  data.frame(fpr = 1 - roc_rf$specificities, tpr = roc_rf$sensitivities,
             Model = paste0("Random Forest  (AUC = ",       round(auc(roc_rf),3), ")"))
)

col_roc <- setNames(
  c(COL_NORMAL, COL_GREEN, COL_STUNTING),
  c(paste0("Logistic Regression  (AUC = ", round(auc(roc_lr),3), ")"),
    paste0("Decision Tree  (AUC = ",       round(auc(roc_dt),3), ")"),
    paste0("Random Forest  (AUC = ",       round(auc(roc_rf),3), ")"))
)

ggplot(roc_data, aes(x = fpr, y = tpr, colour = Model, linetype = Model)) +
  geom_line(linewidth = 1.1) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed", colour = "gray55", linewidth = 0.65) +
  geom_point(data = roc_data %>% group_by(Model) %>%
               filter(abs(fpr - 0.2) == min(abs(fpr - 0.2))),
             size = 3, shape = 16) +
  annotate("text", x = 0.6, y = 0.35, label = "Garis Acak (AUC = 0.5)",
           colour = "gray50", size = 3, fontface = "italic") +
  scale_colour_manual(values = col_roc) +
  scale_linetype_manual(values = c("solid","longdash","dotdash")) +
  coord_equal(xlim = c(0,1), ylim = c(0,1)) +
  labs(title    = "ROC Curve — Perbandingan Ketiga Model",
       subtitle = "Prediksi Risiko Stunting Balita | Test Set (20%)",
       x = "1 − Specificity (False Positive Rate)",
       y = "Sensitivity (True Positive Rate)",
       colour = "Model", linetype = "Model",
       caption = "Sumber: data_stunting_model.csv (test set)") +
  theme(legend.position  = "bottom",
        legend.direction = "vertical",
        legend.key.width = unit(2.2, "cm"))
ROC Curve — Perbandingan Ketiga Model Klasifikasi Stunting

Figure 10.1: ROC Curve — Perbandingan Ketiga Model Klasifikasi Stunting


11 Bukti Unit Kompetensi 11 — Proses Review Pemodelan

Kode Unit: J.62DMI00.015.1


11.1 IK 18 — Checklist Review & Penilaian Kualitas Proses [KUK 1.1–2.2]

Table 11.1: Table 11.2: Checklist Review Kesesuaian Proses Pemodelan
No Tahapan Status Kesesuaian Catatan
1 Pengumpulan Data ✓ Selesai ✓ Sesuai Rencana Dataset 500 balita, 17 variabel tersedia; 3 variabel memiliki missing values
2 Telaah Data (EDA) ✓ Selesai ✓ Sesuai Rencana Korelasi, distribusi, visualisasi, hipotesis — semua tersedia
3 Validasi Data ✓ Selesai ✓ Sesuai Rencana 59 nilai tidak valid (batas WHO); 45 sel missing; kecukupan data terpenuhi (rasio 35,71)
4 Pembersihan Data ✓ Selesai ✓ Sesuai Rencana Outlier dihapus (batas soal); imputasi per kelompok usia; data bersih tersimpan
5 Konstruksi Data ✓ Selesai ✓ Sesuai Rencana Encoding, feature engineering (BMI, rasio BB), normalisasi, VIF — selesai
6 Skenario Model ✓ Selesai ✓ Sesuai Rencana 2 skenario (80:20 split & 5-fold CV); metrik evaluasi ditetapkan
7 Pembangunan Model ✓ Selesai ✓ Sesuai Rencana 3 model dibangun: LR, DT, RF; hyperparameter tuning dilakukan dengan CV
8 Evaluasi Model ✓ Selesai ✓ Sesuai Rencana Confusion matrix, Sensitivity, Specificity, F1, AUC-ROC dihitung untuk ketiga model

Laporan ini dibuat menggunakan R Markdown — diknit ke format Word dan Material HTML.
Tanggal: 30 April 2026 | LSP Politeknik Statistika STIS 2026