1 Pendahuluan

Tujuan analisis
Laporan ini menganalisis empat model regresi penting untuk respons non-kontinu:

  1. Regresi logistik biner — digunakan ketika kategori respons memiliki dua kategori.
  2. Regresi logistik multinomial — digunakan ketika kategori respons tidak memiliki urutan alami (nominal).
  3. Regresi logistik ordinal — digunakan ketika kategori respons memiliki urutan alami.
  4. Regresi Poisson — digunakan ketika respons berupa data hitung (count data).
Jenis Respons Contoh Model
Biner (0/1) Kunjungan tinggi vs rendah Regresi Logistik Biner
Nominal / tidak berurutan Pilihan moda transportasi Regresi Multinomial
Ordinal / berurutan Tingkat kepuasan pasien Regresi Ordinal
Hitungan / count Jumlah kunjungan dokter Regresi Poisson

2 Peta Konsep

Regresi logistik biner bertanya: > Bagaimana prediktor memengaruhi peluang terjadinya suatu kejadian (Ya/Tidak)?

Regresi logistik multinomial bertanya: > Dibandingkan kategori referensi, bagaimana peluang seseorang masuk kategori A, B, atau C berubah ketika prediktor meningkat?

Regresi logistik ordinal bertanya: > Bagaimana prediktor menggeser kecenderungan seseorang berada pada tingkat respons yang lebih tinggi atau lebih rendah?

Regresi Poisson bertanya: > Bagaimana prediktor memengaruhi rata-rata jumlah kejadian yang diamati?

Aspek Biner Multinomial Ordinal Poisson
Skala respons Biner (0/1) Nominal Ordinal Count (≥ 0)
Fungsi penghubung Logit Logit (baseline-cat.) Cumulative logit Log
Interpretasi exp(β) Odds Ratio (OR) Relative Risk Ratio (RRR) Odds Ratio (OR) Incidence Rate Ratio (IRR)
Paket R stats::glm nnet::multinom MASS::polr stats::glm

3 Data yang Digunakan

3.1 Data Biner dan Poisson — Kunjungan ke Dokter

Sumber data: Adaptasi dari Deb & Trivedi (1997), Demand for Medical Care by the Elderly, Journal of Applied Econometrics. Variabel dependen Poisson (Y): Jumlah kunjungan ke dokter dalam setahun. Variabel dependen Biner (Y): Kunjungan tinggi (≥ 5 kali) vs kunjungan rendah (< 5 kali). Variabel independen: Usia, Pendapatan (juta Rp/bulan), Skor kesehatan (1–10), Asuransi (1 = Ya, 0 = Tidak). n = 20 observasi

# FIX: asuransi dibiarkan numerik (0/1) agar glm Poisson tidak error
# FIX: data ini dipakai bersama untuk Bagian I (Biner) dan Bagian IV (Poisson)
data_pois <- tibble(
  usia       = c(25, 30, 35, 40, 45, 50, 55, 60, 28, 33,
                 38, 43, 48, 53, 58, 27, 32, 37, 42, 47),
  pendapatan = c( 3,  4,  5,  6,  7,  8,  9, 10,3.5,4.5,
                5.5,6.5,7.5,8.5,9.5,3.2,4.2,5.2,6.2,7.2),
  skor_kes   = c( 8,  7,  6,  5,  4,  3,  2,  1,  8,  7,
                  6,  5,  4,  3,  2,  8,  7,  6,  5,  4),
  asuransi   = c( 1,  1,  1,  1,  1,  1,  1,  1,  0,  0,
                  0,  0,  0,  0,  0,  1,  1,  0,  0,  1),
  kunjungan  = c( 1,  2,  2,  3,  4,  5,  6,  7,  1,  2,
                  3,  3,  4,  5,  6,  1,  2,  3,  4,  5)
)

knitr::kable(data_pois, caption = "Data Kunjungan ke Dokter (n = 20)")
Data Kunjungan ke Dokter (n = 20)
usia pendapatan skor_kes asuransi kunjungan
25 3.0 8 1 1
30 4.0 7 1 2
35 5.0 6 1 2
40 6.0 5 1 3
45 7.0 4 1 4
50 8.0 3 1 5
55 9.0 2 1 6
60 10.0 1 1 7
28 3.5 8 0 1
33 4.5 7 0 2
38 5.5 6 0 3
43 6.5 5 0 3
48 7.5 4 0 4
53 8.5 3 0 5
58 9.5 2 0 6
27 3.2 8 1 1
32 4.2 7 1 2
37 5.2 6 0 3
42 6.2 5 0 4
47 7.2 4 1 5

3.2 Data Multinomial — Pemilihan Moda Transportasi

Sumber data: Adaptasi dari Greene (2012), Econometric Analysis, 7th ed. Variabel dependen (Y): Moda transportasi yang dipilih — Bus, Kereta, atau Mobil Pribadi. n = 20 observasi

data_multi <- tibble(
  pendapatan = c(3.2, 7.5, 4.8, 2.1, 9.3, 5.6, 1.8,11.2, 6.4, 3.9,
                 8.7, 5.1, 2.7,13.5, 4.3, 6.9, 2.4, 7.8,10.1, 3.5),
  jarak      = c( 12,  35,  22,   8,  45,  28,   6,  52,  30,  15,
                  40,  25,  10,  60,  20,  33,   9,  38,  50,  14),
  usia       = c( 28,  42,  35,  23,  50,  38,  20,  55,  45,  31,
                  48,  36,  25,  58,  33,  41,  22,  46,  53,  29),
  gender     = c("L","L","P","P","L","P","L","L","P","L",
                 "P","L","P","L","P","L","L","P","L","P"),
  moda       = c("Bus","Mobil","Kereta","Bus","Mobil","Kereta",
                 "Bus","Mobil","Kereta","Bus","Mobil","Kereta",
                 "Bus","Mobil","Kereta","Mobil","Bus","Kereta",
                 "Mobil","Bus")
)

data_multi$moda <- relevel(factor(data_multi$moda), ref = "Bus")
knitr::kable(data_multi, caption = "Data Pemilihan Moda Transportasi (n = 20)")
Data Pemilihan Moda Transportasi (n = 20)
pendapatan jarak usia gender moda
3.2 12 28 L Bus
7.5 35 42 L Mobil
4.8 22 35 P Kereta
2.1 8 23 P Bus
9.3 45 50 L Mobil
5.6 28 38 P Kereta
1.8 6 20 L Bus
11.2 52 55 L Mobil
6.4 30 45 P Kereta
3.9 15 31 L Bus
8.7 40 48 P Mobil
5.1 25 36 L Kereta
2.7 10 25 P Bus
13.5 60 58 L Mobil
4.3 20 33 P Kereta
6.9 33 41 L Mobil
2.4 9 22 L Bus
7.8 38 46 P Kereta
10.1 50 53 L Mobil
3.5 14 29 P Bus

3.3 Data Ordinal — Kepuasan Pasien

Sumber data: Dataset Patient Satisfaction dari Kaggle (Dimitrievska, 2020). Dataset berisi 453 responden yang menilai kualitas pelayanan kesehatan di North Macedonia. Variabel dependen (Y): satisfaction in RM (1 = Tidak Puas, 2 = Sebagian Puas, 3 = Puas). Variabel independen: Check up appointment, Time waiting, Admin procedures, Hygiene and cleaning, Communication with dr, Exact diagnosis, Modern equipment. n = 453 observasi

# Sesuaikan path file CSV di bawah ini
data_ord_raw <- read_csv("C:/Users/Lenovo/OneDrive/Desktop/SEMESTER 4/ADK/adk tiba tiba reglog lengkap/datasetsatisfaction.csv")

data_ord <- data_ord_raw %>%
  rename(
    kepuasan      = `satisfaction in RM`,
    checkup       = `Check up appointment`,
    waiting       = `Time waiting`,
    admin         = `Admin procedures`,
    hygiene       = `Hygiene and cleaning`,
    communication = `Communication with dr`,
    diagnosis     = `Exact diagnosis`,
    equipment     = `Modern equipment`
  ) %>%
  # FIX: pastikan semua prediktor numerik (bukan karakter)
  mutate(across(c(checkup, waiting, admin, hygiene,
                  communication, diagnosis, equipment),
                as.numeric)) %>%
  na.omit() %>%
  mutate(
    kepuasan = ordered(kepuasan,
                       levels = 1:3,
                       labels = c("Tidak Puas", "Sebagian Puas", "Puas"))
  )

cat("Dimensi data ordinal:", nrow(data_ord), "x", ncol(data_ord), "\n")
## Dimensi data ordinal: 452 x 17
head(data_ord)
## # A tibble: 6 × 17
##   kepuasan      checkup waiting admin hygiene `Time of appointment`
##   <ord>           <dbl>   <dbl> <dbl>   <dbl>                 <dbl>
## 1 Tidak Puas          1       1     1       1                     1
## 2 Sebagian Puas       2       1     2       1                     2
## 3 Sebagian Puas       1       1     2       2                     2
## 4 Sebagian Puas       2       4     1       1                     2
## 5 Puas                4       1     1       2                     1
## 6 Sebagian Puas       2       3     4       1                     1
## # ℹ 11 more variables: `Quality/experience dr.` <dbl>,
## #   `Specialists avaliable` <dbl>, communication <dbl>, diagnosis <dbl>,
## #   equipment <dbl>, `friendly health care workers` <dbl>,
## #   `lab services` <dbl>, `avaliablity of drugs` <dbl>, `waiting rooms` <dbl>,
## #   `hospital rooms quality` <dbl>, `parking, playing rooms, caffes` <dbl>

4 Bagian I — Regresi Logistik Biner

4.1 Definisi

Regresi logistik biner digunakan ketika variabel respons hanya memiliki dua kategori:

\[Y_i \in \{0, 1\}\]

Model memodelkan probabilitas kejadian \(Y_i = 1\) melalui fungsi logit:

\[\text{logit}(\pi_i) = \log\left(\frac{\pi_i}{1-\pi_i}\right) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}\]

sehingga peluang prediksi:

\[\hat{\pi}_i = \frac{1}{1 + \exp(-\hat{\eta}_i)}\]

4.2 Data yang Digunakan

Data menggunakan dataset yang sama dengan regresi Poisson. Variabel respons dibentuk menjadi:

  • 1 = Kunjungan Tinggi (≥ 5 kali/tahun)
  • 0 = Kunjungan Rendah (< 5 kali/tahun)
data_bin <- data_pois %>%
  mutate(kunjungan_biner = as.integer(kunjungan >= 5))

knitr::kable(data_bin, caption = "Data Regresi Logistik Biner")
Data Regresi Logistik Biner
usia pendapatan skor_kes asuransi kunjungan kunjungan_biner
25 3.0 8 1 1 0
30 4.0 7 1 2 0
35 5.0 6 1 2 0
40 6.0 5 1 3 0
45 7.0 4 1 4 0
50 8.0 3 1 5 1
55 9.0 2 1 6 1
60 10.0 1 1 7 1
28 3.5 8 0 1 0
33 4.5 7 0 2 0
38 5.5 6 0 3 0
43 6.5 5 0 3 0
48 7.5 4 0 4 0
53 8.5 3 0 5 1
58 9.5 2 0 6 1
27 3.2 8 1 1 0
32 4.2 7 1 2 0
37 5.2 6 0 3 0
42 6.2 5 0 4 0
47 7.2 4 1 5 1

4.3 Distribusi Respons Biner

data_bin %>%
  count(kunjungan_biner) %>%
  mutate(
    Label   = ifelse(kunjungan_biner == 1, "Tinggi (≥5)", "Rendah (<5)"),
    Persen  = round(n / sum(n) * 100, 1)
  ) %>%
  knitr::kable(
    col.names = c("Kategori (0/1)", "Label", "Frekuensi", "Persentase (%)"),
    caption   = "Distribusi Respons Biner"
  )
Distribusi Respons Biner
Kategori (0/1) Label Frekuensi Persentase (%)
0 14 Rendah (<5) 70
1 6 Tinggi (≥5) 30

4.4 Formulasi Model

\[\log\left(\frac{P(Y=1)}{1-P(Y=1)}\right) = \beta_0 + \beta_1(\text{usia}) + \beta_2(\text{pendapatan}) + \beta_3(\text{skor\_kes}) + \beta_4(\text{asuransi})\]

dengan \(Y = 1\) = kunjungan tinggi, \(Y = 0\) = kunjungan rendah.

4.5 Fungsi Likelihood dan Estimasi

Model logistik biner diestimasi menggunakan Maximum Likelihood Estimation (MLE):

\[L(\beta) = \prod_{i=1}^{n} \pi_i^{y_i}(1-\pi_i)^{1-y_i}\]

\[\ell(\beta) = \sum_{i=1}^{n}\left[y_i\log(\pi_i) + (1-y_i)\log(1-\pi_i)\right]\]

4.6 Pembagian Training dan Testing (Stratified 80:20)

safe_div <- function(num, den) ifelse(den == 0, NA_real_, num / den)

stratified_split <- function(y, prop = 0.8) {
  idx_by_class <- split(seq_along(y), y)
  train_idx    <- lapply(
    idx_by_class,
    function(idx) sample(idx, size = floor(length(idx) * prop))
  )
  unlist(train_idx, use.names = FALSE)
}

set.seed(42)
train_id  <- stratified_split(data_bin$kunjungan_biner, prop = 0.8)
train_bin <- data_bin[train_id, ]
test_bin  <- data_bin[-train_id, ]

bind_rows(
  train_bin %>% count(kunjungan_biner) %>% mutate(Set = "Training"),
  test_bin  %>% count(kunjungan_biner) %>% mutate(Set = "Testing")
) %>%
  group_by(Set) %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  ungroup() %>%
  rename(`Kelas Y` = kunjungan_biner, Jumlah = n) %>%
  dplyr::select(Set, `Kelas Y`, Jumlah, Proporsi) %>%
  knitr::kable(caption = "Distribusi Kelas pada Training dan Testing")
Distribusi Kelas pada Training dan Testing
Set Kelas Y Jumlah Proporsi
Training 0 11 73.3%
Training 1 4 26.7%
Testing 0 3 60.0%
Testing 1 2 40.0%

4.7 Estimasi Model dengan R

model_bin <- glm(
  kunjungan_biner ~ usia + pendapatan + skor_kes + asuransi,
  data   = train_bin,
  family = binomial(link = "logit")
)

summary(model_bin)
## 
## Call:
## glm(formula = kunjungan_biner ~ usia + pendapatan + skor_kes + 
##     asuransi, family = binomial(link = "logit"), data = train_bin)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.823e+03  1.393e+07       0        1
## usia         1.187e+02  4.476e+05       0        1
## pendapatan  -5.298e+02  2.344e+06       0        1
## skor_kes     1.913e+01  1.087e+06       0        1
## asuransi     8.857e+01  6.009e+05       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.7397e+01  on 14  degrees of freedom
## Residual deviance: 8.8423e-10  on 10  degrees of freedom
## AIC: 10
## 
## Number of Fisher Scoring iterations: 25

4.8 Odds Ratio (OR) dan Confidence Interval

or_bin <- exp(coef(model_bin))
ci_bin <- exp(confint.default(model_bin))

cbind(
  OR         = round(or_bin, 3),
  `CI 2.5%`  = round(ci_bin[, 1], 3),
  `CI 97.5%` = round(ci_bin[, 2], 3)
) %>%
  knitr::kable(caption = "Odds Ratio (OR) dan Confidence Interval 95%")
Odds Ratio (OR) dan Confidence Interval 95%
OR CI 2.5% CI 97.5%
(Intercept) 0.000000e+00 0 Inf
usia 3.573659e+51 0 Inf
pendapatan 0.000000e+00 0 Inf
skor_kes 2.027883e+08 0 Inf
asuransi 2.909858e+38 0 Inf

4.9 Interpretasi Odds Ratio

\[OR = e^{\hat{\beta}}\]

Nilai OR Interpretasi
OR > 1 Meningkatkan peluang kunjungan tinggi
OR < 1 Menurunkan peluang kunjungan tinggi
OR = 1 Tidak berpengaruh

Contoh:

  • OR usia = 1.10 → setiap kenaikan 1 tahun usia meningkatkan odds kunjungan tinggi sebesar 10%.
  • OR skor_kes = 0.80 → setiap kenaikan 1 poin skor kesehatan menurunkan odds kunjungan tinggi sebesar 20%.

4.10 Prediksi dan Evaluasi Klasifikasi

4.10.1 Rumus Confusion Matrix dan Metrik

\[\text{Akurasi} = \frac{TP+TN}{TP+TN+FP+FN}, \quad \text{Sensitivity} = \frac{TP}{TP+FN}, \quad \text{Specificity} = \frac{TN}{TN+FP}\]

# Fungsi metrik — mengikuti gaya dosen (referensi kredit macet)
classification_metrics <- function(actual, prob, threshold = 0.5) {
  pred <- as.integer(prob >= threshold)
  tp   <- sum(pred == 1 & actual == 1)
  tn   <- sum(pred == 0 & actual == 0)
  fp   <- sum(pred == 1 & actual == 0)
  fn   <- sum(pred == 0 & actual == 1)
  data.frame(
    Threshold          = threshold,
    Akurasi            = safe_div(tp + tn, tp + tn + fp + fn),
    `Error rate`       = 1 - safe_div(tp + tn, tp + tn + fp + fn),
    Sensitivity        = safe_div(tp, tp + fn),
    Specificity        = safe_div(tn, tn + fp),
    Presisi            = safe_div(tp, tp + fp),
    NPV                = safe_div(tn, tn + fn),
    `F1-score`         = safe_div(2 * tp, 2 * tp + fp + fn),
    `Balanced accuracy`= (safe_div(tp, tp + fn) + safe_div(tn, tn + fp)) / 2,
    FPR                = 1 - safe_div(tn, tn + fp),
    FNR                = 1 - safe_div(tp, tp + fn),
    check.names = FALSE
  )
}

confusion_matrix_tbl <- function(actual, prob, threshold = 0.5) {
  pred  <- factor(ifelse(prob >= threshold, "Tinggi", "Rendah"),
                  levels = c("Rendah", "Tinggi"))
  act_f <- factor(ifelse(actual == 1, "Tinggi", "Rendah"),
                  levels = c("Rendah", "Tinggi"))
  as.data.frame.matrix(table(Prediksi = pred, Aktual = act_f))
}
p_train_b <- predict(model_bin, newdata = train_bin, type = "response")
p_test_b  <- predict(model_bin, newdata = test_bin,  type = "response")

# Contoh prediksi pada data testing
head(data.frame(
  `Y aktual`           = test_bin$kunjungan_biner,
  `Peluang (tinggi)`   = round(p_test_b, 4),
  `Prediksi c=0.50`    = ifelse(p_test_b >= 0.5, 1, 0),
  check.names = FALSE
), 8) %>%
  knitr::kable(caption = "Contoh Prediksi Probabilitas pada Data Testing")
Contoh Prediksi Probabilitas pada Data Testing
Y aktual Peluang (tinggi) Prediksi c=0.50
0 0 0
1 1 1
1 1 1
0 0 0
0 0 0
cm_50 <- confusion_matrix_tbl(test_bin$kunjungan_biner, p_test_b, 0.5)
knitr::kable(cm_50, caption = "Confusion Matrix Testing — Threshold 0.50")
Confusion Matrix Testing — Threshold 0.50
Rendah Tinggi
Rendah 3 0
Tinggi 0 2
met_50 <- classification_metrics(test_bin$kunjungan_biner, p_test_b, 0.5) %>%
  mutate(across(where(is.numeric), ~ round(.x, 3)))
knitr::kable(met_50, caption = "Metrik Evaluasi Testing — Threshold 0.50")
Metrik Evaluasi Testing — Threshold 0.50
Threshold Akurasi Error rate Sensitivity Specificity Presisi NPV F1-score Balanced accuracy FPR FNR
0.5 1 0 1 1 1 1 1 1 0 0

4.11 Kurva ROC dan Threshold Optimal

Indeks Youden: \(J = \text{Sensitivity} + \text{Specificity} - 1\)

Threshold optimal \(c^*\) adalah nilai yang memaksimalkan \(J\).

roc_points <- function(actual, prob) {
  thresholds <- c(Inf, sort(unique(prob), decreasing = TRUE), -Inf)
  out <- lapply(thresholds, function(th) {
    pred <- as.integer(prob >= th)
    tp   <- sum(pred == 1 & actual == 1)
    tn   <- sum(pred == 0 & actual == 0)
    fp   <- sum(pred == 1 & actual == 0)
    fn   <- sum(pred == 0 & actual == 1)
    sens <- safe_div(tp, tp + fn)
    spec <- safe_div(tn, tn + fp)
    data.frame(threshold   = th,
               sensitivity = sens,
               specificity = spec,
               fpr         = 1 - spec,
               youden      = sens + spec - 1)
  })
  bind_rows(out)
}

auc_value <- function(roc_df) {
  roc_sorted <- roc_df %>% arrange(fpr, sensitivity)
  sum(diff(roc_sorted$fpr) *
        (head(roc_sorted$sensitivity, -1) + tail(roc_sorted$sensitivity, -1)) / 2)
}

roc_train_b <- roc_points(train_bin$kunjungan_biner, p_train_b) %>%
  mutate(data = "Training")
roc_test_b  <- roc_points(test_bin$kunjungan_biner,  p_test_b) %>%
  mutate(data = "Testing")

auc_train_b <- auc_value(roc_train_b)
auc_test_b  <- auc_value(roc_test_b)

optimal_train_b <- roc_train_b %>%
  filter(is.finite(threshold)) %>%
  arrange(desc(youden), desc(sensitivity)) %>%
  dplyr::slice(1)

threshold_opt_b <- optimal_train_b$threshold[1]

test_at_opt_b <- roc_test_b %>%
  filter(is.finite(threshold)) %>%
  dplyr::slice_min(abs(threshold - threshold_opt_b), n = 1, with_ties = FALSE) %>%
  mutate(data = "Testing pada threshold optimal")

data.frame(
  Keterangan = c("Threshold optimal (Youden)", "Sensitivity (training)",
                 "Specificity (training)", "Indeks Youden (training)",
                 "AUC training", "AUC testing"),
  Nilai = round(c(threshold_opt_b,
                  optimal_train_b$sensitivity,
                  optimal_train_b$specificity,
                  optimal_train_b$youden,
                  auc_train_b, auc_test_b), 4)
) %>%
  knitr::kable(caption = "Threshold Optimal Berdasarkan ROC Training")
Threshold Optimal Berdasarkan ROC Training
Keterangan Nilai
Threshold optimal (Youden) 1
Sensitivity (training) 1
Specificity (training) 1
Indeks Youden (training) 1
AUC training 1
AUC testing 1
ggplot(bind_rows(roc_train_b, roc_test_b),
       aes(x = fpr, y = sensitivity, color = data)) +
  geom_path(linewidth = 1.1) +
  geom_abline(intercept = 0, slope = 1,
              linetype = "dashed", color = "#6c757d") +
  geom_point(data = optimal_train_b,
             aes(x = fpr, y = sensitivity),
             inherit.aes = FALSE,
             color = "#ffb703", fill = "#fb8500",
             shape = 21, size = 4, stroke = 1.2) +
  geom_point(data = test_at_opt_b,
             aes(x = fpr, y = sensitivity),
             inherit.aes = FALSE,
             color = "#8338ec", fill = "#3a86ff",
             shape = 24, size = 4, stroke = 1.2) +
  coord_equal() +
  scale_color_manual(values = c("Training" = "#0077b6", "Testing" = "#e76f51")) +
  labs(
    title    = "Kurva ROC Model Regresi Logistik Biner",
    subtitle = paste0("AUC training = ", round(auc_train_b, 3),
                      ";  AUC testing = ", round(auc_test_b, 3),
                      ";  threshold optimal = ", round(threshold_opt_b, 3)),
    x     = "False positive rate (1 - specificity)",
    y     = "Sensitivity / true positive rate",
    color = "Data"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")

4.12 Evaluasi dengan Threshold Optimal

cm_opt_b <- confusion_matrix_tbl(test_bin$kunjungan_biner, p_test_b, threshold_opt_b)
knitr::kable(cm_opt_b,
             caption = paste0("Confusion Matrix Testing — Threshold Optimal (",
                              round(threshold_opt_b, 3), ")"))
Confusion Matrix Testing — Threshold Optimal (1)
Rendah Tinggi
Rendah 3 1
Tinggi 0 1
bind_rows(
  classification_metrics(test_bin$kunjungan_biner, p_test_b, 0.5) %>%
    mutate(Aturan = "Threshold 0.50"),
  classification_metrics(test_bin$kunjungan_biner, p_test_b, threshold_opt_b) %>%
    mutate(Aturan = paste0("Threshold optimal (", round(threshold_opt_b, 3), ")"))
) %>%
  mutate(across(where(is.numeric), ~ round(.x, 3))) %>%
  dplyr::select(Aturan, everything()) %>%
  knitr::kable(caption = "Perbandingan Metrik Testing: Threshold Default vs Threshold Optimal")
Perbandingan Metrik Testing: Threshold Default vs Threshold Optimal
Aturan Threshold Akurasi Error rate Sensitivity Specificity Presisi NPV F1-score Balanced accuracy FPR FNR
Threshold 0.50 0.5 1.0 0.0 1.0 1 1 1.00 1.000 1.00 0 0.0
Threshold optimal (1) 1.0 0.8 0.2 0.5 1 1 0.75 0.667 0.75 0 0.5
test_bin %>%
  mutate(
    peluang = p_test_b,
    Status  = ifelse(kunjungan_biner == 1, "Tinggi (≥5)", "Rendah (<5)")
  ) %>%
  ggplot(aes(x = peluang, fill = Status)) +
  geom_density(alpha = 0.55, color = "white", linewidth = 0.7) +
  geom_vline(xintercept = threshold_opt_b,
             color = "#fb8500", linewidth = 1.2, linetype = "dashed") +
  annotate("label",
           x = threshold_opt_b, y = Inf,
           label = paste0("threshold = ", round(threshold_opt_b, 3)),
           vjust = 1.4, fill = "#fff3b0", color = "#5f370e", label.size = 0) +
  scale_fill_manual(values = c("Rendah (<5)" = "#e76f51",
                               "Tinggi (≥5)" = "#2f7f73")) +
  labs(
    title    = "Distribusi Peluang Prediksi pada Data Testing",
    subtitle = "Garis putus-putus oranye = threshold optimal",
    x = "Peluang prediksi kunjungan tinggi",
    y = "Kepadatan", fill = "Status aktual"
  ) +
  theme_minimal(base_size = 12)

4.13 Pemeriksaan Goodness of Fit (Hosmer-Lemeshow)

# FIX: gunakan data training (bukan seluruh data) agar konsisten dengan model
hl_result <- hoslem.test(
  x = train_bin$kunjungan_biner,
  y = fitted(model_bin),
  g = min(10, floor(nrow(train_bin) / 2))
)
print(hl_result)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  train_bin$kunjungan_biner, fitted(model_bin)
## X-squared = 2.018e-11, df = 1, p-value = 1

Interpretasi: p-value > 0,05 → model sesuai dengan data (tidak ada lack of fit). p-value ≤ 0,05 → terdapat indikasi lack of fit, model perlu dievaluasi kembali.

4.14 Visualisasi Probabilitas Prediksi vs Usia

data_bin %>%
  mutate(prob_pred = predict(model_bin, newdata = data_bin, type = "response")) %>%
  ggplot(aes(x = usia, y = prob_pred)) +
  geom_point(aes(color = factor(kunjungan_biner)), size = 3) +
  geom_smooth(method = "loess", se = FALSE, color = "#243142", linewidth = 1) +
  scale_color_manual(values = c("0" = "#e76f51", "1" = "#2f7f73"),
                     labels = c("0" = "Rendah", "1" = "Tinggi")) +
  labs(
    title    = "Probabilitas Prediksi Kunjungan Tinggi vs Usia",
    subtitle = "Model Regresi Logistik Biner",
    x = "Usia", y = "Probabilitas prediksi", color = "Aktual"
  ) +
  theme_minimal(base_size = 12)

4.15 Asumsi Regresi Logistik Biner

Asumsi utama:

  1. Variabel respons bersifat biner.
  2. Observasi independen.
  3. Tidak terdapat multikolinearitas berat antar prediktor.
  4. Hubungan linear antara prediktor kontinu dan nilai logit.
  5. Ukuran sampel memadai (minimal 10 kejadian per prediktor).

4.16 Interpretasi Hasil Model Biner

Contoh interpretasi:

  • Jika OR usia = 1.08 → setiap kenaikan satu tahun usia meningkatkan odds seseorang masuk kelompok kunjungan tinggi sebesar 8%, dengan variabel lain konstan.
  • Jika OR skor_kes = 0.70 → setiap kenaikan satu poin skor kesehatan menurunkan odds kunjungan tinggi sebesar 30%.
  • AUC mengukur kemampuan diskriminasi model: nilai > 0.7 dianggap baik, > 0.8 sangat baik.

5 Bagian II — Regresi Logistik Multinomial

5.1 Definisi

Regresi logistik multinomial untuk respons kategorik nominal dengan \(K > 2\) kategori. Dengan kategori ke-\(K\) sebagai referensi, untuk \(k = 1, \ldots, K-1\):

\[\log\left(\frac{P(Y_i=k \mid \mathbf{x}_i)}{P(Y_i=K \mid \mathbf{x}_i)}\right) = \beta_{0k}+\beta_{1k}x_{i1}+\cdots+\beta_{pk}x_{ip}\]

Dalam kasus data transportasi (\(K = 3\), referensi = Bus):

\[\log\left(\frac{P(Y=\text{Kereta})}{P(Y=\text{Bus})}\right) = \beta_{0,K}+\beta_{1,K}\,\text{pendapatan}+\cdots\]

\[\log\left(\frac{P(Y=\text{Mobil})}{P(Y=\text{Bus})}\right) = \beta_{0,M}+\beta_{1,M}\,\text{pendapatan}+\cdots\]

5.2 Formulasi — Probabilitas dan Likelihood

Probabilitas untuk kategori \(k\):

\[P(Y_i=k \mid \mathbf{x}_i) = \frac{\exp(\eta_{ik})}{1+\displaystyle\sum_{h=1}^{K-1}\exp(\eta_{ih})}\]

Log-likelihood:

\[\ell(\boldsymbol{\beta}) = \sum_{i=1}^{n}\sum_{k=1}^{K} \mathbf{1}(y_i=k)\,\log\,P(Y_i=k \mid \mathbf{x}_i)\]

Relative Risk Ratio (RRR) = exp(β): seberapa besar peluang relatif memilih kategori target dibanding referensi untuk setiap kenaikan satu unit prediktor.

5.3 Distribusi Kategori Respons

data_multi %>%
  count(moda) %>%
  mutate(persen = round(n / sum(n) * 100, 1)) %>%
  knitr::kable(col.names = c("Moda", "n", "%"),
               caption   = "Distribusi Frekuensi Moda Transportasi")
Distribusi Frekuensi Moda Transportasi
Moda n %
Bus 7 35
Kereta 6 30
Mobil 7 35

5.4 Estimasi Model Multinomial dengan R

model_multi <- multinom(
  moda ~ pendapatan + jarak + usia + gender,
  data  = data_multi,
  trace = FALSE
)

summary(model_multi)
## Call:
## multinom(formula = moda ~ pendapatan + jarak + usia + gender, 
##     data = data_multi, trace = FALSE)
## 
## Coefficients:
##        (Intercept) pendapatan    jarak      usia    genderP
## Kereta   -7.352533 -34.283072 10.03278 -1.164749   9.410764
## Mobil     0.165724   4.178115 10.82538 -8.177491 -15.982850
## 
## Std. Errors:
##        (Intercept) pendapatan    jarak     usia  genderP
## Kereta    32.00714   357.9181 680.9530 309.4647 280.3828
## Mobil     48.66424   407.0119 291.4715 146.4277 233.0251
## 
## Residual Deviance: 0.0001247089 
## AIC: 20.00012

5.5 Koefisien, z-statistik, dan p-value

z_stat <- summary(model_multi)$coefficients /
          summary(model_multi)$standard.errors
p_val  <- 2 * (1 - pnorm(abs(z_stat)))

cat("=== z-statistik ===\n");  print(round(z_stat, 3))
## === z-statistik ===
##        (Intercept) pendapatan jarak   usia genderP
## Kereta      -0.230     -0.096 0.015 -0.004   0.034
## Mobil        0.003      0.010 0.037 -0.056  -0.069
cat("\n=== p-value ===\n");    print(round(p_val, 4))
## 
## === p-value ===
##        (Intercept) pendapatan  jarak   usia genderP
## Kereta      0.8183     0.9237 0.9882 0.9970  0.9732
## Mobil       0.9973     0.9918 0.9704 0.9555  0.9453

5.6 Relative Risk Ratio (RRR)

exp(coef(model_multi)) %>%
  round(3) %>%
  knitr::kable(caption = "Relative Risk Ratio (RRR = exp(β)) Model Multinomial")
Relative Risk Ratio (RRR = exp(β)) Model Multinomial
(Intercept) pendapatan jarak usia genderP
Kereta 0.001 0.000 22760.54 0.312 12219.2
Mobil 1.180 65.243 50280.68 0.000 0.0

Cara membaca RRR (referensi = Bus):

  • RRR > 1: prediktor meningkatkan odds relatif memilih kategori target vs Bus.
  • RRR < 1: prediktor menurunkan odds relatif tersebut.

5.7 Prediksi Probabilitas

fitted(model_multi) %>%
  round(3) %>%
  knitr::kable(caption = "Prediksi Probabilitas per Observasi (Model Multinomial)")
Prediksi Probabilitas per Observasi (Model Multinomial)
Bus Kereta Mobil
1 0 0
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
1 0 0

5.8 Asumsi Model Multinomial

Asumsi utama:

  1. IIA (Independence of Irrelevant Alternatives): rasio peluang dua pilihan tidak bergantung pada kehadiran pilihan lain.
  2. Independensi observasi.
  3. Tidak ada multikolinearitas sempurna.
  4. Ukuran sampel cukup (minimal 10–20 obs per kategori per prediktor).

6 Bagian III — Regresi Logistik Ordinal

6.1 Definisi

Proportional odds model untuk respons kategorik ordinal:

\[\text{logit}[P(Y_i \leq j \mid \mathbf{x}_i)] = \alpha_j - \mathbf{x}_i^\top\boldsymbol{\beta}, \quad j = 1, \ldots, J-1\]

Konvensi tanda MASS::polr: koefisien positif berarti prediktor meningkatkan \(P(Y \leq j)\) → respons cenderung ke kategori lebih rendah.

6.2 Fungsi Likelihood

\[\ell(\boldsymbol{\alpha},\boldsymbol{\beta}) = \sum_{i=1}^{n}\sum_{j=1}^{J}\mathbf{1}(y_i=j)\log P(Y_i=j \mid \mathbf{x}_i)\]

6.3 Distribusi Respons Ordinal

data_ord %>%
  count(kepuasan) %>%
  mutate(persen = round(n / sum(n) * 100, 1)) %>%
  knitr::kable(col.names = c("Kepuasan", "n", "%"),
               caption   = "Distribusi Tingkat Kepuasan Pasien")
Distribusi Tingkat Kepuasan Pasien
Kepuasan n %
Tidak Puas 7 1.5
Sebagian Puas 241 53.3
Puas 204 45.1

6.4 Estimasi Model Ordinal dengan R

model_ord <- MASS::polr(
  kepuasan ~ checkup + waiting + admin + hygiene +
             communication + diagnosis + equipment,
  data   = data_ord,
  method = "logistic",
  Hess   = TRUE
)

summary(model_ord)
## Call:
## MASS::polr(formula = kepuasan ~ checkup + waiting + admin + hygiene + 
##     communication + diagnosis + equipment, data = data_ord, Hess = TRUE, 
##     method = "logistic")
## 
## Coefficients:
##                  Value Std. Error t value
## checkup        0.03528    0.08588  0.4108
## waiting       -0.06990    0.09735 -0.7181
## admin         -0.04947    0.09695 -0.5102
## hygiene       -0.09708    0.10338 -0.9391
## communication -0.04176    0.09945 -0.4199
## diagnosis      0.33403    0.09106  3.6684
## equipment     -0.20987    0.09564 -2.1944
## 
## Intercepts:
##                          Value   Std. Error t value
## Tidak Puas|Sebagian Puas -4.4305  0.4603    -9.6245
## Sebagian Puas|Puas        0.0186  0.2690     0.0692
## 
## Residual Deviance: 662.8506 
## AIC: 680.8506

6.5 Koefisien, p-value, dan Odds Ratio

ctable <- coef(summary(model_ord))
p      <- 2 * pnorm(abs(ctable[, "t value"]), lower.tail = FALSE)
cbind(round(ctable, 4), `p-value` = round(p, 4)) %>%
  knitr::kable(caption = "Koefisien dan p-value Model Ordinal")
Koefisien dan p-value Model Ordinal
Value Std. Error t value p-value
checkup 0.0353 0.0859 0.4108 0.6812
waiting -0.0699 0.0974 -0.7181 0.4727
admin -0.0495 0.0970 -0.5102 0.6099
hygiene -0.0971 0.1034 -0.9391 0.3477
communication -0.0418 0.0994 -0.4199 0.6745
diagnosis 0.3340 0.0911 3.6684 0.0002
equipment -0.2099 0.0956 -2.1944 0.0282
Tidak Puas|Sebagian Puas -4.4305 0.4603 -9.6245 0.0000
Sebagian Puas|Puas 0.0186 0.2690 0.0692 0.9448
exp(coef(model_ord)) %>%
  round(3) %>%
  knitr::kable(col.names = "Odds Ratio",
               caption   = "Odds Ratio Model Regresi Logistik Ordinal")
Odds Ratio Model Regresi Logistik Ordinal
Odds Ratio
checkup 1.036
waiting 0.932
admin 0.952
hygiene 0.907
communication 0.959
diagnosis 1.397
equipment 0.811

6.6 Interpretasi Hasil

Berdasarkan hasil estimasi, variabel Exact Diagnosis dan Modern Equipment berpengaruh signifikan terhadap tingkat kepuasan pasien pada taraf signifikansi 5%.

  • Exact Diagnosis (OR = 1.397, p = 0.0002): ketepatan diagnosis secara signifikan meningkatkan kepuasan pasien.
  • Modern Equipment (OR = 0.811, p = 0.0282): ketersediaan peralatan modern berpengaruh signifikan terhadap kepuasan pasien.
  • Variabel lainnya (check up, waiting, admin, hygiene, communication) tidak berpengaruh signifikan (p > 0.05).

6.7 Prediksi Probabilitas per Observasi

predict(model_ord, type = "probs") %>%
  round(3) %>%
  head(10) %>%
  knitr::kable(caption = "Prediksi Probabilitas 10 Observasi Pertama (Model Ordinal)")
Prediksi Probabilitas 10 Observasi Pertama (Model Ordinal)
Tidak Puas Sebagian Puas Puas
0.013 0.516 0.471
0.017 0.578 0.405
0.006 0.345 0.649
0.006 0.324 0.670
0.005 0.294 0.701
0.014 0.537 0.449
0.017 0.581 0.402
0.005 0.313 0.681
0.032 0.709 0.259
0.016 0.562 0.422

6.8 Uji Proportional Odds

model_ord_clm <- ordinal::clm(
  kepuasan ~ checkup + waiting + admin + hygiene +
             communication + diagnosis + equipment,
  data = data_ord
)

nominal_test(model_ord_clm)
## Tests of nominal effects
## 
## formula: kepuasan ~ checkup + waiting + admin + hygiene + communication + diagnosis + equipment
##               Df  logLik    AIC    LRT Pr(>Chi)  
## <none>           -331.43 680.85                  
## checkup        1 -331.07 682.15 0.7029  0.40182  
## waiting        1 -330.56 681.12 1.7351  0.18776  
## admin          1 -331.41 682.83 0.0220  0.88208  
## hygiene        1 -329.72 679.45 3.4040  0.06504 .
## communication  1 -330.90 681.79 1.0566  0.30400  
## diagnosis      1 -330.75 681.50 1.3490  0.24546  
## equipment      1 -331.35 682.69 0.1578  0.69120  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretasi: p-value > 0,05 → asumsi proportional odds tidak ditolak, model valid. Jika dilanggar → pertimbangkan partial proportional odds atau multinomial.


7 Bagian IV — Regresi Poisson untuk Data Hitung

7.1 Definisi

Regresi Poisson untuk data hitung \(Y_i \in \{0,1,2,\ldots\}\). Fungsi penghubung log:

\[\log(\mu_i) = \beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}\]

Asumsi: \(E(Y_i) = \text{Var}(Y_i) = \mu_i\) (equidispersion).

7.2 Distribusi Respons Hitungan

summary(data_pois$kunjungan)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    3.00    3.45    5.00    7.00
cat(sprintf("\nMean    : %.2f\nVarians : %.2f\n",
            mean(data_pois$kunjungan),
            var(data_pois$kunjungan)))
## 
## Mean    : 3.45
## Varians : 3.21

7.3 Estimasi Model Poisson dengan R

model_pois <- glm(
  kunjungan ~ usia + pendapatan + skor_kes + asuransi,
  data   = data_pois,
  family = poisson(link = "log")
)

summary(model_pois)
## 
## Call:
## glm(formula = kunjungan ~ usia + pendapatan + skor_kes + asuransi, 
##     family = poisson(link = "log"), data = data_pois)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   4.3713    13.3880   0.327    0.744
## usia          0.2747     0.4000   0.687    0.492
## pendapatan   -1.7980     2.5077  -0.717    0.473
## skor_kes     -0.6780     1.1882  -0.571    0.568
## asuransi     -0.1500     0.5176  -0.290    0.772
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 18.36058  on 19  degrees of freedom
## Residual deviance:  0.99382  on 15  degrees of freedom
## AIC: 70.777
## 
## Number of Fisher Scoring iterations: 4

7.4 Incidence Rate Ratio (IRR)

cbind(
  IRR        = round(exp(coef(model_pois)), 3),
  `CI 2.5%`  = round(exp(confint.default(model_pois)[, 1]), 3),
  `CI 97.5%` = round(exp(confint.default(model_pois)[, 2]), 3)
) %>%
  knitr::kable(caption = "Incidence Rate Ratio (IRR) dan Confidence Interval 95%")
Incidence Rate Ratio (IRR) dan Confidence Interval 95%
IRR CI 2.5% CI 97.5%
(Intercept) 79.143 0.000 1.969128e+13
usia 1.316 0.601 2.883000e+00
pendapatan 0.166 0.001 2.257900e+01
skor_kes 0.508 0.049 5.211000e+00
asuransi 0.861 0.312 2.374000e+00
  • IRR > 1: kenaikan 1 unit prediktor meningkatkan rata-rata kunjungan sebesar \((IRR-1)\times100\%\).
  • IRR < 1: kenaikan 1 unit prediktor menurunkan rata-rata kunjungan sebesar \((1-IRR)\times100\%\).

7.5 Visualisasi Aktual vs Terprediksi

data_pois %>%
  mutate(fitted = fitted(model_pois)) %>%
  ggplot(aes(x = fitted, y = kunjungan)) +
  geom_point(color = "#2f7f73", size = 3, alpha = 0.8) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed", color = "#d18b2f", linewidth = 1) +
  labs(
    title    = "Nilai Aktual vs Terprediksi",
    subtitle = "Model Regresi Poisson — Kunjungan ke Dokter",
    x = "Rata-rata Terprediksi (μ̂)", y = "Kunjungan Aktual (Y)"
  ) +
  theme_minimal(base_size = 13)

7.6 Pemeriksaan Asumsi — Equidispersion

cat(sprintf("Residual deviance : %.3f\n", deviance(model_pois)))
## Residual deviance : 0.994
cat(sprintf("Degrees of freedom: %d\n",   df.residual(model_pois)))
## Degrees of freedom: 15
cat(sprintf("Rasio dev/df      : %.3f\n", deviance(model_pois) / df.residual(model_pois)))
## Rasio dev/df      : 0.066
dispersiontest(model_pois)
## 
##  Overdispersion test
## 
## data:  model_pois
## z = -23.004, p-value = 1
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion 
## 0.05855324

Rasio dev/df ≈ 1 dan p-value dispersion test > 0,05 → tidak ada overdispersion → model Poisson sesuai. Jika overdispersion → gunakan Negative Binomial.

# Alternatif jika overdispersion terdeteksi
model_nb <- MASS::glm.nb(
  kunjungan ~ usia + pendapatan + skor_kes + asuransi,
  data = data_pois
)

data.frame(
  Model    = c("Poisson", "Negative Binomial"),
  AIC      = round(c(AIC(model_pois), AIC(model_nb)), 3),
  BIC      = round(c(BIC(model_pois), BIC(model_nb)), 3),
  Deviance = round(c(deviance(model_pois), deviance(model_nb)), 3)
) %>%
  knitr::kable(caption = "Perbandingan: Poisson vs Negative Binomial")
Perbandingan: Poisson vs Negative Binomial
Model AIC BIC Deviance
Poisson 70.777 75.756 0.994
Negative Binomial 72.778 78.752 0.994

8 Perbandingan Keempat Model

Aspek Biner Multinomial Ordinal Poisson
Tipe respons Biner (0/1) Nominal Ordinal Count
Distribusi Binomial Multinomial Logistik kumulatif Poisson
Fungsi link Logit Logit (baseline) Cumulative logit Log
Ukuran efek OR RRR OR IRR
Asumsi kritis Linearitas logit IIA Proportional odds Equidispersion
Alternatif Probit Nested logit Partial PO Negative Binomial
Paket R glm multinom polr glm

9 Kesalahan Umum dalam Interpretasi

Checklist verifikasi:

  1. Pada biner: \(\exp(\hat{\beta})\) adalah Odds Ratio, bukan probabilitas langsung.
  2. Pada multinomial: koefisien adalah log-odds relatif terhadap kategori referensi.
  3. Pada ordinal (polr): tanda koefisien terbalik — koefisien positif → respons ke kategori lebih rendah.
  4. Pada Poisson: IRR adalah perubahan rate, bukan probabilitas.
  5. Tidak memeriksa asumsi sebelum menyimpulkan hasil.
  6. Mengabaikan overdispersion → SE Poisson terlalu kecil → p-value terlalu signifikan.
  7. Menggunakan threshold 0.50 tanpa memeriksa apakah optimal untuk data tidak seimbang.

10 Template Interpretasi Praktis

10.1 Untuk Model Biner

Koefisien variabel [usia] sebesar [β] dengan OR = exp(β) = [nilai]. Setiap kenaikan satu tahun usia meningkatkan odds seseorang masuk kelompok kunjungan tinggi sebesar [(OR−1)×100%], dengan variabel lain konstan.

10.2 Untuk Model Multinomial

Dengan referensi Bus, RRR pendapatan untuk kategori Mobil = [nilai]. Setiap kenaikan Rp 1 juta pendapatan meningkatkan odds relatif memilih Mobil dibanding Bus sebesar [(RRR−1)×100%], variabel lain konstan.

10.3 Untuk Model Ordinal

OR variabel [waktu tunggu] = [nilai] > 1. Kenaikan satu menit waktu tunggu meningkatkan cumulative odds \(P(Y \leq j)/P(Y > j)\), sehingga respons cenderung ke kepuasan lebih rendah.

10.4 Untuk Model Poisson

IRR variabel [skor kesehatan] = [nilai] < 1. Setiap kenaikan satu poin skor kesehatan menurunkan rata-rata kunjungan sebesar [(1−IRR)×100%], variabel lain konstan.


11 Penutup

Ringkasan panduan:

  • Gunakan logistik biner ketika respons dua kategori; evaluasi dengan kurva ROC dan pilih threshold optimal.
  • Gunakan multinomial ketika respons nominal (3+ kategori tanpa urutan).
  • Gunakan ordinal ketika respons ordinal DAN proportional odds terpenuhi.
  • Gunakan Poisson ketika respons hitungan dan equidispersion terpenuhi.
  • Selalu periksa asumsi sebelum menyimpulkan hasil.

12 Referensi

  • Agresti, A. (2010). Analysis of Ordinal Categorical Data. Wiley.
  • Agresti, A. (2013). Categorical Data Analysis. Wiley.
  • Cameron, A. C., & Trivedi, P. K. (2013). Regression Analysis of Count Data. Cambridge University Press.
  • Deb, P., & Trivedi, P. K. (1997). Demand for medical care by the elderly. Journal of Applied Econometrics, 12(3), 313–336.
  • Dimitrievska, V. (2020). Patient Satisfaction Dataset. Kaggle. https://www.kaggle.com/datasets/vdimitrievska/patient-satisfaction-dataset
  • Greene, W. H. (2012). Econometric Analysis, 7th ed. Prentice Hall.
  • Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. Wiley.
  • Long, J. S., & Freese, J. (2014). Regression Models for Categorical Dependent Variables Using Stata. Stata Press.
  • McCullagh, P. (1980). Regression models for ordinal data. JRSS-B, 42(2), 109–142.
  • Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data. MIT Press.