Laporan Analisis Statistika
Analisis Regresi Logistik
Multinomial Β Β·Β  Ordinal Β Β·Β  Poisson
<span class="cover-tag">🌸 Multinomial β€” Iris Species</span>
<span class="cover-tag">πŸ“ˆ Ordinal β€” Student Grade</span>
<span class="cover-tag">🚦 Poisson β€” Traffic Injuries</span>
<div class="cover-meta-item">
  <span class="cover-meta-label">Disusun oleh</span>
  <span class="cover-meta-value">Nama Kamu</span>
</div>
<div class="cover-meta-item">
  <span class="cover-meta-label">NIM</span>
  <span class="cover-meta-value">140XXXXXXX</span>
</div>
<div class="cover-meta-item">
  <span class="cover-meta-label">Mata Kuliah</span>
  <span class="cover-meta-value">Regresi β€” FMIPA UNPAD</span>
</div>
<div class="cover-meta-item">
  <span class="cover-meta-label">Tanggal</span>
  <span class="cover-meta-value"> 02 June 2026 </span>
</div>

🌸
<h2>Bagian I β€” Regresi Logistik Multinomial</h2>
<p>Dataset: Iris &nbsp;|&nbsp; Respons: Species (3 kelas nominal)</p>

πŸ“– Konsep Dasar

Regresi logistik multinomial digunakan saat variabel respons bersifat nominal dengan tiga atau lebih kategori tanpa urutan alami. Model menggunakan satu kategori sebagai baseline (referensi) dan memodelkan log-odds masing-masing kategori lainnya terhadap baseline:

log(P(Y=k) / P(Y=ref)) = Ξ²β‚€β‚– + β₁ₖX₁ + … + Ξ²β‚šβ‚–Xβ‚š

Koefisien diinterpretasikan sebagai perubahan log-odds memilih kategori k dibanding referensi, atau dalam bentuk Relative Risk Ratio (RRR) = exp(Ξ²β±Όβ‚–). Model difit dengan paket nnet::multinom().

0.1 Persiapan Data

0.1.1 Muat & Struktur Data

# ── SESUAIKAN: ganti nama file dan nama kolom jika berbeda ────────────────────
data_multi <- read.csv("data_multinomial.csv", sep = ";")

var_y_multi <- "Species"
var_x_multi <- setdiff(colnames(data_multi), c("Id", "Species"))

data_multi <- data_multi %>%
  mutate(
    Species = factor(Species,
      levels = c("Iris-setosa", "Iris-versicolor", "Iris-virginica"))
  )

dplyr::glimpse(data_multi)
## Rows: 150
## Columns: 6
## $ Id            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ SepalLengthCm <int> 51, 49, 47, 46, 50, 54, 46, 50, 44, 49, 54, 48, 48, 43, …
## $ SepalWidthCm  <int> 35, 30, 32, 31, 36, 39, 34, 34, 29, 31, 37, 34, 30, 30, …
## $ PetalLengthCm <int> 14, 14, 13, 15, 14, 17, 14, 15, 14, 15, 15, 16, 14, 11, …
## $ PetalWidthCm  <int> 2, 2, 2, 2, 2, 4, 3, 2, 2, 1, 2, 2, 1, 1, 2, 4, 4, 3, 3,…
## $ Species       <fct> Iris-setosa, Iris-setosa, Iris-setosa, Iris-setosa, Iris…

0.1.2 Distribusi Respons

data_multi %>%
  count(Species) %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  knitr::kable(caption = "Distribusi Spesies Iris") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)
Distribusi Spesies Iris
Species n Proporsi
Iris-setosa 50 33.3%
Iris-versicolor 50 33.3%
Iris-virginica 50 33.3%
data_multi %>%
  count(Species) %>%
  mutate(proporsi = n / sum(n)) %>%
  ggplot(aes(x = Species, y = proporsi, fill = Species)) +
  geom_col(width = 0.65, alpha = 0.94) +
  geom_text(aes(label = scales::percent(proporsi, accuracy = 0.1)),
            vjust = -0.4, fontface = "bold", color = "#243142") +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.65)) +
  scale_fill_manual(values = c("#2f7f73","#5568b8","#d18b2f")) +
  labs(title = "Distribusi Spesies Iris",
       subtitle = "Respons multinomial β€” 3 kategori tanpa urutan",
       x = NULL, y = "Proporsi") +
  theme_minimal() + theme(legend.position = "none")

0.2 Estimasi Model

0.2.1 Fitting & Summary

formula_multi <- as.formula(paste(var_y_multi, "~", paste(var_x_multi, collapse = " + ")))
cat("Formula:", deparse(formula_multi), "\n")
## Formula: Species ~ SepalLengthCm + SepalWidthCm + PetalLengthCm + PetalWidthCm
fit_multi <- nnet::multinom(formula_multi, data = data_multi, trace = FALSE)
summary(fit_multi)
## Call:
## nnet::multinom(formula = formula_multi, data = data_multi, trace = FALSE)
## 
## Coefficients:
##                 (Intercept) SepalLengthCm SepalWidthCm PetalLengthCm
## Iris-versicolor    22.24756    -0.8914206    -2.814642      4.180154
## Iris-virginica    -20.39006    -1.1379418    -3.482729      5.123089
##                 PetalWidthCm
## Iris-versicolor     1.351080
## Iris-virginica      3.179689
## 
## Std. Errors:
##                 (Intercept) SepalLengthCm SepalWidthCm PetalLengthCm
## Iris-versicolor    12.85375     0.1197144    0.2239780     0.2368581
## Iris-virginica     12.85375     0.1197150    0.2239769     0.2368591
##                 PetalWidthCm
## Iris-versicolor    0.4871285
## Iris-virginica     0.4871288
## 
## Residual Deviance: 11.89855 
## AIC: 31.89855

0.2.2 Koefisien, RRR & CI

multi_sum <- summary(fit_multi)

coef_long <- as.data.frame(multi_sum$coefficients) %>%
  tibble::rownames_to_column("kategori") %>%
  tidyr::pivot_longer(-kategori, names_to = "variabel", values_to = "estimate")

se_long <- as.data.frame(multi_sum$standard.errors) %>%
  tibble::rownames_to_column("kategori") %>%
  tidyr::pivot_longer(-kategori, names_to = "variabel", values_to = "std_error")

result_multi <- coef_long %>%
  left_join(se_long, by = c("kategori","variabel")) %>%
  mutate(
    z_value = estimate / std_error,
    p_value = 2 * (1 - pnorm(abs(z_value))),
    RRR     = exp(estimate),
    CI_low  = exp(estimate - 1.96 * std_error),
    CI_high = exp(estimate + 1.96 * std_error),
    Signifikansi = case_when(
      p_value < 0.001 ~ "***", p_value < 0.01 ~ "**",
      p_value < 0.05  ~ "*",   p_value < 0.1  ~ ".",
      TRUE ~ ""
    )
  ) %>%
  mutate(across(c(estimate,std_error,z_value,p_value,RRR,CI_low,CI_high), ~round(.x,4)))

result_multi %>%
  knitr::kable(
    caption = "Ringkasan Model Multinomial (Base: Iris-setosa)",
    col.names = c("Kategori","Variabel","Estimate","SE","z-value","p-value","RRR","CI 2.5%","CI 97.5%","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)
Ringkasan Model Multinomial (Base: Iris-setosa)
Kategori Variabel Estimate SE z-value p-value RRR CI 2.5% CI 97.5% Sig.
Iris-versicolor (Intercept) 22.2476 12.8537 1.7308 0.0835 4.591905e+09 0.0526 4.011665e+20 .
Iris-versicolor SepalLengthCm -0.8914 0.1197 -7.4462 0.0000 4.101000e-01 0.3243 5.185000e-01 ***
Iris-versicolor SepalWidthCm -2.8146 0.2240 -12.5666 0.0000 5.990000e-02 0.0386 9.300000e-02 ***
Iris-versicolor PetalLengthCm 4.1802 0.2369 17.6484 0.0000 6.537590e+01 41.0961 1.040005e+02 ***
Iris-versicolor PetalWidthCm 1.3511 0.4871 2.7736 0.0055 3.861600e+00 1.4863 1.003270e+01 **
Iris-virginica (Intercept) -20.3901 12.8537 -1.5863 0.1127 0.000000e+00 0.0000 1.219101e+02
Iris-virginica SepalLengthCm -1.1379 0.1197 -9.5054 0.0000 3.205000e-01 0.2535 4.052000e-01 ***
Iris-virginica SepalWidthCm -3.4827 0.2240 -15.5495 0.0000 3.070000e-02 0.0198 4.770000e-02 ***
Iris-virginica PetalLengthCm 5.1231 0.2369 21.6293 0.0000 1.678530e+02 105.5141 2.670224e+02 ***
Iris-virginica PetalWidthCm 3.1797 0.4871 6.5274 0.0000 2.403930e+01 9.2527 6.245590e+01 ***

0.3 Prediksi & Evaluasi

0.3.1 Confusion Matrix & Akurasi

data_multi_pred <- data_multi %>%
  bind_cols(as.data.frame(predict(fit_multi, type = "probs"))) %>%
  mutate(prediksi = predict(fit_multi, type = "class"))

conf_multi <- table(Aktual = data_multi_pred$Species, Prediksi = data_multi_pred$prediksi)
knitr::kable(conf_multi, caption = "Confusion Matrix β€” Model Multinomial") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Confusion Matrix β€” Model Multinomial
Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 50 0 0
Iris-versicolor 0 49 1
Iris-virginica 0 1 49
accuracy_multi <- sum(diag(conf_multi)) / sum(conf_multi)
cat("Akurasi Model:", scales::percent(accuracy_multi, accuracy = 0.01), "\n")
## Akurasi Model: 98.67%

0.3.2 Plot Probabilitas Prediksi

grid_multi <- expand.grid(
  SepalLengthCm  = seq(min(data_multi$SepalLengthCm), max(data_multi$SepalLengthCm), length.out = 100),
  SepalWidthCm   = mean(data_multi$SepalWidthCm),
  PetalLengthCm  = mean(data_multi$PetalLengthCm),
  PetalWidthCm   = mean(data_multi$PetalWidthCm)
)

grid_multi %>%
  bind_cols(as.data.frame(predict(fit_multi, newdata = grid_multi, type = "probs"))) %>%
  tidyr::pivot_longer(cols = c("Iris-setosa","Iris-versicolor","Iris-virginica"),
                      names_to = "Species", values_to = "probabilitas") %>%
  ggplot(aes(x = SepalLengthCm, y = probabilitas, color = Species)) +
  geom_line(linewidth = 1.35) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("#2f7f73","#5568b8","#d18b2f")) +
  labs(title = "Prediksi Probabilitas Spesies Iris",
       subtitle = "Variabel lain ditahan pada nilai rata-rata",
       x = "Sepal Length (Cm)", y = "Probabilitas Prediksi", color = "Spesies") +
  theme_minimal()

0.4 Interpretasi Hasil

πŸ’‘ Interpretasi β€” Logistik Multinomial

Sesuaikan teks berikut dengan hasil RRR dari output kamu:

β€’ Kategori referensi: Iris-setosa. Semua interpretasi dibandingkan terhadap kategori ini.
β€’ RRR (Relative Risk Ratio): Jika RRR untuk variabel X pada kategori Iris-versicolor = 2.5, artinya kenaikan 1 satuan X meningkatkan risiko relatif menjadi versicolor (vs setosa) sebesar 2.5 kali.
β€’ Signifikansi (p-value < 0.05): Hanya variabel dengan tanda *** atau ** yang secara statistik signifikan membedakan antar kelas.
β€’ Grafik probabilitas: Menunjukkan bagaimana peluang masing-masing spesies berubah seiring perubahan Sepal Length, saat variabel lain dipegang tetap.
β€’ Akurasi: (lihat output confusion matrix) β€” akurasi tinggi menunjukkan model dapat membedakan ketiga spesies dengan baik.


πŸ“ˆ
<h2>Bagian II β€” Regresi Logistik Ordinal</h2>
<p>Dataset: Student Performance &nbsp;|&nbsp; Respons: Grade (Rendah &lt; Sedang &lt; Tinggi)</p>

πŸ“– Konsep Dasar

Regresi logistik ordinal (Proportional Odds Model) digunakan saat variabel respons bersifat ordinal β€” ada urutan alami antar kategori. Model memodelkan log-odds kumulatif:

logit(P(Y ≀ j)) = Ξ±β±Ό βˆ’ (β₁X₁ + … + Ξ²β‚šXβ‚š)

Asumsi utama: Proportional Odds β€” efek setiap prediktor dianggap sama untuk semua batas kumulatif (parallel lines). Koefisien diinterpretasikan sebagai log-odds kumulatif, dengan exp(Ξ²) sebagai odds ratio (polaritas tergantung konvensi software). Difit dengan MASS::polr().

0.5 Persiapan Data

0.5.1 Muat & Struktur Data

# ── SESUAIKAN: ganti nama file dan nama kolom jika berbeda ────────────────────
data_ord <- read.csv("data_ordinal.csv", sep = ";")

data_ord <- data_ord %>%
  mutate(
    Grade  = factor(Grade, levels = c("Rendah","Sedang","Tinggi"), ordered = TRUE),
    Gender = as.factor(Gender)
  )

dplyr::glimpse(data_ord)
## Rows: 6,607
## Columns: 21
## $ Hours_Studied              <int> 23, 19, 24, 29, 19, 19, 29, 25, 17, 23, 17,…
## $ Attendance                 <int> 84, 64, 98, 89, 92, 88, 84, 78, 94, 98, 80,…
## $ Parental_Involvement       <chr> "Low", "Low", "Medium", "Low", "Medium", "M…
## $ Access_to_Resources        <chr> "High", "Medium", "Medium", "Medium", "Medi…
## $ Extracurricular_Activities <chr> "No", "No", "Yes", "Yes", "Yes", "Yes", "Ye…
## $ Sleep_Hours                <int> 7, 8, 7, 8, 6, 8, 7, 6, 6, 8, 8, 6, 8, 8, 8…
## $ Previous_Scores            <int> 73, 59, 91, 98, 65, 89, 68, 50, 80, 71, 88,…
## $ Motivation_Level           <chr> "Low", "Low", "Medium", "Medium", "Medium",…
## $ Internet_Access            <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ Tutoring_Sessions          <int> 0, 2, 2, 1, 3, 3, 1, 1, 0, 0, 4, 2, 2, 2, 1…
## $ Family_Income              <chr> "Low", "Medium", "Medium", "Medium", "Mediu…
## $ Teacher_Quality            <chr> "Medium", "Medium", "Medium", "Medium", "Hi…
## $ School_Type                <chr> "Public", "Public", "Public", "Public", "Pu…
## $ Peer_Influence             <chr> "Positive", "Negative", "Neutral", "Negativ…
## $ Physical_Activity          <int> 3, 4, 4, 4, 4, 3, 2, 2, 1, 5, 4, 2, 4, 3, 4…
## $ Learning_Disabilities      <chr> "No", "No", "No", "No", "No", "No", "No", "…
## $ Parental_Education_Level   <chr> "High School", "College", "Postgraduate", "…
## $ Distance_from_Home         <chr> "Near", "Moderate", "Near", "Moderate", "Ne…
## $ Gender                     <fct> Male, Female, Male, Male, Female, Male, Mal…
## $ Exam_Score                 <int> 67, 61, 74, 71, 70, 71, 67, 66, 69, 72, 68,…
## $ Grade                      <ord> Sedang, Rendah, Sedang, Sedang, Sedang, Sed…

0.5.2 Distribusi Respons

data_ord %>%
  count(Grade) %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  knitr::kable(caption = "Distribusi Tingkat Grade Mahasiswa") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)
Distribusi Tingkat Grade Mahasiswa
Grade n Proporsi
Rendah 316 4.8%
Sedang 6167 93.3%
Tinggi 124 1.9%
data_ord %>%
  count(Grade) %>%
  mutate(proporsi = n / sum(n)) %>%
  ggplot(aes(x = Grade, y = proporsi, fill = Grade)) +
  geom_col(width = 0.65, alpha = 0.94) +
  geom_text(aes(label = scales::percent(proporsi, accuracy = 0.1)),
            vjust = -0.4, fontface = "bold", color = "#243142") +
  scale_y_continuous(labels = scales::percent_format(), expand = expansion(mult = c(0, 0.18))) +
  scale_fill_manual(values = c("#2f7f73","#d18b2f","#5568b8")) +
  labs(title = "Distribusi Tingkat Grade",
       subtitle = "Respons ordinal: Rendah < Sedang < Tinggi",
       x = "Grade", y = "Proporsi") +
  theme_minimal() + theme(legend.position = "none")

0.6 Estimasi Model

0.6.1 Fitting & Summary

fit_ord <- MASS::polr(
  Grade ~ Hours_Studied + Attendance + Previous_Scores + Sleep_Hours + Gender,
  data   = data_ord,
  method = "logistic",
  Hess   = TRUE
)
summary(fit_ord)
## Call:
## MASS::polr(formula = Grade ~ Hours_Studied + Attendance + Previous_Scores + 
##     Sleep_Hours + Gender, data = data_ord, Hess = TRUE, method = "logistic")
## 
## Coefficients:
##                    Value Std. Error t value
## Hours_Studied    0.26628   0.011780 22.6048
## Attendance       0.16474   0.008164 20.1781
## Previous_Scores  0.03508   0.004047  8.6675
## Sleep_Hours     -0.02866   0.038071 -0.7529
## GenderMale      -0.06097   0.114288 -0.5334
## 
## Intercepts:
##               Value   Std. Error t value
## Rendah|Sedang 15.7674  0.7880    20.0105
## Sedang|Tinggi 27.3238  1.0385    26.3097
## 
## Residual Deviance: 2328.836 
## AIC: 2342.836

0.6.2 Koefisien, OR & CI

ord_coef <- coef(summary(fit_ord))

result_ord <- as.data.frame(ord_coef) %>%
  tibble::rownames_to_column("parameter") %>%
  dplyr::rename(estimate = Value, std_error = `Std. Error`, t_value = `t value`) %>%
  mutate(
    p_value       = 2 * (1 - pnorm(abs(t_value))),
    jenis         = ifelse(grepl("\\|", parameter), "Cutpoint", "Koefisien"),
    OR_polr       = ifelse(jenis == "Koefisien", exp(estimate), NA_real_),
    CI_low_polr   = ifelse(jenis == "Koefisien", exp(estimate - 1.96 * std_error), NA_real_),
    CI_high_polr  = ifelse(jenis == "Koefisien", exp(estimate + 1.96 * std_error), NA_real_),
    Signifikansi  = case_when(
      p_value < 0.001 ~ "***", p_value < 0.01 ~ "**",
      p_value < 0.05  ~ "*",   p_value < 0.1  ~ ".",
      TRUE ~ ""
    )
  ) %>%
  mutate(across(c(estimate,std_error,t_value,p_value,OR_polr,CI_low_polr,CI_high_polr), ~round(.x,4)))

result_ord %>%
  knitr::kable(
    caption = "Ringkasan Model Ordinal (Proportional Odds)",
    col.names = c("Parameter","Estimate","SE","t-value","p-value","Jenis","OR polr","CI 2.5%","CI 97.5%","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE) %>%
  row_spec(which(result_ord$jenis == "Cutpoint"), background = "#f0f9f8", bold = TRUE)
Ringkasan Model Ordinal (Proportional Odds)
Parameter Estimate SE t-value p-value Jenis OR polr CI 2.5% CI 97.5% Sig.
Hours_Studied 0.2663 0.0118 22.6048 0.0000 Koefisien 1.3051 1.2753 1.3356 ***
Attendance 0.1647 0.0082 20.1781 0.0000 Koefisien 1.1791 1.1604 1.1981 ***
Previous_Scores 0.0351 0.0040 8.6675 0.0000 Koefisien 1.0357 1.0275 1.0439 ***
Sleep_Hours -0.0287 0.0381 -0.7529 0.4515 Koefisien 0.9717 0.9019 1.0470
GenderMale -0.0610 0.1143 -0.5334 0.5937 Koefisien 0.9409 0.7520 1.1771
Rendah&#124;Sedang 15.7674 0.7880 20.0105 0.0000 Cutpoint NA NA NA ***
Sedang&#124;Tinggi 27.3238 1.0385 26.3097 0.0000 Cutpoint NA NA NA ***

0.7 Prediksi & Evaluasi

0.7.1 Confusion Matrix & Akurasi

data_ord_pred <- data_ord %>%
  bind_cols(as.data.frame(predict(fit_ord, type = "probs"))) %>%
  mutate(prediksi = predict(fit_ord, type = "class"))

conf_ord <- table(Aktual = data_ord_pred$Grade, Prediksi = data_ord_pred$prediksi)
knitr::kable(conf_ord, caption = "Confusion Matrix β€” Model Ordinal") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)
Confusion Matrix β€” Model Ordinal
Rendah Sedang Tinggi
Rendah 87 229 0
Sedang 23 6138 6
Tinggi 4 104 16
accuracy_ord <- sum(diag(conf_ord)) / sum(conf_ord)
cat("Akurasi Model:", scales::percent(accuracy_ord, accuracy = 0.01), "\n")
## Akurasi Model: 94.46%

0.7.2 Probabilitas Prediksi

grid_ord <- expand.grid(
  Hours_Studied   = seq(min(data_ord$Hours_Studied, na.rm=TRUE),
                        max(data_ord$Hours_Studied, na.rm=TRUE), length.out = 120),
  Attendance      = mean(data_ord$Attendance, na.rm=TRUE),
  Previous_Scores = mean(data_ord$Previous_Scores, na.rm=TRUE),
  Sleep_Hours     = mean(data_ord$Sleep_Hours, na.rm=TRUE),
  Gender          = "Female"
)

grid_ord %>%
  bind_cols(as.data.frame(predict(fit_ord, newdata = grid_ord, type = "probs"))) %>%
  tidyr::pivot_longer(cols = c("Rendah","Sedang","Tinggi"),
                      names_to = "Grade", values_to = "probabilitas") %>%
  mutate(Grade = factor(Grade, levels = c("Rendah","Sedang","Tinggi"))) %>%
  ggplot(aes(x = Hours_Studied, y = probabilitas, color = Grade)) +
  geom_line(linewidth = 1.25) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("#2f7f73","#d18b2f","#5568b8")) +
  labs(title = "Prediksi Probabilitas Grade berdasarkan Jam Belajar",
       subtitle = "Variabel lain ditahan pada rata-rata; Gender = Female",
       x = "Jam Belajar (Hours Studied)", y = "Probabilitas Prediksi", color = "Grade") +
  theme_minimal()

0.7.3 Uji Asumsi Parallel Lines

fit_clm <- ordinal::clm(
  Grade ~ Hours_Studied + Attendance + Previous_Scores + Sleep_Hours + Gender,
  data = data_ord, link = "logit"
)
summary(fit_clm)
## formula: 
## Grade ~ Hours_Studied + Attendance + Previous_Scores + Sleep_Hours + Gender
## data:    data_ord
## 
##  link  threshold nobs logLik   AIC     niter max.grad cond.H 
##  logit flexible  6607 -1164.42 2342.84 10(0) 3.34e-11 6.3e+06
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## Hours_Studied    0.266278   0.011779  22.606   <2e-16 ***
## Attendance       0.164743   0.008172  20.161   <2e-16 ***
## Previous_Scores  0.035076   0.004049   8.664   <2e-16 ***
## Sleep_Hours     -0.028664   0.038071  -0.753    0.451    
## GenderMale      -0.060965   0.114288  -0.533    0.594    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##               Estimate Std. Error z value
## Rendah|Sedang  15.7674     0.7884    20.0
## Sedang|Tinggi  27.3238     1.0390    26.3
ordinal::nominal_test(fit_clm)

0.8 Interpretasi Hasil

πŸ’‘ Interpretasi β€” Logistik Ordinal

Sesuaikan teks berikut dengan output model kamu:

β€’ Koefisien positif (polr): Variabel X yang memiliki koefisien positif artinya peningkatan X menurunkan peluang berada di kategori lebih rendah β€” ekuivalen dengan meningkatkan peluang Grade lebih tinggi. Perhatikan: MASS::polr menggunakan konvensi tanda negatif untuk arah β€œnaik”.
β€’ Cutpoints (Ξ±): Threshold pemisah antar kategori kumulatif, misalnya batas antara Rendah|Sedang dan Sedang|Tinggi.
β€’ Asumsi Parallel Lines: Jika nominal_test menghasilkan p-value > 0.05, asumsi proportional odds tidak ditolak β€” model valid digunakan.
β€’ Akurasi: (lihat output confusion matrix) β€” model ordinal umumnya lebih akurat dari tebakan acak jika struktur ordinal dimanfaatkan dengan baik.


🚦
<h2>Bagian III β€” Regresi Poisson</h2>
<p>Dataset: Traffic Accidents &nbsp;|&nbsp; Respons: injuries (count)</p>

πŸ“– Konsep Dasar

Regresi Poisson digunakan untuk memodelkan variabel respons berupa hitungan (count) β€” bilangan bulat non-negatif (0, 1, 2, …). Model menggunakan fungsi link log:

log(E[Y|X]) = Ξ²β‚€ + β₁X₁ + … + Ξ²β‚šXβ‚š

Koefisien diinterpretasikan sebagai Incidence Rate Ratio (IRR) = exp(Ξ²): kenaikan 1 satuan Xβ±Ό mengalikan rata-rata hitungan Y sebesar exp(Ξ²β±Ό) kali. Asumsi penting: mean = variance. Jika variance > mean (overdispersion), pertimbangkan model Negative Binomial.

0.9 Persiapan Data

0.9.1 Muat & Struktur Data

# ── SESUAIKAN: ganti nama file jika berbeda ────────────────────────────────────
data_poi <- read.csv("data_poisson1.csv", sep = ";")

var_y_poi <- "injuries"
var_x_poi <- setdiff(colnames(data_poi), c("id","date","time","latitude","longitude","injuries"))

data_poi <- data_poi %>% mutate(across(where(is.character), as.factor))
dplyr::glimpse(data_poi)
## Rows: 99
## Columns: 13
## $ time              <fct> "0,398611111", "0,64375", "0,576388889", "0,00416666…
## $ latitude          <int> 102909, 365251, 264679, 285199, 222705, 174806, 2519…
## $ longitude         <int> 94733, 921448, 753151, 774946, 879632, 969596, 84997…
## $ severity          <fct> Critical, Low, Medium, Medium, Critical, Low, Low, M…
## $ road_condition    <fct> Potholed, Construction, Dry, Potholed, Muddy, Floodi…
## $ weather           <fct> Cloudy, Cloudy, Cloudy, Clear, Dust Storm, Heavy Rai…
## $ vehicles_involved <int> 1, 4, 4, 2, 1, 2, 2, 3, 2, 3, 3, 2, 1, 4, 4, 3, 4, 2…
## $ injuries          <int> 8, 0, 0, 2, 4, 0, 0, 2, 0, 5, 0, 0, 0, 2, 2, 3, 2, 1…
## $ fatalities        <int> 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 1…
## $ accident_cause    <fct> Poor Road, Poor Road, Weather, Poor Road, Weather, W…
## $ traffic_density   <fct> Moderate, Heavy, Light, Heavy, Light, Light, Moderat…
## $ lane_utilization  <fct> Overtaking, Congested Multi-Lane, Single Lane, Conge…
## $ nearby_accidents  <int> 5, 1, 8, 44, 1, 3, 22, 1, 19, 3, 6, 8, 5, 10, 5, 48,…

0.9.2 Distribusi Respons & Zero-Inflation

data_poi %>%
  count(.data[[var_y_poi]], name = "n") %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  knitr::kable(caption = "Distribusi Jumlah Cedera (Injuries)") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)
Distribusi Jumlah Cedera (Injuries)
injuries n Proporsi
0 36 36.4%
1 11 11.1%
2 24 24.2%
3 8 8.1%
4 11 11.1%
5 2 2.0%
6 2 2.0%
7 3 3.0%
8 2 2.0%
data_poi %>%
  count(.data[[var_y_poi]], name = "n") %>%
  ggplot(aes(x = .data[[var_y_poi]], y = n)) +
  geom_col(width = 0.8, fill = "#2f7f73", alpha = 0.92) +
  scale_x_continuous(breaks = scales::pretty_breaks()) +
  labs(title = "Distribusi Jumlah Cedera",
       subtitle = "Respons Poisson: bilangan bulat non-negatif",
       x = "Jumlah Cedera (Injuries)", y = "Frekuensi") +
  theme_minimal()

# Cek zero-inflation
zero_share <- mean(data_poi[[var_y_poi]] == 0)
cat("Proporsi Y = 0:", scales::percent(zero_share, accuracy = 0.1), "\n")
## Proporsi Y = 0: 36.4%

0.10 Estimasi Model

0.10.1 Fitting & Summary

formula_poi <- as.formula(paste(var_y_poi, "~", paste(var_x_poi, collapse = " + ")))
cat("Formula:", deparse(formula_poi), "\n")
## Formula: injuries ~ severity + road_condition + weather + vehicles_involved +      fatalities + accident_cause + traffic_density + lane_utilization +      nearby_accidents
fit_pois <- glm(formula_poi, data = data_poi, family = poisson(link = "log"))
summary(fit_pois)
## 
## Call:
## glm(formula = formula_poi, family = poisson(link = "log"), data = data_poi)
## 
## Coefficients: (2 not defined because of singularities)
##                                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                       1.789e+00  1.084e+00   1.650   0.0989 .
## severityHigh                     -5.697e-01  2.531e-01  -2.251   0.0244 *
## severityLow                      -2.134e+01  2.785e+03  -0.008   0.9939  
## severityMedium                   -8.193e-01  4.183e-01  -1.959   0.0501 .
## road_conditionDry                 2.834e-01  5.420e-01   0.523   0.6011  
## road_conditionFlooding            1.385e-01  5.141e-01   0.269   0.7876  
## road_conditionMuddy               3.317e-01  5.954e-01   0.557   0.5774  
## road_conditionPotholed            7.888e-02  3.976e-01   0.198   0.8428  
## road_conditionWet                 8.822e-02  5.407e-01   0.163   0.8704  
## weatherCloudy                    -2.006e-02  2.725e-01  -0.074   0.9413  
## weatherDust Storm                 1.736e-01  8.193e-01   0.212   0.8322  
## weatherFog                        8.000e-01  7.516e-01   1.064   0.2872  
## weatherHeavy Rain                 4.969e-01  7.122e-01   0.698   0.4854  
## weatherRain                              NA         NA      NA       NA  
## vehicles_involved                -1.804e-02  7.637e-02  -0.236   0.8133  
## fatalities                        9.633e-02  1.239e-01   0.777   0.4370  
## accident_causeHuman Error        -6.921e-01  8.132e-01  -0.851   0.3947  
## accident_causeMechanical Failure -8.933e-01  8.756e-01  -1.020   0.3076  
## accident_causePoor Road          -8.269e-01  8.960e-01  -0.923   0.3561  
## accident_causeSignal Violation   -7.722e-01  8.510e-01  -0.907   0.3642  
## accident_causeWeather            -1.091e+00  9.897e-01  -1.102   0.2705  
## traffic_densityLight             -4.919e-01  5.360e-01  -0.918   0.3588  
## traffic_densityModerate          -3.719e-01  5.026e-01  -0.740   0.4593  
## lane_utilizationLane Change       6.845e-01  4.176e-01   1.639   0.1012  
## lane_utilizationOvertaking        8.180e-01  4.737e-01   1.727   0.0842 .
## lane_utilizationSingle Lane              NA         NA      NA       NA  
## nearby_accidents                  3.621e-03  1.015e-02   0.357   0.7212  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 230.14  on 98  degrees of freedom
## Residual deviance:  30.19  on 74  degrees of freedom
## AIC: 258.35
## 
## Number of Fisher Scoring iterations: 18

0.10.2 Koefisien, IRR & CI

pois_coef <- as.data.frame(coef(summary(fit_pois))) %>%
  tibble::rownames_to_column("parameter") %>%
  dplyr::rename(estimate = Estimate, std_error = `Std. Error`, z_value = `z value`, p_value = `Pr(>|z|)`) %>%
  mutate(
    IRR             = exp(estimate),
    CI_low          = exp(estimate - 1.96 * std_error),
    CI_high         = exp(estimate + 1.96 * std_error),
    perubahan_persen = 100 * (IRR - 1),
    Signifikansi    = case_when(
      p_value < 0.001 ~ "***", p_value < 0.01 ~ "**",
      p_value < 0.05  ~ "*",   p_value < 0.1  ~ ".",
      TRUE ~ ""
    )
  ) %>%
  mutate(across(c(estimate,std_error,z_value,p_value,IRR,CI_low,CI_high,perubahan_persen), ~round(.x,4)))

pois_coef %>%
  knitr::kable(
    caption = "Ringkasan Hasil Regresi Poisson",
    col.names = c("Parameter","Estimate","SE","z-value","p-value","IRR","CI 2.5%","CI 97.5%","Perubahan (%)","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)
Ringkasan Hasil Regresi Poisson
Parameter Estimate SE z-value p-value IRR CI 2.5% CI 97.5% Perubahan (%) Sig.
(Intercept) 1.7888 1.0839 1.6504 0.0989 5.9825 0.7149 50.0605 498.2490 .
severityHigh -0.5697 0.2531 -2.2507 0.0244 0.5657 0.3444 0.9291 -43.4308
severityLow -21.3401 2784.9698 -0.0077 0.9939 0.0000 0.0000 Inf -100.0000
severityMedium -0.8193 0.4183 -1.9587 0.0501 0.4407 0.1942 1.0005 -55.9258 .
road_conditionDry 0.2834 0.5420 0.5229 0.6011 1.3276 0.4589 3.8406 32.7615
road_conditionFlooding 0.1385 0.5141 0.2694 0.7876 1.1486 0.4193 3.1459 14.8559
road_conditionMuddy 0.3317 0.5954 0.5572 0.5774 1.3934 0.4338 4.4757 39.3396
road_conditionPotholed 0.0789 0.3976 0.1984 0.8428 1.0821 0.4964 2.3589 8.2073
road_conditionWet 0.0882 0.5407 0.1632 0.8704 1.0922 0.3785 3.1519 9.2232
weatherCloudy -0.0201 0.2725 -0.0736 0.9413 0.9801 0.5745 1.6721 -1.9862
weatherDust Storm 0.1736 0.8193 0.2119 0.8322 1.1896 0.2388 5.9261 18.9600
weatherFog 0.8000 0.7516 1.0643 0.2872 2.2256 0.5101 9.7108 122.5566
weatherHeavy Rain 0.4969 0.7122 0.6977 0.4854 1.6436 0.4070 6.6375 64.3563
vehicles_involved -0.0180 0.0764 -0.2362 0.8133 0.9821 0.8456 1.1407 -1.7879
fatalities 0.0963 0.1239 0.7773 0.4370 1.1011 0.8637 1.4039 10.1122
accident_causeHuman Error -0.6921 0.8132 -0.8511 0.3947 0.5005 0.1017 2.4640 -49.9463
accident_causeMechanical Failure -0.8933 0.8756 -1.0202 0.3076 0.4093 0.0736 2.2770 -59.0696
accident_causePoor Road -0.8269 0.8960 -0.9229 0.3561 0.4374 0.0755 2.5325 -56.2593
accident_causeSignal Violation -0.7722 0.8510 -0.9073 0.3642 0.4620 0.0871 2.4494 -53.7983
accident_causeWeather -1.0906 0.9897 -1.1020 0.2705 0.3360 0.0483 2.3375 -66.3992
traffic_densityLight -0.4919 0.5360 -0.9176 0.3588 0.6115 0.2139 1.7485 -38.8514
traffic_densityModerate -0.3719 0.5026 -0.7399 0.4593 0.6894 0.2574 1.8464 -31.0591
lane_utilizationLane Change 0.6845 0.4176 1.6393 0.1012 1.9829 0.8747 4.4953 98.2877
lane_utilizationOvertaking 0.8180 0.4737 1.7268 0.0842 2.2659 0.8954 5.7338 126.5885 .
nearby_accidents 0.0036 0.0101 0.3568 0.7212 1.0036 0.9839 1.0238 0.3627

0.11 Evaluasi Model

0.11.1 Overdispersion

dispersion_pois <- sum(residuals(fit_pois, type = "pearson")^2) / df.residual(fit_pois)

tibble::tibble(
  `Dispersion Pearson` = round(dispersion_pois, 4),
  Interpretasi = dplyr::case_when(
    dispersion_pois < 1.5 ~ "βœ… Tidak ada indikasi overdispersion berat",
    dispersion_pois < 2.5 ~ "⚠️ Indikasi overdispersion sedang β€” pertimbangkan Negative Binomial",
    TRUE ~ "🚨 Overdispersion kuat β€” sangat disarankan model Negative Binomial"
  )
) %>%
  knitr::kable(caption = "Uji Overdispersion (Rasio Dispersion Pearson)") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)
Uji Overdispersion (Rasio Dispersion Pearson)
Dispersion Pearson Interpretasi
0.3242 βœ… Tidak ada indikasi overdispersion berat |

0.11.2 Plot Prediksi

get_typical <- function(x) {
  if (is.numeric(x)) return(mean(x, na.rm = TRUE))
  t <- table(x); return(names(t)[which.max(t)])
}

baseline_data <- as.data.frame(lapply(data_poi[var_x_poi], get_typical))
grid_pois <- baseline_data[rep(1, 100), ]
grid_pois$vehicles_involved <- seq(
  min(data_poi$vehicles_involved, na.rm = TRUE),
  max(data_poi$vehicles_involved, na.rm = TRUE),
  length.out = 100
)

pred_pois <- predict(fit_pois, newdata = grid_pois, type = "link", se.fit = TRUE)
grid_pois %>%
  mutate(
    rate      = exp(pred_pois$fit),
    rate_low  = exp(pred_pois$fit - 1.96 * pred_pois$se.fit),
    rate_high = exp(pred_pois$fit + 1.96 * pred_pois$se.fit)
  ) %>%
  ggplot(aes(x = vehicles_involved, y = rate)) +
  geom_ribbon(aes(ymin = rate_low, ymax = rate_high), fill = "#2f7f73", alpha = 0.16) +
  geom_line(color = "#2f7f73", linewidth = 1.35) +
  labs(title = "Prediksi Jumlah Cedera vs Jumlah Kendaraan",
       subtitle = "Variabel lain ditahan pada nilai rata-rata/modus",
       x = "Jumlah Kendaraan Terlibat", y = "Prediksi Cedera (Injuries)") +
  theme_minimal()

0.11.3 Perbandingan Poisson vs OLS pada log(Y)

# OLS hanya pada observasi Y > 0
data_poi_pos <- data_poi %>% filter(.data[[var_y_poi]] > 0)
formula_log_ols <- as.formula(paste("log(", var_y_poi, ") ~", paste(var_x_poi, collapse = " + ")))
fit_log_ols <- lm(formula_log_ols, data = data_poi_pos)

tibble::tibble(
  Aspek = c("Jenis respons","Nilai nol","Target model","Fungsi link","Interpretasi koef.","Prediksi","Kapan lebih tepat?"),
  `Regresi Poisson` = c(
    "Hitungan non-negatif","Dapat langsung menangani Y = 0",
    "E(Y|X)","log{E(Y|X)} = XΞ²","IRR = exp(Ξ²)",
    "Selalu non-negatif","Count data, banyak nol, rate kejadian"
  ),
  `OLS pada log(Y)` = c(
    "Kontinu positif atau hitungan besar tanpa nol","Tidak dapat memakai Y = 0 tanpa modifikasi",
    "E(log Y|X)","log(Y) = XΞ² + Ξ΅","Multiplier pada geometric mean",
    "Perlu retransformation","Y positif, log-scale residual baik"
  )
) %>%
  knitr::kable(caption = "Perbandingan Regresi Poisson vs OLS pada log(Y)") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)
Perbandingan Regresi Poisson vs OLS pada log(Y)
Aspek Regresi Poisson OLS pada log(Y)
Jenis respons Hitungan non-negatif Kontinu positif atau hitungan besar tanpa nol
Nilai nol Dapat langsung menangani Y = 0 Tidak dapat memakai Y = 0 tanpa modifikasi
Target model E(Y&#124;X) E(log Y&#124;X)
Fungsi link log{E(Y&#124;X)} = XΞ² log(Y) = XΞ² + Ξ΅
Interpretasi koef. IRR = exp(Ξ²) Multiplier pada geometric mean
Prediksi Selalu non-negatif Perlu retransformation
Kapan lebih tepat? Count data, banyak nol, rate kejadian Y positif, log-scale residual baik

0.12 Interpretasi Hasil

πŸ’‘ Interpretasi β€” Regresi Poisson

Sesuaikan teks berikut dengan IRR dari output kamu:

β€’ IRR (Incidence Rate Ratio): Jika IRR untuk vehicles_involved = 1.35, artinya setiap penambahan 1 kendaraan yang terlibat meningkatkan rata-rata jumlah cedera sebesar 35%.
β€’ IRR < 1: Variabel tersebut berkaitan dengan penurunan rata-rata hitungan.
β€’ IRR = 1: Variabel tidak berpengaruh pada rata-rata hitungan.
β€’ Overdispersion: Jika Dispersion Pearson jauh di atas 1 (misal > 2.5), model Negative Binomial (MASS::glm.nb()) lebih tepat.
β€’ Zero-inflation: Jika proporsi Y = 0 sangat tinggi (> 30%), pertimbangkan model Zero-Inflated Poisson (ZIP).


1 Ringkasan Komparatif Ketiga Model

tibble::tibble(
  `Model`           = c("Logistik Multinomial","Logistik Ordinal","Poisson"),
  `Dataset`         = c("Iris","Student Grade","Traffic Accidents"),
  `Jenis Respons`   = c("Nominal β‰₯ 3 kat.","Ordinal β‰₯ 3 kat.","Count (hitungan)"),
  `Fungsi Link`     = c("logit (baseline)","logit (kumulatif)","log"),
  `Fungsi R`        = c("nnet::multinom()","MASS::polr()","glm(..., poisson)"),
  `Interpretasi Ξ²`  = c("Rel. Risk Ratio (RRR)","Cumul. Odds Ratio","Inc. Rate Ratio (IRR)"),
  `Asumsi Utama`    = c(
    "IIA (Independence of Irrelevant Alternatives)",
    "Proportional odds",
    "Mean = Variance (equidispersion)"
  )
) %>%
  knitr::kable(caption = "Perbandingan Ketiga Model Regresi Logistik") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)
Perbandingan Ketiga Model Regresi Logistik
Model Dataset Jenis Respons Fungsi Link Fungsi R Interpretasi Ξ² Asumsi Utama
Logistik Multinomial Iris Nominal β‰₯ 3 kat. logit (baseline) nnet::multinom() Rel. Risk Ratio (RRR) IIA (Independence of Irrelevant Alternatives)
Logistik Ordinal Student Grade Ordinal β‰₯ 3 kat. logit (kumulatif) MASS::polr() Cumul. Odds Ratio Proportional odds
Poisson Traffic Accidents Count (hitungan) log glm(…, poisson) Inc.Β Rate Ratio (IRR) Mean = Variance (equidispersion)

πŸ“ Kesimpulan Umum

Pemilihan model regresi yang tepat bergantung pada skala pengukuran variabel respons:

β€’ Gunakan Logistik Multinomial saat Y memiliki 3+ kategori tanpa urutan alami.
β€’ Gunakan Logistik Ordinal saat Y memiliki 3+ kategori dengan urutan alami (Rendah < Sedang < Tinggi).
β€’ Gunakan Regresi Poisson saat Y berupa hitungan non-negatif (0, 1, 2, …).

Pelanggaran asumsi perlu dideteksi dan ditangani: overdispersion pada Poisson β†’ Negative Binomial; pelanggaran parallel lines pada Ordinal β†’ Partial Proportional Odds.