Laporan Analisis Statistika

Analisis Regresi Logistik

Multinomial · Ordinal · Poisson

<span class="cover-tag">🌸 Multinomial — Iris Species</span>
<span class="cover-tag">📈 Ordinal — Student Grade</span>
<span class="cover-tag">🚦 Poisson — Traffic Injuries</span>

<div class="cover-meta-item">
  <span class="cover-meta-label">Disusun oleh</span>
  <span class="cover-meta-value">Nama Kamu</span>
</div>
<div class="cover-meta-item">
  <span class="cover-meta-label">NIM</span>
  <span class="cover-meta-value">140XXXXXXX</span>
</div>
<div class="cover-meta-item">
  <span class="cover-meta-label">Mata Kuliah</span>
  <span class="cover-meta-value">Regresi — FMIPA UNPAD</span>
</div>
<div class="cover-meta-item">
  <span class="cover-meta-label">Tanggal</span>
  <span class="cover-meta-value"> 02 June 2026 </span>
</div>

🌸

<h2>Bagian I — Regresi Logistik Multinomial</h2>
<p>Dataset: Iris &nbsp;|&nbsp; Respons: Species (3 kelas nominal)</p>

📖 Konsep Dasar

Regresi logistik multinomial digunakan saat variabel respons bersifat nominal dengan tiga atau lebih kategori tanpa urutan alami. Model menggunakan satu kategori sebagai baseline (referensi) dan memodelkan log-odds masing-masing kategori lainnya terhadap baseline:

log(P(Y=k) / P(Y=ref)) = β₀ₖ + β₁ₖX₁ + … + βₚₖXₚ

Koefisien diinterpretasikan sebagai perubahan log-odds memilih kategori k dibanding referensi, atau dalam bentuk Relative Risk Ratio (RRR) = exp(βⱼₖ). Model difit dengan paket nnet::multinom().

0.1 Persiapan Data

0.1.1 Muat & Struktur Data

# ── SESUAIKAN: ganti nama file dan nama kolom jika berbeda ────────────────────
data_multi <- read.csv("data_multinomial.csv", sep = ";")

var_y_multi <- "Species"
var_x_multi <- setdiff(colnames(data_multi), c("Id", "Species"))

data_multi <- data_multi %>%
  mutate(
    Species = factor(Species,
      levels = c("Iris-setosa", "Iris-versicolor", "Iris-virginica"))
  )

dplyr::glimpse(data_multi)

## Rows: 150
## Columns: 6
## $ Id            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ SepalLengthCm <int> 51, 49, 47, 46, 50, 54, 46, 50, 44, 49, 54, 48, 48, 43, …
## $ SepalWidthCm  <int> 35, 30, 32, 31, 36, 39, 34, 34, 29, 31, 37, 34, 30, 30, …
## $ PetalLengthCm <int> 14, 14, 13, 15, 14, 17, 14, 15, 14, 15, 15, 16, 14, 11, …
## $ PetalWidthCm  <int> 2, 2, 2, 2, 2, 4, 3, 2, 2, 1, 2, 2, 1, 1, 2, 4, 4, 3, 3,…
## $ Species       <fct> Iris-setosa, Iris-setosa, Iris-setosa, Iris-setosa, Iris…

0.1.2 Distribusi Respons

data_multi %>%
  count(Species) %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  knitr::kable(caption = "Distribusi Spesies Iris") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)

Distribusi Spesies Iris
Species	n	Proporsi
Iris-setosa	50	33.3%
Iris-versicolor	50	33.3%
Iris-virginica	50	33.3%

data_multi %>%
  count(Species) %>%
  mutate(proporsi = n / sum(n)) %>%
  ggplot(aes(x = Species, y = proporsi, fill = Species)) +
  geom_col(width = 0.65, alpha = 0.94) +
  geom_text(aes(label = scales::percent(proporsi, accuracy = 0.1)),
            vjust = -0.4, fontface = "bold", color = "#243142") +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.65)) +
  scale_fill_manual(values = c("#2f7f73","#5568b8","#d18b2f")) +
  labs(title = "Distribusi Spesies Iris",
       subtitle = "Respons multinomial — 3 kategori tanpa urutan",
       x = NULL, y = "Proporsi") +
  theme_minimal() + theme(legend.position = "none")

0.2 Estimasi Model

0.2.1 Fitting & Summary

formula_multi <- as.formula(paste(var_y_multi, "~", paste(var_x_multi, collapse = " + ")))
cat("Formula:", deparse(formula_multi), "\n")

## Formula: Species ~ SepalLengthCm + SepalWidthCm + PetalLengthCm + PetalWidthCm

fit_multi <- nnet::multinom(formula_multi, data = data_multi, trace = FALSE)
summary(fit_multi)

## Call:
## nnet::multinom(formula = formula_multi, data = data_multi, trace = FALSE)
## 
## Coefficients:
##                 (Intercept) SepalLengthCm SepalWidthCm PetalLengthCm
## Iris-versicolor    22.24756    -0.8914206    -2.814642      4.180154
## Iris-virginica    -20.39006    -1.1379418    -3.482729      5.123089
##                 PetalWidthCm
## Iris-versicolor     1.351080
## Iris-virginica      3.179689
## 
## Std. Errors:
##                 (Intercept) SepalLengthCm SepalWidthCm PetalLengthCm
## Iris-versicolor    12.85375     0.1197144    0.2239780     0.2368581
## Iris-virginica     12.85375     0.1197150    0.2239769     0.2368591
##                 PetalWidthCm
## Iris-versicolor    0.4871285
## Iris-virginica     0.4871288
## 
## Residual Deviance: 11.89855 
## AIC: 31.89855

0.2.2 Koefisien, RRR & CI

multi_sum <- summary(fit_multi)

coef_long <- as.data.frame(multi_sum$coefficients) %>%
  tibble::rownames_to_column("kategori") %>%
  tidyr::pivot_longer(-kategori, names_to = "variabel", values_to = "estimate")

se_long <- as.data.frame(multi_sum$standard.errors) %>%
  tibble::rownames_to_column("kategori") %>%
  tidyr::pivot_longer(-kategori, names_to = "variabel", values_to = "std_error")

result_multi <- coef_long %>%
  left_join(se_long, by = c("kategori","variabel")) %>%
  mutate(
    z_value = estimate / std_error,
    p_value = 2 * (1 - pnorm(abs(z_value))),
    RRR     = exp(estimate),
    CI_low  = exp(estimate - 1.96 * std_error),
    CI_high = exp(estimate + 1.96 * std_error),
    Signifikansi = case_when(
      p_value < 0.001 ~ "***", p_value < 0.01 ~ "**",
      p_value < 0.05  ~ "*",   p_value < 0.1  ~ ".",
      TRUE ~ ""
    )
  ) %>%
  mutate(across(c(estimate,std_error,z_value,p_value,RRR,CI_low,CI_high), ~round(.x,4)))

result_multi %>%
  knitr::kable(
    caption = "Ringkasan Model Multinomial (Base: Iris-setosa)",
    col.names = c("Kategori","Variabel","Estimate","SE","z-value","p-value","RRR","CI 2.5%","CI 97.5%","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)

Ringkasan Model Multinomial (Base: Iris-setosa)
Kategori	Variabel	Estimate	SE	z-value	p-value	RRR	CI 2.5%	CI 97.5%	Sig.
Iris-versicolor	(Intercept)	22.2476	12.8537	1.7308	0.0835	4.591905e+09	0.0526	4.011665e+20	.
Iris-versicolor	SepalLengthCm	-0.8914	0.1197	-7.4462	0.0000	4.101000e-01	0.3243	5.185000e-01	***
Iris-versicolor	SepalWidthCm	-2.8146	0.2240	-12.5666	0.0000	5.990000e-02	0.0386	9.300000e-02	***
Iris-versicolor	PetalLengthCm	4.1802	0.2369	17.6484	0.0000	6.537590e+01	41.0961	1.040005e+02	***
Iris-versicolor	PetalWidthCm	1.3511	0.4871	2.7736	0.0055	3.861600e+00	1.4863	1.003270e+01	**
Iris-virginica	(Intercept)	-20.3901	12.8537	-1.5863	0.1127	0.000000e+00	0.0000	1.219101e+02
Iris-virginica	SepalLengthCm	-1.1379	0.1197	-9.5054	0.0000	3.205000e-01	0.2535	4.052000e-01	***
Iris-virginica	SepalWidthCm	-3.4827	0.2240	-15.5495	0.0000	3.070000e-02	0.0198	4.770000e-02	***
Iris-virginica	PetalLengthCm	5.1231	0.2369	21.6293	0.0000	1.678530e+02	105.5141	2.670224e+02	***
Iris-virginica	PetalWidthCm	3.1797	0.4871	6.5274	0.0000	2.403930e+01	9.2527	6.245590e+01	***

0.3 Prediksi & Evaluasi

0.3.1 Confusion Matrix & Akurasi

data_multi_pred <- data_multi %>%
  bind_cols(as.data.frame(predict(fit_multi, type = "probs"))) %>%
  mutate(prediksi = predict(fit_multi, type = "class"))

conf_multi <- table(Aktual = data_multi_pred$Species, Prediksi = data_multi_pred$prediksi)
knitr::kable(conf_multi, caption = "Confusion Matrix — Model Multinomial") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Confusion Matrix — Model Multinomial
	Iris-setosa	Iris-versicolor	Iris-virginica
Iris-setosa	50	0	0
Iris-versicolor	0	49	1
Iris-virginica	0	1	49

accuracy_multi <- sum(diag(conf_multi)) / sum(conf_multi)
cat("Akurasi Model:", scales::percent(accuracy_multi, accuracy = 0.01), "\n")

## Akurasi Model: 98.67%

0.3.2 Plot Probabilitas Prediksi

grid_multi <- expand.grid(
  SepalLengthCm  = seq(min(data_multi$SepalLengthCm), max(data_multi$SepalLengthCm), length.out = 100),
  SepalWidthCm   = mean(data_multi$SepalWidthCm),
  PetalLengthCm  = mean(data_multi$PetalLengthCm),
  PetalWidthCm   = mean(data_multi$PetalWidthCm)
)

grid_multi %>%
  bind_cols(as.data.frame(predict(fit_multi, newdata = grid_multi, type = "probs"))) %>%
  tidyr::pivot_longer(cols = c("Iris-setosa","Iris-versicolor","Iris-virginica"),
                      names_to = "Species", values_to = "probabilitas") %>%
  ggplot(aes(x = SepalLengthCm, y = probabilitas, color = Species)) +
  geom_line(linewidth = 1.35) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("#2f7f73","#5568b8","#d18b2f")) +
  labs(title = "Prediksi Probabilitas Spesies Iris",
       subtitle = "Variabel lain ditahan pada nilai rata-rata",
       x = "Sepal Length (Cm)", y = "Probabilitas Prediksi", color = "Spesies") +
  theme_minimal()

0.4 Interpretasi Hasil

💡 Interpretasi — Logistik Multinomial

Sesuaikan teks berikut dengan hasil RRR dari output kamu:

• Kategori referensi: Iris-setosa. Semua interpretasi dibandingkan terhadap kategori ini.
• RRR (Relative Risk Ratio): Jika RRR untuk variabel X pada kategori Iris-versicolor = 2.5, artinya kenaikan 1 satuan X meningkatkan risiko relatif menjadi versicolor (vs setosa) sebesar 2.5 kali.
• Signifikansi (p-value < 0.05): Hanya variabel dengan tanda *** atau ** yang secara statistik signifikan membedakan antar kelas.
• Grafik probabilitas: Menunjukkan bagaimana peluang masing-masing spesies berubah seiring perubahan Sepal Length, saat variabel lain dipegang tetap.
• Akurasi: (lihat output confusion matrix) — akurasi tinggi menunjukkan model dapat membedakan ketiga spesies dengan baik.

📈

<h2>Bagian II — Regresi Logistik Ordinal</h2>
<p>Dataset: Student Performance &nbsp;|&nbsp; Respons: Grade (Rendah &lt; Sedang &lt; Tinggi)</p>

📖 Konsep Dasar

Regresi logistik ordinal (Proportional Odds Model) digunakan saat variabel respons bersifat ordinal — ada urutan alami antar kategori. Model memodelkan log-odds kumulatif:

logit(P(Y ≤ j)) = αⱼ − (β₁X₁ + … + βₚXₚ)

Asumsi utama: Proportional Odds — efek setiap prediktor dianggap sama untuk semua batas kumulatif (parallel lines). Koefisien diinterpretasikan sebagai log-odds kumulatif, dengan exp(β) sebagai odds ratio (polaritas tergantung konvensi software). Difit dengan MASS::polr().

0.5 Persiapan Data

0.5.1 Muat & Struktur Data

# ── SESUAIKAN: ganti nama file dan nama kolom jika berbeda ────────────────────
data_ord <- read.csv("data_ordinal.csv", sep = ";")

data_ord <- data_ord %>%
  mutate(
    Grade  = factor(Grade, levels = c("Rendah","Sedang","Tinggi"), ordered = TRUE),
    Gender = as.factor(Gender)
  )

dplyr::glimpse(data_ord)

## Rows: 6,607
## Columns: 21
## $ Hours_Studied              <int> 23, 19, 24, 29, 19, 19, 29, 25, 17, 23, 17,…
## $ Attendance                 <int> 84, 64, 98, 89, 92, 88, 84, 78, 94, 98, 80,…
## $ Parental_Involvement       <chr> "Low", "Low", "Medium", "Low", "Medium", "M…
## $ Access_to_Resources        <chr> "High", "Medium", "Medium", "Medium", "Medi…
## $ Extracurricular_Activities <chr> "No", "No", "Yes", "Yes", "Yes", "Yes", "Ye…
## $ Sleep_Hours                <int> 7, 8, 7, 8, 6, 8, 7, 6, 6, 8, 8, 6, 8, 8, 8…
## $ Previous_Scores            <int> 73, 59, 91, 98, 65, 89, 68, 50, 80, 71, 88,…
## $ Motivation_Level           <chr> "Low", "Low", "Medium", "Medium", "Medium",…
## $ Internet_Access            <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ Tutoring_Sessions          <int> 0, 2, 2, 1, 3, 3, 1, 1, 0, 0, 4, 2, 2, 2, 1…
## $ Family_Income              <chr> "Low", "Medium", "Medium", "Medium", "Mediu…
## $ Teacher_Quality            <chr> "Medium", "Medium", "Medium", "Medium", "Hi…
## $ School_Type                <chr> "Public", "Public", "Public", "Public", "Pu…
## $ Peer_Influence             <chr> "Positive", "Negative", "Neutral", "Negativ…
## $ Physical_Activity          <int> 3, 4, 4, 4, 4, 3, 2, 2, 1, 5, 4, 2, 4, 3, 4…
## $ Learning_Disabilities      <chr> "No", "No", "No", "No", "No", "No", "No", "…
## $ Parental_Education_Level   <chr> "High School", "College", "Postgraduate", "…
## $ Distance_from_Home         <chr> "Near", "Moderate", "Near", "Moderate", "Ne…
## $ Gender                     <fct> Male, Female, Male, Male, Female, Male, Mal…
## $ Exam_Score                 <int> 67, 61, 74, 71, 70, 71, 67, 66, 69, 72, 68,…
## $ Grade                      <ord> Sedang, Rendah, Sedang, Sedang, Sedang, Sed…

0.5.2 Distribusi Respons

data_ord %>%
  count(Grade) %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  knitr::kable(caption = "Distribusi Tingkat Grade Mahasiswa") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)

Distribusi Tingkat Grade Mahasiswa
Grade	n	Proporsi
Rendah	316	4.8%
Sedang	6167	93.3%
Tinggi	124	1.9%

data_ord %>%
  count(Grade) %>%
  mutate(proporsi = n / sum(n)) %>%
  ggplot(aes(x = Grade, y = proporsi, fill = Grade)) +
  geom_col(width = 0.65, alpha = 0.94) +
  geom_text(aes(label = scales::percent(proporsi, accuracy = 0.1)),
            vjust = -0.4, fontface = "bold", color = "#243142") +
  scale_y_continuous(labels = scales::percent_format(), expand = expansion(mult = c(0, 0.18))) +
  scale_fill_manual(values = c("#2f7f73","#d18b2f","#5568b8")) +
  labs(title = "Distribusi Tingkat Grade",
       subtitle = "Respons ordinal: Rendah < Sedang < Tinggi",
       x = "Grade", y = "Proporsi") +
  theme_minimal() + theme(legend.position = "none")

0.6 Estimasi Model

0.6.1 Fitting & Summary

fit_ord <- MASS::polr(
  Grade ~ Hours_Studied + Attendance + Previous_Scores + Sleep_Hours + Gender,
  data   = data_ord,
  method = "logistic",
  Hess   = TRUE
)
summary(fit_ord)

## Call:
## MASS::polr(formula = Grade ~ Hours_Studied + Attendance + Previous_Scores + 
##     Sleep_Hours + Gender, data = data_ord, Hess = TRUE, method = "logistic")
## 
## Coefficients:
##                    Value Std. Error t value
## Hours_Studied    0.26628   0.011780 22.6048
## Attendance       0.16474   0.008164 20.1781
## Previous_Scores  0.03508   0.004047  8.6675
## Sleep_Hours     -0.02866   0.038071 -0.7529
## GenderMale      -0.06097   0.114288 -0.5334
## 
## Intercepts:
##               Value   Std. Error t value
## Rendah|Sedang 15.7674  0.7880    20.0105
## Sedang|Tinggi 27.3238  1.0385    26.3097
## 
## Residual Deviance: 2328.836 
## AIC: 2342.836

0.6.2 Koefisien, OR & CI

ord_coef <- coef(summary(fit_ord))

result_ord <- as.data.frame(ord_coef) %>%
  tibble::rownames_to_column("parameter") %>%
  dplyr::rename(estimate = Value, std_error = `Std. Error`, t_value = `t value`) %>%
  mutate(
    p_value       = 2 * (1 - pnorm(abs(t_value))),
    jenis         = ifelse(grepl("\\|", parameter), "Cutpoint", "Koefisien"),
    OR_polr       = ifelse(jenis == "Koefisien", exp(estimate), NA_real_),
    CI_low_polr   = ifelse(jenis == "Koefisien", exp(estimate - 1.96 * std_error), NA_real_),
    CI_high_polr  = ifelse(jenis == "Koefisien", exp(estimate + 1.96 * std_error), NA_real_),
    Signifikansi  = case_when(
      p_value < 0.001 ~ "***", p_value < 0.01 ~ "**",
      p_value < 0.05  ~ "*",   p_value < 0.1  ~ ".",
      TRUE ~ ""
    )
  ) %>%
  mutate(across(c(estimate,std_error,t_value,p_value,OR_polr,CI_low_polr,CI_high_polr), ~round(.x,4)))

result_ord %>%
  knitr::kable(
    caption = "Ringkasan Model Ordinal (Proportional Odds)",
    col.names = c("Parameter","Estimate","SE","t-value","p-value","Jenis","OR polr","CI 2.5%","CI 97.5%","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE) %>%
  row_spec(which(result_ord$jenis == "Cutpoint"), background = "#f0f9f8", bold = TRUE)

Ringkasan Model Ordinal (Proportional Odds)
Parameter	Estimate	SE	t-value	p-value	Jenis	OR polr	CI 2.5%	CI 97.5%	Sig.
Hours_Studied	0.2663	0.0118	22.6048	0.0000	Koefisien	1.3051	1.2753	1.3356	***
Attendance	0.1647	0.0082	20.1781	0.0000	Koefisien	1.1791	1.1604	1.1981	***
Previous_Scores	0.0351	0.0040	8.6675	0.0000	Koefisien	1.0357	1.0275	1.0439	***
Sleep_Hours	-0.0287	0.0381	-0.7529	0.4515	Koefisien	0.9717	0.9019	1.0470
GenderMale	-0.0610	0.1143	-0.5334	0.5937	Koefisien	0.9409	0.7520	1.1771
Rendah\|Sedang	15.7674	0.7880	20.0105	0.0000	Cutpoint	NA	NA	NA	***
Sedang\|Tinggi	27.3238	1.0385	26.3097	0.0000	Cutpoint	NA	NA	NA	***

0.7 Prediksi & Evaluasi

0.7.1 Confusion Matrix & Akurasi

data_ord_pred <- data_ord %>%
  bind_cols(as.data.frame(predict(fit_ord, type = "probs"))) %>%
  mutate(prediksi = predict(fit_ord, type = "class"))

conf_ord <- table(Aktual = data_ord_pred$Grade, Prediksi = data_ord_pred$prediksi)
knitr::kable(conf_ord, caption = "Confusion Matrix — Model Ordinal") %>%
  kable_styling(bootstrap_options = c("striped","hover"), full_width = FALSE)

Confusion Matrix — Model Ordinal
	Rendah	Sedang	Tinggi
Rendah	87	229	0
Sedang	23	6138	6
Tinggi	4	104	16

accuracy_ord <- sum(diag(conf_ord)) / sum(conf_ord)
cat("Akurasi Model:", scales::percent(accuracy_ord, accuracy = 0.01), "\n")

## Akurasi Model: 94.46%

0.7.2 Probabilitas Prediksi

grid_ord <- expand.grid(
  Hours_Studied   = seq(min(data_ord$Hours_Studied, na.rm=TRUE),
                        max(data_ord$Hours_Studied, na.rm=TRUE), length.out = 120),
  Attendance      = mean(data_ord$Attendance, na.rm=TRUE),
  Previous_Scores = mean(data_ord$Previous_Scores, na.rm=TRUE),
  Sleep_Hours     = mean(data_ord$Sleep_Hours, na.rm=TRUE),
  Gender          = "Female"
)

grid_ord %>%
  bind_cols(as.data.frame(predict(fit_ord, newdata = grid_ord, type = "probs"))) %>%
  tidyr::pivot_longer(cols = c("Rendah","Sedang","Tinggi"),
                      names_to = "Grade", values_to = "probabilitas") %>%
  mutate(Grade = factor(Grade, levels = c("Rendah","Sedang","Tinggi"))) %>%
  ggplot(aes(x = Hours_Studied, y = probabilitas, color = Grade)) +
  geom_line(linewidth = 1.25) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("#2f7f73","#d18b2f","#5568b8")) +
  labs(title = "Prediksi Probabilitas Grade berdasarkan Jam Belajar",
       subtitle = "Variabel lain ditahan pada rata-rata; Gender = Female",
       x = "Jam Belajar (Hours Studied)", y = "Probabilitas Prediksi", color = "Grade") +
  theme_minimal()

0.7.3 Uji Asumsi Parallel Lines

fit_clm <- ordinal::clm(
  Grade ~ Hours_Studied + Attendance + Previous_Scores + Sleep_Hours + Gender,
  data = data_ord, link = "logit"
)
summary(fit_clm)

## formula: 
## Grade ~ Hours_Studied + Attendance + Previous_Scores + Sleep_Hours + Gender
## data:    data_ord
## 
##  link  threshold nobs logLik   AIC     niter max.grad cond.H 
##  logit flexible  6607 -1164.42 2342.84 10(0) 3.34e-11 6.3e+06
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## Hours_Studied    0.266278   0.011779  22.606   <2e-16 ***
## Attendance       0.164743   0.008172  20.161   <2e-16 ***
## Previous_Scores  0.035076   0.004049   8.664   <2e-16 ***
## Sleep_Hours     -0.028664   0.038071  -0.753    0.451    
## GenderMale      -0.060965   0.114288  -0.533    0.594    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Threshold coefficients:
##               Estimate Std. Error z value
## Rendah|Sedang  15.7674     0.7884    20.0
## Sedang|Tinggi  27.3238     1.0390    26.3

ordinal::nominal_test(fit_clm)

0.8 Interpretasi Hasil

💡 Interpretasi — Logistik Ordinal

Sesuaikan teks berikut dengan output model kamu:

• Koefisien positif (polr): Variabel X yang memiliki koefisien positif artinya peningkatan X menurunkan peluang berada di kategori lebih rendah — ekuivalen dengan meningkatkan peluang Grade lebih tinggi. Perhatikan: MASS::polr menggunakan konvensi tanda negatif untuk arah “naik”.
• Cutpoints (α): Threshold pemisah antar kategori kumulatif, misalnya batas antara Rendah|Sedang dan Sedang|Tinggi.
• Asumsi Parallel Lines: Jika nominal_test menghasilkan p-value > 0.05, asumsi proportional odds tidak ditolak — model valid digunakan.
• Akurasi: (lihat output confusion matrix) — model ordinal umumnya lebih akurat dari tebakan acak jika struktur ordinal dimanfaatkan dengan baik.

🚦

<h2>Bagian III — Regresi Poisson</h2>
<p>Dataset: Traffic Accidents &nbsp;|&nbsp; Respons: injuries (count)</p>

📖 Konsep Dasar

Regresi Poisson digunakan untuk memodelkan variabel respons berupa hitungan (count) — bilangan bulat non-negatif (0, 1, 2, …). Model menggunakan fungsi link log:

log(E[Y|X]) = β₀ + β₁X₁ + … + βₚXₚ

Koefisien diinterpretasikan sebagai Incidence Rate Ratio (IRR) = exp(β): kenaikan 1 satuan Xⱼ mengalikan rata-rata hitungan Y sebesar exp(βⱼ) kali. Asumsi penting: mean = variance. Jika variance > mean (overdispersion), pertimbangkan model Negative Binomial.

0.9 Persiapan Data

0.9.1 Muat & Struktur Data

# ── SESUAIKAN: ganti nama file jika berbeda ────────────────────────────────────
data_poi <- read.csv("data_poisson1.csv", sep = ";")

var_y_poi <- "injuries"
var_x_poi <- setdiff(colnames(data_poi), c("id","date","time","latitude","longitude","injuries"))

data_poi <- data_poi %>% mutate(across(where(is.character), as.factor))
dplyr::glimpse(data_poi)

## Rows: 99
## Columns: 13
## $ time              <fct> "0,398611111", "0,64375", "0,576388889", "0,00416666…
## $ latitude          <int> 102909, 365251, 264679, 285199, 222705, 174806, 2519…
## $ longitude         <int> 94733, 921448, 753151, 774946, 879632, 969596, 84997…
## $ severity          <fct> Critical, Low, Medium, Medium, Critical, Low, Low, M…
## $ road_condition    <fct> Potholed, Construction, Dry, Potholed, Muddy, Floodi…
## $ weather           <fct> Cloudy, Cloudy, Cloudy, Clear, Dust Storm, Heavy Rai…
## $ vehicles_involved <int> 1, 4, 4, 2, 1, 2, 2, 3, 2, 3, 3, 2, 1, 4, 4, 3, 4, 2…
## $ injuries          <int> 8, 0, 0, 2, 4, 0, 0, 2, 0, 5, 0, 0, 0, 2, 2, 3, 2, 1…
## $ fatalities        <int> 3, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 1, 1…
## $ accident_cause    <fct> Poor Road, Poor Road, Weather, Poor Road, Weather, W…
## $ traffic_density   <fct> Moderate, Heavy, Light, Heavy, Light, Light, Moderat…
## $ lane_utilization  <fct> Overtaking, Congested Multi-Lane, Single Lane, Conge…
## $ nearby_accidents  <int> 5, 1, 8, 44, 1, 3, 22, 1, 19, 3, 6, 8, 5, 10, 5, 48,…

0.9.2 Distribusi Respons & Zero-Inflation

data_poi %>%
  count(.data[[var_y_poi]], name = "n") %>%
  mutate(Proporsi = scales::percent(n / sum(n), accuracy = 0.1)) %>%
  knitr::kable(caption = "Distribusi Jumlah Cedera (Injuries)") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)

Distribusi Jumlah Cedera (Injuries)
injuries	n	Proporsi
0	36	36.4%
1	11	11.1%
2	24	24.2%
3	8	8.1%
4	11	11.1%
5	2	2.0%
6	2	2.0%
7	3	3.0%
8	2	2.0%

data_poi %>%
  count(.data[[var_y_poi]], name = "n") %>%
  ggplot(aes(x = .data[[var_y_poi]], y = n)) +
  geom_col(width = 0.8, fill = "#2f7f73", alpha = 0.92) +
  scale_x_continuous(breaks = scales::pretty_breaks()) +
  labs(title = "Distribusi Jumlah Cedera",
       subtitle = "Respons Poisson: bilangan bulat non-negatif",
       x = "Jumlah Cedera (Injuries)", y = "Frekuensi") +
  theme_minimal()

# Cek zero-inflation
zero_share <- mean(data_poi[[var_y_poi]] == 0)
cat("Proporsi Y = 0:", scales::percent(zero_share, accuracy = 0.1), "\n")

## Proporsi Y = 0: 36.4%

0.10 Estimasi Model

0.10.1 Fitting & Summary

formula_poi <- as.formula(paste(var_y_poi, "~", paste(var_x_poi, collapse = " + ")))
cat("Formula:", deparse(formula_poi), "\n")

## Formula: injuries ~ severity + road_condition + weather + vehicles_involved +      fatalities + accident_cause + traffic_density + lane_utilization +      nearby_accidents

fit_pois <- glm(formula_poi, data = data_poi, family = poisson(link = "log"))
summary(fit_pois)

## 
## Call:
## glm(formula = formula_poi, family = poisson(link = "log"), data = data_poi)
## 
## Coefficients: (2 not defined because of singularities)
##                                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                       1.789e+00  1.084e+00   1.650   0.0989 .
## severityHigh                     -5.697e-01  2.531e-01  -2.251   0.0244 *
## severityLow                      -2.134e+01  2.785e+03  -0.008   0.9939  
## severityMedium                   -8.193e-01  4.183e-01  -1.959   0.0501 .
## road_conditionDry                 2.834e-01  5.420e-01   0.523   0.6011  
## road_conditionFlooding            1.385e-01  5.141e-01   0.269   0.7876  
## road_conditionMuddy               3.317e-01  5.954e-01   0.557   0.5774  
## road_conditionPotholed            7.888e-02  3.976e-01   0.198   0.8428  
## road_conditionWet                 8.822e-02  5.407e-01   0.163   0.8704  
## weatherCloudy                    -2.006e-02  2.725e-01  -0.074   0.9413  
## weatherDust Storm                 1.736e-01  8.193e-01   0.212   0.8322  
## weatherFog                        8.000e-01  7.516e-01   1.064   0.2872  
## weatherHeavy Rain                 4.969e-01  7.122e-01   0.698   0.4854  
## weatherRain                              NA         NA      NA       NA  
## vehicles_involved                -1.804e-02  7.637e-02  -0.236   0.8133  
## fatalities                        9.633e-02  1.239e-01   0.777   0.4370  
## accident_causeHuman Error        -6.921e-01  8.132e-01  -0.851   0.3947  
## accident_causeMechanical Failure -8.933e-01  8.756e-01  -1.020   0.3076  
## accident_causePoor Road          -8.269e-01  8.960e-01  -0.923   0.3561  
## accident_causeSignal Violation   -7.722e-01  8.510e-01  -0.907   0.3642  
## accident_causeWeather            -1.091e+00  9.897e-01  -1.102   0.2705  
## traffic_densityLight             -4.919e-01  5.360e-01  -0.918   0.3588  
## traffic_densityModerate          -3.719e-01  5.026e-01  -0.740   0.4593  
## lane_utilizationLane Change       6.845e-01  4.176e-01   1.639   0.1012  
## lane_utilizationOvertaking        8.180e-01  4.737e-01   1.727   0.0842 .
## lane_utilizationSingle Lane              NA         NA      NA       NA  
## nearby_accidents                  3.621e-03  1.015e-02   0.357   0.7212  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 230.14  on 98  degrees of freedom
## Residual deviance:  30.19  on 74  degrees of freedom
## AIC: 258.35
## 
## Number of Fisher Scoring iterations: 18

0.10.2 Koefisien, IRR & CI

pois_coef <- as.data.frame(coef(summary(fit_pois))) %>%
  tibble::rownames_to_column("parameter") %>%
  dplyr::rename(estimate = Estimate, std_error = `Std. Error`, z_value = `z value`, p_value = `Pr(>|z|)`) %>%
  mutate(
    IRR             = exp(estimate),
    CI_low          = exp(estimate - 1.96 * std_error),
    CI_high         = exp(estimate + 1.96 * std_error),
    perubahan_persen = 100 * (IRR - 1),
    Signifikansi    = case_when(
      p_value < 0.001 ~ "***", p_value < 0.01 ~ "**",
      p_value < 0.05  ~ "*",   p_value < 0.1  ~ ".",
      TRUE ~ ""
    )
  ) %>%
  mutate(across(c(estimate,std_error,z_value,p_value,IRR,CI_low,CI_high,perubahan_persen), ~round(.x,4)))

pois_coef %>%
  knitr::kable(
    caption = "Ringkasan Hasil Regresi Poisson",
    col.names = c("Parameter","Estimate","SE","z-value","p-value","IRR","CI 2.5%","CI 97.5%","Perubahan (%)","Sig.")
  ) %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)

Ringkasan Hasil Regresi Poisson
Parameter	Estimate	SE	z-value	p-value	IRR	CI 2.5%	CI 97.5%	Perubahan (%)	Sig.
(Intercept)	1.7888	1.0839	1.6504	0.0989	5.9825	0.7149	50.0605	498.2490	.
severityHigh	-0.5697	0.2531	-2.2507	0.0244	0.5657	0.3444	0.9291	-43.4308
severityLow	-21.3401	2784.9698	-0.0077	0.9939	0.0000	0.0000	Inf	-100.0000
severityMedium	-0.8193	0.4183	-1.9587	0.0501	0.4407	0.1942	1.0005	-55.9258	.
road_conditionDry	0.2834	0.5420	0.5229	0.6011	1.3276	0.4589	3.8406	32.7615
road_conditionFlooding	0.1385	0.5141	0.2694	0.7876	1.1486	0.4193	3.1459	14.8559
road_conditionMuddy	0.3317	0.5954	0.5572	0.5774	1.3934	0.4338	4.4757	39.3396
road_conditionPotholed	0.0789	0.3976	0.1984	0.8428	1.0821	0.4964	2.3589	8.2073
road_conditionWet	0.0882	0.5407	0.1632	0.8704	1.0922	0.3785	3.1519	9.2232
weatherCloudy	-0.0201	0.2725	-0.0736	0.9413	0.9801	0.5745	1.6721	-1.9862
weatherDust Storm	0.1736	0.8193	0.2119	0.8322	1.1896	0.2388	5.9261	18.9600
weatherFog	0.8000	0.7516	1.0643	0.2872	2.2256	0.5101	9.7108	122.5566
weatherHeavy Rain	0.4969	0.7122	0.6977	0.4854	1.6436	0.4070	6.6375	64.3563
vehicles_involved	-0.0180	0.0764	-0.2362	0.8133	0.9821	0.8456	1.1407	-1.7879
fatalities	0.0963	0.1239	0.7773	0.4370	1.1011	0.8637	1.4039	10.1122
accident_causeHuman Error	-0.6921	0.8132	-0.8511	0.3947	0.5005	0.1017	2.4640	-49.9463
accident_causeMechanical Failure	-0.8933	0.8756	-1.0202	0.3076	0.4093	0.0736	2.2770	-59.0696
accident_causePoor Road	-0.8269	0.8960	-0.9229	0.3561	0.4374	0.0755	2.5325	-56.2593
accident_causeSignal Violation	-0.7722	0.8510	-0.9073	0.3642	0.4620	0.0871	2.4494	-53.7983
accident_causeWeather	-1.0906	0.9897	-1.1020	0.2705	0.3360	0.0483	2.3375	-66.3992
traffic_densityLight	-0.4919	0.5360	-0.9176	0.3588	0.6115	0.2139	1.7485	-38.8514
traffic_densityModerate	-0.3719	0.5026	-0.7399	0.4593	0.6894	0.2574	1.8464	-31.0591
lane_utilizationLane Change	0.6845	0.4176	1.6393	0.1012	1.9829	0.8747	4.4953	98.2877
lane_utilizationOvertaking	0.8180	0.4737	1.7268	0.0842	2.2659	0.8954	5.7338	126.5885	.
nearby_accidents	0.0036	0.0101	0.3568	0.7212	1.0036	0.9839	1.0238	0.3627

0.11 Evaluasi Model

0.11.1 Overdispersion

dispersion_pois <- sum(residuals(fit_pois, type = "pearson")^2) / df.residual(fit_pois)

tibble::tibble(
  `Dispersion Pearson` = round(dispersion_pois, 4),
  Interpretasi = dplyr::case_when(
    dispersion_pois < 1.5 ~ "✅ Tidak ada indikasi overdispersion berat",
    dispersion_pois < 2.5 ~ "⚠️ Indikasi overdispersion sedang — pertimbangkan Negative Binomial",
    TRUE ~ "🚨 Overdispersion kuat — sangat disarankan model Negative Binomial"
  )
) %>%
  knitr::kable(caption = "Uji Overdispersion (Rasio Dispersion Pearson)") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = FALSE)

Uji Overdispersion (Rasio Dispersion Pearson)
Dispersion Pearson	Interpretasi
0.3242	✅ Tidak ada indikasi overdispersion berat \|

0.11.2 Plot Prediksi

get_typical <- function(x) {
  if (is.numeric(x)) return(mean(x, na.rm = TRUE))
  t <- table(x); return(names(t)[which.max(t)])
}

baseline_data <- as.data.frame(lapply(data_poi[var_x_poi], get_typical))
grid_pois <- baseline_data[rep(1, 100), ]
grid_pois$vehicles_involved <- seq(
  min(data_poi$vehicles_involved, na.rm = TRUE),
  max(data_poi$vehicles_involved, na.rm = TRUE),
  length.out = 100
)

pred_pois <- predict(fit_pois, newdata = grid_pois, type = "link", se.fit = TRUE)
grid_pois %>%
  mutate(
    rate      = exp(pred_pois$fit),
    rate_low  = exp(pred_pois$fit - 1.96 * pred_pois$se.fit),
    rate_high = exp(pred_pois$fit + 1.96 * pred_pois$se.fit)
  ) %>%
  ggplot(aes(x = vehicles_involved, y = rate)) +
  geom_ribbon(aes(ymin = rate_low, ymax = rate_high), fill = "#2f7f73", alpha = 0.16) +
  geom_line(color = "#2f7f73", linewidth = 1.35) +
  labs(title = "Prediksi Jumlah Cedera vs Jumlah Kendaraan",
       subtitle = "Variabel lain ditahan pada nilai rata-rata/modus",
       x = "Jumlah Kendaraan Terlibat", y = "Prediksi Cedera (Injuries)") +
  theme_minimal()

0.11.3 Perbandingan Poisson vs OLS pada log(Y)

# OLS hanya pada observasi Y > 0
data_poi_pos <- data_poi %>% filter(.data[[var_y_poi]] > 0)
formula_log_ols <- as.formula(paste("log(", var_y_poi, ") ~", paste(var_x_poi, collapse = " + ")))
fit_log_ols <- lm(formula_log_ols, data = data_poi_pos)

tibble::tibble(
  Aspek = c("Jenis respons","Nilai nol","Target model","Fungsi link","Interpretasi koef.","Prediksi","Kapan lebih tepat?"),
  `Regresi Poisson` = c(
    "Hitungan non-negatif","Dapat langsung menangani Y = 0",
    "E(Y|X)","log{E(Y|X)} = Xβ","IRR = exp(β)",
    "Selalu non-negatif","Count data, banyak nol, rate kejadian"
  ),
  `OLS pada log(Y)` = c(
    "Kontinu positif atau hitungan besar tanpa nol","Tidak dapat memakai Y = 0 tanpa modifikasi",
    "E(log Y|X)","log(Y) = Xβ + ε","Multiplier pada geometric mean",
    "Perlu retransformation","Y positif, log-scale residual baik"
  )
) %>%
  knitr::kable(caption = "Perbandingan Regresi Poisson vs OLS pada log(Y)") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)

Perbandingan Regresi Poisson vs OLS pada log(Y)
Aspek	Regresi Poisson	OLS pada log(Y)
Jenis respons	Hitungan non-negatif	Kontinu positif atau hitungan besar tanpa nol
Nilai nol	Dapat langsung menangani Y = 0	Tidak dapat memakai Y = 0 tanpa modifikasi
Target model	E(Y\|X)	E(log Y\|X)
Fungsi link	log{E(Y\|X)} = Xβ	log(Y) = Xβ + ε
Interpretasi koef.	IRR = exp(β)	Multiplier pada geometric mean
Prediksi	Selalu non-negatif	Perlu retransformation
Kapan lebih tepat?	Count data, banyak nol, rate kejadian	Y positif, log-scale residual baik

0.12 Interpretasi Hasil

💡 Interpretasi — Regresi Poisson

Sesuaikan teks berikut dengan IRR dari output kamu:

• IRR (Incidence Rate Ratio): Jika IRR untuk vehicles_involved = 1.35, artinya setiap penambahan 1 kendaraan yang terlibat meningkatkan rata-rata jumlah cedera sebesar 35%.
• IRR < 1: Variabel tersebut berkaitan dengan penurunan rata-rata hitungan.
• IRR = 1: Variabel tidak berpengaruh pada rata-rata hitungan.
• Overdispersion: Jika Dispersion Pearson jauh di atas 1 (misal > 2.5), model Negative Binomial (MASS::glm.nb()) lebih tepat.
• Zero-inflation: Jika proporsi Y = 0 sangat tinggi (> 30%), pertimbangkan model Zero-Inflated Poisson (ZIP).

1 Ringkasan Komparatif Ketiga Model

tibble::tibble(
  `Model`           = c("Logistik Multinomial","Logistik Ordinal","Poisson"),
  `Dataset`         = c("Iris","Student Grade","Traffic Accidents"),
  `Jenis Respons`   = c("Nominal ≥ 3 kat.","Ordinal ≥ 3 kat.","Count (hitungan)"),
  `Fungsi Link`     = c("logit (baseline)","logit (kumulatif)","log"),
  `Fungsi R`        = c("nnet::multinom()","MASS::polr()","glm(..., poisson)"),
  `Interpretasi β`  = c("Rel. Risk Ratio (RRR)","Cumul. Odds Ratio","Inc. Rate Ratio (IRR)"),
  `Asumsi Utama`    = c(
    "IIA (Independence of Irrelevant Alternatives)",
    "Proportional odds",
    "Mean = Variance (equidispersion)"
  )
) %>%
  knitr::kable(caption = "Perbandingan Ketiga Model Regresi Logistik") %>%
  kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"), full_width = TRUE)

Perbandingan Ketiga Model Regresi Logistik
Model	Dataset	Jenis Respons	Fungsi Link	Fungsi R	Interpretasi β	Asumsi Utama
Logistik Multinomial	Iris	Nominal ≥ 3 kat.	logit (baseline)	nnet::multinom()	Rel. Risk Ratio (RRR)	IIA (Independence of Irrelevant Alternatives)
Logistik Ordinal	Student Grade	Ordinal ≥ 3 kat.	logit (kumulatif)	MASS::polr()	Cumul. Odds Ratio	Proportional odds
Poisson	Traffic Accidents	Count (hitungan)	log	glm(…, poisson)	Inc. Rate Ratio (IRR)	Mean = Variance (equidispersion)

📝 Kesimpulan Umum

Pemilihan model regresi yang tepat bergantung pada skala pengukuran variabel respons:

• Gunakan Logistik Multinomial saat Y memiliki 3+ kategori tanpa urutan alami.
• Gunakan Logistik Ordinal saat Y memiliki 3+ kategori dengan urutan alami (Rendah < Sedang < Tinggi).
• Gunakan Regresi Poisson saat Y berupa hitungan non-negatif (0, 1, 2, …).

Pelanggaran asumsi perlu dideteksi dan ditangani: overdispersion pada Poisson → Negative Binomial; pelanggaran parallel lines pada Ordinal → Partial Proportional Odds.

Analisis Regresi Logistik

Multinomial · Ordinal · Poisson

Atthoriq Adrian Setiawan

0.1 Persiapan Data

0.1.1 Muat & Struktur Data

0.1.2 Distribusi Respons

0.2 Estimasi Model

0.2.1 Fitting & Summary

0.2.2 Koefisien, RRR & CI

0.3 Prediksi & Evaluasi

0.3.1 Confusion Matrix & Akurasi

0.3.2 Plot Probabilitas Prediksi

0.4 Interpretasi Hasil

0.5 Persiapan Data

0.5.1 Muat & Struktur Data

0.5.2 Distribusi Respons

0.6 Estimasi Model

0.6.1 Fitting & Summary

0.6.2 Koefisien, OR & CI

0.7 Prediksi & Evaluasi

0.7.1 Confusion Matrix & Akurasi

0.7.2 Probabilitas Prediksi

0.7.3 Uji Asumsi Parallel Lines

0.8 Interpretasi Hasil

0.9 Persiapan Data

0.9.1 Muat & Struktur Data

0.9.2 Distribusi Respons & Zero-Inflation

0.10 Estimasi Model

0.10.1 Fitting & Summary

0.10.2 Koefisien, IRR & CI

0.11 Evaluasi Model

0.11.1 Overdispersion

0.11.2 Plot Prediksi

0.11.3 Perbandingan Poisson vs OLS pada log(Y)

0.12 Interpretasi Hasil

1 Ringkasan Komparatif Ketiga Model