1 Multinomial Logistic Regression: Alligator Food Choice

  1. Data Input

Data dari Tabel 6.1 dimasukkan secara manual.

alligator_data <- data.frame(
  length = c(1.24,1.30,1.30,1.32,1.32,1.40,1.42,1.42,1.45,1.45,
             1.47,1.47,1.47,1.50,1.52,1.55,1.60,1.63,1.63,1.65,
             1.65,1.65,1.65,1.68,1.70,1.73,1.73,1.78,1.78,1.78,
             1.80,1.80,1.85,1.88,1.88,1.93,1.93,1.98,2.03,2.03,
             2.16,2.26,2.31,2.31,2.36,2.36,2.39,2.41,2.44,2.46,
             2.56,2.67,2.72,2.79,2.84,3.25,3.28,3.33,3.56,3.58,
             3.66,3.68,3.71,3.89),
  food = factor(c("I","I","I","F","F","F","I","F","I","O",
                  "I","F","F","I","I","I","I","I","I","O",
                  "I","F","F","F","I","O","O","I","I","O",
                  "I","F","F","I","I","I","I","I","F","F",
                  "F","F","F","F","F","F","F","F","F","F",
                  "O","F","I","F","F","O","O","F","F","F",
                  "F","O","F","F"),
                levels = c("O", "F", "I"))  # response order
)
  1. Eksplorasi Data
summary(alligator_data)
##      length      food  
##  Min.   :1.240   O: 9  
##  1st Qu.:1.587   F:32  
##  Median :1.825   I:23  
##  Mean   :2.099         
##  3rd Qu.:2.417         
##  Max.   :3.890
ggplot(alligator_data, aes(x = length, fill = food)) +
  geom_histogram(binwidth = 0.2, position = "stack", color = "black") +
  labs(title = "Distribusi Panjang Buaya berdasarkan Pilihan Makanan",
       x = "Panjang Buaya (meter)", y = "Jumlah") +
  theme_minimal()

  1. Multinomial Logistic Regression
# Model multinomial (default: baseline = first level, yaitu Invertebrates)
model_mlr <- multinom(food ~ length, data = alligator_data)
## # weights:  9 (4 variable)
## initial  value 70.311186 
## iter  10 value 55.228714
## final  value 55.228598 
## converged
summary(model_mlr)
## Call:
## multinom(formula = food ~ length, data = alligator_data)
## 
## Coefficients:
##   (Intercept)      length
## F    1.329763 -0.02619614
## I    5.223790 -2.22400214
## 
## Std. Errors:
##   (Intercept)    length
## F    1.227764 0.4988134
## I    1.654445 0.8318572
## 
## Residual Deviance: 110.4572 
## AIC: 118.4572
  1. Interpretasi Koefisien
z_values <- summary(model_mlr)$coefficients / summary(model_mlr)$standard.errors
p_values <- 2 * (1 - pnorm(abs(z_values)))

coef_table <- cbind(summary(model_mlr)$coefficients, 
                    "z value" = round(z_values, 2), 
                    "p value" = round(p_values, 4))

Perhatikan ada 3 kategori pada variabel respons food:

  • I = Invertebrata

  • F = Fish

  • O = Other (baseline / referensi)

Maka model membangun dua model logit terhadap baseline (O), yaitu:

  1. Logit untuk I terhadap O:

\[ \log\left( \frac{P(\text{I})}{P(\text{O})} \right) = \beta_0^{(I)} + \beta_1^{(I)} \cdot \text{length} \]

  1. Logit untuk F terhadap O:

\[ \log\left( \frac{P(\text{F})}{P(\text{O})} \right) = \beta_0^{(F)} + \beta_1^{(F)} \cdot \text{length} \]

Jadi untuk setiap kategori selain referensi (O), kita dapatkan:

  • Intercept

  • Koefisien untuk length

  • Z-value dan p-value untuk masing-masing koefisien

Berikut adalah hasil estimasi model multinom(food ~ length):

Koefisien Model Multinomial Logistic Regression
Kategori Intercept Length Z..Intercept. Z..Length. p..Intercept. p..Length.
I 5.22 -2.220 3.16 -2.67 0.0016 0.0075
F 1.33 -0.027 1.08 -0.05 0.2788 0.9581

Interpretasi

Untuk Kategori F: Koefisien length = -0.027, p-value = 0.9581 → tidak signifikan.

Tidak ada bukti kuat bahwa panjang buaya memengaruhi peluang memilih Fish dibanding Other.

Untuk Kategori I: Koefisien length = -2.22 → semakin panjang buaya, semakin kecil probabilitas memilih invertebrata dibanding other (O).

p-value = 0.0075 → signifikan, artinya panjang berpengaruh terhadap preferensi I vs O.

📊 Interpretasi dalam Odds Ratio

🔹 Kategori F (Fish vs Other)

  • Koefisien length = −0.027

\[ \text{OR} = e^{-0.027} \approx 0.973 \]

➡️ Interpretasi:
Setiap kenaikan 1 meter pada panjang buaya, peluang memilih fish dibanding other hanya turun sekitar 2.7%,
yang tidak signifikan secara statistik (p-value = 0.9581).

🔹 Kategori I (Invertebrata vs Other)

  • Koefisien length = −2.22

\[ \text{OR} = e^{-2.22} \approx 0.108 \]

➡️ Interpretasi:
Setiap kenaikan 1 meter pada panjang buaya, peluang (odds) buaya memilih invertebrata dibanding other menurun sekitar 89.2%
(karena \(1 - 0.108 \approx 0.892\)).

  • Karena p-value = 0.0075, efek ini signifikan secara statistik → panjang buaya memang memengaruhi peluang memilih invertebrata dibanding other.

✅ Ringkasan

Ringkasan Interpretasi Odds Ratio
Perbandingan Koef..Length Odds.Ratio Interpretasi Signifikansi
I vs O (Other) -2.220 0.109 Peluang memilih I turun drastis saat panjang meningkat Signifikan (p = 0.0075)
F vs O (Other) -0.027 0.973 Tidak ada perubahan berarti pada peluang memilih F Tidak signifikan (p = 0.9581)
  1. Prediksi dan Visualisasi
length_seq <- data.frame(length = seq(1.2, 4.0, by = 0.05))
predicted_probs <- predict(model_mlr, newdata = length_seq, type = "probs")

pred_df <- cbind(length_seq, predicted_probs)
pred_long <- tidyr::pivot_longer(pred_df, cols = -length, names_to = "food", values_to = "prob")

ggplot(pred_long, aes(x = length, y = prob, color = food)) +
  geom_line(size = 1.2) +
  labs(title = "Probabilitas Pilihan Makanan berdasarkan Panjang Buaya",
       x = "Panjang Buaya (meter)", y = "Probabilitas") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

🔍 Penghitungan G² (Likelihood Ratio Test)

📊 Definisi

  • (Deviance) dihitung sebagai:

\[ G^2 = -2 \left( \log L_{\text{null}} - \log L_{\text{full}} \right) \]

  • McFadden’s Pseudo R² dihitung dengan rumus:

\[ R^2 = 1 - \frac{\log L_{\text{full}}}{\log L_{\text{null}}} \]

Keterangan:

  • \(L_{\text{null}}\): log-likelihood dari model null (hanya intercept)

  • \(L_{\text{full}}\): log-likelihood dari model dengan prediktor

  • \(G^2\): mengukur peningkatan kecocokan model penuh dibanding model null

  • \(R^2\): memberikan ukuran proporsi informasi yang dijelaskan oleh model

Kita bandingkan model null vs model penuh:

  • Model null: hanya intercept, tanpa prediktor

  • Model penuh: menggunakan prediktor length

library(nnet)

# Data (dari sebelumnya)
alligator_data <- data.frame(
  length = c(1.24,1.30,1.30,1.32,1.32,1.40,1.42,1.42,1.45,1.45,
             1.47,1.47,1.47,1.50,1.52,1.55,1.60,1.63,1.63,1.65,
             1.65,1.65,1.65,1.68,1.70,1.73,1.73,1.78,1.78,1.78,
             1.80,1.80,1.85,1.88,1.88,1.93,1.93,1.98,2.03,2.03,
             2.16,2.26,2.31,2.31,2.36,2.36,2.39,2.41,2.44,2.46,
             2.56,2.67,2.72,2.79,2.84,3.25,3.28,3.33,3.56,3.58,
             3.66,3.68,3.71,3.89),
  food = factor(c("I","I","I","F","F","F","I","F","I","O",
                  "I","F","F","I","I","I","I","I","I","O",
                  "I","F","F","F","I","O","O","I","I","O",
                  "I","F","F","I","I","I","I","I","F","F",
                  "F","F","F","F","F","F","F","F","F","F",
                  "O","F","I","F","F","O","O","F","F","F",
                  "F","O","F","F"),
                levels = c("O", "F", "I"))
)

# Model null (tanpa prediktor)
model_null <- multinom(food ~ 1, data = alligator_data, trace = FALSE)

# Model penuh (dengan prediktor length)
model_full <- multinom(food ~ length, data = alligator_data, trace = FALSE)

# Log-likelihood masing-masing
LL_null <- logLik(model_null)
LL_full <- logLik(model_full)

# G² = -2 (LL_null - LL_full)
G2 <- -2 * (as.numeric(LL_null) - as.numeric(LL_full))

# Derajat kebebasan = jumlah tambahan parameter
df <- attr(LL_full, "df") - attr(LL_null, "df")

# p-value
p_value <- 1 - pchisq(G2, df)

# Output
cat("Nilai G² =", round(G2, 4), "\n")
## Nilai G² = 16.29
cat("Derajat kebebasan =", df, "\n")
## Derajat kebebasan = 2
cat("p-value =", round(p_value, 4), "\n")
## p-value = 3e-04

2 Multinomial Logistic Regression: Belief in Afterlife

  1. Data: Belief in Afterlife
# Buat tabel sesuai dengan Tabel 6.2 Agresti
belief_data <- tribble(
  ~Race, ~Gender, ~Belief,     ~Count,
  "White", "Female", "Yes",      371,
  "White", "Female", "Undecided",49,
  "White", "Female", "No",       74,
  "White", "Male",   "Yes",      250,
  "White", "Male",   "Undecided",45,
  "White", "Male",   "No",       71,
  "Black", "Female", "Yes",      64,
  "Black", "Female", "Undecided",9,
  "Black", "Female", "No",       15,
  "Black", "Male",   "Yes",      25,
  "Black", "Male",   "Undecided",5,
  "Black", "Male",   "No",       13
)

# Ubah ke format faktor
belief_data <- belief_data %>%
  mutate(
    Belief = factor(Belief, levels = c("Yes", "Undecided", "No")),
    Gender = factor(Gender),
    Race = factor(Race)
  )
  1. Ekspansi Data (untuk multinom())
# Perluas data sesuai Count
expanded_data <- belief_data %>%
  uncount(weights = Count)

head(expanded_data)
## # A tibble: 6 × 3
##   Race  Gender Belief
##   <fct> <fct>  <fct> 
## 1 White Female Yes   
## 2 White Female Yes   
## 3 White Female Yes   
## 4 White Female Yes   
## 5 White Female Yes   
## 6 White Female Yes
  1. Model Multinomial Logistic Regression
# Model multinomial: Belief sebagai response
model_mlr <- multinom(Belief ~ Race + Gender, data = expanded_data, trace = FALSE)

summary(model_mlr)
## Call:
## multinom(formula = Belief ~ Race + Gender, data = expanded_data, 
##     trace = FALSE)
## 
## Coefficients:
##           (Intercept)   RaceWhite GenderMale
## Undecided   -1.954553 -0.07078817  0.3134797
## No          -1.301607 -0.34176952  0.4185511
## 
## Std. Errors:
##           (Intercept) RaceWhite GenderMale
## Undecided   0.2974392 0.3091936  0.2083285
## No          0.2264955 0.2370377  0.1712550
## 
## Residual Deviance: 1547.453 
## AIC: 1559.453
  1. Interpretasi Koefisien dan p-value
# Z dan p-value
z_values <- summary(model_mlr)$coefficients / summary(model_mlr)$standard.errors
p_values <- 2 * (1 - pnorm(abs(z_values)))

coef_table <- cbind(summary(model_mlr)$coefficients,
                    "z value" = round(z_values, 2),
                    "p value" = round(p_values, 4))

knitr::kable(coef_table, caption = "Koefisien Model Multinomial Logistic Regression")
Koefisien Model Multinomial Logistic Regression
(Intercept) RaceWhite GenderMale (Intercept) RaceWhite GenderMale (Intercept) RaceWhite GenderMale
Undecided -1.954553 -0.0707882 0.3134797 -6.57 -0.23 1.50 0 0.8189 0.1324
No -1.301607 -0.3417695 0.4185511 -5.75 -1.44 2.44 0 0.1493 0.0145
  1. Interpretasi Umum
  • Referensi kategori untuk Belief adalah “Yes” (karena muncul pertama).

  • Koefisien pada model menunjukkan perubahan log-odds memilih “Undecided” atau “No” dibanding “Yes” sebagai fungsi dari Gender dan Race.

  1. Contoh Prediksi
# Prediksi probabilitas
new_data <- expand.grid(
  Race = levels(expanded_data$Race),
  Gender = levels(expanded_data$Gender)
)

new_data$prob <- predict(model_mlr, newdata = new_data, type = "probs")

# Gabungkan hasil
cbind(new_data, predict(model_mlr, newdata = new_data, type = "probs"))
##    Race Gender   prob.Yes prob.Undecided    prob.No       Yes  Undecided
## 1 Black Female 0.70735270     0.10018080 0.19246651 0.7073527 0.10018080
## 2 White Female 0.75456040     0.09956337 0.14587623 0.7545604 0.09956337
## 3 Black   Male 0.62216559     0.12055825 0.25727616 0.6221656 0.12055825
## 4 White   Male 0.67827036     0.12244777 0.19928187 0.6782704 0.12244777
##          No
## 1 0.1924665
## 2 0.1458762
## 3 0.2572762
## 4 0.1992819

📌 Catatan Jika ingin menambahkan interaction (Race * Gender), Anda bisa ubah formula menjadi:

model_mlr <- multinom(Belief ~ Race * Gender, data = expanded_data)
## # weights:  15 (8 variable)
## initial  value 1088.724778 
## iter  10 value 774.603183
## final  value 773.299583 
## converged

Penghitungan G² dan McFadden Pseudo R²

# Model null: hanya intercept
model_null <- multinom(Belief ~ 1, data = expanded_data, trace = FALSE)

# Log-likelihood
ll_null <- logLik(model_null)
ll_full <- logLik(model_mlr)

# G2 (deviance)
G2 <- -2 * (as.numeric(ll_null) - as.numeric(ll_full))

# Derajat kebebasan
df <- attr(ll_full, "df") - attr(ll_null, "df")

# p-value
pval <- 1 - pchisq(G2, df)

# McFadden's pseudo R2
pseudo_r2 <- 1 - (as.numeric(ll_full) / as.numeric(ll_null))

# Tampilkan hasil
cat("Nilai G² (Deviance):", round(G2, 4), "\n")
## Nilai G² (Deviance): 9.5975
cat("Derajat Kebebasan:", df, "\n")
## Derajat Kebebasan: 6
cat("p-value:", round(pval, 4), "\n")
## p-value: 0.1427
cat("McFadden's Pseudo R²:", round(pseudo_r2, 4), "\n")
## McFadden's Pseudo R²: 0.0062

3 Ordinal Logistic Regression: Ideologi Politik dan Partai

library(MASS)  # Untuk polr
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(dplyr)
library(tidyr)
library(ggplot2)
  1. Input Data
# Data dari Tabel 6.3 Agresti
ideology_data <- tribble(
  ~Gender, ~Party, ~Ideology, ~Count,
  "Female", "Democrat",    "Very Liberal",       25,
  "Female", "Democrat",    "Slightly Liberal",  105,
  "Female", "Democrat",    "Moderate",           86,
  "Female", "Democrat",    "Slightly Conservative", 28,
  "Female", "Democrat",    "Very Conservative",   4,
  "Female", "Republican",  "Very Liberal",        0,
  "Female", "Republican",  "Slightly Liberal",    5,
  "Female", "Republican",  "Moderate",           15,
  "Female", "Republican",  "Slightly Conservative", 83,
  "Female", "Republican",  "Very Conservative",  32,
  "Male",   "Democrat",    "Very Liberal",       20,
  "Male",   "Democrat",    "Slightly Liberal",   73,
  "Male",   "Democrat",    "Moderate",           43,
  "Male",   "Democrat",    "Slightly Conservative", 20,
  "Male",   "Democrat",    "Very Conservative",   3,
  "Male",   "Republican",  "Very Liberal",        0,
  "Male",   "Republican",  "Slightly Liberal",    1,
  "Male",   "Republican",  "Moderate",           14,
  "Male",   "Republican",  "Slightly Conservative", 72,
  "Male",   "Republican",  "Very Conservative",  32
)


# Pastikan urutan faktor ordinal sesuai spektrum ideologi
ideology_data <- ideology_data %>%
  mutate(
    Ideology = factor(Ideology, 
                      levels = c("Very Liberal", "Slightly Liberal", "Moderate", 
                                 "Slightly Conservative", "Very Conservative"),
                      ordered = TRUE),
    Gender = factor(Gender),
    Party = factor(Party)
  )
  1. Ekspansi Data
# Perluas berdasarkan Count
expanded_data <- uncount(ideology_data, weights = Count)
head(expanded_data)
## # A tibble: 6 × 3
##   Gender Party    Ideology    
##   <fct>  <fct>    <ord>       
## 1 Female Democrat Very Liberal
## 2 Female Democrat Very Liberal
## 3 Female Democrat Very Liberal
## 4 Female Democrat Very Liberal
## 5 Female Democrat Very Liberal
## 6 Female Democrat Very Liberal
  1. Fit Model: Ordinal Logistic Regression
# Model proportional odds menggunakan polr
model_ordinal <- polr(Ideology ~ Gender + Party, data = expanded_data, Hess = TRUE)

summary(model_ordinal)
## Call:
## polr(formula = Ideology ~ Gender + Party, data = expanded_data, 
##     Hess = TRUE)
## 
## Coefficients:
##                    Value Std. Error t value
## GenderMale      -0.04731     0.1499 -0.3157
## PartyRepublican  3.63365     0.2175 16.7037
## 
## Intercepts:
##                                         Value    Std. Error t value 
## Very Liberal|Slightly Liberal            -2.1223   0.1692   -12.5461
## Slightly Liberal|Moderate                 0.1689   0.1143     1.4778
## Moderate|Slightly Conservative            1.8572   0.1511    12.2891
## Slightly Conservative|Very Conservative   4.6500   0.2343    19.8456
## 
## Residual Deviance: 1555.786 
## AIC: 1567.786
  1. Uji Signifikansi dan p-value
# Hitung z dan p-value
coefs <- coef(summary(model_ordinal))
p_values <- pnorm(abs(coefs[, "t value"]), lower.tail = FALSE) * 2
cbind(coefs, "p value" = round(p_values, 4)) %>%
  knitr::kable(caption = "Koefisien dan p-value dari Model Regresi Logistik Ordinal")
Koefisien dan p-value dari Model Regresi Logistik Ordinal
Value Std. Error t value p value
GenderMale -0.0473126 0.1498822 -0.315665 0.7523
PartyRepublican 3.6336493 0.2175355 16.703705 0.0000
Very Liberal|Slightly Liberal -2.1223326 0.1691632 -12.546067 0.0000
Slightly Liberal|Moderate 0.1689125 0.1143027 1.477764 0.1395
Moderate|Slightly Conservative 1.8571511 0.1511213 12.289140 0.0000
Slightly Conservative|Very Conservative 4.6500442 0.2343111 19.845599 0.0000
  1. Interpretasi Awal

Model ini mengasumsikan efek linier kumulatif dari prediktor terhadap tingkat ideologi politik yang ordinal.

  • Koefisien positif → kecenderungan ke arah konservatif.

  • Koefisien negatif → kecenderungan ke arah liberal.

  1. Prediksi Probabilitas (Opsional)
new_data <- expand.grid(Gender = levels(expanded_data$Gender),
                        Party = levels(expanded_data$Party))

# Prediksi probabilitas
predict_probs <- predict(model_ordinal, newdata = new_data, type = "probs")
cbind(new_data, predict_probs)
##   Gender      Party Very Liberal Slightly Liberal  Moderate
## 1 Female   Democrat  0.106945088       0.43518292 0.3228365
## 2   Male   Democrat  0.111548559       0.44229808 0.3165493
## 3 Female Republican  0.003153821       0.02717858 0.1144037
## 4   Male Republican  0.003306117       0.02844921 0.1189364
##   Slightly Conservative Very Conservative
## 1             0.1255648       0.009470629
## 2             0.1205672       0.009036939
## 3             0.5895337       0.265730233
## 4             0.5927066       0.256601598