Data dari Tabel 6.1 dimasukkan secara manual.
alligator_data <- data.frame(
length = c(1.24,1.30,1.30,1.32,1.32,1.40,1.42,1.42,1.45,1.45,
1.47,1.47,1.47,1.50,1.52,1.55,1.60,1.63,1.63,1.65,
1.65,1.65,1.65,1.68,1.70,1.73,1.73,1.78,1.78,1.78,
1.80,1.80,1.85,1.88,1.88,1.93,1.93,1.98,2.03,2.03,
2.16,2.26,2.31,2.31,2.36,2.36,2.39,2.41,2.44,2.46,
2.56,2.67,2.72,2.79,2.84,3.25,3.28,3.33,3.56,3.58,
3.66,3.68,3.71,3.89),
food = factor(c("I","I","I","F","F","F","I","F","I","O",
"I","F","F","I","I","I","I","I","I","O",
"I","F","F","F","I","O","O","I","I","O",
"I","F","F","I","I","I","I","I","F","F",
"F","F","F","F","F","F","F","F","F","F",
"O","F","I","F","F","O","O","F","F","F",
"F","O","F","F"),
levels = c("O", "F", "I")) # response order
)
summary(alligator_data)
## length food
## Min. :1.240 O: 9
## 1st Qu.:1.587 F:32
## Median :1.825 I:23
## Mean :2.099
## 3rd Qu.:2.417
## Max. :3.890
ggplot(alligator_data, aes(x = length, fill = food)) +
geom_histogram(binwidth = 0.2, position = "stack", color = "black") +
labs(title = "Distribusi Panjang Buaya berdasarkan Pilihan Makanan",
x = "Panjang Buaya (meter)", y = "Jumlah") +
theme_minimal()
# Model multinomial (default: baseline = first level, yaitu Invertebrates)
model_mlr <- multinom(food ~ length, data = alligator_data)
## # weights: 9 (4 variable)
## initial value 70.311186
## iter 10 value 55.228714
## final value 55.228598
## converged
summary(model_mlr)
## Call:
## multinom(formula = food ~ length, data = alligator_data)
##
## Coefficients:
## (Intercept) length
## F 1.329763 -0.02619614
## I 5.223790 -2.22400214
##
## Std. Errors:
## (Intercept) length
## F 1.227764 0.4988134
## I 1.654445 0.8318572
##
## Residual Deviance: 110.4572
## AIC: 118.4572
z_values <- summary(model_mlr)$coefficients / summary(model_mlr)$standard.errors
p_values <- 2 * (1 - pnorm(abs(z_values)))
coef_table <- cbind(summary(model_mlr)$coefficients,
"z value" = round(z_values, 2),
"p value" = round(p_values, 4))
Perhatikan ada 3 kategori pada variabel respons
food
:
I = Invertebrata
F = Fish
O = Other (baseline / referensi)
Maka model membangun dua model logit terhadap baseline (O), yaitu:
\[ \log\left( \frac{P(\text{I})}{P(\text{O})} \right) = \beta_0^{(I)} + \beta_1^{(I)} \cdot \text{length} \]
\[ \log\left( \frac{P(\text{F})}{P(\text{O})} \right) = \beta_0^{(F)} + \beta_1^{(F)} \cdot \text{length} \]
Jadi untuk setiap kategori selain referensi (O), kita dapatkan:
Intercept
Koefisien untuk length
Z-value dan p-value untuk masing-masing koefisien
Berikut adalah hasil estimasi model
multinom(food ~ length)
:
Kategori | Intercept | Length | Z..Intercept. | Z..Length. | p..Intercept. | p..Length. |
---|---|---|---|---|---|---|
I | 5.22 | -2.220 | 3.16 | -2.67 | 0.0016 | 0.0075 |
F | 1.33 | -0.027 | 1.08 | -0.05 | 0.2788 | 0.9581 |
Interpretasi
Untuk Kategori F: Koefisien length = -0.027, p-value = 0.9581 → tidak signifikan.
Tidak ada bukti kuat bahwa panjang buaya memengaruhi peluang memilih Fish dibanding Other.
Untuk Kategori I: Koefisien length = -2.22 → semakin panjang buaya, semakin kecil probabilitas memilih invertebrata dibanding other (O).
p-value = 0.0075 → signifikan, artinya panjang berpengaruh terhadap preferensi I vs O.
📊 Interpretasi dalam Odds Ratio
🔹 Kategori F (Fish vs Other)
length
= −0.027\[ \text{OR} = e^{-0.027} \approx 0.973 \]
➡️ Interpretasi:
Setiap kenaikan 1 meter pada panjang buaya, peluang
memilih fish dibanding other hanya turun sekitar
2.7%,
yang tidak signifikan secara statistik (p-value =
0.9581).
🔹 Kategori I (Invertebrata vs Other)
length
= −2.22\[ \text{OR} = e^{-2.22} \approx 0.108 \]
➡️ Interpretasi:
Setiap kenaikan 1 meter pada panjang buaya, peluang
(odds) buaya memilih invertebrata dibanding other
menurun sekitar 89.2%
(karena \(1 - 0.108 \approx
0.892\)).
✅ Ringkasan
Perbandingan | Koef..Length | Odds.Ratio | Interpretasi | Signifikansi |
---|---|---|---|---|
I vs O (Other) | -2.220 | 0.109 | Peluang memilih I turun drastis saat panjang meningkat | Signifikan (p = 0.0075) |
F vs O (Other) | -0.027 | 0.973 | Tidak ada perubahan berarti pada peluang memilih F | Tidak signifikan (p = 0.9581) |
length_seq <- data.frame(length = seq(1.2, 4.0, by = 0.05))
predicted_probs <- predict(model_mlr, newdata = length_seq, type = "probs")
pred_df <- cbind(length_seq, predicted_probs)
pred_long <- tidyr::pivot_longer(pred_df, cols = -length, names_to = "food", values_to = "prob")
ggplot(pred_long, aes(x = length, y = prob, color = food)) +
geom_line(size = 1.2) +
labs(title = "Probabilitas Pilihan Makanan berdasarkan Panjang Buaya",
x = "Panjang Buaya (meter)", y = "Probabilitas") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
🔍 Penghitungan G² (Likelihood Ratio Test)
📊 Definisi
\[ G^2 = -2 \left( \log L_{\text{null}} - \log L_{\text{full}} \right) \]
\[ R^2 = 1 - \frac{\log L_{\text{full}}}{\log L_{\text{null}}} \]
Keterangan:
\(L_{\text{null}}\): log-likelihood dari model null (hanya intercept)
\(L_{\text{full}}\): log-likelihood dari model dengan prediktor
\(G^2\): mengukur peningkatan kecocokan model penuh dibanding model null
\(R^2\): memberikan ukuran proporsi informasi yang dijelaskan oleh model
Kita bandingkan model null vs model penuh:
Model null: hanya intercept, tanpa prediktor
Model penuh: menggunakan prediktor
length
library(nnet)
# Data (dari sebelumnya)
alligator_data <- data.frame(
length = c(1.24,1.30,1.30,1.32,1.32,1.40,1.42,1.42,1.45,1.45,
1.47,1.47,1.47,1.50,1.52,1.55,1.60,1.63,1.63,1.65,
1.65,1.65,1.65,1.68,1.70,1.73,1.73,1.78,1.78,1.78,
1.80,1.80,1.85,1.88,1.88,1.93,1.93,1.98,2.03,2.03,
2.16,2.26,2.31,2.31,2.36,2.36,2.39,2.41,2.44,2.46,
2.56,2.67,2.72,2.79,2.84,3.25,3.28,3.33,3.56,3.58,
3.66,3.68,3.71,3.89),
food = factor(c("I","I","I","F","F","F","I","F","I","O",
"I","F","F","I","I","I","I","I","I","O",
"I","F","F","F","I","O","O","I","I","O",
"I","F","F","I","I","I","I","I","F","F",
"F","F","F","F","F","F","F","F","F","F",
"O","F","I","F","F","O","O","F","F","F",
"F","O","F","F"),
levels = c("O", "F", "I"))
)
# Model null (tanpa prediktor)
model_null <- multinom(food ~ 1, data = alligator_data, trace = FALSE)
# Model penuh (dengan prediktor length)
model_full <- multinom(food ~ length, data = alligator_data, trace = FALSE)
# Log-likelihood masing-masing
LL_null <- logLik(model_null)
LL_full <- logLik(model_full)
# G² = -2 (LL_null - LL_full)
G2 <- -2 * (as.numeric(LL_null) - as.numeric(LL_full))
# Derajat kebebasan = jumlah tambahan parameter
df <- attr(LL_full, "df") - attr(LL_null, "df")
# p-value
p_value <- 1 - pchisq(G2, df)
# Output
cat("Nilai G² =", round(G2, 4), "\n")
## Nilai G² = 16.29
cat("Derajat kebebasan =", df, "\n")
## Derajat kebebasan = 2
cat("p-value =", round(p_value, 4), "\n")
## p-value = 3e-04
# Buat tabel sesuai dengan Tabel 6.2 Agresti
belief_data <- tribble(
~Race, ~Gender, ~Belief, ~Count,
"White", "Female", "Yes", 371,
"White", "Female", "Undecided",49,
"White", "Female", "No", 74,
"White", "Male", "Yes", 250,
"White", "Male", "Undecided",45,
"White", "Male", "No", 71,
"Black", "Female", "Yes", 64,
"Black", "Female", "Undecided",9,
"Black", "Female", "No", 15,
"Black", "Male", "Yes", 25,
"Black", "Male", "Undecided",5,
"Black", "Male", "No", 13
)
# Ubah ke format faktor
belief_data <- belief_data %>%
mutate(
Belief = factor(Belief, levels = c("Yes", "Undecided", "No")),
Gender = factor(Gender),
Race = factor(Race)
)
# Perluas data sesuai Count
expanded_data <- belief_data %>%
uncount(weights = Count)
head(expanded_data)
## # A tibble: 6 × 3
## Race Gender Belief
## <fct> <fct> <fct>
## 1 White Female Yes
## 2 White Female Yes
## 3 White Female Yes
## 4 White Female Yes
## 5 White Female Yes
## 6 White Female Yes
# Model multinomial: Belief sebagai response
model_mlr <- multinom(Belief ~ Race + Gender, data = expanded_data, trace = FALSE)
summary(model_mlr)
## Call:
## multinom(formula = Belief ~ Race + Gender, data = expanded_data,
## trace = FALSE)
##
## Coefficients:
## (Intercept) RaceWhite GenderMale
## Undecided -1.954553 -0.07078817 0.3134797
## No -1.301607 -0.34176952 0.4185511
##
## Std. Errors:
## (Intercept) RaceWhite GenderMale
## Undecided 0.2974392 0.3091936 0.2083285
## No 0.2264955 0.2370377 0.1712550
##
## Residual Deviance: 1547.453
## AIC: 1559.453
# Z dan p-value
z_values <- summary(model_mlr)$coefficients / summary(model_mlr)$standard.errors
p_values <- 2 * (1 - pnorm(abs(z_values)))
coef_table <- cbind(summary(model_mlr)$coefficients,
"z value" = round(z_values, 2),
"p value" = round(p_values, 4))
knitr::kable(coef_table, caption = "Koefisien Model Multinomial Logistic Regression")
(Intercept) | RaceWhite | GenderMale | (Intercept) | RaceWhite | GenderMale | (Intercept) | RaceWhite | GenderMale | |
---|---|---|---|---|---|---|---|---|---|
Undecided | -1.954553 | -0.0707882 | 0.3134797 | -6.57 | -0.23 | 1.50 | 0 | 0.8189 | 0.1324 |
No | -1.301607 | -0.3417695 | 0.4185511 | -5.75 | -1.44 | 2.44 | 0 | 0.1493 | 0.0145 |
Referensi kategori untuk Belief adalah “Yes” (karena muncul pertama).
Koefisien pada model menunjukkan perubahan log-odds memilih “Undecided” atau “No” dibanding “Yes” sebagai fungsi dari Gender dan Race.
# Prediksi probabilitas
new_data <- expand.grid(
Race = levels(expanded_data$Race),
Gender = levels(expanded_data$Gender)
)
new_data$prob <- predict(model_mlr, newdata = new_data, type = "probs")
# Gabungkan hasil
cbind(new_data, predict(model_mlr, newdata = new_data, type = "probs"))
## Race Gender prob.Yes prob.Undecided prob.No Yes Undecided
## 1 Black Female 0.70735270 0.10018080 0.19246651 0.7073527 0.10018080
## 2 White Female 0.75456040 0.09956337 0.14587623 0.7545604 0.09956337
## 3 Black Male 0.62216559 0.12055825 0.25727616 0.6221656 0.12055825
## 4 White Male 0.67827036 0.12244777 0.19928187 0.6782704 0.12244777
## No
## 1 0.1924665
## 2 0.1458762
## 3 0.2572762
## 4 0.1992819
📌 Catatan Jika ingin menambahkan interaction (Race * Gender), Anda bisa ubah formula menjadi:
model_mlr <- multinom(Belief ~ Race * Gender, data = expanded_data)
## # weights: 15 (8 variable)
## initial value 1088.724778
## iter 10 value 774.603183
## final value 773.299583
## converged
Penghitungan G² dan McFadden Pseudo R²
# Model null: hanya intercept
model_null <- multinom(Belief ~ 1, data = expanded_data, trace = FALSE)
# Log-likelihood
ll_null <- logLik(model_null)
ll_full <- logLik(model_mlr)
# G2 (deviance)
G2 <- -2 * (as.numeric(ll_null) - as.numeric(ll_full))
# Derajat kebebasan
df <- attr(ll_full, "df") - attr(ll_null, "df")
# p-value
pval <- 1 - pchisq(G2, df)
# McFadden's pseudo R2
pseudo_r2 <- 1 - (as.numeric(ll_full) / as.numeric(ll_null))
# Tampilkan hasil
cat("Nilai G² (Deviance):", round(G2, 4), "\n")
## Nilai G² (Deviance): 9.5975
cat("Derajat Kebebasan:", df, "\n")
## Derajat Kebebasan: 6
cat("p-value:", round(pval, 4), "\n")
## p-value: 0.1427
cat("McFadden's Pseudo R²:", round(pseudo_r2, 4), "\n")
## McFadden's Pseudo R²: 0.0062
library(MASS) # Untuk polr
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
library(dplyr)
library(tidyr)
library(ggplot2)
# Data dari Tabel 6.3 Agresti
ideology_data <- tribble(
~Gender, ~Party, ~Ideology, ~Count,
"Female", "Democrat", "Very Liberal", 25,
"Female", "Democrat", "Slightly Liberal", 105,
"Female", "Democrat", "Moderate", 86,
"Female", "Democrat", "Slightly Conservative", 28,
"Female", "Democrat", "Very Conservative", 4,
"Female", "Republican", "Very Liberal", 0,
"Female", "Republican", "Slightly Liberal", 5,
"Female", "Republican", "Moderate", 15,
"Female", "Republican", "Slightly Conservative", 83,
"Female", "Republican", "Very Conservative", 32,
"Male", "Democrat", "Very Liberal", 20,
"Male", "Democrat", "Slightly Liberal", 73,
"Male", "Democrat", "Moderate", 43,
"Male", "Democrat", "Slightly Conservative", 20,
"Male", "Democrat", "Very Conservative", 3,
"Male", "Republican", "Very Liberal", 0,
"Male", "Republican", "Slightly Liberal", 1,
"Male", "Republican", "Moderate", 14,
"Male", "Republican", "Slightly Conservative", 72,
"Male", "Republican", "Very Conservative", 32
)
# Pastikan urutan faktor ordinal sesuai spektrum ideologi
ideology_data <- ideology_data %>%
mutate(
Ideology = factor(Ideology,
levels = c("Very Liberal", "Slightly Liberal", "Moderate",
"Slightly Conservative", "Very Conservative"),
ordered = TRUE),
Gender = factor(Gender),
Party = factor(Party)
)
# Perluas berdasarkan Count
expanded_data <- uncount(ideology_data, weights = Count)
head(expanded_data)
## # A tibble: 6 × 3
## Gender Party Ideology
## <fct> <fct> <ord>
## 1 Female Democrat Very Liberal
## 2 Female Democrat Very Liberal
## 3 Female Democrat Very Liberal
## 4 Female Democrat Very Liberal
## 5 Female Democrat Very Liberal
## 6 Female Democrat Very Liberal
# Model proportional odds menggunakan polr
model_ordinal <- polr(Ideology ~ Gender + Party, data = expanded_data, Hess = TRUE)
summary(model_ordinal)
## Call:
## polr(formula = Ideology ~ Gender + Party, data = expanded_data,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## GenderMale -0.04731 0.1499 -0.3157
## PartyRepublican 3.63365 0.2175 16.7037
##
## Intercepts:
## Value Std. Error t value
## Very Liberal|Slightly Liberal -2.1223 0.1692 -12.5461
## Slightly Liberal|Moderate 0.1689 0.1143 1.4778
## Moderate|Slightly Conservative 1.8572 0.1511 12.2891
## Slightly Conservative|Very Conservative 4.6500 0.2343 19.8456
##
## Residual Deviance: 1555.786
## AIC: 1567.786
# Hitung z dan p-value
coefs <- coef(summary(model_ordinal))
p_values <- pnorm(abs(coefs[, "t value"]), lower.tail = FALSE) * 2
cbind(coefs, "p value" = round(p_values, 4)) %>%
knitr::kable(caption = "Koefisien dan p-value dari Model Regresi Logistik Ordinal")
Value | Std. Error | t value | p value | |
---|---|---|---|---|
GenderMale | -0.0473126 | 0.1498822 | -0.315665 | 0.7523 |
PartyRepublican | 3.6336493 | 0.2175355 | 16.703705 | 0.0000 |
Very Liberal|Slightly Liberal | -2.1223326 | 0.1691632 | -12.546067 | 0.0000 |
Slightly Liberal|Moderate | 0.1689125 | 0.1143027 | 1.477764 | 0.1395 |
Moderate|Slightly Conservative | 1.8571511 | 0.1511213 | 12.289140 | 0.0000 |
Slightly Conservative|Very Conservative | 4.6500442 | 0.2343111 | 19.845599 | 0.0000 |
Model ini mengasumsikan efek linier kumulatif dari prediktor terhadap tingkat ideologi politik yang ordinal.
Koefisien positif → kecenderungan ke arah konservatif.
Koefisien negatif → kecenderungan ke arah liberal.
new_data <- expand.grid(Gender = levels(expanded_data$Gender),
Party = levels(expanded_data$Party))
# Prediksi probabilitas
predict_probs <- predict(model_ordinal, newdata = new_data, type = "probs")
cbind(new_data, predict_probs)
## Gender Party Very Liberal Slightly Liberal Moderate
## 1 Female Democrat 0.106945088 0.43518292 0.3228365
## 2 Male Democrat 0.111548559 0.44229808 0.3165493
## 3 Female Republican 0.003153821 0.02717858 0.1144037
## 4 Male Republican 0.003306117 0.02844921 0.1189364
## Slightly Conservative Very Conservative
## 1 0.1255648 0.009470629
## 2 0.1205672 0.009036939
## 3 0.5895337 0.265730233
## 4 0.5927066 0.256601598