Dokumen ini membahas dua jenis regresi logistik:
Multinomial Logistic Regression, digunakan untuk memprediksi variabel kategorikal dengan lebih dari dua kelas tanpa urutan.
Ordinal Logistic Regression, digunakan saat variabel kategorikal memiliki urutan tingkat (ordinal).
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tibble' was built under R version 4.4.3
## Warning: package 'tidyr' was built under R version 4.4.3
## Warning: package 'readr' was built under R version 4.4.3
## Warning: package 'purrr' was built under R version 4.4.3
## Warning: package 'dplyr' was built under R version 4.4.3
## Warning: package 'stringr' was built under R version 4.4.3
## Warning: package 'forcats' was built under R version 4.4.3
## Warning: package 'lubridate' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
fish <- read.csv("C:/Users/HP 430 G5/OneDrive/Dokumen/Semester 4/Analisis Multivariat/Fish.csv")
fish$Species <- as.factor(fish$Species)
str(fish)
## 'data.frame': 159 obs. of 7 variables:
## $ Species: Factor w/ 7 levels "Bream","Parkki",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Weight : num 242 290 340 363 430 450 500 390 450 500 ...
## $ Length1: num 23.2 24 23.9 26.3 26.5 26.8 26.8 27.6 27.6 28.5 ...
## $ Length2: num 25.4 26.3 26.5 29 29 29.7 29.7 30 30 30.7 ...
## $ Length3: num 30 31.2 31.1 33.5 34 34.7 34.5 35 35.1 36.2 ...
## $ Height : num 11.5 12.5 12.4 12.7 12.4 ...
## $ Width : num 4.02 4.31 4.7 4.46 5.13 ...
ggplot(fish, aes(x = Height, y = Width, color = Species)) +
geom_point(size = 3, alpha = 0.7) +
theme_minimal()
library(nnet)
library(broom)
## Warning: package 'broom' was built under R version 4.4.3
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.4.3
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(marginaleffects)
## Warning: package 'marginaleffects' was built under R version 4.4.3
fit_basic <- multinom(Species ~ Height, data = fish)
## # weights: 21 (12 variable)
## initial value 309.399714
## iter 10 value 189.538120
## iter 20 value 157.384865
## iter 30 value 157.351984
## iter 40 value 157.328524
## final value 157.318458
## converged
summary(fit_basic)
## Call:
## multinom(formula = Species ~ Height, data = fish)
##
## Coefficients:
## (Intercept) Height
## Parkki 30.03563 -2.514788
## Perch 33.23740 -2.702132
## Pike 32.25839 -2.729501
## Roach 34.01845 -2.952103
## Smelt 44.03021 -5.904017
## Whitefish 27.66221 -2.328914
##
## Std. Errors:
## (Intercept) Height
## Parkki 11.07501 0.8920020
## Perch 11.05383 0.8889272
## Pike 11.07867 0.8936509
## Roach 11.08739 0.8970243
## Smelt 11.62484 1.5307495
## Whitefish 11.10177 0.8941862
##
## Residual Deviance: 314.6369
## AIC: 338.6369
tidy(fit_basic, conf.int = TRUE) %>%
kable() %>%
kable_styling("basic", full_width = FALSE)
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
Parkki | (Intercept) | 30.035632 | 11.0750128 | 2.712018 | 0.0066875 | 8.329006 | 51.7422580 |
Parkki | Height | -2.514788 | 0.8920020 | -2.819263 | 0.0048134 | -4.263080 | -0.7664966 |
Perch | (Intercept) | 33.237402 | 11.0538258 | 3.006869 | 0.0026395 | 11.572301 | 54.9025022 |
Perch | Height | -2.702132 | 0.8889272 | -3.039767 | 0.0023676 | -4.444397 | -0.9598663 |
Pike | (Intercept) | 32.258395 | 11.0786710 | 2.911757 | 0.0035940 | 10.544599 | 53.9721910 |
Pike | Height | -2.729501 | 0.8936509 | -3.054325 | 0.0022557 | -4.481024 | -0.9779769 |
Roach | (Intercept) | 34.018450 | 11.0873905 | 3.068211 | 0.0021534 | 12.287563 | 55.7493354 |
Roach | Height | -2.952103 | 0.8970243 | -3.290995 | 0.0009983 | -4.710238 | -1.1939674 |
Smelt | (Intercept) | 44.030211 | 11.6248350 | 3.787599 | 0.0001521 | 21.245953 | 66.8144691 |
Smelt | Height | -5.904017 | 1.5307495 | -3.856945 | 0.0001148 | -8.904231 | -2.9038029 |
Whitefish | (Intercept) | 27.662212 | 11.1017698 | 2.491694 | 0.0127136 | 5.903143 | 49.4212812 |
Whitefish | Height | -2.328914 | 0.8941862 | -2.604507 | 0.0092007 | -4.081487 | -0.5763414 |
fit_full <- multinom(Species ~ Height + Width + Length1, data = fish)
## # weights: 35 (24 variable)
## initial value 309.399714
## iter 10 value 180.514736
## iter 20 value 66.572505
## iter 30 value 45.714570
## iter 40 value 38.913166
## iter 50 value 36.849215
## iter 60 value 36.268579
## iter 70 value 35.923987
## iter 80 value 35.569171
## iter 90 value 35.503361
## iter 100 value 35.427287
## final value 35.427287
## stopped after 100 iterations
summary(fit_full)
## Call:
## multinom(formula = Species ~ Height + Width + Length1, data = fish)
##
## Coefficients:
## (Intercept) Height Width Length1
## Parkki 75.51016 -13.81480 -0.6428685 3.607363
## Perch 72.10001 -46.52687 38.0678237 9.579827
## Pike 19.49055 -52.39277 -22.6208506 22.635028
## Roach 74.05456 -42.14156 33.9557434 8.808315
## Smelt 64.74945 -61.22605 16.4907516 17.033127
## Whitefish 66.81587 -37.53292 32.1322729 7.823939
##
## Std. Errors:
## (Intercept) Height Width Length1
## Parkki 53.661339 11.637192 13.650243 4.717493
## Perch 8.064019 6.117955 9.035182 1.226707
## Pike 4.992302 9.287552 4.819420 3.411137
## Roach 8.073732 6.130948 9.052691 1.228708
## Smelt 20.668181 21.187647 24.125128 3.007804
## Whitefish 8.297163 6.410701 9.132673 1.297334
##
## Residual Deviance: 70.85457
## AIC: 118.8546
# Relative Risk Ratio
tidy(fit_full, conf.int = TRUE, exponentiate = TRUE) %>%
kable() %>%
kable_styling("basic", full_width = FALSE)
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
Parkki | (Intercept) | 6.217909e+32 | 53.661339 | 1.4071612 | 0.1593796 | 0.000000e+00 | 2.952970e+78 |
Parkki | Height | 1.000000e-06 | 11.637192 | -1.1871247 | 0.2351785 | 0.000000e+00 | 8.052030e+03 |
Parkki | Width | 5.257820e-01 | 13.650243 | -0.0470958 | 0.9624369 | 0.000000e+00 | 2.187327e+11 |
Parkki | Length1 | 3.686872e+01 | 4.717493 | 0.7646780 | 0.4444633 | 3.557300e-03 | 3.821163e+05 |
Perch | (Intercept) | 2.054170e+31 | 8.064019 | 8.9409522 | 0.0000000 | 2.808864e+24 | 1.502249e+38 |
Perch | Height | 0.000000e+00 | 6.117955 | -7.6049705 | 0.0000000 | 0.000000e+00 | 0.000000e+00 |
Perch | Width | 3.409147e+16 | 9.035182 | 4.2132878 | 0.0000252 | 6.948396e+08 | 1.672657e+24 |
Perch | Length1 | 1.446992e+04 | 1.226707 | 7.8093871 | 0.0000000 | 1.307048e+03 | 1.601920e+05 |
Pike | (Intercept) | 2.915007e+08 | 4.992303 | 3.9041210 | 0.0000946 | 1.641285e+04 | 5.177202e+12 |
Pike | Height | 0.000000e+00 | 9.287552 | -5.6411812 | 0.0000000 | 0.000000e+00 | 0.000000e+00 |
Pike | Width | 0.000000e+00 | 4.819420 | -4.6936876 | 0.0000027 | 0.000000e+00 | 1.900000e-06 |
Pike | Length1 | 6.764999e+09 | 3.411137 | 6.6356259 | 0.0000000 | 8.447006e+06 | 5.417921e+12 |
Roach | (Intercept) | 1.450390e+32 | 8.073732 | 9.1722833 | 0.0000000 | 1.945861e+25 | 1.081080e+39 |
Roach | Height | 0.000000e+00 | 6.130948 | -6.8735802 | 0.0000000 | 0.000000e+00 | 0.000000e+00 |
Roach | Width | 5.582028e+14 | 9.052691 | 3.7509003 | 0.0001762 | 1.099327e+07 | 2.834372e+22 |
Roach | Length1 | 6.689638e+03 | 1.228708 | 7.1687617 | 0.0000000 | 6.019001e+02 | 7.434996e+04 |
Smelt | (Intercept) | 1.319262e+28 | 20.668181 | 3.1328085 | 0.0017314 | 3.369327e+10 | 5.165575e+45 |
Smelt | Height | 0.000000e+00 | 21.187648 | -2.8897050 | 0.0038560 | 0.000000e+00 | 0.000000e+00 |
Smelt | Width | 1.451585e+07 | 24.125128 | 0.6835508 | 0.4942588 | 0.000000e+00 | 4.979533e+27 |
Smelt | Length1 | 2.496855e+07 | 3.007804 | 5.6629780 | 0.0000000 | 6.872993e+04 | 9.070696e+09 |
Whitefish | (Intercept) | 1.041746e+29 | 8.297163 | 8.0528566 | 0.0000000 | 9.019939e+21 | 1.203150e+36 |
Whitefish | Height | 0.000000e+00 | 6.410701 | -5.8547294 | 0.0000000 | 0.000000e+00 | 0.000000e+00 |
Whitefish | Width | 9.012989e+13 | 9.132673 | 3.5183864 | 0.0004342 | 1.517481e+06 | 5.353210e+21 |
Whitefish | Length1 | 2.499732e+03 | 1.297334 | 6.0307807 | 0.0000000 | 1.966077e+02 | 3.178238e+04 |
mfx_height <- avg_comparisons(fit_full, variables = "Height", type = "probs")
mfx_height
##
## Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## Bream 0.0305 0.00965 3.157 0.00159 9.3 0.0116 0.0494
## Parkki -0.0199 0.08680 -0.229 0.81855 0.3 -0.1900 0.1502
## Perch -0.2625 0.03955 -6.637 < 0.001 34.9 -0.3401 -0.1850
## Pike 0.0183 0.07232 0.253 0.80033 0.3 -0.1235 0.1600
## Roach 0.0893 0.22845 0.391 0.69600 0.5 -0.3585 0.5370
## Smelt -0.0667 0.29401 -0.227 0.82057 0.3 -0.6429 0.5096
## Whitefish 0.2111 0.05777 3.654 < 0.001 11.9 0.0979 0.3243
##
## Term: Height
## Type: probs
## Comparison: +1
mfx_width <- avg_comparisons(fit_full, variables = "Width", type = "probs")
mfx_width
##
## Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## Bream -0.00498 0.0261 -0.1910 0.848 0.2 -0.0560 0.0461
## Parkki -0.02136 0.0513 -0.4162 0.677 0.6 -0.1220 0.0792
## Perch 0.23129 0.0685 3.3781 <0.001 10.4 0.0971 0.3655
## Pike -0.01782 0.2036 -0.0875 0.930 0.1 -0.4168 0.3812
## Roach -0.11465 0.0205 -5.5937 <0.001 25.4 -0.1548 -0.0745
## Smelt -0.06358 0.2486 -0.2558 0.798 0.3 -0.5508 0.4236
## Whitefish -0.00891 0.0378 -0.2359 0.814 0.3 -0.0830 0.0651
##
## Term: Width
## Type: probs
## Comparison: +1
library(caret)
## Warning: package 'caret' was built under R version 4.4.3
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
pred <- predict(fit_full, newdata = fish)
confusionMatrix(pred, fish$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Bream Parkki Perch Pike Roach Smelt Whitefish
## Bream 35 0 0 0 0 0 0
## Parkki 0 11 0 0 0 0 0
## Perch 0 0 51 0 5 0 0
## Pike 0 0 0 17 0 0 0
## Roach 0 0 4 0 14 0 2
## Smelt 0 0 0 0 0 14 0
## Whitefish 0 0 1 0 1 0 4
##
## Overall Statistics
##
## Accuracy : 0.9182
## 95% CI : (0.8642, 0.9557)
## No Information Rate : 0.3522
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.896
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: Bream Class: Parkki Class: Perch Class: Pike
## Sensitivity 1.0000 1.00000 0.9107 1.0000
## Specificity 1.0000 1.00000 0.9515 1.0000
## Pos Pred Value 1.0000 1.00000 0.9107 1.0000
## Neg Pred Value 1.0000 1.00000 0.9515 1.0000
## Prevalence 0.2201 0.06918 0.3522 0.1069
## Detection Rate 0.2201 0.06918 0.3208 0.1069
## Detection Prevalence 0.2201 0.06918 0.3522 0.1069
## Balanced Accuracy 1.0000 1.00000 0.9311 1.0000
## Class: Roach Class: Smelt Class: Whitefish
## Sensitivity 0.70000 1.00000 0.66667
## Specificity 0.95683 1.00000 0.98693
## Pos Pred Value 0.70000 1.00000 0.66667
## Neg Pred Value 0.95683 1.00000 0.98693
## Prevalence 0.12579 0.08805 0.03774
## Detection Rate 0.08805 0.08805 0.02516
## Detection Prevalence 0.12579 0.08805 0.03774
## Balanced Accuracy 0.82842 1.00000 0.82680
#Analisis Kasus Dalam kasus ini, kami menganalisis spesies ikan berdasarkan ukuran fisik seperti Tinggi, Lebar, dan Panjang1, menggunakan Regresi Logistik Multinomial. Pemilihan metode ini didasarkan pada karakteristik variabel target, yaitu spesies ikan yang terdiri dari lebih dari dua kategori tanpa urutan yang jelas (non-ordinal).
Model awal yang hanya memanfaatkan Tinggi sudah mampu membentuk pola klasifikasi yang berarti. Namun, ketika prediktor Lebar dan Panjang1 ditambahkan, terjadi peningkatan performa model secara signifikan. Visualisasi sebaran data mengungkapkan adanya klaster yang cukup tegas antar spesies, khususnya jika dilihat berdasarkan kombinasi antara Tinggi dan Lebar, yang semakin menegaskan pentingnya kedua fitur ini dalam membedakan spesies.
Analisis Rasio Risiko Relatif (Relative Risk Ratio) dan Efek Marginal (Marginal Effects) memperkuat temuan ini, menunjukkan bahwa Tinggi dan Lebar adalah faktor paling dominan dalam membedakan spesies ikan.
Evaluasi performa model menunjukkan hasil yang sangat memuaskan: - Akurasi model tercatat sebesar 91.82% dengan interval kepercayaan 95% antara 86.42% hingga 95.57%. - No Information Rate (NIR) sebesar 35.22%, memperlihatkan bahwa model secara drastis lebih baik dibandingkan tebakan acak. - P-Value [Acc > NIR] kurang dari 2.2e-16, menunjukkan signifikansi yang sangat tinggi. - Nilai Kappa sebesar 0.896, mengindikasikan tingkat kesepakatan prediksi yang hampir sempurna. - Dari Confusion Matrix, dapat dilihat bahwa model memiliki performa luar biasa dalam mengklasifikasikan spesies Bream, Parkki, Pike, dan Smelt, dengan sensitivitas dan spesifisitas 100% untuk masing-masing kelas tersebut.
Spesies Perch juga menunjukkan performa tinggi, dengan sensitivitas sekitar 91%, meskipun terdapat sedikit kekeliruan dalam mengklasifikasikan sebagian ke dalam kelas Roach. Untuk spesies Roach dan Whitefish, sensitivitas masing-masing lebih rendah (70% untuk Roach dan 66.67% untuk Whitefish), namun spesifisitasnya tetap tinggi (>95%), menandakan model sangat jarang membuat prediksi positif palsu terhadap spesies lain.
Secara keseluruhan, model ini sangat andal dalam mengklasifikasikan sebagian besar spesies ikan berdasarkan ukuran fisik dasar, meskipun terdapat peluang untuk peningkatan klasifikasi spesies dengan distribusi data yang lebih sedikit seperti Whitefish.
##Import Dataset Wine Quality
wine <- read.csv("C:/Users/HP 430 G5/OneDrive/Dokumen/Semester 4/Analisis Multivariat/winequality-red.csv")
str(wine)
## 'data.frame': 1599 obs. of 12 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
wine$quality <- factor(wine$quality, ordered = TRUE)
summary(wine$quality)
## 3 4 5 6 7 8
## 10 53 681 638 199 18
##Visualisasi Awal Data Anggur
# Boxplot: alkohol vs kualitas
ggplot(wine, aes(x = quality, y = alcohol)) +
geom_boxplot(fill = "skyblue") +
theme_minimal() +
labs(title = "Kadar Alkohol terhadap Kualitas Anggur", x = "Kualitas", y = "Alkohol")
# Boxplot: volatile acidity vs kualitas
ggplot(wine, aes(x = quality, y = volatile.acidity)) +
geom_boxplot(fill = "salmon") +
theme_minimal() +
labs(title = "Keasaman Volatil terhadap Kualitas Anggur", x = "Kualitas", y = "Volatile Acidity")
##Model Regresi Logistik Ordinal
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
fit_ordinal <- polr(quality ~ alcohol + volatile.acidity + sulphates, data = wine, Hess = TRUE)
summary(fit_ordinal)
## Call:
## polr(formula = quality ~ alcohol + volatile.acidity + sulphates,
## data = wine, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## alcohol 0.9374 0.05441 17.229
## volatile.acidity -3.6191 0.31738 -11.403
## sulphates 2.1427 0.31732 6.753
##
## Intercepts:
## Value Std. Error t value
## 3|4 3.4313 0.7048 4.8682
## 4|5 5.3586 0.6397 8.3768
## 5|6 8.9841 0.6417 14.0013
## 6|7 11.7561 0.6820 17.2388
## 7|8 14.7458 0.7402 19.9211
##
## Residual Deviance: 3138.035
## AIC: 3154.035
# Tabel log odds
tidy(fit_ordinal, conf.int = TRUE) %>%
kable() %>%
kable_styling("basic", full_width = FALSE)
term | estimate | std.error | statistic | conf.low | conf.high | coef.type |
---|---|---|---|---|---|---|
alcohol | 0.9374265 | 0.0544100 | 17.228930 | 0.8317929 | 1.045137 | coefficient |
volatile.acidity | -3.6190824 | 0.3173787 | -11.403042 | -4.2442986 | -2.999938 | coefficient |
sulphates | 2.1427003 | 0.3173163 | 6.752568 | 1.5259271 | 2.771125 | coefficient |
3|4 | 3.4312799 | 0.7048384 | 4.868179 | NA | NA | scale |
4|5 | 5.3585665 | 0.6396909 | 8.376806 | NA | NA | scale |
5|6 | 8.9840834 | 0.6416628 | 14.001252 | NA | NA | scale |
6|7 | 11.7561145 | 0.6819560 | 17.238817 | NA | NA | scale |
7|8 | 14.7457955 | 0.7402110 | 19.921070 | NA | NA | scale |
##Eksponensiasi Koefisien (Odds Ratio)
tidy(fit_ordinal, conf.int = TRUE, exponentiate = TRUE) %>%
kable() %>%
kable_styling("basic", full_width = FALSE)
term | estimate | std.error | statistic | conf.low | conf.high | coef.type |
---|---|---|---|---|---|---|
alcohol | 2.553402e+00 | 0.0544100 | 17.228930 | 2.2974341 | 2.8437876 | coefficient |
volatile.acidity | 2.680730e-02 | 0.3173787 | -11.403042 | 0.0143458 | 0.0497902 | coefficient |
sulphates | 8.522420e+00 | 0.3173163 | 6.752568 | 4.5994058 | 15.9766039 | coefficient |
3|4 | 3.091619e+01 | 0.7048384 | 4.868179 | NA | NA | scale |
4|5 | 2.124202e+02 | 0.6396909 | 8.376806 | NA | NA | scale |
5|6 | 7.975131e+03 | 0.6416628 | 14.001252 | NA | NA | scale |
6|7 | 1.275310e+05 | 0.6819560 | 17.238817 | NA | NA | scale |
7|8 | 2.535231e+06 | 0.7402110 | 19.921070 | NA | NA | scale |
##Marginal Effects
mfx_alcohol <- avg_comparisons(fit_ordinal, variables = "alcohol", type = "probs")
mfx_alcohol
##
## Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## 3 -0.00384 0.00120 -3.19 0.00141 9.5 -0.00619 -0.00148
## 4 -0.01929 0.00257 -7.50 < 0.001 43.9 -0.02432 -0.01425
## 5 -0.13956 0.00694 -20.10 < 0.001 296.2 -0.15317 -0.12596
## 6 0.05759 0.00428 13.47 < 0.001 134.9 0.04921 0.06597
## 7 0.08930 0.00572 15.62 < 0.001 180.2 0.07810 0.10051
## 8 0.01579 0.00347 4.55 < 0.001 17.5 0.00900 0.02258
##
## Term: alcohol
## Type: probs
## Comparison: +1
mfx_volatile <- avg_comparisons(fit_ordinal, variables = "volatile.acidity", type = "probs")
mfx_volatile
##
## Group Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
## 3 0.1522 0.04810 3.16 0.00156 9.3 0.0579 0.24647
## 4 0.2664 0.03684 7.23 < 0.001 40.9 0.1942 0.33865
## 5 0.0473 0.04142 1.14 0.25364 2.0 -0.0339 0.12845
## 6 -0.3352 0.01729 -19.38 < 0.001 275.7 -0.3691 -0.30135
## 7 -0.1193 0.00720 -16.56 < 0.001 202.3 -0.1334 -0.10518
## 8 -0.0114 0.00257 -4.42 < 0.001 16.6 -0.0164 -0.00632
##
## Term: volatile.acidity
## Type: probs
## Comparison: +1
predicted_quality <- predict(fit_ordinal, newdata = wine)
confusionMatrix(predicted_quality, wine$quality)
## Confusion Matrix and Statistics
##
## Reference
## Prediction 3 4 5 6 7 8
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 9 41 502 227 13 0
## 6 1 12 176 382 145 10
## 7 0 0 3 29 41 8
## 8 0 0 0 0 0 0
##
## Overall Statistics
##
## Accuracy : 0.5785
## 95% CI : (0.5538, 0.6028)
## No Information Rate : 0.4259
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.2993
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 3 Class: 4 Class: 5 Class: 6 Class: 7 Class: 8
## Sensitivity 0.000000 0.00000 0.7372 0.5987 0.20603 0.00000
## Specificity 1.000000 1.00000 0.6841 0.6420 0.97143 1.00000
## Pos Pred Value NaN NaN 0.6338 0.5262 0.50617 NaN
## Neg Pred Value 0.993746 0.96685 0.7782 0.7068 0.89592 0.98874
## Prevalence 0.006254 0.03315 0.4259 0.3990 0.12445 0.01126
## Detection Rate 0.000000 0.00000 0.3139 0.2389 0.02564 0.00000
## Detection Prevalence 0.000000 0.00000 0.4953 0.4540 0.05066 0.00000
## Balanced Accuracy 0.500000 0.50000 0.7106 0.6204 0.58873 0.50000
#Analisis Kasus Dalam analisis ini, kami menggunakan Regresi Logistik Ordinal untuk memprediksi kualitas anggur berdasarkan kandungan alkohol, keasaman volatil, dan sulfat. Metode ini dipilih karena variabel target, yaitu kualitas anggur, bersifat ordinal dan memiliki tingkatan yang jelas, dari kualitas rendah hingga tinggi.
Visualisasi awal menggunakan boxplot mengungkapkan tren penting: - Kadar alkohol cenderung meningkat seiring dengan peningkatan kualitas anggur. - Keasaman volatil justru menunjukkan pola menurun pada anggur dengan kualitas lebih tinggi.
Temuan ini mendukung adanya hubungan potensial antara kedua variabel tersebut dengan skor kualitas anggur. Pada hasil pemodelan, diperoleh beberapa insight kunci: - Kadar alkohol memiliki hubungan positif dengan kualitas anggur, yang berarti semakin tinggi kandungan alkohol, semakin besar peluang untuk memperoleh kualitas yang lebih baik. - Sebaliknya, keasaman volatil menunjukkan hubungan negatif, di mana kadar keasaman volatil yang lebih tinggi menurunkan kemungkinan tercapainya kualitas anggur yang lebih tinggi. - Sulfat juga berkontribusi dalam model, namun diperlukan analisis lanjutan untuk memahami pengaruh spesifiknya terhadap kualitas anggur.
Hasil eksponensiasi koefisien (Odds Ratio) memperkuat interpretasi ini, di mana setiap peningkatan kecil pada kadar alkohol atau penurunan keasaman volatil memengaruhi peningkatan peluang terhadap kategori kualitas yang lebih baik.
Lebih lanjut, analisis Marginal Effects menunjukkan bahwa perubahan kecil pada variabel-variabel tersebut berdampak cukup signifikan terhadap distribusi probabilitas di setiap tingkatan kualitas.
Untuk mengevaluasi performa model, dilakukan pengujian terhadap dataset: - Akurasi model mencapai 57.85%, menunjukkan performa yang secara signifikan lebih baik daripada tebakan acak (No Information Rate sebesar 42.59%). - Nilai Kappa sebesar 0.2993 mengindikasikan tingkat kesepakatan prediksi model terhadap data aktual berada pada level “fair”. - Berdasarkan Confusion Matrix, model menunjukkan kemampuan yang cukup baik dalam memprediksi kualitas 5 dan 6, namun masih mengalami kesulitan dalam membedakan kategori ekstrem seperti kualitas 3, 4, dan 8, yang frekuensinya relatif rendah dalam data.
Meskipun performa model dapat dikatakan memadai sebagai baseline awal, terdapat ruang perbaikan, khususnya dengan mengatasi ketidakseimbangan jumlah sampel antar kategori kualitas.