Hasil dan Pembahasan Jurnal ADK
Library
## Warning: package 'car' was built under R version 4.2.3
## Loading required package: carData
## Warning: package 'pscl' was built under R version 4.2.3
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
## Warning: package 'ResourceSelection' was built under R version 4.2.3
## ResourceSelection 0.3-6 2023-06-27
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Data
breast.cancer <- read.csv("C:/Users/acer/Downloads/breast-cancer.csv", sep=";")
summary(breast.cancer)## Class age menopause tumor.size
## Length:286 Length:286 Length:286 Length:286
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## inv.nodes node.caps deg.malig breast
## Length:286 Length:286 Min. :1.000 Length:286
## Class :character Class :character 1st Qu.:2.000 Class :character
## Mode :character Mode :character Median :2.000 Mode :character
## Mean :2.049
## 3rd Qu.:3.000
## Max. :3.000
## breast.quad irradiat
## Length:286 Length:286
## Class :character Class :character
## Mode :character Mode :character
##
##
##
Membentuk Data Frame
Y= as.factor(breast.cancer$Class)
X1=as.factor(breast.cancer$age)
X2=as.factor(breast.cancer$menopause)
X3=as.factor(breast.cancer$tumor.size)
X4=as.factor(breast.cancer$inv.nodes)
X5=as.factor(breast.cancer$node.caps)
X6=as.factor(breast.cancer$deg.malig)
X7=as.factor(breast.cancer$breast)
X8=as.factor(breast.cancer$breast.quad)
X9=as.factor(breast.cancer$irradiat)
df=data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9)##
## no-recurrence-events recurrence-events
## 201 85
Analisis Deskriptif
Analisis Regresi Logistik Biner
Estimasi Parameter
##
## Call:
## glm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9,
## family = binomial, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7003 -0.7716 -0.4804 0.8639 2.3793
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.17462 3393.46915 0.000 1.000
## X130-39 15.47277 2399.54488 0.006 0.995
## X140-49 15.05563 2399.54486 0.006 0.995
## X150-59 14.94901 2399.54488 0.006 0.995
## X160-69 15.69932 2399.54492 0.007 0.995
## X170-79 15.07518 2399.54523 0.006 0.995
## X2lt40 0.75686 0.97364 0.777 0.437
## X2premeno 0.63348 0.50167 1.263 0.207
## X305-Sep -15.07161 1173.99205 -0.013 0.990
## X315-19 -0.04204 1.30979 -0.032 0.974
## X320-24 0.36430 1.25276 0.291 0.771
## X325-29 0.26147 1.26614 0.207 0.836
## X330-34 0.24074 1.25872 0.191 0.848
## X335-39 -0.07422 1.37265 -0.054 0.957
## X340-44 -0.27129 1.35584 -0.200 0.841
## X345-49 -0.44835 1.82169 -0.246 0.806
## X350-54 0.88358 1.46142 0.605 0.545
## X3Oct-14 -2.19523 1.66642 -1.317 0.188
## X403-May 0.74346 0.51769 1.436 0.151
## X406-Aug 1.06533 0.70069 1.520 0.128
## X409-Nov 1.18584 0.86413 1.372 0.170
## X415-17 0.55522 1.01226 0.548 0.583
## X424-26 15.65505 2399.54485 0.007 0.995
## X4Dec-14 0.91522 1.44742 0.632 0.527
## X5no 0.23078 0.98198 0.235 0.814
## X5yes 0.63527 0.99801 0.637 0.524
## X62 -0.37648 0.45381 -0.830 0.407
## X63 0.97091 0.45789 2.120 0.034 *
## X7right -0.39954 0.33946 -1.177 0.239
## X8central -17.71238 2399.54484 -0.007 0.994
## X8left_low -17.30972 2399.54476 -0.007 0.994
## X8left_up -17.46757 2399.54476 -0.007 0.994
## X8right_low -17.78898 2399.54483 -0.007 0.994
## X8right_up -16.75471 2399.54478 -0.007 0.994
## X9yes 0.37633 0.37924 0.992 0.321
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 348.05 on 285 degrees of freedom
## Residual deviance: 276.16 on 251 degrees of freedom
## AIC: 346.16
##
## Number of Fisher Scoring iterations: 15
Uji Non-Multikolinearitas
## GVIF Df GVIF^(1/(2*Df))
## X1 3.803662 5 1.142931
## X2 3.238805 2 1.341517
## X3 2.756842 10 1.052012
## X4 4.505902 6 1.133657
## X5 2.899823 2 1.304947
## X6 1.527815 2 1.111777
## X7 1.308460 1 1.143879
## X8 2.059782 5 1.074935
## X9 1.374079 1 1.172211
Berdasarkan hasil perhitungan, nilai VIF pada masih-masing variable predictor menghasilkan nilai kurang dari 10. Artinya masing-masing variabel predictor tidak terjadi multikolinearitas antar variabel predictor atau tidak saling berkorelasi.
Uji Simultan
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML r2CU
## -138.0779297 -174.0240146 71.8921698 0.2065582 0.2222664 0.3157784
df=(n variabel - n kategorik)
## [1] 15.50731
Dapat dilihat bahwa nilai G2 (71,89217) > chi square tabel (15,507), sehingga dapat diambil kesimpulan untuk menolak H0 yang artinya dapat diyakini bahwa terdapat minimal satu peubah penjelas yang signifikan terhadap peubah respon.
Uji Parsial
##
## Call:
## glm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9,
## family = binomial, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7003 -0.7716 -0.4804 0.8639 2.3793
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.17462 3393.46915 0.000 1.000
## X130-39 15.47277 2399.54488 0.006 0.995
## X140-49 15.05563 2399.54486 0.006 0.995
## X150-59 14.94901 2399.54488 0.006 0.995
## X160-69 15.69932 2399.54492 0.007 0.995
## X170-79 15.07518 2399.54523 0.006 0.995
## X2lt40 0.75686 0.97364 0.777 0.437
## X2premeno 0.63348 0.50167 1.263 0.207
## X305-Sep -15.07161 1173.99205 -0.013 0.990
## X315-19 -0.04204 1.30979 -0.032 0.974
## X320-24 0.36430 1.25276 0.291 0.771
## X325-29 0.26147 1.26614 0.207 0.836
## X330-34 0.24074 1.25872 0.191 0.848
## X335-39 -0.07422 1.37265 -0.054 0.957
## X340-44 -0.27129 1.35584 -0.200 0.841
## X345-49 -0.44835 1.82169 -0.246 0.806
## X350-54 0.88358 1.46142 0.605 0.545
## X3Oct-14 -2.19523 1.66642 -1.317 0.188
## X403-May 0.74346 0.51769 1.436 0.151
## X406-Aug 1.06533 0.70069 1.520 0.128
## X409-Nov 1.18584 0.86413 1.372 0.170
## X415-17 0.55522 1.01226 0.548 0.583
## X424-26 15.65505 2399.54485 0.007 0.995
## X4Dec-14 0.91522 1.44742 0.632 0.527
## X5no 0.23078 0.98198 0.235 0.814
## X5yes 0.63527 0.99801 0.637 0.524
## X62 -0.37648 0.45381 -0.830 0.407
## X63 0.97091 0.45789 2.120 0.034 *
## X7right -0.39954 0.33946 -1.177 0.239
## X8central -17.71238 2399.54484 -0.007 0.994
## X8left_low -17.30972 2399.54476 -0.007 0.994
## X8left_up -17.46757 2399.54476 -0.007 0.994
## X8right_low -17.78898 2399.54483 -0.007 0.994
## X8right_up -16.75471 2399.54478 -0.007 0.994
## X9yes 0.37633 0.37924 0.992 0.321
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 348.05 on 285 degrees of freedom
## Residual deviance: 276.16 on 251 degrees of freedom
## AIC: 346.16
##
## Number of Fisher Scoring iterations: 15
Pengujian parameter secara parsial menunjukkan bahwa X6 bernilai p value < alfa 5%, artinya berpengaruh terhadap peubah respon Class karena p-value < 0,05.
model akhir Regresi Logistik Biner
##
## Call:
## glm(formula = Y ~ X6, family = binomial, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2278 -0.6965 -0.6085 1.1278 1.8856
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.5926 0.3167 -5.029 4.92e-07 ***
## X62 0.2999 0.3818 0.785 0.432
## X63 1.7104 0.3841 4.453 8.45e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 348.05 on 285 degrees of freedom
## Residual deviance: 317.52 on 283 degrees of freedom
## AIC: 323.52
##
## Number of Fisher Scoring iterations: 4
Uji Kesesuaian Model
## Warning in Ops.factor(1, y): '-' not meaningful for factors
## Warning in hoslem.test(df$Y, fitted(model_X6)): The data did not allow for the
## requested number of bins.
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: df$Y, fitted(model_X6)
## X-squared = 286, df = 0, p-value < 2.2e-16
## [1] 15.50731
Dapat dilihat bahwa nilai chi square (1,3027) < nilai chi square tabel (15,50731) artinya model sesuai atau tidak ada perbedaan antara hasil pengamatan dengan kemungkinan hasil prediksi.
Koefisien Determinasi
## [1] 0.08771728
Untuk mengukur kebaikan model dapat dilakukan dengan melihat nilai koefisien determinasi, nilai koefiisien determinasi pada model terbaik ini adalah 8,7%.
Odds Ratio
## beta OR_beta
## (Intercept) -1.5926308 0.2033898
## X62 0.2998625 1.3496732
## X63 1.7104138 5.5312500
Interpretasi: 1. Tingkat keganasan sedang (2) bertambah 0, maka kecenderungan terhadap probabilitas status kekambuhan sebesar 11,34967 atau sebesar 11 kali. 2. Tingkat keganasan tinggi (3) bertambah 1, maka kecenderungan terhadap probabilitas status kekambuhan sebesar 5,5312500 atau sebesar 5 kali.
Ketepatan Klasifikasi
probabilitas <- model_X6 %>% predict(df, type = "response")
prediksi <- ifelse(probabilitas>0.5, "no-recurrence-events", "recurrence-events")
tab1<- table(predicted=prediksi, actual=df$Y)
tab1## actual
## predicted no-recurrence-events recurrence-events
## no-recurrence-events 40 45
## recurrence-events 161 40
## [1] 27.97
## [1] "27.972027972028 %"
berdasarkan pembentukan tabel ketetapan klasifikasi yang merupakan tabel frekuensi dua arah antara variabel respon dan prediktor, maka diperoleh tingkat akurasi model sebesar 27,97 %
References
https://rstudio-pubs-static.s3.amazonaws.com/950229_353f993e74f747c98e5c23727afa4ae4.html#Odd_Ratio https://rpubs.com/alfazrinb/regresi-dan-klasifikasi-logistik-biner https://rpubs.com/annisads/regresi-logistik-biner-kanker https://rpubs.com/RatriSintya/1048800 https://rpubs.com/nadiaindrswari/1048708