Hasil dan Pembahasan Jurnal ADK

Library

library(car)

## Warning: package 'car' was built under R version 4.2.3

## Loading required package: carData

library(pscl)

## Warning: package 'pscl' was built under R version 4.2.3

## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis

library(ResourceSelection)

## Warning: package 'ResourceSelection' was built under R version 4.2.3

## ResourceSelection 0.3-6   2023-06-27

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data

breast.cancer <- read.csv("C:/Users/acer/Downloads/breast-cancer.csv", sep=";")
summary(breast.cancer)

##     Class               age             menopause          tumor.size       
##  Length:286         Length:286         Length:286         Length:286        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##   inv.nodes          node.caps           deg.malig        breast         
##  Length:286         Length:286         Min.   :1.000   Length:286        
##  Class :character   Class :character   1st Qu.:2.000   Class :character  
##  Mode  :character   Mode  :character   Median :2.000   Mode  :character  
##                                        Mean   :2.049                     
##                                        3rd Qu.:3.000                     
##                                        Max.   :3.000                     
##  breast.quad          irradiat        
##  Length:286         Length:286        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##

Membentuk Data Frame

Y= as.factor(breast.cancer$Class)
X1=as.factor(breast.cancer$age)
X2=as.factor(breast.cancer$menopause)
X3=as.factor(breast.cancer$tumor.size)
X4=as.factor(breast.cancer$inv.nodes)
X5=as.factor(breast.cancer$node.caps)
X6=as.factor(breast.cancer$deg.malig)
X7=as.factor(breast.cancer$breast)
X8=as.factor(breast.cancer$breast.quad)
X9=as.factor(breast.cancer$irradiat)
df=data.frame(Y,X1,X2,X3,X4,X5,X6,X7,X8,X9)

table.Cancer = table(df$Y)
table.Cancer

## 
## no-recurrence-events    recurrence-events 
##                  201                   85

Analisis Deskriptif

Pie Chart

kat=c("no-recurrence-events","recurrence-events")
persentase=round(table.Cancer/sum(table.Cancer)*100)
kat=paste(kat,persentase)
kat=paste(kat,'%',sep = ' ')
pie(table.Cancer,labels = kat,col = c('pink','black'),main="persentase breast cancer")

Histogram

counts=table(df$Y,df$X3)
barplot(counts, main = "Keadaan Penderita Kanker Berdasarkan Ukuran Tumor", xlab =" ", col=c('brown','lightblue'), legend=rownames(counts), beside = TRUE)

Analisis Regresi Logistik Biner

Estimasi Parameter

model = glm(Y~X1+X2+X3+X4+X5+X6+X7+X8+X9, data = df, family = binomial)
summary(model)

## 
## Call:
## glm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, 
##     family = binomial, data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7003  -0.7716  -0.4804   0.8639   2.3793  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    0.17462 3393.46915   0.000    1.000  
## X130-39       15.47277 2399.54488   0.006    0.995  
## X140-49       15.05563 2399.54486   0.006    0.995  
## X150-59       14.94901 2399.54488   0.006    0.995  
## X160-69       15.69932 2399.54492   0.007    0.995  
## X170-79       15.07518 2399.54523   0.006    0.995  
## X2lt40         0.75686    0.97364   0.777    0.437  
## X2premeno      0.63348    0.50167   1.263    0.207  
## X305-Sep     -15.07161 1173.99205  -0.013    0.990  
## X315-19       -0.04204    1.30979  -0.032    0.974  
## X320-24        0.36430    1.25276   0.291    0.771  
## X325-29        0.26147    1.26614   0.207    0.836  
## X330-34        0.24074    1.25872   0.191    0.848  
## X335-39       -0.07422    1.37265  -0.054    0.957  
## X340-44       -0.27129    1.35584  -0.200    0.841  
## X345-49       -0.44835    1.82169  -0.246    0.806  
## X350-54        0.88358    1.46142   0.605    0.545  
## X3Oct-14      -2.19523    1.66642  -1.317    0.188  
## X403-May       0.74346    0.51769   1.436    0.151  
## X406-Aug       1.06533    0.70069   1.520    0.128  
## X409-Nov       1.18584    0.86413   1.372    0.170  
## X415-17        0.55522    1.01226   0.548    0.583  
## X424-26       15.65505 2399.54485   0.007    0.995  
## X4Dec-14       0.91522    1.44742   0.632    0.527  
## X5no           0.23078    0.98198   0.235    0.814  
## X5yes          0.63527    0.99801   0.637    0.524  
## X62           -0.37648    0.45381  -0.830    0.407  
## X63            0.97091    0.45789   2.120    0.034 *
## X7right       -0.39954    0.33946  -1.177    0.239  
## X8central    -17.71238 2399.54484  -0.007    0.994  
## X8left_low   -17.30972 2399.54476  -0.007    0.994  
## X8left_up    -17.46757 2399.54476  -0.007    0.994  
## X8right_low  -17.78898 2399.54483  -0.007    0.994  
## X8right_up   -16.75471 2399.54478  -0.007    0.994  
## X9yes          0.37633    0.37924   0.992    0.321  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 348.05  on 285  degrees of freedom
## Residual deviance: 276.16  on 251  degrees of freedom
## AIC: 346.16
## 
## Number of Fisher Scoring iterations: 15

Uji Non-Multikolinearitas

vif(model)

##        GVIF Df GVIF^(1/(2*Df))
## X1 3.803662  5        1.142931
## X2 3.238805  2        1.341517
## X3 2.756842 10        1.052012
## X4 4.505902  6        1.133657
## X5 2.899823  2        1.304947
## X6 1.527815  2        1.111777
## X7 1.308460  1        1.143879
## X8 2.059782  5        1.074935
## X9 1.374079  1        1.172211

Berdasarkan hasil perhitungan, nilai VIF pada masih-masing variable predictor menghasilkan nilai kurang dari 10. Artinya masing-masing variabel predictor tidak terjadi multikolinearitas antar variabel predictor atau tidak saling berkorelasi.

Uji Simultan

pR2(model)

## fitting null model for pseudo-r2

##          llh      llhNull           G2     McFadden         r2ML         r2CU 
## -138.0779297 -174.0240146   71.8921698    0.2065582    0.2222664    0.3157784

df=(n variabel - n kategorik)

qchisq(0.95,8 )

## [1] 15.50731

Dapat dilihat bahwa nilai G2 (71,89217) > chi square tabel (15,507), sehingga dapat diambil kesimpulan untuk menolak H0 yang artinya dapat diyakini bahwa terdapat minimal satu peubah penjelas yang signifikan terhadap peubah respon.

Uji Parsial

summary(model)

## 
## Call:
## glm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9, 
##     family = binomial, data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7003  -0.7716  -0.4804   0.8639   2.3793  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    0.17462 3393.46915   0.000    1.000  
## X130-39       15.47277 2399.54488   0.006    0.995  
## X140-49       15.05563 2399.54486   0.006    0.995  
## X150-59       14.94901 2399.54488   0.006    0.995  
## X160-69       15.69932 2399.54492   0.007    0.995  
## X170-79       15.07518 2399.54523   0.006    0.995  
## X2lt40         0.75686    0.97364   0.777    0.437  
## X2premeno      0.63348    0.50167   1.263    0.207  
## X305-Sep     -15.07161 1173.99205  -0.013    0.990  
## X315-19       -0.04204    1.30979  -0.032    0.974  
## X320-24        0.36430    1.25276   0.291    0.771  
## X325-29        0.26147    1.26614   0.207    0.836  
## X330-34        0.24074    1.25872   0.191    0.848  
## X335-39       -0.07422    1.37265  -0.054    0.957  
## X340-44       -0.27129    1.35584  -0.200    0.841  
## X345-49       -0.44835    1.82169  -0.246    0.806  
## X350-54        0.88358    1.46142   0.605    0.545  
## X3Oct-14      -2.19523    1.66642  -1.317    0.188  
## X403-May       0.74346    0.51769   1.436    0.151  
## X406-Aug       1.06533    0.70069   1.520    0.128  
## X409-Nov       1.18584    0.86413   1.372    0.170  
## X415-17        0.55522    1.01226   0.548    0.583  
## X424-26       15.65505 2399.54485   0.007    0.995  
## X4Dec-14       0.91522    1.44742   0.632    0.527  
## X5no           0.23078    0.98198   0.235    0.814  
## X5yes          0.63527    0.99801   0.637    0.524  
## X62           -0.37648    0.45381  -0.830    0.407  
## X63            0.97091    0.45789   2.120    0.034 *
## X7right       -0.39954    0.33946  -1.177    0.239  
## X8central    -17.71238 2399.54484  -0.007    0.994  
## X8left_low   -17.30972 2399.54476  -0.007    0.994  
## X8left_up    -17.46757 2399.54476  -0.007    0.994  
## X8right_low  -17.78898 2399.54483  -0.007    0.994  
## X8right_up   -16.75471 2399.54478  -0.007    0.994  
## X9yes          0.37633    0.37924   0.992    0.321  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 348.05  on 285  degrees of freedom
## Residual deviance: 276.16  on 251  degrees of freedom
## AIC: 346.16
## 
## Number of Fisher Scoring iterations: 15

Pengujian parameter secara parsial menunjukkan bahwa X6 bernilai p value < alfa 5%, artinya berpengaruh terhadap peubah respon Class karena p-value < 0,05.

model akhir Regresi Logistik Biner

model_X6 = glm(Y~X6, data = df, family = binomial)
summary(model_X6)

## 
## Call:
## glm(formula = Y ~ X6, family = binomial, data = df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2278  -0.6965  -0.6085   1.1278   1.8856  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.5926     0.3167  -5.029 4.92e-07 ***
## X62           0.2999     0.3818   0.785    0.432    
## X63           1.7104     0.3841   4.453 8.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 348.05  on 285  degrees of freedom
## Residual deviance: 317.52  on 283  degrees of freedom
## AIC: 323.52
## 
## Number of Fisher Scoring iterations: 4

Uji Kesesuaian Model

hoslem.test(df$Y, fitted(model_X6))

## Warning in Ops.factor(1, y): '-' not meaningful for factors

## Warning in hoslem.test(df$Y, fitted(model_X6)): The data did not allow for the
## requested number of bins.

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  df$Y, fitted(model_X6)
## X-squared = 286, df = 0, p-value < 2.2e-16

qchisq(0.95, 8)

## [1] 15.50731

Dapat dilihat bahwa nilai chi square (1,3027) < nilai chi square tabel (15,50731) artinya model sesuai atau tidak ada perbedaan antara hasil pengamatan dengan kemungkinan hasil prediksi.

Koefisien Determinasi

Rsq <- 1-(317.52/348.05)
Rsq

## [1] 0.08771728

Untuk mengukur kebaikan model dapat dilakukan dengan melihat nilai koefisien determinasi, nilai koefiisien determinasi pada model terbaik ini adalah 8,7%.

Odds Ratio

beta <-(coef(model_X6))
OR_beta <- exp(beta)
cbind(beta, OR_beta)

##                   beta   OR_beta
## (Intercept) -1.5926308 0.2033898
## X62          0.2998625 1.3496732
## X63          1.7104138 5.5312500

Interpretasi: 1. Tingkat keganasan sedang (2) bertambah 0, maka kecenderungan terhadap probabilitas status kekambuhan sebesar 11,34967 atau sebesar 11 kali. 2. Tingkat keganasan tinggi (3) bertambah 1, maka kecenderungan terhadap probabilitas status kekambuhan sebesar 5,5312500 atau sebesar 5 kali.

Ketepatan Klasifikasi

probabilitas <- model_X6 %>% predict(df, type = "response")
prediksi <- ifelse(probabilitas>0.5, "no-recurrence-events", "recurrence-events")
tab1<- table(predicted=prediksi, actual=df$Y)
tab1

##                       actual
## predicted              no-recurrence-events recurrence-events
##   no-recurrence-events                   40                45
##   recurrence-events                     161                40

testAcc = (sum(diag(tab1))/sum(tab1))*100
round(testAcc,2)

## [1] 27.97

print(paste(testAcc, "%"))

## [1] "27.972027972028 %"

berdasarkan pembentukan tabel ketetapan klasifikasi yang merupakan tabel frekuensi dua arah antara variabel respon dan prediktor, maka diperoleh tingkat akurasi model sebesar 27,97 %

References

https://rstudio-pubs-static.s3.amazonaws.com/950229_353f993e74f747c98e5c23727afa4ae4.html#Odd_Ratio https://rpubs.com/alfazrinb/regresi-dan-klasifikasi-logistik-biner https://rpubs.com/annisads/regresi-logistik-biner-kanker https://rpubs.com/RatriSintya/1048800 https://rpubs.com/nadiaindrswari/1048708