Data (1)

Data akan menggunakan data contoh kemarin

data <- read.csv("https://raw.githubusercontent.com/Nr5D/data/main/data_02.csv", sep=",", header = TRUE)
dplyr::glimpse(data)

## Rows: 30
## Columns: 4
## $ y  <int> 51, 57, 53, 64, 53, 45, 44, 52, 53, 50, 56, 51, 45, 38, 50, 55, ...
## $ x1 <int> 35, 32, 31, 36, 26, 26, 27, 32, 36, 33, 33, 30, 30, 26, 29, 29, ...
## $ x2 <int> 15, 10, 11, 10, 9, 10, 6, 13, 10, 14, 11, 9, 11, 8, 7, 11, 9, 10...
## $ x3 <int> 14, 18, 17, 19, 18, 15, 12, 15, 14, 14, 16, 15, 12, 12, 15, 19, ...

Namun, variabel y dari data, akan kita ubah, dari kontinu, menjadi biner. Dengan memisahkan data tersebut berdasar Mediannya

median(data$y)

## [1] 52.5

Maka dipergunakan fungsi berikut untuk mengubah y dari kontinu menjadi y biner, dimana data lebih kecil dari Median 52.5, diubah menjadi 0, selainnya diubah menjadi 1.

data$y <- ifelse(data$y <= 52.5, 0, 1)

Data (2)

Sehingga datanya menjadi

dplyr::glimpse(data)

## Rows: 30
## Columns: 4
## $ y  <dbl> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0...
## $ x1 <int> 35, 32, 31, 36, 26, 26, 27, 32, 36, 33, 33, 30, 30, 26, 29, 29, ...
## $ x2 <int> 15, 10, 11, 10, 9, 10, 6, 13, 10, 14, 11, 9, 11, 8, 7, 11, 9, 10...
## $ x3 <int> 14, 18, 17, 19, 18, 15, 12, 15, 14, 14, 16, 15, 12, 12, 15, 19, ...

Dimana

y
0 : Berpenghasilan Rendah
1 : Berpenghasilan Tinggi
x1 : usia (dalam tahun)
x2 : lama pendidikan (dalam tahun)
x3 : lama bekerja (dalam bulan)

Explorasi Data (1)

summary(data)

##        y             x1              x2             x3      
##  Min.   :0.0   Min.   :17.00   Min.   : 6.0   Min.   : 9.0  
##  1st Qu.:0.0   1st Qu.:29.00   1st Qu.: 9.0   1st Qu.:14.0  
##  Median :0.5   Median :32.00   Median :10.0   Median :15.0  
##  Mean   :0.5   Mean   :31.17   Mean   :10.2   Mean   :15.5  
##  3rd Qu.:1.0   3rd Qu.:34.00   3rd Qu.:11.0   3rd Qu.:17.0  
##  Max.   :1.0   Max.   :39.00   Max.   :15.0   Max.   :21.0

par(mfrow=c(1,3))
boxplot(x1~y,data=data, main="Boxplot x1 terhadap y", xlab="y", ylab="x1", col="darkred")
boxplot(x2~y,data=data, main="Boxplot x2 terhadap y", xlab="y", ylab="x2", col="gold")
boxplot(x3~y,data=data, main="Boxplot x3 terhadap y", xlab="y", ylab="x3", col="lightgreen")

Model Regresi Logistik (1)

Untuk Pertemuan ini, hanya akan dibahas GLM dengan lebih dari satu Peubah Bebas, untuk memudahkan, langsung dipilih \(x_2\) dan \(x_3\)

reglog<-glm(y~x2+x3,data=data,family=binomial(link = "logit"))
summary(reglog)

## 
## Call:
## glm(formula = y ~ x2 + x3, family = binomial(link = "logit"), 
##     data = data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.47842  -0.39275   0.00285   0.35130   2.11604  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -21.3185    10.0469  -2.122   0.0338 *
## x2           -0.1607     0.3558  -0.452   0.6516  
## x3            1.4856     0.6076   2.445   0.0145 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41.589  on 29  degrees of freedom
## Residual deviance: 18.292  on 27  degrees of freedom
## AIC: 24.292
## 
## Number of Fisher Scoring iterations: 7

Model Regresi Logistik (2)

library(tidyverse)
data %>%
  mutate(prob = ifelse(y == "1", 1, 0)) %>%
  ggplot(aes(x1+x2+x3, prob)) +
  geom_point(alpha = .15) +
  geom_smooth(method = "glm", method.args = list(family = "binomial")) +
  ggtitle("Logistic regression model fit") +
  xlab("x1+x2+x3") +
  ylab("Probability of Berpenghasilan Tinggi")

Interpreting Coefficient

reglog$coefficients

## (Intercept)          x2          x3 
##  -21.318486   -0.160656    1.485638

exp(reglog$coefficients)

##  (Intercept)           x2           x3 
## 5.514410e-10 8.515850e-01 4.417785e+00

confint(reglog)

## Waiting for profiling to be done...

##                   2.5 %    97.5 %
## (Intercept) -47.9199495 -6.920481
## x2           -0.9761594  0.510992
## x3            0.6303814  3.178987

Prediction

predict(reglog, newdata=data.frame(x2=15, x3=20), type="response")

##         1 
## 0.9974887

predict(reglog, newdata=data.frame(x2=15, x3=15), type="response")

##         1 
## 0.1909654

Jika diketahui \(x_2 = 15\) & \(x_3=20\), maka Peluang tergolong berpenghasilan tinggi sebesar 99.74%

Jika diketahui \(x_2 = 15\) & \(x_3=15\), maka Peluang tergolong berpenghasilan tinggi sebesar 19.09%

Uji pada Regresi Logistik

Wald Test

# Membuat Fungsi uji.wald
uji.wald<-function(x){
 koef<-x$coefficients
 se<-summary(x)$coefficients[,2]
 wald.stat<-(koef/se)^2
 pval<-pchisq(wald.stat,1,lower.tail = F)
 exp.koef<-exp(koef)
 print(round(data.frame(koef,se,wald.stat,pval,exp.koef),4))
}

uji.wald(reglog)

##                 koef      se wald.stat   pval exp.koef
## (Intercept) -21.3185 10.0469    4.5024 0.0338   0.0000
## x2           -0.1607  0.3558    0.2039 0.6516   0.8516
## x3            1.4856  0.6076    5.9789 0.0145   4.4178

Hoslem Test

#hosmer lemeshow test
ResourceSelection::hoslem.test(reglog$y,reglog$fitted.values, g=5)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  reglog$y, reglog$fitted.values
## X-squared = 2.3095, df = 3, p-value = 0.5107

Ukuran Kebaikan Model (1)

Pseudo \(R^2\)

pscl::pR2(reglog)["McFadden"]

## fitting null model for pseudo-r2

##  McFadden 
## 0.5601757

car::vif(reglog)

##       x2       x3 
## 1.000034 1.000034

Materi 4 : Implementasi GLM (Y Biner dengan Beberapa Peubah Bebas) dalam R