Materi 3 : Implementasi GLM (Y Biner dengan satu Peubah Bebas) dalam R

Akbar Rizki, S.Stat. M.Si

6 Februari 2021 10.00-12.00 WIB

Data (1)

Data akan menggunakan data contoh kemarin

data <- read.csv("https://raw.githubusercontent.com/Nr5D/data/main/data_02.csv", sep=",", header = TRUE)
dplyr::glimpse(data)
## Rows: 30
## Columns: 4
## $ y  <int> 51, 57, 53, 64, 53, 45, 44, 52, 53, 50, 56, 51, 45, 38, 50, 55, ...
## $ x1 <int> 35, 32, 31, 36, 26, 26, 27, 32, 36, 33, 33, 30, 30, 26, 29, 29, ...
## $ x2 <int> 15, 10, 11, 10, 9, 10, 6, 13, 10, 14, 11, 9, 11, 8, 7, 11, 9, 10...
## $ x3 <int> 14, 18, 17, 19, 18, 15, 12, 15, 14, 14, 16, 15, 12, 12, 15, 19, ...

Namun, variabel y dari data, akan kita ubah, dari kontinu, menjadi biner. Dengan memisahkan data tersebut berdasar Mediannya

median(data$y)
## [1] 52.5

Maka dipergunakan fungsi berikut untuk mengubah y dari kontinu menjadi y biner, dimana data lebih kecil dari Median 52.5, diubah menjadi 0, selainnya diubah menjadi 1.

data$y <- ifelse(data$y <= 52.5, 0, 1)

Data (2)

Sehingga datanya menjadi

dplyr::glimpse(data)
## Rows: 30
## Columns: 4
## $ y  <dbl> 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0...
## $ x1 <int> 35, 32, 31, 36, 26, 26, 27, 32, 36, 33, 33, 30, 30, 26, 29, 29, ...
## $ x2 <int> 15, 10, 11, 10, 9, 10, 6, 13, 10, 14, 11, 9, 11, 8, 7, 11, 9, 10...
## $ x3 <int> 14, 18, 17, 19, 18, 15, 12, 15, 14, 14, 16, 15, 12, 12, 15, 19, ...

Dimana

y
0 : Berpenghasilan Rendah
1 : Berpenghasilan Tinggi
x1 :
x2 :
x3 :

Explorasi Data (1)

summary(data)
##        y             x1              x2             x3      
##  Min.   :0.0   Min.   :17.00   Min.   : 6.0   Min.   : 9.0  
##  1st Qu.:0.0   1st Qu.:29.00   1st Qu.: 9.0   1st Qu.:14.0  
##  Median :0.5   Median :32.00   Median :10.0   Median :15.0  
##  Mean   :0.5   Mean   :31.17   Mean   :10.2   Mean   :15.5  
##  3rd Qu.:1.0   3rd Qu.:34.00   3rd Qu.:11.0   3rd Qu.:17.0  
##  Max.   :1.0   Max.   :39.00   Max.   :15.0   Max.   :21.0
par(mfrow=c(1,3))
boxplot(x1~y,data=data, main="Boxplot x1 terhadap y", xlab="y", ylab="x1", col="darkred")
boxplot(x2~y,data=data, main="Boxplot x2 terhadap y", xlab="y", ylab="x2", col="gold")
boxplot(x3~y,data=data, main="Boxplot x3 terhadap y", xlab="y", ylab="x3", col="lightgreen")

Model Regresi Logistik (1)

Untuk Pertemuan ini, hanya akan dibahas GLM dengan satu Peubah Bebas, untuk memudahkan, langsung dipilih \(x_1\)

reglog<-glm(y~x1,data=data,family=binomial)
summary(reglog)
## 
## Call:
## glm(formula = y ~ x1, family = binomial, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8118  -0.7909   0.1147   0.7843   2.1537  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -12.7360     4.9519  -2.572  0.01011 * 
## x1            0.4046     0.1561   2.593  0.00952 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 41.589  on 29  degrees of freedom
## Residual deviance: 28.578  on 28  degrees of freedom
## AIC: 32.578
## 
## Number of Fisher Scoring iterations: 5

Model Regresi Logistik (2)

library(tidyverse)
data %>%
  mutate(prob = ifelse(y == "1", 1, 0)) %>%
  ggplot(aes(x1, prob)) +
  geom_point(alpha = .15) +
  geom_smooth(method = "glm", method.args = list(family = "binomial")) +
  ggtitle("Logistic regression model fit") +
  xlab("x1") +
  ylab("Probability of Berpenghasilan Tinggi")

Interpreting Coefficient

reglog$coefficients
## (Intercept)          x1 
## -12.7359728   0.4046261
exp(reglog$coefficients)
##  (Intercept)           x1 
## 2.943319e-06 1.498742e+00
confint(reglog)
## Waiting for profiling to be done...
##                   2.5 %     97.5 %
## (Intercept) -24.6435725 -4.7213428
## x1            0.1522637  0.7801682

Prediction

predict(reglog, newdata=data.frame(x1=50), type="response")
##         1 
## 0.9994446
predict(reglog, newdata=data.frame(x1=30), type="response")
##         1 
## 0.3549868

Jika diketahui \(x_1 = 50\), maka Peluang tergolong berpenghasilan tinggi sebesar 99.94%

Jika diketahui \(x_1 = 30\), maka Peluang tergolong berpenghasilan tinggi sebesar 35.49%

Uji pada Regresi Logistik

Wald Test

# Membuat Fungsi uji.wald
uji.wald<-function(x){
 koef<-x$coefficients
 se<-summary(x)$coefficients[,2]
 wald.stat<-(koef/se)^2
 pval<-pchisq(wald.stat,1,lower.tail = F)
 exp.koef<-exp(koef)
 print(round(data.frame(koef,se,wald.stat,pval,exp.koef),4))
}
uji.wald(reglog)
##                 koef     se wald.stat   pval exp.koef
## (Intercept) -12.7360 4.9519    6.6148 0.0101   0.0000
## x1            0.4046 0.1561    6.7231 0.0095   1.4987

Hoslem Test

#hosmer lemeshow test
ResourceSelection::hoslem.test(reglog$y,reglog$fitted.values, g=5)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  reglog$y, reglog$fitted.values
## X-squared = 2.247, df = 3, p-value = 0.5228

Ukuran Kebaikan Model (1)

Pseudo \(R^2\)

pscl::pR2(reglog)["McFadden"]
## fitting null model for pseudo-r2
##  McFadden 
## 0.3128397
# car::vif(reglog)

Ukuran Kebaikan Model (2)

Plot ROC

InformationValue::plotROC(data$y=="1", reglog$fitted.values)