1 Pendahuluan

1.1 Generalized Linear Model (GLM)

Tuliskan konsep GLM, fungsi link, dan distribusi keluarga eksponensial.

1.2 Klasifikasi Regresi Logistik

Model Respon
Logistik Biner 2 kategori
Multinomial >2 kategori nominal
Ordinal >2 kategori berurutan
Poisson Data hitung

2 Regresi Logistik Biner

2.1 Dataset Heart Failure

data <- read.csv("heart_failure_clinical_records_dataset_reglog biner.csv")
str(data)
## 'data.frame':    299 obs. of  13 variables:
##  $ age                     : num  75 55 65 50 65 90 75 60 65 80 ...
##  $ anaemia                 : int  0 0 0 1 1 1 1 1 0 1 ...
##  $ creatinine_phosphokinase: int  582 7861 146 111 160 47 246 315 157 123 ...
##  $ diabetes                : int  0 0 0 0 1 0 0 1 0 0 ...
##  $ ejection_fraction       : int  20 38 20 20 20 40 15 60 65 35 ...
##  $ high_blood_pressure     : int  1 0 0 0 0 1 0 0 0 1 ...
##  $ platelets               : num  265000 263358 162000 210000 327000 ...
##  $ serum_creatinine        : num  1.9 1.1 1.3 1.9 2.7 2.1 1.2 1.1 1.5 9.4 ...
##  $ serum_sodium            : int  130 136 129 137 116 132 137 131 138 133 ...
##  $ sex                     : int  1 1 1 1 0 1 1 1 0 1 ...
##  $ smoking                 : int  0 0 1 0 0 1 0 1 0 1 ...
##  $ time                    : int  4 6 7 7 8 8 10 10 10 10 ...
##  $ DEATH_EVENT             : int  1 1 1 1 1 1 1 1 1 1 ...
summary(data)
##       age           anaemia       creatinine_phosphokinase    diabetes     
##  Min.   :40.00   Min.   :0.0000   Min.   :  23.0           Min.   :0.0000  
##  1st Qu.:51.00   1st Qu.:0.0000   1st Qu.: 116.5           1st Qu.:0.0000  
##  Median :60.00   Median :0.0000   Median : 250.0           Median :0.0000  
##  Mean   :60.83   Mean   :0.4314   Mean   : 581.8           Mean   :0.4181  
##  3rd Qu.:70.00   3rd Qu.:1.0000   3rd Qu.: 582.0           3rd Qu.:1.0000  
##  Max.   :95.00   Max.   :1.0000   Max.   :7861.0           Max.   :1.0000  
##  ejection_fraction high_blood_pressure   platelets      serum_creatinine
##  Min.   :14.00     Min.   :0.0000      Min.   : 25100   Min.   :0.500   
##  1st Qu.:30.00     1st Qu.:0.0000      1st Qu.:212500   1st Qu.:0.900   
##  Median :38.00     Median :0.0000      Median :262000   Median :1.100   
##  Mean   :38.08     Mean   :0.3512      Mean   :263358   Mean   :1.394   
##  3rd Qu.:45.00     3rd Qu.:1.0000      3rd Qu.:303500   3rd Qu.:1.400   
##  Max.   :80.00     Max.   :1.0000      Max.   :850000   Max.   :9.400   
##   serum_sodium        sex            smoking            time      
##  Min.   :113.0   Min.   :0.0000   Min.   :0.0000   Min.   :  4.0  
##  1st Qu.:134.0   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 73.0  
##  Median :137.0   Median :1.0000   Median :0.0000   Median :115.0  
##  Mean   :136.6   Mean   :0.6488   Mean   :0.3211   Mean   :130.3  
##  3rd Qu.:140.0   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:203.0  
##  Max.   :148.0   Max.   :1.0000   Max.   :1.0000   Max.   :285.0  
##   DEATH_EVENT    
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.3211  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

2.2 Eksplorasi Data

2.2.1 Missing Value

colSums(is.na(data))
##                      age                  anaemia creatinine_phosphokinase 
##                        0                        0                        0 
##                 diabetes        ejection_fraction      high_blood_pressure 
##                        0                        0                        0 
##                platelets         serum_creatinine             serum_sodium 
##                        0                        0                        0 
##                      sex                  smoking                     time 
##                        0                        0                        0 
##              DEATH_EVENT 
##                        0

2.2.2 Statistik Deskriptif

summary(data)
##       age           anaemia       creatinine_phosphokinase    diabetes     
##  Min.   :40.00   Min.   :0.0000   Min.   :  23.0           Min.   :0.0000  
##  1st Qu.:51.00   1st Qu.:0.0000   1st Qu.: 116.5           1st Qu.:0.0000  
##  Median :60.00   Median :0.0000   Median : 250.0           Median :0.0000  
##  Mean   :60.83   Mean   :0.4314   Mean   : 581.8           Mean   :0.4181  
##  3rd Qu.:70.00   3rd Qu.:1.0000   3rd Qu.: 582.0           3rd Qu.:1.0000  
##  Max.   :95.00   Max.   :1.0000   Max.   :7861.0           Max.   :1.0000  
##  ejection_fraction high_blood_pressure   platelets      serum_creatinine
##  Min.   :14.00     Min.   :0.0000      Min.   : 25100   Min.   :0.500   
##  1st Qu.:30.00     1st Qu.:0.0000      1st Qu.:212500   1st Qu.:0.900   
##  Median :38.00     Median :0.0000      Median :262000   Median :1.100   
##  Mean   :38.08     Mean   :0.3512      Mean   :263358   Mean   :1.394   
##  3rd Qu.:45.00     3rd Qu.:1.0000      3rd Qu.:303500   3rd Qu.:1.400   
##  Max.   :80.00     Max.   :1.0000      Max.   :850000   Max.   :9.400   
##   serum_sodium        sex            smoking            time      
##  Min.   :113.0   Min.   :0.0000   Min.   :0.0000   Min.   :  4.0  
##  1st Qu.:134.0   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 73.0  
##  Median :137.0   Median :1.0000   Median :0.0000   Median :115.0  
##  Mean   :136.6   Mean   :0.6488   Mean   :0.3211   Mean   :130.3  
##  3rd Qu.:140.0   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:203.0  
##  Max.   :148.0   Max.   :1.0000   Max.   :1.0000   Max.   :285.0  
##   DEATH_EVENT    
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.3211  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

2.3 Pengujian Asumsi

2.3.1 Multikolinearitas

# isi setelah model awal dibuat

2.3.2 Linearitas Logit

# Box-Tidwell Test

2.3.3 Outlier dan Observasi Berpengaruh

# Cook Distance
# Leverage
# DFBETA

2.4 Model Awal

# glm(..., family = binomial)

2.5 Seleksi Model

2.5.1 Backward AIC

# stepAIC()

2.5.2 Forward AIC

# stepAIC()

2.5.3 Stepwise

# stepAIC()

2.6 Goodness of Fit

2.6.1 Likelihood Ratio Test

# anova(..., test="Chisq")

2.6.2 Hosmer-Lemeshow

# hoslem.test()

2.6.3 Pseudo R-Square

# pR2()

2.7 Evaluasi Klasifikasi

2.7.1 Confusion Matrix

# table()

2.7.2 ROC dan AUC

# roc()

2.8 Interpretasi Odds Ratio

# exp(coef(model))

2.9 Kesimpulan

Tuliskan kesimpulan model biner.

3 Regresi Logistik Multinomial

3.1 Dataset Dunia Nyata

Sebutkan sumber dataset.

3.2 Eksplorasi Data

# EDA

3.3 Pengujian Asumsi

3.3.1 Multikolinearitas

# vif()

3.3.2 Independence of Irrelevant Alternatives (IIA)

# Hausman-McFadden

3.4 Model Multinomial

# multinom()

3.5 Goodness of Fit

# LR Test
# AIC
# BIC
# McFadden R2

3.6 Prediksi

# confusion matrix

3.7 Relative Risk Ratio

# exp(coef(model))

3.8 Kesimpulan

4 Regresi Logistik Ordinal

4.1 Dataset Dunia Nyata

Sebutkan sumber dataset.

4.2 Eksplorasi Data

# EDA

4.3 Pengujian Asumsi

4.3.1 Multikolinearitas

# vif()

4.3.2 Proportional Odds Assumption

# brant()

4.4 Model Ordinal

# polr()

4.5 Goodness of Fit

# LR test
# AIC
# BIC

4.6 Prediksi

# confusion matrix

4.7 Odds Ratio

# exp(coef(model))

4.8 Kesimpulan

5 Regresi Poisson

5.1 Dataset Dunia Nyata

Sebutkan sumber dataset.

5.2 Eksplorasi Data

# histogram

5.3 Pengujian Asumsi

5.3.1 Equidispersion

# dispersiontest()

5.3.2 Overdispersion

# cek overdispersion

5.4 Model Poisson

# glm(..., family = poisson)

5.5 Goodness of Fit

# deviance
# pearson chi-square

5.6 Negative Binomial (Pembanding)

# glm.nb()

5.7 Incidence Rate Ratio

# exp(coef(model))

5.8 Kesimpulan

6 Perbandingan Empat Model

Aspek Biner Multinomial Ordinal Poisson
Respon
Link Function
Interpretasi
Asumsi Khas

7 Penutup

Ringkasan seluruh hasil analisis dan implikasi statistik.