OVERDISPERSI

Zero Inflated Poisson dan Zero Inflated Negative Binomial

Pendahuluan

Pembahasan kali ini mengenai Overdispersi. Kondisi Overdispersi dapat disebabkan oleh banyaknya data amatan bernilai nol pada peubah respon (excess zeros). Pemodelan dengan Regresi Poisson menjadi tidak baik jika data mengalami overdispersi, sehingga dapat diatasi dengan menggunakan Zero Inflated Poisson (ZIP) dan Zero Inflated Negative Binomial (ZINB). Selanjutnya akan dipilih model terbaik dari model ZIP dan model ZINB.

Data

Data yang digunakan yaitu Data Angka Kematian Bayi di Kota Bandung Tahun 2019.

Peubah

Peubah yang digunakan yaitu :

X1 = Persentase berat badan bayi lahir rendah

X2 = Persentase bayi mendapat vit A sebanyak dua kali

X3 = Persentase bayi yang diberikan ASI eksklusif

X4 = Persentase penduduk dengan akses sanitasi layak

library(readxl)
## Warning: package 'readxl' was built under R version 4.0.5
data_tugas_glm_3 <- read_excel("D:/1. Kuliah S2 (STATISTIKA DAN SAINS DATA)/Materi Kuliah/SEMESTER 3/2. Model Linier Terampat/1. Model Linier Terampat/TB GLM/Tugas akhir ujian GLM 3/data tugas glm 3.xlsx")
data_tugas_glm_3 <- data.frame(data_tugas_glm_3)
## Uji Multikolinieritas
library(car)
## Warning: package 'car' was built under R version 4.0.4
## Loading required package: carData
vif(lm(Y~X1+X2+X3+X4, data=data_tugas_glm_3 ))
##       X1       X2       X3       X4 
## 1.047438 1.012865 1.045982 1.010114

Nilai vif tersebut menunjukkan tidak terdapat multikolineritas.

Model Poisson

model_poisson <- glm(formula=Y~X1+X2+X3+X4, family=poisson(link="log"), data=data_tugas_glm_3 )
summary(model_poisson)
## 
## Call:
## glm(formula = Y ~ X1 + X2 + X3 + X4, family = poisson(link = "log"), 
##     data = data_tugas_glm_3)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.90494  -0.81374  -0.64609  -0.04052   2.90079  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -11.882605   5.364836  -2.215   0.0268 *
## X1            0.094465   0.052911   1.785   0.0742 .
## X2            0.101309   0.054530   1.858   0.0632 .
## X3            0.021960   0.010946   2.006   0.0448 *
## X4           -0.011632   0.007199  -1.616   0.1062  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 98.171  on 79  degrees of freedom
## Residual deviance: 79.401  on 75  degrees of freedom
## AIC: 131.56
## 
## Number of Fisher Scoring iterations: 6
# Uji Overdispersi
library(AER)
## Warning: package 'AER' was built under R version 4.0.5
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 4.0.5
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
## Warning: package 'survival' was built under R version 4.0.5
dispersiontest(model_poisson)
## 
##  Overdispersion test
## 
## data:  model_poisson
## z = 1.7945, p-value = 0.03637
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion 
##   1.349184

Nilai dispersi tersebut menunjukkan data mengalami overdispersi, sehingga akan digunakan ZIP dan ZINB untuk mengatasinya.

Model ZIP

library(pscl)
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
model_zip <- zeroinfl(formula=Y~X1+X2+X3+X4, data=data_tugas_glm_3)
summary(model_zip)
## 
## Call:
## zeroinfl(formula = Y ~ X1 + X2 + X3 + X4, data = data_tugas_glm_3)
## 
## Pearson residuals:
##       Min        1Q    Median        3Q       Max 
## -0.786184 -0.535159 -0.389722 -0.002717  3.736573 
## 
## Count model coefficients (poisson with log link):
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -3.961567   8.729688  -0.454   0.6500  
## X1           0.137588   0.054096   2.543   0.0110 *
## X2           0.023215   0.086173   0.269   0.7876  
## X3           0.033066   0.013777   2.400   0.0164 *
## X4          -0.020667   0.008056  -2.565   0.0103 *
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 14.42553   13.20079   1.093    0.274
## X1           0.08902    0.16005   0.556    0.578
## X2          -0.15700    0.14038  -1.118    0.263
## X3           0.02645    0.04344   0.609    0.543
## X4          -0.02617    0.02438  -1.073    0.283
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 43 
## Log-likelihood: -55.1 on 10 Df
lrtest(model_zip)
## Likelihood ratio test
## 
## Model 1: Y ~ X1 + X2 + X3 + X4
## Model 2: Y ~ 1
##   #Df  LogLik Df Chisq Pr(>Chisq)  
## 1  10 -55.097                      
## 2   2 -63.042 -8 15.89    0.04398 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model_zip1 <- zeroinfl(Y~X1+X3+X4, data=data_tugas_glm_3)
summary(model_zip1)
## 
## Call:
## zeroinfl(formula = Y ~ X1 + X3 + X4, data = data_tugas_glm_3)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6947 -0.5084 -0.4258 -0.2746  3.5800 
## 
## Count model coefficients (poisson with log link):
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.661144   1.390239  -1.195   0.2321  
## X1           0.132537   0.057696   2.297   0.0216 *
## X3           0.032418   0.016701   1.941   0.0523 .
## X4          -0.019402   0.008746  -2.218   0.0265 *
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.45624    2.59998   0.175    0.861
## X1           0.02911    0.13814   0.211    0.833
## X3           0.01100    0.03430   0.321    0.749
## X4          -0.02025    0.01995  -1.015    0.310
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 16 
## Log-likelihood: -57.83 on 8 Df
AICzip <- -2*logLik(model_zip1 ) + 2*8
AICzip
## 'log Lik.' 131.6688 (df=8)

Model ZINB

library(pscl)
model_zinb<- zeroinfl(formula=Y~X1+X2+X3+X4, data=data_tugas_glm_3, dist="negbin")
summary(model_zinb)
## 
## Call:
## zeroinfl(formula = Y ~ X1 + X2 + X3 + X4, data = data_tugas_glm_3, dist = "negbin")
## 
## Pearson residuals:
##       Min        1Q    Median        3Q       Max 
## -0.786100 -0.535160 -0.389756 -0.002722  3.736775 
## 
## Count model coefficients (negbin with log link):
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -3.963948   8.731515  -0.454   0.6498  
## X1            0.137585   0.054100   2.543   0.0110 *
## X2            0.023239   0.086190   0.270   0.7874  
## X3            0.033065   0.013778   2.400   0.0164 *
## X4           -0.020666   0.008056  -2.565   0.0103 *
## Log(theta)   10.144399 107.250726   0.095   0.9246  
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) 14.41745   13.20188   1.092    0.275
## X1           0.08899    0.16004   0.556    0.578
## X2          -0.15692    0.14039  -1.118    0.264
## X3           0.02645    0.04344   0.609    0.543
## X4          -0.02617    0.02438  -1.073    0.283
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 25448.1559 
## Number of iterations in BFGS optimization: 47 
## Log-likelihood: -55.1 on 11 Df
lrtest(model_zinb)
## Likelihood ratio test
## 
## Model 1: Y ~ X1 + X2 + X3 + X4
## Model 2: Y ~ 1
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  11 -55.097                       
## 2   3 -62.788 -8 15.382    0.05213 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(pscl)
model_zinb<- zeroinfl(formula=Y~X1+X3+X4, data=data_tugas_glm_3, dist="negbin")
summary(model_zinb)
## 
## Call:
## zeroinfl(formula = Y ~ X1 + X3 + X4, data = data_tugas_glm_3, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6948 -0.5084 -0.4257 -0.2746  3.5801 
## 
## Count model coefficients (negbin with log link):
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.660470   1.390254  -1.194   0.2323  
## X1           0.132545   0.057698   2.297   0.0216 *
## X3           0.032409   0.016702   1.940   0.0523 .
## X4          -0.019402   0.008747  -2.218   0.0265 *
## Log(theta)   9.545452  80.239041   0.119   0.9053  
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.45877    2.59922   0.177    0.860
## X1           0.02910    0.13813   0.211    0.833
## X3           0.01097    0.03429   0.320    0.749
## X4          -0.02025    0.01995  -1.015    0.310
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 13980.9634 
## Number of iterations in BFGS optimization: 26 
## Log-likelihood: -57.83 on 9 Df
AICzinb <- -2*logLik(model_zinb) + 2*9
AICzinb
## 'log Lik.' 133.6691 (df=9)

Pemilihan Model Terbaik

AIC <- data.frame(AICzip,AICzinb)
AIC
##     AICzip  AICzinb
## 1 131.6688 133.6691

Model dengan nilai AIC terkecil merupakan model yang terbaik. Terlihat bahwasanya nilai AIC model ZIP lebih kecil dari nilai AIC model ZINB. Sehingga model ZIP lebih baik digunakan dalam mengatasi overdispersi pada data angka kematian bayi di Kota Bandung Tahun 2019.