OVERDISPERSI
Zero Inflated Poisson dan Zero Inflated Negative Binomial
Pendahuluan
Pembahasan kali ini mengenai Overdispersi. Kondisi Overdispersi dapat disebabkan oleh banyaknya data amatan bernilai nol pada peubah respon (excess zeros). Pemodelan dengan Regresi Poisson menjadi tidak baik jika data mengalami overdispersi, sehingga dapat diatasi dengan menggunakan Zero Inflated Poisson (ZIP) dan Zero Inflated Negative Binomial (ZINB). Selanjutnya akan dipilih model terbaik dari model ZIP dan model ZINB.
Data
Data yang digunakan yaitu Data Angka Kematian Bayi di Kota Bandung Tahun 2019.
Peubah
Peubah yang digunakan yaitu :
X1 = Persentase berat badan bayi lahir rendah
X2 = Persentase bayi mendapat vit A sebanyak dua kali
X3 = Persentase bayi yang diberikan ASI eksklusif
X4 = Persentase penduduk dengan akses sanitasi layak
## Warning: package 'readxl' was built under R version 4.0.5
data_tugas_glm_3 <- read_excel("D:/1. Kuliah S2 (STATISTIKA DAN SAINS DATA)/Materi Kuliah/SEMESTER 3/2. Model Linier Terampat/1. Model Linier Terampat/TB GLM/Tugas akhir ujian GLM 3/data tugas glm 3.xlsx")
data_tugas_glm_3 <- data.frame(data_tugas_glm_3)## Warning: package 'car' was built under R version 4.0.4
## Loading required package: carData
## X1 X2 X3 X4
## 1.047438 1.012865 1.045982 1.010114
Nilai vif tersebut menunjukkan tidak terdapat multikolineritas.
Model Poisson
model_poisson <- glm(formula=Y~X1+X2+X3+X4, family=poisson(link="log"), data=data_tugas_glm_3 )
summary(model_poisson)##
## Call:
## glm(formula = Y ~ X1 + X2 + X3 + X4, family = poisson(link = "log"),
## data = data_tugas_glm_3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.90494 -0.81374 -0.64609 -0.04052 2.90079
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -11.882605 5.364836 -2.215 0.0268 *
## X1 0.094465 0.052911 1.785 0.0742 .
## X2 0.101309 0.054530 1.858 0.0632 .
## X3 0.021960 0.010946 2.006 0.0448 *
## X4 -0.011632 0.007199 -1.616 0.1062
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 98.171 on 79 degrees of freedom
## Residual deviance: 79.401 on 75 degrees of freedom
## AIC: 131.56
##
## Number of Fisher Scoring iterations: 6
## Warning: package 'AER' was built under R version 4.0.5
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 4.0.5
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
## Warning: package 'survival' was built under R version 4.0.5
##
## Overdispersion test
##
## data: model_poisson
## z = 1.7945, p-value = 0.03637
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 1.349184
Nilai dispersi tersebut menunjukkan data mengalami overdispersi, sehingga akan digunakan ZIP dan ZINB untuk mengatasinya.
Model ZIP
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
##
## Call:
## zeroinfl(formula = Y ~ X1 + X2 + X3 + X4, data = data_tugas_glm_3)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.786184 -0.535159 -0.389722 -0.002717 3.736573
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.961567 8.729688 -0.454 0.6500
## X1 0.137588 0.054096 2.543 0.0110 *
## X2 0.023215 0.086173 0.269 0.7876
## X3 0.033066 0.013777 2.400 0.0164 *
## X4 -0.020667 0.008056 -2.565 0.0103 *
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 14.42553 13.20079 1.093 0.274
## X1 0.08902 0.16005 0.556 0.578
## X2 -0.15700 0.14038 -1.118 0.263
## X3 0.02645 0.04344 0.609 0.543
## X4 -0.02617 0.02438 -1.073 0.283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 43
## Log-likelihood: -55.1 on 10 Df
## Likelihood ratio test
##
## Model 1: Y ~ X1 + X2 + X3 + X4
## Model 2: Y ~ 1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 10 -55.097
## 2 2 -63.042 -8 15.89 0.04398 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Call:
## zeroinfl(formula = Y ~ X1 + X3 + X4, data = data_tugas_glm_3)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.6947 -0.5084 -0.4258 -0.2746 3.5800
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.661144 1.390239 -1.195 0.2321
## X1 0.132537 0.057696 2.297 0.0216 *
## X3 0.032418 0.016701 1.941 0.0523 .
## X4 -0.019402 0.008746 -2.218 0.0265 *
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.45624 2.59998 0.175 0.861
## X1 0.02911 0.13814 0.211 0.833
## X3 0.01100 0.03430 0.321 0.749
## X4 -0.02025 0.01995 -1.015 0.310
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 16
## Log-likelihood: -57.83 on 8 Df
## 'log Lik.' 131.6688 (df=8)
Model ZINB
library(pscl)
model_zinb<- zeroinfl(formula=Y~X1+X2+X3+X4, data=data_tugas_glm_3, dist="negbin")
summary(model_zinb)##
## Call:
## zeroinfl(formula = Y ~ X1 + X2 + X3 + X4, data = data_tugas_glm_3, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.786100 -0.535160 -0.389756 -0.002722 3.736775
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.963948 8.731515 -0.454 0.6498
## X1 0.137585 0.054100 2.543 0.0110 *
## X2 0.023239 0.086190 0.270 0.7874
## X3 0.033065 0.013778 2.400 0.0164 *
## X4 -0.020666 0.008056 -2.565 0.0103 *
## Log(theta) 10.144399 107.250726 0.095 0.9246
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 14.41745 13.20188 1.092 0.275
## X1 0.08899 0.16004 0.556 0.578
## X2 -0.15692 0.14039 -1.118 0.264
## X3 0.02645 0.04344 0.609 0.543
## X4 -0.02617 0.02438 -1.073 0.283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 25448.1559
## Number of iterations in BFGS optimization: 47
## Log-likelihood: -55.1 on 11 Df
## Likelihood ratio test
##
## Model 1: Y ~ X1 + X2 + X3 + X4
## Model 2: Y ~ 1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 11 -55.097
## 2 3 -62.788 -8 15.382 0.05213 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(pscl)
model_zinb<- zeroinfl(formula=Y~X1+X3+X4, data=data_tugas_glm_3, dist="negbin")
summary(model_zinb)##
## Call:
## zeroinfl(formula = Y ~ X1 + X3 + X4, data = data_tugas_glm_3, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.6948 -0.5084 -0.4257 -0.2746 3.5801
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.660470 1.390254 -1.194 0.2323
## X1 0.132545 0.057698 2.297 0.0216 *
## X3 0.032409 0.016702 1.940 0.0523 .
## X4 -0.019402 0.008747 -2.218 0.0265 *
## Log(theta) 9.545452 80.239041 0.119 0.9053
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.45877 2.59922 0.177 0.860
## X1 0.02910 0.13813 0.211 0.833
## X3 0.01097 0.03429 0.320 0.749
## X4 -0.02025 0.01995 -1.015 0.310
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 13980.9634
## Number of iterations in BFGS optimization: 26
## Log-likelihood: -57.83 on 9 Df
## 'log Lik.' 133.6691 (df=9)
Pemilihan Model Terbaik
## AICzip AICzinb
## 1 131.6688 133.6691
Model dengan nilai AIC terkecil merupakan model yang terbaik. Terlihat bahwasanya nilai AIC model ZIP lebih kecil dari nilai AIC model ZINB. Sehingga model ZIP lebih baik digunakan dalam mengatasi overdispersi pada data angka kematian bayi di Kota Bandung Tahun 2019.
IPB University, nadirabelinda@apps.ipb.ac.id↩︎