Examples of Zero-Inflated Poisson regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

import data

library(ggplot2)
## Warning: The package `vctrs` (>= 0.3.8) is required as of rlang 1.0.0.
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'tibble'
## Warning: replacing previous import 'lifecycle::last_warnings' by
## 'rlang::last_warnings' when loading 'pillar'
fishing <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/fishing.csv")
head(fishing)
##   X nofish livebait camper persons child         xb         zg count
## 1 1      1        0      0       1     0 -0.8963146  3.0504048     0
## 2 2      0        1      1       1     0 -0.5583450  1.7461489     0
## 3 3      0        1      0       1     0 -0.4017310  0.2799389     0
## 4 4      0        1      1       2     1 -0.9562981 -0.6015257     0
## 5 5      0        1      0       1     0  0.4368910  0.5277091     1
## 6 6      0        1      1       4     2  1.3944855 -0.7075348     0
## histogram with x axis 
ggplot(fishing, aes(count)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## histogram with x axis in log10 scale
ggplot(fishing, aes(count)) + geom_histogram() + scale_x_log10()
## Warning: Transformation introduced infinite values in continuous x-axis
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 142 rows containing non-finite values (stat_bin).

## Run a Zero-Inflated Poisson Regression Analysis to predict number of fish caught
library(pscl)
## Warning: package 'pscl' was built under R version 4.0.5
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
newdata1 <- expand.grid(0:3, factor(0:1), 1:4)
colnames(newdata1) <- c("child", "camper", "persons")
newdata1 <- subset(newdata1, subset=(child<=persons))
#head(newdata1)
## Create a Model
m1 <- zeroinfl(count ~ child + camper | persons, data = fishing)

## Make Predictions
newdata1$phat <- predict(m1, newdata1)
ggplot(newdata1, aes(x = child, y = phat, colour = factor(persons))) +
  geom_point() +
  geom_line() +
  facet_wrap(~camper) +
  labs(x = "Number of Children", y = "Predicted Fish Caught")

Key

  • People without camper vans on the left ( 0 )
  • People with camper vans on the right (1)