Count Models

Poisson, Negative Binomial, Zero-Inflation, and Hurdle Models

Christopher Weber

2025-10-27

Introduction

  • This lecture follows Long (1997, Chapters 7-8).

  • Many variables in the social sciences consist of integer counts.

  • E.g., number of times a candidate runs for office, frequency of conflict, number of terror attacks, number of war casualties, number of positive statements about a candidate, number of homicides in a city, etc.

  • Least squares is inappropriate for the reasons we’ve already discussed: \(\textbf{Nonlinearity, nonadditivity and heteroskedasticity}\).

  • Instead, we need models that are designed to handle count data.

  • Starting with the most straightforward, let’s consider the Poisson Distribution and by extension, the Poisson Regression Model.

  • Then, we’ll extend the model to the Negative Binomial Regression Model to account for overdispersion.

The Poisson Distribution

  • The Poisson Distribution is governed by one parameter, \(\mu\), commonly called the “rate parameter.”

  • The variance of the Poisson Distribution is also governed by \(\mu\).

\[p(y|\mu)={{exp(-\mu)\mu^y}\over{y!}}\]

The Poisson Distribution

The Poission Distribution

The Poission Distribution

  • As \(\mu\) increases, the distribution becomes more symmetric and approaches a normal distribution.

  • As \(\mu\) increases, the variance increases. A model built from the Poisson Distribution is heteroskedastic.

  • As \(\mu\) decreases, the distribution becomes more right-skewed.

  • As \(\mu\) decreases, the variance decreases.

Outline

  • The Poisson Regression Model (PRM) is built from the Poisson Distribution. The model allows us to capture the distribution of integer counts, which are typically right skewed.

  • The PRM relies on strong assumption: \(E(\mu)=var(y)\). If the data violate this assumption, the variance estimates will be biased, as will test statistics and hypothesis tests.

  • Overdispersion. Does the Poisson adequately capture the variance in the data?

  • Zero Inflation. Does the Poisson adequately capture the number of zeros in the data?

  • Key Extensions: Negative binomial and zero-inflated models.

The Poisson Regression Model

  • Absent covariates, the Poisson distribution is given by

\[p(y|\mu)={{exp(-\mu)\mu^y}\over{y!}}\]

  • If \(y\) takes on values from 0, 1, 2, 3, etc. (Long 1997, p. 218)
  • In this density, the only parameter that governs the shape of the density is \(\mu\) or the “rate” parameter
  • A characteristic of the poisson distribution is that \(E(y)=var(y)=\mu\), an assumption called equidispersion

The Poisson regression Model

  • If \(\mu\) is the rate parameter, we can then model \(\mu_i\) based on a set of covariates. That is,

\[\mu_i=E(y_i|x_i)=exp(\alpha+\beta x_i)\]

  • \(\alpha+\beta x_i\) must be positive; the rate parameter must be positive, hence \(exp()\)

  • Non-linear regression model with heteroskedasticity (Long 1997, p. 223), \(\mu_i=exp(\alpha+\beta x_i)\)

  • Note that by extension, the variance is also influenced by covariate(s), \(x_i\).

A Motivating Example

  • Assume \(\alpha=-0.25\), and \(\beta=0.13\) as Long does (p. 221). We may calculate the expected values of \(y\) for a number of \(x\) values, then

  • When \(x=1\), then the expected number of counts – the rate parameter – is 0.89 (exp(-0.25 + 0.13* 1))

  • When \(x=10\) then the expected number of counts is 2.85

  • \(E(y|x)=exp(\alpha+\beta x_i)=var(y|x)\)

Predicted Probabilities from the Model

  • Assume that mu <- exp(-0.25 + 0.13*1)

  • Then, what is the probability of observing 0, 1, or 2 counts when \(x=1\)?

Show/Hide Code
mu <- exp(-0.25 + 0.13*1) 

(exp(-mu) * mu^0) / factorial(0)
[1] 0.4119223
Show/Hide Code
(exp(-mu) * mu^1) / factorial(1)
[1] 0.3653423
Show/Hide Code
(exp(-mu) * mu^2) / factorial(2)
[1] 0.1620148
Show/Hide Code
dpois(0:2, lambda = mu)
[1] 0.4119223 0.3653423 0.1620148

The Poisson Regression Model

  • With \(k\) predictors, then

\[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\]

\[p(y|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\] - The Likelihood of the PRM with \(k\) predictors:

\[\prod_{i=1}^{N}p(y_i|\mu_i)=\prod_{i=1}^{N}{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

The PRM Likelihood

The log of the likelihood is then,

\[\log\left(\prod_{i=1}^{N}p(y_i|\mu_i)\right) = \sum_{i=1}^{N}\log\left(\frac{\exp\left(-\exp\left(\alpha + \sum_{k=1}^{K} \beta_k x_{k,i}\right)\right) \left[\exp\left(\alpha + \sum_{k=1}^{K} \beta_k x_{k,i}\right)\right]^{y_i}}{y_i!}\right)\] \[E(y|x)=\mu_i = exp(\alpha+\sum_K \beta_k x_{k,i})\]

Interpretation

  • The partial derivative of \(E(y|x)\) with respect to \(x_k\)

  • Using the chain-rule.

\[u=\alpha+\sum_K \beta_k x_{k,i}\]

\[ \frac{\partial y}{\partial x} = \frac{\partial y}{\partial u} \cdot \frac{\partial u}{\partial x} \]

Interpretation

\[ \frac{\partial y}{\partial x} = \frac{\partial y}{\partial u} \cdot \frac{\partial u}{\partial x} \]

\[ {{\partial E(Y|X)}\over{\partial x_k}}={{\partial exp(u)}\over{\partial u}}{{\partial u \beta}\over{\partial x_k}}\]

\[ exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k \]

  • The effect of \(x_k\) on \(y\) is now a function of the rate parameter and the expected effect of \(x_k\) on that rate parameter

  • It’s not a constant change in the expected count; instead it’s a function of how \(x_k\) affects the rate as well as how all others relate to the rate!

Interpretation: Change in \(x\)

  • What effect does a \(d_k\) change in \(x_k\) have on the expected count. Take the ratio of the the prediction including the change over the prediction absent the change (Long 1997, p. 225):

\[ {{E(y|X, x_k+d_k)}\over{E(y|X, x_k)}}\]

  • The numerator: \[ E(y|X, x_k+d_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)exp(\beta_k d_k) \]

  • The denominator: \[ E(y|X, x_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k) \]

  • We’re left with:

\[ exp(\beta_k d_k) \]

Interpretation: Probabilities

  • The predicted probability of a count

\[ pr(y=m|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{m}}\over{m!}} \]

  • The partial derivative

\[ exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\]

Summary

  • If \(y \sim Poisson(\mu)\), then \(E(y)=var(y)=\mu\)

  • \(log(\mu)=\alpha+\sum_K \beta_k x_{k,i}\)

  • We might encounter either under or overdispersion brought about by unobserved heterogeneity.

  • It is useful to rely on an alternative model that doesn’t treat \(\mu\) as fixed, but rather it is drawn from a distribution, i.e., \[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]

Poisson Regression: Application

  • Scholarly publications of biochemistry graduate students.

  • From Long (1997), data consist of 915 biochemistry graduate students and the following observations.

  • The data are in pscl::bioChemists

  • art: Number of articles published

  • phd: Prestige of PhD granting institution

  • fem: Whether the student identifies as “Male” or “Female”

  • mar: Marital status of the student

  • kid5: Number of children under age 5.

  • ment: Number of articles published by the mentor during past 3 years.

  • We can fit the data using glm(formula, data, family=poisson(link="log"))

  • Use pois family for post-estimation, along with predict

  • Follow the same workflow: Estimation \(\rightarrow\) Prediction \(\rightarrow\) (Graphical) Presentation.

Poisson Regression: Application

\[ y_{articles} = \beta_0 + \beta_1 x_{kid5} + \beta_2 x_{fem} + \beta_3 x_{mar} + \beta_4 x_{phd} + \beta_5 x_{ment} + \epsilon \]

Show/Hide Code
library(pscl)
data(bioChemists)
bioChemists$fem<-as.numeric(bioChemists$fem)-1
bioChemists$mar<-as.numeric(bioChemists$mar)-1
summary(glm(art~kid5 + fem + mar + phd + ment , 
            data=bioChemists,
            family=poisson(link="log")))

Call:
glm(formula = art ~ kid5 + fem + mar + phd + ment, family = poisson(link = "log"), 
    data = bioChemists)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.304617   0.102981   2.958   0.0031 ** 
kid5        -0.184883   0.040127  -4.607 4.08e-06 ***
fem         -0.224594   0.054613  -4.112 3.92e-05 ***
mar          0.155243   0.061374   2.529   0.0114 *  
phd          0.012823   0.026397   0.486   0.6271    
ment         0.025543   0.002006  12.733  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1634.4  on 909  degrees of freedom
AIC: 3314.1

Number of Fisher Scoring iterations: 5

Poisson Regression: Application

\[ y_{articles} = \beta_0 + \beta_1 x_{kid5} + \beta_2 x_{fem} + \beta_3 x_{mar} + \beta_4 x_{phd} + \beta_5 x_{ment} + \epsilon \]

Generate Expected Counts

Show/Hide Code
library(tidyverse)
design_matrix <- expand.grid(
  intercept = 1,
  fem = 1,
  mar = 1,
  kid5 = 0,
  phd = mean(bioChemists$phd),
  ment = mean(bioChemists$ment)
) 

cat("The expected number of articles published, assuming the above  covariate values:\n",
   as.matrix(design_matrix) %*%  coef(my_model) |> exp() |> as.numeric(),
   "\narticles"
    )
The expected number of articles published, assuming the above  covariate values:
 1.172184 
articles

Generate Expected Counts

Show/Hide Code
library(tidyverse)
design_matrix <- expand.grid(
  intercept = 1,
  fem = 0,
  mar = 1,
  kid5 = 1,
  phd = mean(bioChemists$phd),
  ment = mean(bioChemists$ment)
) 

cat("The expected number of articles published, assuming the abovecovariate values:\n",
   as.matrix(design_matrix) %*%  coef(my_model) |> exp() |> as.numeric(),
   "\narticles"
    )
The expected number of articles published, assuming the abovecovariate values:
 1.647064 
articles

Simulate Uncertainty

Show/Hide Code
library(dplyr)

library(tidyverse)
design_matrix <- expand.grid(
  intercept = 1,
  fem = 0,
  mar = 1,
  kid5 = 2,
  phd = mean(bioChemists$phd),
  ment = mean(bioChemists$ment)
) 

parameter_draws = MASS::mvrnorm(1000, mu = coef(my_model), Sigma = vcov(my_model))
sampled_values = as.matrix(design_matrix) %*%  t(parameter_draws) |> exp() |> t() |> as.tibble()

sampled_values |>
  summarize(
    mean = mean(V1),
    lower = quantile(V1, 0.025),
    upper = quantile(V1, 0.975)
  )
# A tibble: 1 × 3
   mean lower upper
  <dbl> <dbl> <dbl>
1  1.93  1.58  2.33

Simulate Uncertainty

  • And varying multiple variables simultaneously, predicting what happens in different combinations of covariate values.

  • Note that the code is identical, the exception being bind_cols. This is just to label the prediction groups in the final output.

Show/Hide Code
library(tidyverse)
design_matrix <- expand.grid(
  intercept = 1,
  fem = c(0,1),
  mar = c(0,1),
  kid5 = c(0,1,2),
  phd = mean(bioChemists$phd),
  ment = mean(bioChemists$ment)
) 

parameter_draws = MASS::mvrnorm(1000, mu = coef(my_model), Sigma = vcov(my_model))
sampled_values = as.matrix(design_matrix) %*%  t(parameter_draws) |> exp() |> as_tibble()

sampled_values_with_groups <- bind_cols(
  design_matrix |> select(fem, mar, kid5), 
  sampled_values
)

sampled_values_with_groups |>
  pivot_longer(cols = starts_with("V"), names_to = "draw", values_to = "predicted") |>
  group_by(fem, mar, kid5) |>
  summarize(
    mean = mean(predicted),
    lower = quantile(predicted, 0.025),
    upper = quantile(predicted, 0.975),
    .groups = "drop"
  )
# A tibble: 12 × 6
     fem   mar  kid5  mean lower upper
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     0     0     0  1.77  1.58  1.96
 2     0     0     1  2.06  1.88  2.26
 3     0     0     2  2.42  2.02  2.91
 4     0     1     0  1.41  1.27  1.56
 5     0     1     1  1.65  1.49  1.82
 6     0     1     2  1.94  1.57  2.34
 7     1     0     0  1.47  1.29  1.68
 8     1     0     1  1.71  1.59  1.84
 9     1     0     2  2.01  1.72  2.34
10     1     1     0  1.17  1.03  1.33
11     1     1     1  1.37  1.23  1.52
12     1     1     2  1.61  1.32  1.91

Generating predictions

  • Use the Poisson CDF in R to generate predicted probabilities of counts.

  • We can use the same code, just change exp() to ppois()

The Negative Binomial Regression Model

  • The NBRM stems from the negative binomial distribution

  • Recall the binomial density is the PDF stemming from \(k\) independent bernoulli trials. The \(\theta\) parameter will govern its shape.

  • We can modify the logic (and code) slightly to generate a probability of observing \(r\) successes, given \(n\) trials.

Show/Hide Code
library(tidyverse)
design_matrix <- expand.grid(
  intercept = 1,
  fem = c(0,1),
  mar = c(0,1),
  kid5 = c(0,1,2),
  phd = mean(bioChemists$phd),
  ment = mean(bioChemists$ment)
) 

parameter_draws = MASS::mvrnorm(1000, mu = coef(my_model), Sigma = vcov(my_model))
sampled_values = as.matrix(design_matrix) %*%  t(parameter_draws) |> exp() |> ppois(2) |> as_tibble()

sampled_values_groups <- bind_cols(
  design_matrix |> select(fem, mar, kid5), 
  sampled_values
)

sampled_values_groups |>
  pivot_longer(cols = starts_with("V"), names_to = "draw", values_to = "predicted") |>
  group_by(fem, mar, kid5) |>
  summarize(
    mean = mean(predicted),
    lower = quantile(predicted, 0.025),
    upper = quantile(predicted, 0.975),
    .groups = "drop"
  )
# A tibble: 12 × 6
     fem   mar  kid5  mean lower upper
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     0     0     0 0.408 0.406 0.406
 2     0     0     1 0.608 0.406 0.677
 3     0     0     2 0.672 0.677 0.677
 4     0     1     0 0.406 0.406 0.406
 5     0     1     1 0.406 0.406 0.406
 6     0     1     2 0.498 0.406 0.677
 7     1     0     0 0.406 0.406 0.406
 8     1     0     1 0.406 0.406 0.406
 9     1     0     2 0.539 0.406 0.677
10     1     1     0 0.404 0.406 0.406
11     1     1     1 0.406 0.406 0.406
12     1     1     2 0.409 0.406 0.406

The Negative Binomial Regresion

  • In negative binomial regression, count data are modeled with overdispersion:

  • Mean: \(\mu = \exp(X\beta)\)

  • Variance: \(\mu + \frac{\mu^2}{\theta}\) (larger than Poisson’s \(\mu\))

  • glm(formula, data = data, family = quasipoisson(link = "log"))

The Negative Binomial Regresion

Show/Hide Code
glm(art~kid5 + fem + mar + phd + ment , 
            data=bioChemists,
            family=quasipoisson(link="log")) |>
  summary()

Call:
glm(formula = art ~ kid5 + fem + mar + phd + ment, family = quasipoisson(link = "log"), 
    data = bioChemists)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.304617   0.139273   2.187 0.028983 *  
kid5        -0.184883   0.054268  -3.407 0.000686 ***
fem         -0.224594   0.073860  -3.041 0.002427 ** 
mar          0.155243   0.083003   1.870 0.061759 .  
phd          0.012823   0.035700   0.359 0.719544    
ment         0.025543   0.002713   9.415  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 1.829006)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1634.4  on 909  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

Capturing Dispersion

  • If we have overdispersion (\(E(y)<var(y)\)) then standard errors will be too small and we will be too over confident in our results.
  • Instead, let’s assume that the rate parameter is subject to error.
  • That is, it follows some distribution. We’ll use Gamma.
  • The Negative Binomial Distribution is a mixture between a Poisson and a Gamma distribution.

Capturing Dispersion: The Gamma-Poisson Mixture

The Problem: If we have overdispersion (\(E(y) < \text{Var}(y)\)), standard errors will be too small and we will express too much confidence in our results.

  • The Negative Binomial: \(\text{Var}(Y_i) = \mu_i + \frac{\mu_i^2}{\theta}\) (variance > mean)

  • \(\theta\) = dispersion parameter

  • As \(\theta \to \infty\): variance approaches Poisson.

  • As \(\theta \to 0\): high overdispersion.

\[ \lambda_i \sim \text{Gamma}(\text{shape}=\theta, \text{rate}=\frac{\theta}{\mu_i}) \] - Given this updated rate, then \(Y_i\) just follows the Poisson.

\[Y_i|\lambda_i \sim \text{Poisson}(\lambda_i \cdot e_i)\] \[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\]

Application: Negative Binomial Regression

  • The coefficients are the same as the Poisson regression.

  • The general approach to extract meaningful estimates is also the same.

  • The only difference is that we use family=quasipoisson(link="log") in glm()

  • And estimate a \(\theta\) parameter that captures overdispersion.

  • The standard errors change.

Poisson Regression Model

Show/Hide Code
library(pscl)
data(bioChemists)
bioChemists$fem<-as.numeric(bioChemists$fem)-1
bioChemists$mar<-as.numeric(bioChemists$mar)-1
my_model = glm(art~kid5 + fem + mar + phd + ment , 
            data=bioChemists,
            family=poisson(link="log"))
summary(my_model)

Call:
glm(formula = art ~ kid5 + fem + mar + phd + ment, family = poisson(link = "log"), 
    data = bioChemists)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.304617   0.102981   2.958   0.0031 ** 
kid5        -0.184883   0.040127  -4.607 4.08e-06 ***
fem         -0.224594   0.054613  -4.112 3.92e-05 ***
mar          0.155243   0.061374   2.529   0.0114 *  
phd          0.012823   0.026397   0.486   0.6271    
ment         0.025543   0.002006  12.733  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1634.4  on 909  degrees of freedom
AIC: 3314.1

Number of Fisher Scoring iterations: 5

Negative Binomial Regression Model

Show/Hide Code
library(pscl)
data(bioChemists)
bioChemists$fem<-as.numeric(bioChemists$fem)-1
bioChemists$mar<-as.numeric(bioChemists$mar)-1
my_model = glm(art~kid5 + fem + mar + phd + ment , 
            data=bioChemists,
            family=quasipoisson(link="log"))
summary(my_model)

Call:
glm(formula = art ~ kid5 + fem + mar + phd + ment, family = quasipoisson(link = "log"), 
    data = bioChemists)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.304617   0.139273   2.187 0.028983 *  
kid5        -0.184883   0.054268  -3.407 0.000686 ***
fem         -0.224594   0.073860  -3.041 0.002427 ** 
mar          0.155243   0.083003   1.870 0.061759 .  
phd          0.012823   0.035700   0.359 0.719544    
ment         0.025543   0.002713   9.415  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 1.829006)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1634.4  on 909  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5