Count Models

Christopher Weber

2024-11-08

Introduction

This lecture follows Long (1997, Chapters 7-8)
Many variables in the social sciences consist of integer counts
E.g., number of times a candidate runs for office, frequency of conflict, number of terror attacks, number of war casualties, number of positive statements about a candidate, number of homicides in a city, etc.
Least squares is inappropriate for the reasons we’ve already discussed: \(\textbf{Nonlinearity, nonadditivity and heteroskedasticity}\).

For instance

Models

The Poisson Regression Model (PRM)
A strong assumption: \(E(\mu)=var(y)\)
Overdispersion
Underdispersion
Extensions: Negative binomial and zero-inflated models.

The Poisson Regression Model

\[p(y|\mu)={{exp(-\mu)\mu^y}\over{y!}}\]

If \(y\) takes on values from 0, 1, 2, 3, etc. (Long 1997, p. 218)
In this density, the only parameter that governs the shape of the density is \(\mu\) or the “rate” parameter
A characteristic of the poisson distribution is that \(E(y)=var(y)=\mu\), an assumption called equidispersion

The Poission Distribution

Extensions

If \(\mu\) is the rate parameter, we can then model \(\mu_i\) based on a set of covariates. That is,

\[\mu_i=E(y_i|x_i)=exp(\alpha+\beta x_i)\]

\(\alpha+\beta x_i\) must be positive; the rate parameter must be positive.
Non-linear regression model with heteroskedasticity (Long 1997, p. 223), \(\mu_i=exp(\alpha+\beta x_i)\)

Extensions

Let’s just assume \(\alpha=-0.25\), and \(\beta=0.13\) as Long does (p. 221). If we calculate the expected values of \(y\) for a number of \(x\) values, then
When \(x=1\), then the expected number of counts – the rate parameter – is 0.89 (exp(-0.25 + 0.13* 1))
When \(x=10\) then the expected number of counts is 2.85
\(E(y|x)=exp(\alpha+\beta x_i)=var(y|x)\)

Extensions

(exp(-0.12)*0.12^1)/factorial(1)

[1] 0.1064305

(exp(-0.12)*0.12^2)/factorial(2)

[1] 0.006385827

(exp(-0.12)*0.12^3)/factorial(3)

[1] 0.0002554331

# Or just
dpois(1,0.12)

[1] 0.1064305

dpois(2,0.12)

[1] 0.006385827

dpois(3,0.12)

[1] 0.0002554331

Extensions

With \(k\) predictors, then

\[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\]

\[p(y|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

The PRM Likelihood

The likelihood of the PRM with \(k\) predictors is.

\[\prod_{i=1}^{N}p(y_i|\mu_i)=\prod_{i=1}^{N}{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

The PRM Likelihood

The log of the likelihood is then,

\[\tiny log(\prod_{i=1}^{N}p(y_i|\mu_i))=\sum_{i=1}^{N}log{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

\(\tiny E(y|x)=exp(\alpha+\sum_K \beta_k x_{k,i}\).
The partial derivative of \(E(y|x)\) with respect to \(x_k\)
Using the chain-rule.
\(\tiny u=\alpha+\sum_K \beta_k x_{k,i}\).
\(\tiny {{\partial y}\over{\partial u}}{{\partial u}\over{\partial x}}\)

The PRM Likelihood

\[\tiny {{\partial E(Y|X)}\over{\partial x_k}}={{\partial exp(u)}\over{\partial u}}{{\partial u \beta}\over{\partial x_k}}\]

\[\tiny exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\] - The effect of \(x_k\) on \(y\) is now a function of the rate parameter and the expected effect of \(x_k\) on that rate parameter

It’s not a constant change in the expected count; instead it’s a function of how \(x_k\) affects the rate as well as how all others relate to the rate!

The PRM Likelihood

What effect does a \(d_k\) change in \(x_k\) have on the expected count. Take the ratio of the the prediction including the change over the prediction absent the change (Long 1997, p. 225):

\[\tiny {{E(y|X, x_k+d_k)}\over{E(y|X, x_k)}}\]

The numerator: \(\tiny E(y|X, x_k+d_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)exp(\beta_k d_k)\)
The denominator: \(\tiny E(y|X, x_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)\)
We’re left with: \(\tiny exp(\beta_k d_k)\)

Interpretation

The predicted probability of a count

\[\tiny pr(y=m|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{m}}\over{m!}}\]

The partial derivative

\[\tiny exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\]

Summary of PRM

We might encounter either under or overdispersion brought about by unobserved heterogeneity.
It is useful to rely on an alternative model that doesn’t treat \(\mu\) as fixed, but rather it is drawn from a distribution, i.e., \[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]

The Negative Binomial Regression Model

The NBRM stems from the negative binomial distribution
Recall the binomial density is the PDF stemming from \(k\) independent bernoulli trials. The \(\theta\) parameter will govern its shape.
We can modify the logic (and code) slightly to generate a probability of observing \(r\) successes, given \(n\) trials.

The Negative Binomial Regression Model

For instance, how many times would we need to flip a coin in order for three heads to appear, or four heads, and so forth? \(({{s+f-1}\over{s}})\theta^{f}(1-\theta)^{s}\)
\(f=\)number of failures, \(s=\)number of successes. The total number of trials is just \(s+f\).

The Negative Binomial Regression Model

If \(\theta\) represents the probability of striking out, how many at bats are expected before a batter strikes out once, or twice, or 10 times?
Or, how many games must the Minnesota Vikings play in the 2022 NFL season before they lose three games?

The Negative Binomial Regression Model

Here, define “success”’” as striking out (that is the outcome we’re interested in), or losing two games; “failure” is at-bats before striking out or number of games before losing twice

The Negative Binomial Regression Model

The negative binomial because if expand and then rearrange the binomial coefficient, it will equal

\[-1^s({{-f}\over{s}})\]

\[{{n}\over{k}}={{\Gamma(n+1)}\over {\Gamma(k+1)\Gamma(n+1)}}\]

Capturing Dispersion

If we have overdispersion (\(E(y)<var(y)\)) then standard errors will be too small and we will be too over confident in our results
Instead, let’s assume that the rate parameter is subject to error
That is, it follows some distribution (we’ll use gamma)

Capturing Dispersion

\[\ \mu^*_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]

\[\mu^*_i=exp(\alpha)exp(\beta_1 x_{1,i})exp(\beta_2 x_{2,i})...exp(\beta_k x_{k,i}))exp(\epsilon_i)\] \[ E(Y_i|X)=\mu_i d_i\]

\[ E(Y_i|X)=\mu_i\]

\[p(y|\mu_i d_i)={{exp(-\mu_i d_i)\mu_i d_i ^y}\over{y!}}\]

Capturing Dispersion

To calculate \(p(y|\mu^*_i)\) we need to assume that \(d_i\) follows from some density, and then we should integrate (i.e., average) over this unknown parameter to obtain the joint density of \(y\) given \(\mu_i\). Let’s assume that \(d\) follows a gamma density.

\[p(y|\mu_i)=\int_0^{\infty} pr(y|x, d_i)pr(d_i) d\mathrm{d}\]

where, \(d_i\) is distributed gamma,

Capturing Dispersion

\[{{\Gamma(v_i^{v_i})}\over {\Gamma(v_i)}}exp(-d_i v_i)\]

If we combine these two, the equation becomes

\[\tiny p(y|x_i)={{{\Gamma(y_i+v_i)}\over {y_i!\Gamma(v_i)}} ({{v_i}\over{v_i+\mu_i}})^{v_i} ({{\mu_i}\over{v_i+\mu_i}})^{y_i}}\]

Capturing Dispersion

Notice the similarity to the negative binomial above. If you compare these two, you’ll see that by using the distribution of \(d_i\) in the integration equation, as well as the poisson, you will find that the negative binomial is nothing more than a poisson mixed with the gamma.

Capturing Dispersion

\[E(y|x)=\mu_i\]

But, the variance is no longer \(\mu_i\)

\[var(y_i|x)=\mu_i(1+({{\mu_i}\over{v_i}}))\]

The \(v_i\) parameter governs the shape of the gamma density. Another way to write this is,
The negative binomial model is an example of a mixture model that has a closed form maximum likelihood solution, where we maximize,

Capturing Dispersion

\[\prod_{i=1}^{N}p(y_i| x_i \beta)=\prod {{\Gamma(y_i+\alpha^{-1})}\over{y_i \Gamma(\alpha^{-1})}}({{\alpha^{-1}}\over{\alpha^{-1}+\mu_i}})({{\mu_i}\over{\alpha^{-1}+\mu_i}})\]

\(\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\).

Examples

Notice that the shape of the distrbution shifts with different values of \(\mu\).

Graduate Productivity in Biochemistry

[1] "art"  "fem"  "mar"  "kid5" "phd"  "ment"

Fit a PRM to the data. Notice how the standard errors are quite small.


Call:
glm(formula = art ~ kid5, family = poisson(link = "log"), data = bioChemists)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.55960    0.02988  18.728   <2e-16 ***
kid5        -0.06978    0.03450  -2.023   0.0431 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1813.2  on 913  degrees of freedom
AIC: 3485

Number of Fisher Scoring iterations: 5

Generate some prediction.


Call:
glm(formula = art ~ female + kid5 + married, family = poisson(link = "log"), 
    data = bioChemists)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63656    0.05457  11.665  < 2e-16 ***
female      -0.28549    0.05433  -5.255 1.48e-07 ***
kid5        -0.16118    0.03934  -4.097 4.19e-05 ***
married      0.13271    0.06092   2.179   0.0294 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1776.7  on 911  degrees of freedom
AIC: 3452.5

Number of Fisher Scoring iterations: 5

Expected Articles, 4 Kid (All Variables=Min): 0.7455263


Call:
glm(formula = art ~ female + kid5 + married, family = poisson(link = "log"), 
    data = bioChemists)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63656    0.05457  11.665  < 2e-16 ***
female      -0.28549    0.05433  -5.255 1.48e-07 ***
kid5        -0.16118    0.03934  -4.097 4.19e-05 ***
married      0.13271    0.06092   2.179   0.0294 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1776.7  on 911  degrees of freedom
AIC: 3452.5

Number of Fisher Scoring iterations: 5


Call:
glm(formula = art ~ female + kid5 + married, family = quasipoisson(link = "log"), 
    data = bioChemists)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.63656    0.07864   8.094 1.83e-15 ***
female      -0.28549    0.07830  -3.646 0.000281 ***
kid5        -0.16118    0.05670  -2.843 0.004572 ** 
married      0.13271    0.08779   1.512 0.130965    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 2.076853)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1776.7  on 911  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5