Count Models

Christopher Weber

2024-11-08

Introduction

  • This lecture follows Long (1997, Chapters 7-8)

  • Many variables in the social sciences consist of integer counts

  • E.g., number of times a candidate runs for office, frequency of conflict, number of terror attacks, number of war casualties, number of positive statements about a candidate, number of homicides in a city, etc.

  • Least squares is inappropriate for the reasons we’ve already discussed: \(\textbf{Nonlinearity, nonadditivity and heteroskedasticity}\).

For instance

Models

  • The Poisson Regression Model (PRM)
  • A strong assumption: \(E(\mu)=var(y)\)
  • Overdispersion
  • Underdispersion
  • Extensions: Negative binomial and zero-inflated models.

The Poisson Regression Model

\[p(y|\mu)={{exp(-\mu)\mu^y}\over{y!}}\]

  • If \(y\) takes on values from 0, 1, 2, 3, etc. (Long 1997, p. 218)
  • In this density, the only parameter that governs the shape of the density is \(\mu\) or the “rate” parameter
  • A characteristic of the poisson distribution is that \(E(y)=var(y)=\mu\), an assumption called equidispersion

The Poission Distribution

Extensions

  • If \(\mu\) is the rate parameter, we can then model \(\mu_i\) based on a set of covariates. That is,

\[\mu_i=E(y_i|x_i)=exp(\alpha+\beta x_i)\]

  • \(\alpha+\beta x_i\) must be positive; the rate parameter must be positive.

  • Non-linear regression model with heteroskedasticity (Long 1997, p. 223), \(\mu_i=exp(\alpha+\beta x_i)\)

Extensions

  • Let’s just assume \(\alpha=-0.25\), and \(\beta=0.13\) as Long does (p. 221). If we calculate the expected values of \(y\) for a number of \(x\) values, then

  • When \(x=1\), then the expected number of counts – the rate parameter – is 0.89 (exp(-0.25 + 0.13* 1))

  • When \(x=10\) then the expected number of counts is 2.85

  • \(E(y|x)=exp(\alpha+\beta x_i)=var(y|x)\)

Extensions

(exp(-0.12)*0.12^1)/factorial(1)
[1] 0.1064305
(exp(-0.12)*0.12^2)/factorial(2)
[1] 0.006385827
(exp(-0.12)*0.12^3)/factorial(3)
[1] 0.0002554331
# Or just
dpois(1,0.12)
[1] 0.1064305
dpois(2,0.12)
[1] 0.006385827
dpois(3,0.12)
[1] 0.0002554331

Extensions

  • With \(k\) predictors, then

\[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\]

\[p(y|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

The PRM Likelihood

The likelihood of the PRM with \(k\) predictors is.

\[\prod_{i=1}^{N}p(y_i|\mu_i)=\prod_{i=1}^{N}{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

The PRM Likelihood

The log of the likelihood is then,

\[\tiny log(\prod_{i=1}^{N}p(y_i|\mu_i))=\sum_{i=1}^{N}log{{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{y_i}}\over{y_i!}}\]

  • \(\tiny E(y|x)=exp(\alpha+\sum_K \beta_k x_{k,i}\).

  • The partial derivative of \(E(y|x)\) with respect to \(x_k\)

  • Using the chain-rule.

  • \(\tiny u=\alpha+\sum_K \beta_k x_{k,i}\).

  • \(\tiny {{\partial y}\over{\partial u}}{{\partial u}\over{\partial x}}\)

The PRM Likelihood

\[\tiny {{\partial E(Y|X)}\over{\partial x_k}}={{\partial exp(u)}\over{\partial u}}{{\partial u \beta}\over{\partial x_k}}\]

\[\tiny exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\] - The effect of \(x_k\) on \(y\) is now a function of the rate parameter and the expected effect of \(x_k\) on that rate parameter

  • It’s not a constant change in the expected count; instead it’s a function of how \(x_k\) affects the rate as well as how all others relate to the rate!

The PRM Likelihood

  • What effect does a \(d_k\) change in \(x_k\) have on the expected count. Take the ratio of the the prediction including the change over the prediction absent the change (Long 1997, p. 225):

\[\tiny {{E(y|X, x_k+d_k)}\over{E(y|X, x_k)}}\]

  • The numerator: \(\tiny E(y|X, x_k+d_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)exp(\beta_k d_k)\)

  • The denominator: \(\tiny E(y|X, x_k)=exp(\beta_0)exp(\beta_1 x_1)exp(\beta_2 x_2)...exp(\beta_1 x_k)\)

  • We’re left with: \(\tiny exp(\beta_k d_k)\)

Interpretation

  • The predicted probability of a count

\[\tiny pr(y=m|x)={{exp(-exp(\alpha+\sum_K \beta_k x_{k,i}))exp(\alpha+\sum_K \beta_k x_{k,i})^{m}}\over{m!}}\]

  • The partial derivative

\[\tiny exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k=E(Y|X)\beta_k\]

Summary of PRM

  • We might encounter either under or overdispersion brought about by unobserved heterogeneity.

  • It is useful to rely on an alternative model that doesn’t treat \(\mu\) as fixed, but rather it is drawn from a distribution, i.e., \[\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]

The Negative Binomial Regression Model

  • The NBRM stems from the negative binomial distribution

  • Recall the binomial density is the PDF stemming from \(k\) independent bernoulli trials. The \(\theta\) parameter will govern its shape.

  • We can modify the logic (and code) slightly to generate a probability of observing \(r\) successes, given \(n\) trials.

The Negative Binomial Regression Model

  • For instance, how many times would we need to flip a coin in order for three heads to appear, or four heads, and so forth? \(({{s+f-1}\over{s}})\theta^{f}(1-\theta)^{s}\)

  • \(f=\)number of failures, \(s=\)number of successes. The total number of trials is just \(s+f\).

The Negative Binomial Regression Model

  • If \(\theta\) represents the probability of striking out, how many at bats are expected before a batter strikes out once, or twice, or 10 times?

  • Or, how many games must the Minnesota Vikings play in the 2022 NFL season before they lose three games?

The Negative Binomial Regression Model

  • Here, define “success”’” as striking out (that is the outcome we’re interested in), or losing two games; “failure” is at-bats before striking out or number of games before losing twice

The Negative Binomial Regression Model

  • The negative binomial because if expand and then rearrange the binomial coefficient, it will equal

\[-1^s({{-f}\over{s}})\]

\[{{n}\over{k}}={{\Gamma(n+1)}\over {\Gamma(k+1)\Gamma(n+1)}}\]

Capturing Dispersion

  • If we have overdispersion (\(E(y)<var(y)\)) then standard errors will be too small and we will be too over confident in our results

  • Instead, let’s assume that the rate parameter is subject to error

  • That is, it follows some distribution (we’ll use gamma)

Capturing Dispersion

\[\ \mu^*_i=exp(\alpha+\sum_K \beta_k x_{k,i})\beta_k+\epsilon_i)\]

\[\mu^*_i=exp(\alpha)exp(\beta_1 x_{1,i})exp(\beta_2 x_{2,i})...exp(\beta_k x_{k,i}))exp(\epsilon_i)\] \[ E(Y_i|X)=\mu_i d_i\]

\[ E(Y_i|X)=\mu_i\]

\[p(y|\mu_i d_i)={{exp(-\mu_i d_i)\mu_i d_i ^y}\over{y!}}\]

Capturing Dispersion

  • To calculate \(p(y|\mu^*_i)\) we need to assume that \(d_i\) follows from some density, and then we should integrate (i.e., average) over this unknown parameter to obtain the joint density of \(y\) given \(\mu_i\). Let’s assume that \(d\) follows a gamma density.

\[p(y|\mu_i)=\int_0^{\infty} pr(y|x, d_i)pr(d_i) d\mathrm{d}\]

where, \(d_i\) is distributed gamma,

Capturing Dispersion

\[{{\Gamma(v_i^{v_i})}\over {\Gamma(v_i)}}exp(-d_i v_i)\]

If we combine these two, the equation becomes

\[\tiny p(y|x_i)={{{\Gamma(y_i+v_i)}\over {y_i!\Gamma(v_i)}} ({{v_i}\over{v_i+\mu_i}})^{v_i} ({{\mu_i}\over{v_i+\mu_i}})^{y_i}}\]

Capturing Dispersion

  • Notice the similarity to the negative binomial above. If you compare these two, you’ll see that by using the distribution of \(d_i\) in the integration equation, as well as the poisson, you will find that the negative binomial is nothing more than a poisson mixed with the gamma.

Capturing Dispersion

\[E(y|x)=\mu_i\]

  • But, the variance is no longer \(\mu_i\)

\[var(y_i|x)=\mu_i(1+({{\mu_i}\over{v_i}}))\]

  • The \(v_i\) parameter governs the shape of the gamma density. Another way to write this is,

  • The negative binomial model is an example of a mixture model that has a closed form maximum likelihood solution, where we maximize,

Capturing Dispersion

\[\prod_{i=1}^{N}p(y_i| x_i \beta)=\prod {{\Gamma(y_i+\alpha^{-1})}\over{y_i \Gamma(\alpha^{-1})}}({{\alpha^{-1}}\over{\alpha^{-1}+\mu_i}})({{\mu_i}\over{\alpha^{-1}+\mu_i}})\]

  • \(\mu_i=exp(\alpha+\sum_K \beta_k x_{k,i})\).

Examples

Notice that the shape of the distrbution shifts with different values of \(\mu\).

Graduate Productivity in Biochemistry

[1] "art"  "fem"  "mar"  "kid5" "phd"  "ment"

Fit a PRM to the data. Notice how the standard errors are quite small.


Call:
glm(formula = art ~ kid5, family = poisson(link = "log"), data = bioChemists)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.55960    0.02988  18.728   <2e-16 ***
kid5        -0.06978    0.03450  -2.023   0.0431 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1813.2  on 913  degrees of freedom
AIC: 3485

Number of Fisher Scoring iterations: 5

Generate some prediction.


Call:
glm(formula = art ~ female + kid5 + married, family = poisson(link = "log"), 
    data = bioChemists)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63656    0.05457  11.665  < 2e-16 ***
female      -0.28549    0.05433  -5.255 1.48e-07 ***
kid5        -0.16118    0.03934  -4.097 4.19e-05 ***
married      0.13271    0.06092   2.179   0.0294 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1776.7  on 911  degrees of freedom
AIC: 3452.5

Number of Fisher Scoring iterations: 5
Expected Articles, 4 Kid (All Variables=Min): 0.7455263

Call:
glm(formula = art ~ female + kid5 + married, family = poisson(link = "log"), 
    data = bioChemists)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63656    0.05457  11.665  < 2e-16 ***
female      -0.28549    0.05433  -5.255 1.48e-07 ***
kid5        -0.16118    0.03934  -4.097 4.19e-05 ***
married      0.13271    0.06092   2.179   0.0294 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1776.7  on 911  degrees of freedom
AIC: 3452.5

Number of Fisher Scoring iterations: 5

Call:
glm(formula = art ~ female + kid5 + married, family = quasipoisson(link = "log"), 
    data = bioChemists)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.63656    0.07864   8.094 1.83e-15 ***
female      -0.28549    0.07830  -3.646 0.000281 ***
kid5        -0.16118    0.05670  -2.843 0.004572 ** 
married      0.13271    0.08779   1.512 0.130965    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 2.076853)

    Null deviance: 1817.4  on 914  degrees of freedom
Residual deviance: 1776.7  on 911  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5