Count Models

MSBA Data Analytics III

Count Models

In these notes, we will discuss several types of count models

Ordered Probit/Logit
Poisson Model
Negative Binomial
Hurdle Model
Zero Inflated Poisson Model

Ordered Probit/Logit

Oridinal Categorical Variables

Suppose your dependent is categorical in nature and these categories show an ordinal proper (i.e. you can order them)

Examples: Surveys including Likert Scales (strongly agree, agree, neither, disagree, strongly disagree)

Medalling in the olympics (Gold, Silver, Bronze)

Soda size preference (small, medium, large, extra large)

Ordered Probit/Logit

In these cases of ordinal categorical variables, we use an ordered probit/logit model.

Notice, these responses, while they can be ordered, the distance between levels is not constant.

That is, we do not know what increase is necessary to move someone from strongly disagree to just disagree.

Ordered Probit/Logit Math

The math of the ordered probit/logit is very similar to a simple binary model

There is just one difference, we need to estimate level parameters.

Ordered Probit/Logit Math

The math to the ordered probit/logit is very similar to a simple binary model

There is just one difference, we need to estimate level parameters.

Consider three observed outcomes: y = 0,1,2. Consider the latent variable model without a constant: \[y^{*} = \beta_1 x_1 +...+\beta_k x_k +\epsilon\] where \(\epsilon \rightarrow N(0,1)\). Define two cut-off points: \(\alpha_1 < \alpha_2\) We do not observe \(y^{*}\), but we do observe choices according to the following rule

y = 0 if \(y^{*} < \alpha_1\);
y = 1 if \(\alpha_1 < y^{*} < \alpha_2\);
y = 2 if \(\alpha_2 < y^{*}\)

Analytical Example

y = 0 if inactive, y = 1 if part-time, and y = 2 if full-time

\(y* = \beta_e educ +\beta_k kids +\epsilon\), where \(\epsilon \rightarrow N (0,1)\)

Then y = 0 if \(\beta_e educ +\beta_k kids +\epsilon\) < \(\alpha_1\)

y = 1 if \(\alpha_1 < \beta_e educ +\beta_k kids +\epsilon\) < \(\alpha_2\)

y = 2 if \(\alpha_2 < \beta_e educ +\beta_k kids +\epsilon\)

We could alternatively introduce a constant \(\beta_0\) and assume that \(\alpha_1\) = 0.

Log Likelihood Function

\[Log \, L=\sum_{i=1}^{n} \sum_j Z_{ij}[F_{ij}-F_{ij-1}]\]

where \(F_{ij}=F(\alpha_j-\beta_e educ -\beta_k kids)\) and \(Z_{ij}\) is an indicator that equals 1 if person i select option j and zero otherwise.

We choose the values of \(\alpha\) and \(\beta\) that maximize the value of the log likelihood function.

Ordered Logit in R

A study looks at factors that influence the decision of whether to apply to graduate school.

College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school.

Hence, our outcome variable has three categories.

Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected.

Example: Data

library(foreign)
dat <- read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta")
head(dat)

##             apply pared public  gpa
## 1     very likely     0      0 3.26
## 2 somewhat likely     1      0 3.21
## 3        unlikely     1      1 3.94
## 4 somewhat likely     0      0 2.81
## 5 somewhat likely     0      0 2.53
## 6        unlikely     0      1 2.59

Example: Some Descriptive Stats

## one at a time, table apply, pared, and public
lapply(dat[, c("apply", "pared", "public")], table)

## $apply
## 
##        unlikely somewhat likely     very likely 
##             220             140              40 
## 
## $pared
## 
##   0   1 
## 337  63 
## 
## $public
## 
##   0   1 
## 343  57

Example: Estimation

library(MASS)
m <- polr(apply ~ pared + public + gpa, data = dat, Hess=TRUE)
## view a summary of the model
summary(m)

## Call:
## polr(formula = apply ~ pared + public + gpa, data = dat, Hess = TRUE)
## 
## Coefficients:
##           Value Std. Error t value
## pared   1.04769     0.2658  3.9418
## public -0.05879     0.2979 -0.1974
## gpa     0.61594     0.2606  2.3632
## 
## Intercepts:
##                             Value   Std. Error t value
## unlikely|somewhat likely     2.2039  0.7795     2.8272
## somewhat likely|very likely  4.2994  0.8043     5.3453
## 
## Residual Deviance: 717.0249 
## AIC: 727.0249

Predicting probabailities

Let's say we want to know the probability that someone who's parents are educated is likely very likely to go to graduate school.

We want to know \[Pr(y=2|pared=1, public=0)\]

Calculating probabilities

## First let's find the cumulative probability of being less than 2.
gpa<-as.vector(seq(0,4,.1))
pared.f<-rep(1,length(gpa))
public.f<-rep(0,length(gpa))
fakedata=as.matrix(rbind(pared.f,public.f,gpa))
p2=exp(m$zeta[2]-t(fakedata)%*%m$coefficients)/(1+exp(m$zeta[2]
                +t(fakedata)%*%m$coefficients))
## but we want to know Pr(y=2)=1-Pr(y<2)
p2=1-p2

Calculating probabilities

Poisson and Negative Binomial Model

Count Models

Count data are integer counts of the outcome variable that can include the number zero.

These data present problems for typical linear regression methods because the outcome variable is non-negative, but the regression error is bounded by \(-\infty\) and \(\infty\).

We cannont simply take a the natural log of the outcome variable because zero's are present.

Also, the integer values are more discrete in nature.

Examples of count data

Count data capture the number of occurrences over a fixed time intervals such as:

Number of Suicides in a state over a year
Number of cars served through a fast food drive thru in a day
Number of children in a family
Number of doctors visits
Number of website visitors

Poisson Distribution

The Poisson density function is \[Pr(Y=y)=\frac{e^{-\lambda}\lambda^y}{y!}\]

where the \(E[y|X]=Var[y|X]=\lambda\)

For our purposes, we assume \(\lambda=exp(X\beta)\)

Interpretation of \(\beta\) is the percentage change in \(\lambda\) for a one unit increase in \(X\).

Properties of the Poisson Distribution

The equidispersion property of the Poisson distribution: Implies \(E[y|X]=Var[y|X]=\lambda\), which is very often violated.

When this property fails to hold, we state that there is overdispersion in the data. In this case, the negative binomial can be used.

Overdispersion causes the standard errors estimated using the Poisson distribution to be too small. An alternative approach is to estimate heterosckedastic (robust) standard errors.

Woolridge (1999) had demonstrated that using robust standard errors compensates for the overdispersion problem and often performs better than the negative binomial distribution.

Negative Binomial Distribution

The Negative Binomail distribution has a less restrictive relationship between the mean and variance of y. \[Var(y)=\lambda+\alpha \lambda^2\]

We can think of \(\alpha\) as the overdispersion parameter.

Testing for Overdispersion

Step 1. Estimate the count model using the negative binomial distribution.

Step 2. Test \(H_o:\alpha=0\) vs \(H_a:\alpha\neq0\)

Step 3. Determine which case you are in

If \(\alpha=0\), then the Poisson model holds
If \(\alpha>0\), then we have overdispersion (very common)
If \(\alpha<0\), then we have underdispersion (rare)

Poisson and Negative Binomial in

library(MASS); library(AER) #MASS contains the negative binomial model
data("NMES1988") 
plot(table(NMES1988$visits),xlab="Physician Office Visits", ylab="Counts")

Issues with the data

First, notice the shape of Physician office visits. Does this look normally distributed to you?

Should we be concerned with the mass of zero's?

How about those outliers?

Standard Poisson and Negative Binomial Model

fm_pois <- glm(visits ~ ., data = NMES1988, family = poisson) # Standard Poisson Model
# Calculating robust standard errors
cov.m1 <- vcovHC(fm_pois, type="HC0")
std.err <- sqrt(diag(cov.m1))
fm_qpois <- glm(visits ~ ., data = NMES1988, family = quasipoisson) # Quasi-Poisson
 fm_nbin <- glm.nb(visits ~ ., data = NMES1988) # Negative Binomial
 stargazer::stargazer(fm_pois,fm_pois,fm_qpois,fm_nbin,type="html", 
                      se = list(NULL,std.err,NULL,NULL), 
                      dep.var.labels = c("Poisson","Poisson Robust",
                                         "Quasi Poisson","Neg. Bin"))

Standard Poisson and Negative Binomial Model


	Dependent variable:

	Poisson
	Poisson		glm: quasipoisson	negative
			link = log	binomial
	(1)	(2)	(3)	(4)

hospital	0.182^***	0.182^***	0.182^***	0.240^***
	(0.006)	(0.022)	(0.015)	(0.020)

chronic	0.175^***	0.175^***	0.175^***	0.204^***
	(0.004)	(0.011)	(0.011)	(0.012)

insuranceyes	0.183^***	0.183^***	0.183^***	0.192^***
	(0.017)	(0.043)	(0.044)	(0.040)

gendermale	-0.116^***	-0.116^***	-0.116^***	-0.136^***
	(0.013)	(0.036)	(0.034)	(0.031)

school	0.022^***	0.022^***	0.022^***	0.021^***
	(0.002)	(0.005)	(0.005)	(0.004)

Constant	1.053^***	1.053^***	1.053^***	0.986^***
	(0.023)	(0.064)	(0.061)	(0.054)


Observations	4,406	4,406	4,406	4,406
Log Likelihood	-18,151.350	-18,151.350		-12,209.240
theta				1.180^*** (0.033)
Akaike Inf. Crit.	36,314.700	36,314.700		24,430.490

Note:	p<0.1; p<0.05; p<0.01

Excessive Zeros

A problem that plagues both the Poisson and Negative Binomial Models is when there are excessive zeros in the data.

There a two appoaches to address these zeros

Hurdle Model or Two Part Model
Zero Inflated Model

Hurdle Model

The Hurdle Model treats the distribution of zeros separately from the distribution of positive values.

For example, your choice to start exercising may be different than your choice about how often to exercise.

The first part of the model is typical captured by a probit or logit model.

The second part of the model is captured by a poisson or negative binomial model that is trunctated at 1.

Hurdle Model

Let the process generating the zeros be \(f_1(0)\) and the process generating the positive responses be \(f_2(0)\), then the two part density distribution is \[g(y)=\left\{\begin{matrix} f_1(0) & y=0 \\ \frac{1-f_1(0)}{1-f_2(0)}f_2(y) & y>0 \end{matrix}\right.\]

If \(f_1(0)=f_2(0)\), then the hurdle model reduces to the standard count model.

Zero Inflated Model

The zero inflated model treats zeros differently than the hurdle model.

Zeros are produce for two reasons.

There is an exogenous mass of zeros that are independent of the count process.
There are a set of zeros that are caused by the count process.

Example: You may drink alchohol or you may not. Even if you do drink alcohol you may not drink alcohol everyday. On the days you do drink, you may have 1, 2, 3 or more drinks.

Zero Inflated Model

Let the process generating the structural zeros be \(f_1(.)\) and the process generating the random counts including zero be \(f_2(.)\), then the two part density distribution is \[g(y)=\left\{\begin{matrix} f_1(0)+(1-f_1(0))f_2(0) & y=0 \\ (1-f_1(0))f_2(y) & y>0 \end{matrix}\right.\]

Think of this model as a finite mixture of a zero mass and the count model.

Next, we will use R to estimate these models.

The package you will need is pscl: zero-inflation and hurdle models via zeroinfl() and hurdle()

Hurdle Model and Zero Inflated Model in R

library(pscl)
dt_hurdle <- hurdle(visits ~ hospital + chronic + insurance + school + gender |
 hospital + chronic + insurance + school + gender,
 data = NMES1988, dist = "negbin")
dt_zinb <- zeroinfl(visits ~ hospital + chronic + insurance + school + gender |
+ hospital + chronic + insurance + school + gender,
 data = NMES1988, dist = "negbin")
stargazer::stargazer(fm_pois,fm_pois,fm_qpois,fm_nbin,dt_hurdle,dt_zinb,
                     type="html",se = list(NULL,std.err,NULL,NULL,NULL,NULL),
                     dep.var.labels = c("Poisson","Poisson Robust",
                    "Quasi Poisson","Neg. Bin","Hurdle","Zero-Inflated"))

Hurdle Model and Zero Inflated Model in R


	Dependent variable:

	Poisson
	Poisson		glm: quasipoisson	negative	hurdle	zero-inflated
			link = log	binomial		count data
	(1)	(2)	(3)	(4)	(5)	(6)

hospital	0.182^***	0.182^***	0.182^***	0.240^***	0.235^***	0.223^***
	(0.006)	(0.022)	(0.015)	(0.020)	(0.022)	(0.020)

chronic	0.175^***	0.175^***	0.175^***	0.204^***	0.154^***	0.155^***
	(0.004)	(0.011)	(0.011)	(0.012)	(0.012)	(0.012)

insuranceyes	0.183^***	0.183^***	0.183^***	0.192^***	0.064	0.096^**
	(0.017)	(0.043)	(0.044)	(0.040)	(0.043)	(0.042)

gendermale	-0.116^***	-0.116^***	-0.116^***	-0.136^***	-0.078^**	-0.090^***
	(0.013)	(0.036)	(0.034)	(0.031)	(0.033)	(0.031)

school	0.022^***	0.022^***	0.022^***	0.021^***	0.015^***	0.016^***
	(0.002)	(0.005)	(0.005)	(0.004)	(0.005)	(0.004)

Constant	1.053^***	1.053^***	1.053^***	0.986^***	1.262^***	1.251^***
	(0.023)	(0.064)	(0.061)	(0.054)	(0.058)	(0.056)


Observations	4,406	4,406	4,406	4,406	4,406	4,406
Log Likelihood	-18,151.350	-18,151.350		-12,209.240	-12,125.900	-12,126.080
theta				1.180^*** (0.033)
Akaike Inf. Crit.	36,314.700	36,314.700		24,430.490

Note:	p<0.1; p<0.05; p<0.01