Poisson regression is used to model count variables. Poisson regression is similar to regular multiple regression except that the dependent (Y) variable is an observed count that follows the Poisson distribution. Thus, the possible values of Y are the nonnegative integers: 0, 1, 2, 3, and so on. It is assumed that large counts are rare. Hence, Poisson regression is similar to logistic regression, which also has a discrete response variable. However, the response is not limited to specific values as it is in logistic regression.
The assumptions are as follows:
There is a linear relationship between the log of mean and the explanatory variable.
Changes in the rate from combined effects of different explanatory variables are multiplicative.
Variance=Mean
Errors are independent of each other.
In this example num_awards is the outcome variable, math is a continuous predictor variable and prog is a categorical predictor variable with three levels.
pois <- read.csv("https://stats.idre.ucla.edu/stat/data/poisson_sim.csv")
prog is coded as 1 = “General”, 2 = “Academic” and 3 = “Vocational”.
pois$prog <- factor(pois$prog, levels=1:3, labels=c("General", "Academic", "Vocational"))
ggplot(pois, aes(num_awards, fill = prog)) + geom_histogram(position="dodge")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Poisson model analysis using the glm function.
summary(glm(num_awards ~ prog + math, family="poisson", data=pois))
##
## Call:
## glm(formula = num_awards ~ prog + math, family = "poisson", data = pois)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2043 -0.8436 -0.5106 0.2558 2.6796
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.24712 0.65845 -7.969 1.60e-15 ***
## progAcademic 1.08386 0.35825 3.025 0.00248 **
## progVocational 0.36981 0.44107 0.838 0.40179
## math 0.07015 0.01060 6.619 3.63e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 287.67 on 199 degrees of freedom
## Residual deviance: 189.45 on 196 degrees of freedom
## AIC: 373.5
##
## Number of Fisher Scoring iterations: 6
In this model the deviance residuals show skewness as the median is not 0. Deviance residuals are approximately normally distributed if the model is specified correctly.
The coefficient for math is .07. This means that the expected log count for a one-unit increase in math is .07.