Blog 2

Poisson Regression

Poisson regression is used to model count variables. Poisson regression is similar to regular multiple regression except that the dependent (Y) variable is an observed count that follows the Poisson distribution. Thus, the possible values of Y are the nonnegative integers: 0, 1, 2, 3, and so on. It is assumed that large counts are rare. Hence, Poisson regression is similar to logistic regression, which also has a discrete response variable. However, the response is not limited to specific values as it is in logistic regression.

The assumptions are as follows:

  1. There is a linear relationship between the log of mean and the explanatory variable.

  2. Changes in the rate from combined effects of different explanatory variables are multiplicative.

  3. Variance=Mean

  4. Errors are independent of each other.

Data

In this example num_awards is the outcome variable, math is a continuous predictor variable and prog is a categorical predictor variable with three levels.

pois <- read.csv("https://stats.idre.ucla.edu/stat/data/poisson_sim.csv")

prog is coded as 1 = “General”, 2 = “Academic” and 3 = “Vocational”.

pois$prog <- factor(pois$prog, levels=1:3, labels=c("General", "Academic", "Vocational"))

Visualization

ggplot(pois, aes(num_awards, fill = prog)) + geom_histogram(position="dodge")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Poisson regression

Poisson model analysis using the glm function.

summary(glm(num_awards ~ prog + math, family="poisson", data=pois))
## 
## Call:
## glm(formula = num_awards ~ prog + math, family = "poisson", data = pois)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2043  -0.8436  -0.5106   0.2558   2.6796  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -5.24712    0.65845  -7.969 1.60e-15 ***
## progAcademic    1.08386    0.35825   3.025  0.00248 ** 
## progVocational  0.36981    0.44107   0.838  0.40179    
## math            0.07015    0.01060   6.619 3.63e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 287.67  on 199  degrees of freedom
## Residual deviance: 189.45  on 196  degrees of freedom
## AIC: 373.5
## 
## Number of Fisher Scoring iterations: 6

Interpretation of the results

In this model the deviance residuals show skewness as the median is not 0. Deviance residuals are approximately normally distributed if the model is specified correctly.

The coefficient for math is .07. This means that the expected log count for a one-unit increase in math is .07.