This week we worked on Poisson Regression.
Assumptions:
1. Response variable is a count per unit of time or space, described by a poisson distribution.
2. The observations must be independent of one another.
3. The mean of a poisson random variable must be equal to its variance.
4. The log of the mean rate lambda must be a linear function of x.
Here is what the regression formula looks like:
\[log(\lambda_i)=\beta_0+\beta_1X_i\]
R Code To do Poisson Regression in R, use the glm function similarily to the way we would use the lm function.
crabdat<-read.csv("http://www.cknudson.com/data/crabs.csv")
m1<-glm(satell~color,data = crabdat,family = poisson)
It is important to remember to specifiy the family, i.e., Poisson.
Lack of Model fit
Here are just a few ways to look for lack of model fit in the Poisson glm setting.
summary(m1)
##
## Call:
## glm(formula = satell ~ color, family = poisson, data = crabdat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8577 -2.1106 -0.1649 0.8721 4.7491
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.80078 0.10102 7.927 2.24e-15 ***
## colordarker -0.08516 0.18007 -0.473 0.636279
## colorlight 0.60614 0.17496 3.464 0.000532 ***
## colormedium 0.39155 0.11575 3.383 0.000718 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 632.79 on 172 degrees of freedom
## Residual deviance: 609.14 on 169 degrees of freedom
## AIC: 972.44
##
## Number of Fisher Scoring iterations: 6
disp<-sum(residuals(m1,type = "pearson")^2/m1$df.residual)
disp
## [1] 3.454472
So in this case the dispersion parameter is estimated to be about 3.45. Thus, we would want to multiply the standard errors of the estimated coefficients by the square root of this dispersion parameter.
Use the halfnorm() command to compare the absolute Poisson residuals against the quantiles of half a normal distribution. With Poisson residuals, we are looking for points that don’t follow the trend rather than looking to see if the points follow a straight line. This is because we don’t think that the residuals will be normally distributed.
library(faraway)
## Warning: package 'faraway' was built under R version 3.6.3
halfnorm(residuals(m1))
Interpreting Coefficients
- When interpreting estimated coefficients, exponentiate the estimated parameter to find out the multiplicative change in the mean number of the response variable. This is assuming a 1 unit increase in the corresponding explanatory variable while holding all else constant.
Drop in Deviance Tests
- This is used to compare nested models, assuming that the mean=variance.
- Test stat is drop in deviance between the two models.
- This follows a Chi-Square Distribution with df= difference in the # of parameters.
- Small p-values indicate that the more complicated model is better.
- This can be done with the anova() command, specifying the test=“Chisq”
- If overdispersion is present, divide the drop in deviance by the estimated overdispersion to get a different test stat.
- This test stat follows an F distribution with:
- df1 = # parameters dropped and df2= n -(# parameters in the bigger model)