Weekly Summary3

Poisson Regression

This week we worked on Poisson Regression.

Assumptions:
1. Response variable is a count per unit of time or space, described by a poisson distribution.
2. The observations must be independent of one another.
3. The mean of a poisson random variable must be equal to its variance.
4. The log of the mean rate lambda must be a linear function of x.

Here is what the regression formula looks like:
\[log(\lambda_i)=\beta_0+\beta_1X_i\]

R Code To do Poisson Regression in R, use the glm function similarily to the way we would use the lm function.

crabdat<-read.csv("http://www.cknudson.com/data/crabs.csv")
m1<-glm(satell~color,data = crabdat,family = poisson)

It is important to remember to specifiy the family, i.e., Poisson.

Lack of Model fit
Here are just a few ways to look for lack of model fit in the Poisson glm setting.

This first is through the Residual Deviance, which is a generalization of the Residual Sum of Squares. As such, we hope to see a small Residual Deviance to indicate good fit. We can find this near the bottom of the summary output. To test the Residual Deviance, use a Chi-Sqaured distribution with df equal to n-(# of parameters). A small p-value indicates poor fit.

summary(m1)

## 
## Call:
## glm(formula = satell ~ color, family = poisson, data = crabdat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8577  -2.1106  -0.1649   0.8721   4.7491  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.80078    0.10102   7.927 2.24e-15 ***
## colordarker -0.08516    0.18007  -0.473 0.636279    
## colorlight   0.60614    0.17496   3.464 0.000532 ***
## colormedium  0.39155    0.11575   3.383 0.000718 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 609.14  on 169  degrees of freedom
## AIC: 972.44
## 
## Number of Fisher Scoring iterations: 6

An assumption for Poisson Regression is the the variance is equal to the mean. If not, we may have overdispersion. Here is how to estimate overdispersion.

disp<-sum(residuals(m1,type = "pearson")^2/m1$df.residual)
disp

## [1] 3.454472

So in this case the dispersion parameter is estimated to be about 3.45. Thus, we would want to multiply the standard errors of the estimated coefficients by the square root of this dispersion parameter.

Use the halfnorm() command to compare the absolute Poisson residuals against the quantiles of half a normal distribution. With Poisson residuals, we are looking for points that don’t follow the trend rather than looking to see if the points follow a straight line. This is because we don’t think that the residuals will be normally distributed.

library(faraway)

## Warning: package 'faraway' was built under R version 3.6.3

halfnorm(residuals(m1))

Interpreting Coefficients
- When interpreting estimated coefficients, exponentiate the estimated parameter to find out the multiplicative change in the mean number of the response variable. This is assuming a 1 unit increase in the corresponding explanatory variable while holding all else constant.

Drop in Deviance Tests
- This is used to compare nested models, assuming that the mean=variance.
- Test stat is drop in deviance between the two models.
- This follows a Chi-Square Distribution with df= difference in the # of parameters.
- Small p-values indicate that the more complicated model is better.
- This can be done with the anova() command, specifying the test=“Chisq”
- If overdispersion is present, divide the drop in deviance by the estimated overdispersion to get a different test stat.
- This test stat follows an F distribution with:
- df1 = # parameters dropped and df2= n -(# parameters in the bigger model)

Weekly Summary3

Sam K

2/26/2021

Poisson Regression