Logistic and Poisson Regression

Key Concepts covered in the lecture include:
1. Logistic Regression Overview
2. Other link functions: for ordinal and count data (Poisson)
3. Characteristics of the Poisson Model (mean = variance)
4. Overdispersion
5. Zero-Inflation Poisson Regression
6. What analysis to use and when awesome slide!!!

The inclass exercise is about using a generalized linear model, specifically the logistic regression.

Part 1: Fit a Poisson Model and check for overdispersion

# install.packages('faraway')#only have to do this once per computer.
library(faraway)  #access the faraway library to access this week's dataset

data(gala)  #access the dataset stored in faraway... how does it know to get the right one?
names(gala)

## [1] "Species"   "Endemics"  "Area"      "Elevation" "Nearest"   "Scruz"    
## [7] "Adjacent"

hist(gala$Species)  #The data is skewed...  many low values, few high values.

plot of chunk unnamed-chunk-1



# Create a Poisson model.  Use family = poisson
pm1 <- glm(Species ~ Area + Elevation + Nearest + Adjacent, family = poisson, 
    data = gala)
summary(pm1)  #We cannot interpret these results because of the natural log link function

## 
## Call:
## glm(formula = Species ~ Area + Elevation + Nearest + Adjacent, 
##     family = poisson, data = gala)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -11.149   -4.035   -0.978    2.363    9.985  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.00e+00   4.98e-02   60.12   <2e-16 ***
## Area        -5.83e-04   2.58e-05  -22.64   <2e-16 ***
## Elevation    3.61e-03   8.60e-05   41.91   <2e-16 ***
## Nearest     -3.24e-03   1.44e-03   -2.24    0.025 *  
## Adjacent    -7.58e-04   2.78e-05  -27.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3510.73  on 29  degrees of freedom
## Residual deviance:  813.62  on 25  degrees of freedom
## AIC: 984.5
## 
## Number of Fisher Scoring iterations: 5


# Overdispersion occurs when the variance of the fitted model is greater
# than the mean, which would indicate that the Poisson model may not be
# good.  Check for overdispersion:

# First, plot the output of the model to check this visually:
plot(log(fitted(pm1)), log((gala$Species - fitted(pm1))^2), xlab = expression(hat(mu)), 
    ylab = expression((y - hat(mu))^2))
abline(0, 1)

plot of chunk unnamed-chunk-1


hist(predict(pm1))

plot of chunk unnamed-chunk-1

mean(predict(pm1))  #The predict function allows us to calculate all of the estimations of the model.

## [1] 3.939

var(predict(pm1))

## [1] 0.8811


# The variance is lower than the mean... I don't know what this means... I
# don't know where to go from here.  I don't know how to find the mean and
# variance of the model output#########

# We can also use a quasipoisson glm, which takes into account
# overdispersion:
pm2 <- glm(Species ~ Area + Elevation + Nearest + Adjacent, family = quasipoisson, 
    data = gala)
summary(pm2)  #Note the change in the Dispersion parameter.  I don't know what this tells us...

## 
## Call:
## glm(formula = Species ~ Area + Elevation + Nearest + Adjacent, 
##     family = quasipoisson, data = gala)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -11.149   -4.035   -0.978    2.363    9.985  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.995521   0.277488   10.80  6.7e-11 ***
## Area        -0.000583   0.000143   -4.06  0.00042 ***
## Elevation    0.003606   0.000479    7.52  7.1e-08 ***
## Nearest     -0.003240   0.008046   -0.40  0.69056    
## Adjacent    -0.000758   0.000155   -4.89  5.0e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for quasipoisson family taken to be 31.02)
## 
##     Null deviance: 3510.73  on 29  degrees of freedom
## Residual deviance:  813.62  on 25  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

Part 2: Fit a negative binomial model

library(MASS)  #need the MASS library to do a negative binomail model

gala.nb <- glm.nb(glm(Species ~ Area + Elevation + Nearest + Adjacent, data = gala))  #Create a negative binomal model
summary(gala.nb)

## 
## Call:
## glm.nb(formula = glm(Species ~ Area + Elevation + Nearest + Adjacent, 
##     data = gala), init.theta = 1.651652744, link = log)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.127  -0.997  -0.121   0.541   1.671  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.817638   0.237855   11.85  < 2e-16 ***
## Area        -0.000646   0.000288   -2.24   0.0250 *  
## Elevation    0.003933   0.000693    5.68  1.4e-08 ***
## Nearest     -0.000313   0.010722   -0.03   0.9767    
## Adjacent    -0.000795   0.000225   -3.54   0.0004 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## (Dispersion parameter for Negative Binomial(1.652) family taken to be 1)
## 
##     Null deviance: 87.286  on 29  degrees of freedom
## Residual deviance: 33.156  on 25  degrees of freedom
## AIC: 302.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.652 
##           Std. Err.:  0.434 
## 
##  2 x log-likelihood:  -290.592