Key Concepts covered in the lecture include:
1. Logistic Regression Overview
2. Other link functions: for ordinal and count data (Poisson)
3. Characteristics of the Poisson Model (mean = variance)
4. Overdispersion
5. Zero-Inflation Poisson Regression
6. What analysis to use and when awesome slide!!!
The inclass exercise is about using a generalized linear model, specifically the logistic regression.
Part 1: Fit a Poisson Model and check for overdispersion
# install.packages('faraway')#only have to do this once per computer.
library(faraway) #access the faraway library to access this week's dataset
data(gala) #access the dataset stored in faraway... how does it know to get the right one?
names(gala)
## [1] "Species" "Endemics" "Area" "Elevation" "Nearest" "Scruz"
## [7] "Adjacent"
hist(gala$Species) #The data is skewed... many low values, few high values.
# Create a Poisson model. Use family = poisson
pm1 <- glm(Species ~ Area + Elevation + Nearest + Adjacent, family = poisson,
data = gala)
summary(pm1) #We cannot interpret these results because of the natural log link function
##
## Call:
## glm(formula = Species ~ Area + Elevation + Nearest + Adjacent,
## family = poisson, data = gala)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.149 -4.035 -0.978 2.363 9.985
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.00e+00 4.98e-02 60.12 <2e-16 ***
## Area -5.83e-04 2.58e-05 -22.64 <2e-16 ***
## Elevation 3.61e-03 8.60e-05 41.91 <2e-16 ***
## Nearest -3.24e-03 1.44e-03 -2.24 0.025 *
## Adjacent -7.58e-04 2.78e-05 -27.22 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 3510.73 on 29 degrees of freedom
## Residual deviance: 813.62 on 25 degrees of freedom
## AIC: 984.5
##
## Number of Fisher Scoring iterations: 5
# Overdispersion occurs when the variance of the fitted model is greater
# than the mean, which would indicate that the Poisson model may not be
# good. Check for overdispersion:
# First, plot the output of the model to check this visually:
plot(log(fitted(pm1)), log((gala$Species - fitted(pm1))^2), xlab = expression(hat(mu)),
ylab = expression((y - hat(mu))^2))
abline(0, 1)
hist(predict(pm1))
mean(predict(pm1)) #The predict function allows us to calculate all of the estimations of the model.
## [1] 3.939
var(predict(pm1))
## [1] 0.8811
# The variance is lower than the mean... I don't know what this means... I
# don't know where to go from here. I don't know how to find the mean and
# variance of the model output#########
# We can also use a quasipoisson glm, which takes into account
# overdispersion:
pm2 <- glm(Species ~ Area + Elevation + Nearest + Adjacent, family = quasipoisson,
data = gala)
summary(pm2) #Note the change in the Dispersion parameter. I don't know what this tells us...
##
## Call:
## glm(formula = Species ~ Area + Elevation + Nearest + Adjacent,
## family = quasipoisson, data = gala)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -11.149 -4.035 -0.978 2.363 9.985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.995521 0.277488 10.80 6.7e-11 ***
## Area -0.000583 0.000143 -4.06 0.00042 ***
## Elevation 0.003606 0.000479 7.52 7.1e-08 ***
## Nearest -0.003240 0.008046 -0.40 0.69056
## Adjacent -0.000758 0.000155 -4.89 5.0e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 31.02)
##
## Null deviance: 3510.73 on 29 degrees of freedom
## Residual deviance: 813.62 on 25 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
Part 2: Fit a negative binomial model
library(MASS) #need the MASS library to do a negative binomail model
gala.nb <- glm.nb(glm(Species ~ Area + Elevation + Nearest + Adjacent, data = gala)) #Create a negative binomal model
summary(gala.nb)
##
## Call:
## glm.nb(formula = glm(Species ~ Area + Elevation + Nearest + Adjacent,
## data = gala), init.theta = 1.651652744, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.127 -0.997 -0.121 0.541 1.671
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.817638 0.237855 11.85 < 2e-16 ***
## Area -0.000646 0.000288 -2.24 0.0250 *
## Elevation 0.003933 0.000693 5.68 1.4e-08 ***
## Nearest -0.000313 0.010722 -0.03 0.9767
## Adjacent -0.000795 0.000225 -3.54 0.0004 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(1.652) family taken to be 1)
##
## Null deviance: 87.286 on 29 degrees of freedom
## Residual deviance: 33.156 on 25 degrees of freedom
## AIC: 302.6
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 1.652
## Std. Err.: 0.434
##
## 2 x log-likelihood: -290.592