expresses the probability of a given number of events occuring in a fixed interval of time or space IF these events occur with a known constant mean rate AND independently
Probability of occcurrence of an event in a given time interval is proportional to the length of that time interval and independent of the occurence of other events.
Number of events in any specified time interval will be Poisson distributed
EX:
modeling the number of incoming telephone calls to a service center or the number of earthquakes.
rate of incoming telephone calls is likely to vary with time of day, while the timing of earthquakes are unlikely to be completely independent. However it might be a good approximation
30 galapagos islands, have count of the number of species of tortoise found on each island and the number that are endemic to that island.
5 geographic variables for each island.
library(GGally)
library(faraway)
data(gala)
gala<-gala[,-2]
Fit the model
modp<-glm(Species~., family=poisson, gala)
summary(modp)
##
## Call:
## glm(formula = Species ~ ., family = poisson, data = gala)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.2752 -4.4966 -0.9443 1.9168 10.1849
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.155e+00 5.175e-02 60.963 < 2e-16 ***
## Area -5.799e-04 2.627e-05 -22.074 < 2e-16 ***
## Elevation 3.541e-03 8.741e-05 40.507 < 2e-16 ***
## Nearest 8.826e-03 1.821e-03 4.846 1.26e-06 ***
## Scruz -5.709e-03 6.256e-04 -9.126 < 2e-16 ***
## Adjacent -6.630e-04 2.933e-05 -22.608 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 3510.73 on 29 degrees of freedom
## Residual deviance: 716.85 on 24 degrees of freedom
## AIC: 889.68
##
## Number of Fisher Scoring iterations: 5
(deviance for Poisson regression)
Use the following goodness-of-fit metric for The Poisson Regression \[1-pchisq(\frac{residual~deviance}{degrees~of~freedom})\]
pchisq(716.85,24,lower.tail = FALSE)
## [1] 7.058684e-136
We are comparing this model to the saturated model (like logistic regression), a well fitting model should have a HIGH p-value
This p-value indicates the model is not a good fit for the data.
halfnorm(residuals(modp))
There are no egregious outliers (the outliers will be in top left-bottom right and will have their row index number)
The proportion of the deviance explained by the model, similar to \(R^2\) in linear regression
\(1-\frac{residual~deviance}{null~deviance}=R^2_{dev}\)
One feature of the Poisson distribution is that the mean equals the variance. However, over- or underdispersion happens in Poisson models, where the variance is larger or smaller than the mean value, respectively. In reality, overdispersion happens more frequently with a limited amount of data.
Estimate the dispersion of the model, and use as a parameter in the summary() function
(dp <-
sum(residuals(modp,type="pearson")^2)/modp$df.res)
## [1] 31.74914
summary(modp,dispersion=dp)
##
## Call:
## glm(formula = Species ~ ., family = poisson, data = gala)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.2752 -4.4966 -0.9443 1.9168 10.1849
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.1548079 0.2915897 10.819 < 2e-16 ***
## Area -0.0005799 0.0001480 -3.918 8.95e-05 ***
## Elevation 0.0035406 0.0004925 7.189 6.53e-13 ***
## Nearest 0.0088256 0.0102621 0.860 0.390
## Scruz -0.0057094 0.0035251 -1.620 0.105
## Adjacent -0.0006630 0.0001653 -4.012 6.01e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 31.74914)
##
## Null deviance: 3510.73 on 29 degrees of freedom
## Residual deviance: 716.85 on 24 degrees of freedom
## AIC: 889.68
##
## Number of Fisher Scoring iterations: 5
Note that the p-values for the predictors are different, (they arent all significant anymore)
You can also use the drop1 function to identify significant predictors
drop1(modp,test="F")
## Warning in drop1.glm(modp, test = "F"): F test assumes 'quasipoisson' family
## Single term deletions
##
## Model:
## Species ~ Area + Elevation + Nearest + Scruz + Adjacent
## Df Deviance AIC F value Pr(>F)
## <none> 716.85 889.68
## Area 1 1204.35 1375.18 16.3217 0.0004762 ***
## Elevation 1 2389.57 2560.40 56.0028 1.007e-07 ***
## Nearest 1 739.41 910.24 0.7555 0.3933572
## Scruz 1 813.62 984.45 3.2400 0.0844448 .
## Adjacent 1 1341.45 1512.29 20.9119 0.0001230 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1