To create a model about the number of times an event occurs in an interval of time or space (like number of people, number of cases, etcetera) we need a special distribution called Poisson.
This time we will use this kind of distribution to create a model in which we fit the total number of roadkills of amphibious along a roadway of Portugal (Zuur et al. 2013). We want to see the relationship between the distance to a local park and the total number of dead amphibious, being the distance to that park (D.PARK) the explanatory variable.
| Sector | X | Y | BufoCalamita | TOT.N | S.RICH | OPEN.L | OLIVE | MONT.S | MONT | POLIC | SHRUB | URBAN | WAT.RES | L.WAT.C | L.D.ROAD | L.P.ROAD | D.WAT.RES | D.WAT.COUR | D.PARK | N.PATCH | P.EDGE | L.SDI |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 260181 | 256546 | 5 | 22 | 3 | 22.684 | 60.333 | 0.000 | 0.653 | 4.811 | 0.406 | 7.787 | 0.043 | 0.583 | 3330.189 | 1.975 | 252.113 | 735.000 | 250.214 | 122 | 553.936 | 1.801 |
| 2 | 259914 | 256124 | 1 | 14 | 4 | 24.657 | 40.832 | 0.000 | 0.161 | 2.224 | 0.735 | 27.150 | 0.182 | 1.419 | 2587.498 | 1.761 | 139.573 | 134.052 | 741.179 | 96 | 457.142 | 1.886 |
| 3 | 259672 | 255688 | 40 | 65 | 6 | 30.121 | 23.710 | 0.258 | 10.918 | 1.946 | 0.474 | 28.086 | 0.453 | 2.005 | 2149.651 | 1.250 | 59.168 | 269.029 | 1240.080 | 67 | 432.360 | 1.930 |
| 4 | 259454 | 255238 | 27 | 55 | 5 | 50.277 | 14.940 | 1.783 | 26.454 | 0.625 | 0.607 | 0.831 | 0.026 | 1.924 | 4222.983 | 0.666 | 277.842 | 48.751 | 1739.885 | 63 | 421.292 | 1.865 |
| 5 | 259307 | 254763 | 67 | 88 | 4 | 43.609 | 35.353 | 2.431 | 11.330 | 0.791 | 0.173 | 2.452 | 0.000 | 2.167 | 2219.302 | 0.653 | 967.808 | 126.102 | 2232.130 | 59 | 407.573 | 1.818 |
| 6 | 259189 | 254277 | 56 | 104 | 7 | 31.385 | 17.666 | 0.000 | 43.678 | 0.054 | 0.325 | 2.730 | 0.039 | 2.391 | 1005.629 | 1.309 | 560.000 | 344.444 | 2724.089 | 49 | 420.289 | 1.799 |
| 7 | 259092 | 253786 | 27 | 49 | 7 | 24.810 | 9.786 | 0.000 | 60.660 | 0.022 | 0.055 | 1.001 | 0.114 | 1.165 | 3735.189 | 0.685 | 93.147 | 95.133 | 3215.511 | 35 | 298.617 | 1.593 |
| 8 | 258993 | 253296 | 37 | 66 | 7 | 56.228 | 1.619 | 0.000 | 25.280 | 11.263 | 0.092 | 0.083 | 0.224 | 2.428 | 4891.466 | 0.677 | 88.971 | 243.230 | 3709.401 | 55 | 442.750 | 1.627 |
| 9 | 258880 | 252809 | 8 | 26 | 7 | 48.735 | 0.603 | 1.108 | 41.722 | 1.238 | 1.744 | 0.185 | 0.177 | 2.416 | 4010.961 | 0.664 | 684.887 | 187.084 | 4206.477 | 52 | 431.084 | 1.801 |
| 10 | 258767 | 252322 | 16 | 47 | 6 | 15.633 | 0.000 | 0.000 | 82.069 | 0.119 | 0.000 | 0.361 | 0.000 | 0.211 | 2885.246 | 0.654 | 794.000 | 236.004 | 4704.176 | 26 | 220.231 | 1.440 |
ggplot(RK, aes(x = D.PARK, y = TOT.N)) +
geom_point() + labs(title = "Roadkills vs. Distance to the park") + theme_minimal()
As we can see in the scatter plot, the data shows a Poisson distribution. This kind of distribution is hard to fit to a classical regression line, since the residuals will be too high. We need to create a GLM model.
M1 <- glm(TOT.N ~ D.PARK, family = poisson, data = RK)
summary(M1)
##
## Call:
## glm(formula = TOT.N ~ D.PARK, family = poisson, data = RK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -8.1100 -1.6950 -0.4708 1.4206 7.3337
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.316e+00 4.322e-02 99.87 <2e-16 ***
## D.PARK -1.059e-04 4.387e-06 -24.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1071.4 on 51 degrees of freedom
## Residual deviance: 390.9 on 50 degrees of freedom
## AIC: 634.29
##
## Number of Fisher Scoring iterations: 4
# Pseudo R squared of the model
100 * (1071.4 - 390.9)/ 1071.4
## [1] 63.51503
The slope and the intercept of the model are statistically significants (p < 0.001). The AIC value is 634.29 (we’ll need it to compare it with differents models in the future). We can explain a 63% of the variance with our GLM model, which is a reasonable amount of information.
MyData <- data.frame(D.PARK = seq(from = 0, to = 25000, by = 1000))
G <- predict(M1, newdata = MyData, type = "link", se = TRUE)
Fi <- exp(G$fit)
FSEUP <- exp(G$fit + 1.96 * G$se.fit)
FSELOW <- exp(G$fit - 1.96 * G$se.fit)
plot(RK$D.PARK, RK$TOT.N,
xlab = "Distance to the park", ylab = "Killroads", main = "Killroads vs. Distance to the park")
lines(MyData$D.PARK, Fi,lty = 1)
lines(MyData$D.PARK, FSEUP, lty = 2, col = "red")
lines(MyData$D.PARK, FSELOW, lty = 2, col = "red")
First of all, we have to transform the variables which has outliers. We will use a square transformation. Then, we will create a GLM model with a Poisson distribution without interactions
RK$SQ.POLIC <- sqrt(RK$POLIC)
RK$SQ.WATRES <- sqrt(RK$WAT.RES)
RK$SQ.LPROAD <- sqrt(RK$L.P.ROAD)
RK$SQ.SHRUB <- sqrt(RK$SHRUB)
RK$SQ.DWATCOUR <- sqrt(RK$D.WAT.COUR)
M2 <- glm(TOT.N ~ OPEN.L + MONT.S + SQ.POLIC +
SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
SQ.DWATCOUR + D.PARK, family = poisson, data = RK)
summary(M2)
##
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB +
## SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, family = poisson,
## data = RK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.8398 -1.3965 -0.1409 1.4641 4.3749
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.749e+00 1.567e-01 23.935 < 2e-16 ***
## OPEN.L -3.025e-03 1.580e-03 -1.915 0.055531 .
## MONT.S 8.697e-02 1.359e-02 6.398 1.57e-10 ***
## SQ.POLIC -1.787e-01 4.676e-02 -3.822 0.000133 ***
## SQ.SHRUB -6.112e-01 1.176e-01 -5.197 2.02e-07 ***
## SQ.WATRES 2.243e-01 7.050e-02 3.181 0.001468 **
## L.WAT.C 3.355e-01 4.127e-02 8.128 4.36e-16 ***
## SQ.LPROAD 4.517e-01 1.348e-01 3.351 0.000804 ***
## SQ.DWATCOUR 7.355e-03 4.879e-03 1.508 0.131629
## D.PARK -1.301e-04 5.936e-06 -21.923 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1071.44 on 51 degrees of freedom
## Residual deviance: 270.23 on 42 degrees of freedom
## AIC: 529.62
##
## Number of Fisher Scoring iterations: 5
There are no significant variables. In order to decide which variables we have to eliminate we will use two kinds of selections: 1. Step method using both forward and step methods 2. Manual analysis
summary(step(M2, trace = 0))
##
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB +
## SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, family = poisson,
## data = RK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.8398 -1.3965 -0.1409 1.4641 4.3749
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.749e+00 1.567e-01 23.935 < 2e-16 ***
## OPEN.L -3.025e-03 1.580e-03 -1.915 0.055531 .
## MONT.S 8.697e-02 1.359e-02 6.398 1.57e-10 ***
## SQ.POLIC -1.787e-01 4.676e-02 -3.822 0.000133 ***
## SQ.SHRUB -6.112e-01 1.176e-01 -5.197 2.02e-07 ***
## SQ.WATRES 2.243e-01 7.050e-02 3.181 0.001468 **
## L.WAT.C 3.355e-01 4.127e-02 8.128 4.36e-16 ***
## SQ.LPROAD 4.517e-01 1.348e-01 3.351 0.000804 ***
## SQ.DWATCOUR 7.355e-03 4.879e-03 1.508 0.131629
## D.PARK -1.301e-04 5.936e-06 -21.923 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1071.44 on 51 degrees of freedom
## Residual deviance: 270.23 on 42 degrees of freedom
## AIC: 529.62
##
## Number of Fisher Scoring iterations: 5
There are some variables which are not significant. Eliminating them have to be considered.
drop1(M2, test = "Chi")
## Single term deletions
##
## Model:
## TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C +
## SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Df Deviance AIC LRT Pr(>Chi)
## <none> 270.23 529.62
## OPEN.L 1 273.93 531.32 3.69 0.0546474 .
## MONT.S 1 306.89 564.28 36.66 1.410e-09 ***
## SQ.POLIC 1 285.53 542.92 15.30 9.181e-05 ***
## SQ.SHRUB 1 298.31 555.70 28.08 1.167e-07 ***
## SQ.WATRES 1 280.02 537.41 9.79 0.0017539 **
## L.WAT.C 1 335.47 592.86 65.23 6.648e-16 ***
## SQ.LPROAD 1 281.25 538.64 11.02 0.0009009 ***
## SQ.DWATCOUR 1 272.50 529.89 2.27 0.1319862
## D.PARK 1 838.09 1095.48 567.85 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We should eliminate the SQ.DWATCOUR variable at least. Now we gonna test our model without those variables and see how it improved.
M2b<-glm(TOT.N~OPEN.L + MONT.S + SQ.POLIC +
SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD+
D.PARK, family = poisson, data = RK)
summary(M2b)
##
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB +
## SQ.WATRES + L.WAT.C + SQ.LPROAD + D.PARK, family = poisson,
## data = RK)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.1443 -1.4118 -0.3607 1.3934 4.3559
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.852e+00 1.405e-01 27.405 < 2e-16 ***
## OPEN.L -3.464e-03 1.544e-03 -2.243 0.024890 *
## MONT.S 8.927e-02 1.344e-02 6.643 3.08e-11 ***
## SQ.POLIC -1.583e-01 4.492e-02 -3.525 0.000424 ***
## SQ.SHRUB -6.160e-01 1.185e-01 -5.200 2.00e-07 ***
## SQ.WATRES 2.080e-01 6.917e-02 3.006 0.002645 **
## L.WAT.C 3.113e-01 3.817e-02 8.155 3.50e-16 ***
## SQ.LPROAD 4.936e-01 1.315e-01 3.753 0.000175 ***
## D.PARK -1.286e-04 5.837e-06 -22.033 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 1071.4 on 51 degrees of freedom
## Residual deviance: 272.5 on 43 degrees of freedom
## AIC: 529.89
##
## Number of Fisher Scoring iterations: 5
Now all the variables are statiscally significants.
anova(M2, M2b, test = "Chi")
## Analysis of Deviance Table
##
## Model 1: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C +
## SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Model 2: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C +
## SQ.LPROAD + D.PARK
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 42 270.23
## 2 43 272.50 -1 -2.269 0.132
There are no significant differences between the two models (p = 0.132, Dif deviance = 2.269). However, we will continue our analysis with the model without the SQ.DWATCOUR variable, since this is the most parsimonious model.
M3 <- glm(TOT.N ~ MONT.S + SQ.POLIC + D.PARK +
SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
SQ.DWATCOUR, family = poisson, data = RK)
anova(M2, M3, test = "Chi")
## Analysis of Deviance Table
##
## Model 1: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C +
## SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Model 2: TOT.N ~ MONT.S + SQ.POLIC + D.PARK + SQ.SHRUB + SQ.WATRES + L.WAT.C +
## SQ.LPROAD + SQ.DWATCOUR
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 42 270.23
## 2 43 273.93 -1 -3.6928 0.05465 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This problem happens when the variance is greater than the mean. A value greater than 1 would mean a problem.
270.23/42
## [1] 6.434048
The model has a problem of overdispersion. We should check the assumptions of the model to see whether something is wrong or not.
op <- par(mfrow = c(2, 2))
plot(M2b)
EP <- resid(M2b, type = "pearson")
ED <- resid(M2b, type = "deviance")
par(op)
mu <- predict(M2b, type = "response")
E <- RK$TOT.N - mu
EP2 <- E/sqrt(6.337209 * mu)
op <- par(mfrow=c(2,2))
plot(x = mu, y = E, main = "Response residuals")
plot(x = mu, y = EP, main = "Pearson residuals")
plot(x = mu, y = EP2, main = "Pearson residuals scaled")
plot(x = mu, y = ED, main = "Deviance residuals")
par(op)
Here we observe an anormal ditribution of the residuals. The most likely reason is an overdispersion problem. We will try to fix this with another kind of distribution: the negative binomial distribution.
This is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a number of specified failures occur.
M6 <- glm.nb(TOT.N ~ OPEN.L + MONT.S + SQ.POLIC +
SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
SQ.DWATCOUR + D.PARK, link = "log", data = RK)
summary(M6, cor = FALSE)
##
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB +
## SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, data = RK,
## link = "log", init.theta = 5.517795385)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.97265 -0.64463 -0.03337 0.62007 1.48627
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.951e+00 4.145e-01 9.532 <2e-16 ***
## OPEN.L -9.419e-03 3.245e-03 -2.903 0.0037 **
## MONT.S 5.846e-02 3.481e-02 1.679 0.0931 .
## SQ.POLIC -4.618e-02 1.298e-01 -0.356 0.7221
## SQ.SHRUB -3.881e-01 2.883e-01 -1.346 0.1784
## SQ.WATRES 1.631e-01 1.675e-01 0.974 0.3301
## L.WAT.C 2.076e-01 9.636e-02 2.154 0.0312 *
## SQ.LPROAD 5.944e-01 3.214e-01 1.850 0.0644 .
## SQ.DWATCOUR -1.489e-05 1.139e-02 -0.001 0.9990
## D.PARK -1.235e-04 1.292e-05 -9.557 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(5.5178) family taken to be 1)
##
## Null deviance: 213.674 on 51 degrees of freedom
## Residual deviance: 51.803 on 42 degrees of freedom
## AIC: 390.11
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 5.52
## Std. Err.: 1.41
##
## 2 x log-likelihood: -368.107
summary(step(M6, trace = 0))
##
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK,
## data = RK, init.theta = 4.979895134, link = "log")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.08218 -0.70494 -0.09268 0.55575 1.67860
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.032e+00 3.363e-01 11.989 < 2e-16 ***
## OPEN.L -1.085e-02 3.122e-03 -3.475 0.00051 ***
## L.WAT.C 1.597e-01 7.852e-02 2.034 0.04195 *
## SQ.LPROAD 4.924e-01 3.101e-01 1.588 0.11231
## D.PARK -1.154e-04 1.061e-05 -10.878 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
##
## Null deviance: 197.574 on 51 degrees of freedom
## Residual deviance: 51.329 on 47 degrees of freedom
## AIC: 383.54
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.98
## Std. Err.: 1.22
##
## 2 x log-likelihood: -371.542
summary(stepAIC(M6, trace = 0))
##
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK,
## data = RK, init.theta = 4.979895134, link = "log")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.08218 -0.70494 -0.09268 0.55575 1.67860
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.032e+00 3.363e-01 11.989 < 2e-16 ***
## OPEN.L -1.085e-02 3.122e-03 -3.475 0.00051 ***
## L.WAT.C 1.597e-01 7.852e-02 2.034 0.04195 *
## SQ.LPROAD 4.924e-01 3.101e-01 1.588 0.11231
## D.PARK -1.154e-04 1.061e-05 -10.878 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
##
## Null deviance: 197.574 on 51 degrees of freedom
## Residual deviance: 51.329 on 47 degrees of freedom
## AIC: 383.54
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.98
## Std. Err.: 1.22
##
## 2 x log-likelihood: -371.542
Both methods have choosen the variables OPEN.L, L.WAT.C, SQ.LPROAD y D.PARK. The variable SQ.LPROAD is not significant and L.WAT.C is slightly significant but it can be included.
M7 <- glm.nb(TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, link = "log", data = RK)
drop1(M7, test = "Chi")
## Single term deletions
##
## Model:
## TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK
## Df Deviance AIC LRT Pr(>Chi)
## <none> 51.329 381.54
## OPEN.L 1 63.331 391.54 12.002 0.0005316 ***
## L.WAT.C 1 55.384 383.60 4.055 0.0440383 *
## SQ.LPROAD 1 54.086 382.30 2.757 0.0968442 .
## D.PARK 1 172.847 501.06 121.518 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We should eliminate both variables to improve our model.
M7b <- glm.nb(TOT.N ~ OPEN.L + L.WAT.C + D.PARK,link = "log", data = RK)
M7c <- glm.nb(TOT.N ~ OPEN.L + D.PARK, link = "log", data = RK)
anova(M7b, M7c)
## Likelihood ratio tests of Negative Binomial Models
##
## Response: TOT.N
## Model theta Resid. df 2 x log-lik. Test
## 1 OPEN.L + D.PARK 4.132845 49 -379.4317
## 2 OPEN.L + L.WAT.C + D.PARK 4.725770 48 -374.2536 1 vs 2
## df LR stat. Pr(Chi)
## 1
## 2 1 5.178081 0.02287358
summary(M7c)
##
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + D.PARK, data = RK, link = "log",
## init.theta = 4.13284466)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5880 -0.7878 -0.1676 0.3721 2.4369
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.6717034 0.1641768 28.455 <2e-16 ***
## OPEN.L -0.0093591 0.0031952 -2.929 0.0034 **
## D.PARK -0.0001119 0.0000113 -9.901 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.1328) family taken to be 1)
##
## Null deviance: 170.661 on 51 degrees of freedom
## Residual deviance: 51.839 on 49 degrees of freedom
## AIC: 387.43
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.133
## Std. Err.: 0.980
##
## 2 x log-likelihood: -379.432
51.839/49
## [1] 1.057939
Close to 1, so it is not supposed to be a problem.
M8 <- glm.nb(TOT.N ~ OPEN.L + D.PARK, link = "log", data = RK)
summary(M8)
##
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + D.PARK, data = RK, link = "log",
## init.theta = 4.13284466)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5880 -0.7878 -0.1676 0.3721 2.4369
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.6717034 0.1641768 28.455 <2e-16 ***
## OPEN.L -0.0093591 0.0031952 -2.929 0.0034 **
## D.PARK -0.0001119 0.0000113 -9.901 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(4.1328) family taken to be 1)
##
## Null deviance: 170.661 on 51 degrees of freedom
## Residual deviance: 51.839 on 49 degrees of freedom
## AIC: 387.43
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 4.133
## Std. Err.: 0.980
##
## 2 x log-likelihood: -379.432
drop1(M8, test = "Chi")
## Single term deletions
##
## Model:
## TOT.N ~ OPEN.L + D.PARK
## Df Deviance AIC LRT Pr(>Chi)
## <none> 51.839 385.43
## OPEN.L 1 59.731 391.32 7.891 0.004967 **
## D.PARK 1 154.595 486.19 102.756 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Both variables are significant.
par(mfrow = c(2, 2))
plot(M8)
There is no problem with the model.
The farther from the park (D.PARK), the lesser the number of roadkills (TOT.N). In addition to this, the open spaces (OPEN.L) are significantly the safest.