Poisson distribution model

To create a model about the number of times an event occurs in an interval of time or space (like number of people, number of cases, etcetera) we need a special distribution called Poisson.

This time we will use this kind of distribution to create a model in which we fit the total number of roadkills of amphibious along a roadway of Portugal (Zuur et al. 2013). We want to see the relationship between the distance to a local park and the total number of dead amphibious, being the distance to that park (D.PARK) the explanatory variable.

Sector X Y BufoCalamita TOT.N S.RICH OPEN.L OLIVE MONT.S MONT POLIC SHRUB URBAN WAT.RES L.WAT.C L.D.ROAD L.P.ROAD D.WAT.RES D.WAT.COUR D.PARK N.PATCH P.EDGE L.SDI
1 260181 256546 5 22 3 22.684 60.333 0.000 0.653 4.811 0.406 7.787 0.043 0.583 3330.189 1.975 252.113 735.000 250.214 122 553.936 1.801
2 259914 256124 1 14 4 24.657 40.832 0.000 0.161 2.224 0.735 27.150 0.182 1.419 2587.498 1.761 139.573 134.052 741.179 96 457.142 1.886
3 259672 255688 40 65 6 30.121 23.710 0.258 10.918 1.946 0.474 28.086 0.453 2.005 2149.651 1.250 59.168 269.029 1240.080 67 432.360 1.930
4 259454 255238 27 55 5 50.277 14.940 1.783 26.454 0.625 0.607 0.831 0.026 1.924 4222.983 0.666 277.842 48.751 1739.885 63 421.292 1.865
5 259307 254763 67 88 4 43.609 35.353 2.431 11.330 0.791 0.173 2.452 0.000 2.167 2219.302 0.653 967.808 126.102 2232.130 59 407.573 1.818
6 259189 254277 56 104 7 31.385 17.666 0.000 43.678 0.054 0.325 2.730 0.039 2.391 1005.629 1.309 560.000 344.444 2724.089 49 420.289 1.799
7 259092 253786 27 49 7 24.810 9.786 0.000 60.660 0.022 0.055 1.001 0.114 1.165 3735.189 0.685 93.147 95.133 3215.511 35 298.617 1.593
8 258993 253296 37 66 7 56.228 1.619 0.000 25.280 11.263 0.092 0.083 0.224 2.428 4891.466 0.677 88.971 243.230 3709.401 55 442.750 1.627
9 258880 252809 8 26 7 48.735 0.603 1.108 41.722 1.238 1.744 0.185 0.177 2.416 4010.961 0.664 684.887 187.084 4206.477 52 431.084 1.801
10 258767 252322 16 47 6 15.633 0.000 0.000 82.069 0.119 0.000 0.361 0.000 0.211 2885.246 0.654 794.000 236.004 4704.176 26 220.231 1.440
ggplot(RK, aes(x = D.PARK, y = TOT.N)) +
  geom_point() + labs(title = "Roadkills vs. Distance to the park") + theme_minimal()

As we can see in the scatter plot, the data shows a Poisson distribution. This kind of distribution is hard to fit to a classical regression line, since the residuals will be too high. We need to create a GLM model.

1. Creation of the GLM model

M1 <- glm(TOT.N ~ D.PARK, family = poisson, data = RK) 
summary(M1)
## 
## Call:
## glm(formula = TOT.N ~ D.PARK, family = poisson, data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.1100  -1.6950  -0.4708   1.4206   7.3337  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.316e+00  4.322e-02   99.87   <2e-16 ***
## D.PARK      -1.059e-04  4.387e-06  -24.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.4  on 51  degrees of freedom
## Residual deviance:  390.9  on 50  degrees of freedom
## AIC: 634.29
## 
## Number of Fisher Scoring iterations: 4
# Pseudo R squared of the model
100 * (1071.4 - 390.9)/ 1071.4
## [1] 63.51503

The slope and the intercept of the model are statistically significants (p < 0.001). The AIC value is 634.29 (we’ll need it to compare it with differents models in the future). We can explain a 63% of the variance with our GLM model, which is a reasonable amount of information.

2. Plotting the model

MyData <- data.frame(D.PARK = seq(from = 0, to = 25000, by = 1000))
G <- predict(M1, newdata = MyData, type = "link", se = TRUE)
Fi <- exp(G$fit)

FSEUP <- exp(G$fit + 1.96 * G$se.fit)
FSELOW <- exp(G$fit - 1.96 * G$se.fit)
plot(RK$D.PARK, RK$TOT.N, 
     xlab = "Distance to the park", ylab = "Killroads", main = "Killroads vs. Distance to the park")
lines(MyData$D.PARK, Fi,lty = 1)
lines(MyData$D.PARK, FSEUP, lty = 2, col = "red")
lines(MyData$D.PARK, FSELOW, lty = 2, col = "red")

3. Variable selection

First of all, we have to transform the variables which has outliers. We will use a square transformation. Then, we will create a GLM model with a Poisson distribution without interactions

RK$SQ.POLIC <- sqrt(RK$POLIC)
RK$SQ.WATRES <- sqrt(RK$WAT.RES)
RK$SQ.LPROAD <- sqrt(RK$L.P.ROAD)
RK$SQ.SHRUB <- sqrt(RK$SHRUB)
RK$SQ.DWATCOUR <- sqrt(RK$D.WAT.COUR)

M2 <- glm(TOT.N ~ OPEN.L + MONT.S + SQ.POLIC +
          SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
          SQ.DWATCOUR + D.PARK, family = poisson, data = RK)
summary(M2)
## 
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, family = poisson, 
##     data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.8398  -1.3965  -0.1409   1.4641   4.3749  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.749e+00  1.567e-01  23.935  < 2e-16 ***
## OPEN.L      -3.025e-03  1.580e-03  -1.915 0.055531 .  
## MONT.S       8.697e-02  1.359e-02   6.398 1.57e-10 ***
## SQ.POLIC    -1.787e-01  4.676e-02  -3.822 0.000133 ***
## SQ.SHRUB    -6.112e-01  1.176e-01  -5.197 2.02e-07 ***
## SQ.WATRES    2.243e-01  7.050e-02   3.181 0.001468 ** 
## L.WAT.C      3.355e-01  4.127e-02   8.128 4.36e-16 ***
## SQ.LPROAD    4.517e-01  1.348e-01   3.351 0.000804 ***
## SQ.DWATCOUR  7.355e-03  4.879e-03   1.508 0.131629    
## D.PARK      -1.301e-04  5.936e-06 -21.923  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.44  on 51  degrees of freedom
## Residual deviance:  270.23  on 42  degrees of freedom
## AIC: 529.62
## 
## Number of Fisher Scoring iterations: 5

There are no significant variables. In order to decide which variables we have to eliminate we will use two kinds of selections: 1. Step method using both forward and step methods 2. Manual analysis

3.1. Step method

  1. Forward
summary(step(M2, trace = 0))
## 
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, family = poisson, 
##     data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.8398  -1.3965  -0.1409   1.4641   4.3749  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.749e+00  1.567e-01  23.935  < 2e-16 ***
## OPEN.L      -3.025e-03  1.580e-03  -1.915 0.055531 .  
## MONT.S       8.697e-02  1.359e-02   6.398 1.57e-10 ***
## SQ.POLIC    -1.787e-01  4.676e-02  -3.822 0.000133 ***
## SQ.SHRUB    -6.112e-01  1.176e-01  -5.197 2.02e-07 ***
## SQ.WATRES    2.243e-01  7.050e-02   3.181 0.001468 ** 
## L.WAT.C      3.355e-01  4.127e-02   8.128 4.36e-16 ***
## SQ.LPROAD    4.517e-01  1.348e-01   3.351 0.000804 ***
## SQ.DWATCOUR  7.355e-03  4.879e-03   1.508 0.131629    
## D.PARK      -1.301e-04  5.936e-06 -21.923  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.44  on 51  degrees of freedom
## Residual deviance:  270.23  on 42  degrees of freedom
## AIC: 529.62
## 
## Number of Fisher Scoring iterations: 5

There are some variables which are not significant. Eliminating them have to be considered.

  1. Drop
drop1(M2, test = "Chi")
## Single term deletions
## 
## Model:
## TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR + D.PARK
##             Df Deviance     AIC    LRT  Pr(>Chi)    
## <none>           270.23  529.62                     
## OPEN.L       1   273.93  531.32   3.69 0.0546474 .  
## MONT.S       1   306.89  564.28  36.66 1.410e-09 ***
## SQ.POLIC     1   285.53  542.92  15.30 9.181e-05 ***
## SQ.SHRUB     1   298.31  555.70  28.08 1.167e-07 ***
## SQ.WATRES    1   280.02  537.41   9.79 0.0017539 ** 
## L.WAT.C      1   335.47  592.86  65.23 6.648e-16 ***
## SQ.LPROAD    1   281.25  538.64  11.02 0.0009009 ***
## SQ.DWATCOUR  1   272.50  529.89   2.27 0.1319862    
## D.PARK       1   838.09 1095.48 567.85 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We should eliminate the SQ.DWATCOUR variable at least. Now we gonna test our model without those variables and see how it improved.

M2b<-glm(TOT.N~OPEN.L + MONT.S + SQ.POLIC +
           SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD+
           D.PARK, family = poisson, data = RK)
summary(M2b)
## 
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + D.PARK, family = poisson, 
##     data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.1443  -1.4118  -0.3607   1.3934   4.3559  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.852e+00  1.405e-01  27.405  < 2e-16 ***
## OPEN.L      -3.464e-03  1.544e-03  -2.243 0.024890 *  
## MONT.S       8.927e-02  1.344e-02   6.643 3.08e-11 ***
## SQ.POLIC    -1.583e-01  4.492e-02  -3.525 0.000424 ***
## SQ.SHRUB    -6.160e-01  1.185e-01  -5.200 2.00e-07 ***
## SQ.WATRES    2.080e-01  6.917e-02   3.006 0.002645 ** 
## L.WAT.C      3.113e-01  3.817e-02   8.155 3.50e-16 ***
## SQ.LPROAD    4.936e-01  1.315e-01   3.753 0.000175 ***
## D.PARK      -1.286e-04  5.837e-06 -22.033  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.4  on 51  degrees of freedom
## Residual deviance:  272.5  on 43  degrees of freedom
## AIC: 529.89
## 
## Number of Fisher Scoring iterations: 5

Now all the variables are statiscally significants.

3.2. Manual analysis

anova(M2, M2b, test = "Chi")
## Analysis of Deviance Table
## 
## Model 1: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Model 2: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + D.PARK
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        42     270.23                     
## 2        43     272.50 -1   -2.269    0.132

There are no significant differences between the two models (p = 0.132, Dif deviance = 2.269). However, we will continue our analysis with the model without the SQ.DWATCOUR variable, since this is the most parsimonious model.

M3 <- glm(TOT.N ~ MONT.S + SQ.POLIC + D.PARK +
            SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
            SQ.DWATCOUR, family = poisson, data = RK)
anova(M2, M3, test = "Chi")
## Analysis of Deviance Table
## 
## Model 1: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Model 2: TOT.N ~ MONT.S + SQ.POLIC + D.PARK + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        42     270.23                       
## 2        43     273.93 -1  -3.6928  0.05465 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4- Overdispersion

This problem happens when the variance is greater than the mean. A value greater than 1 would mean a problem.

270.23/42
## [1] 6.434048

The model has a problem of overdispersion. We should check the assumptions of the model to see whether something is wrong or not.

5- Validation

op <- par(mfrow = c(2, 2))
plot(M2b)

EP <- resid(M2b, type = "pearson")
ED <- resid(M2b, type = "deviance")
par(op)

mu <- predict(M2b, type = "response")
E <- RK$TOT.N - mu
EP2 <- E/sqrt(6.337209 * mu)

op <- par(mfrow=c(2,2))
plot(x = mu, y = E, main = "Response residuals")
plot(x = mu, y = EP, main = "Pearson residuals")
plot(x = mu, y = EP2, main = "Pearson residuals scaled")
plot(x = mu, y = ED, main = "Deviance residuals")

par(op)

Here we observe an anormal ditribution of the residuals. The most likely reason is an overdispersion problem. We will try to fix this with another kind of distribution: the negative binomial distribution.

Negative binomial distribution

This is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a number of specified failures occur.

1- Model creation

M6 <- glm.nb(TOT.N ~ OPEN.L + MONT.S + SQ.POLIC +
             SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
             SQ.DWATCOUR + D.PARK, link = "log", data = RK)
summary(M6, cor = FALSE)
## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, data = RK, 
##     link = "log", init.theta = 5.517795385)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.97265  -0.64463  -0.03337   0.62007   1.48627  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.951e+00  4.145e-01   9.532   <2e-16 ***
## OPEN.L      -9.419e-03  3.245e-03  -2.903   0.0037 ** 
## MONT.S       5.846e-02  3.481e-02   1.679   0.0931 .  
## SQ.POLIC    -4.618e-02  1.298e-01  -0.356   0.7221    
## SQ.SHRUB    -3.881e-01  2.883e-01  -1.346   0.1784    
## SQ.WATRES    1.631e-01  1.675e-01   0.974   0.3301    
## L.WAT.C      2.076e-01  9.636e-02   2.154   0.0312 *  
## SQ.LPROAD    5.944e-01  3.214e-01   1.850   0.0644 .  
## SQ.DWATCOUR -1.489e-05  1.139e-02  -0.001   0.9990    
## D.PARK      -1.235e-04  1.292e-05  -9.557   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(5.5178) family taken to be 1)
## 
##     Null deviance: 213.674  on 51  degrees of freedom
## Residual deviance:  51.803  on 42  degrees of freedom
## AIC: 390.11
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  5.52 
##           Std. Err.:  1.41 
## 
##  2 x log-likelihood:  -368.107

2- Variables selection

2.1. Step method

summary(step(M6, trace = 0))
## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, 
##     data = RK, init.theta = 4.979895134, link = "log")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -3.08218  -0.70494  -0.09268   0.55575   1.67860  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.032e+00  3.363e-01  11.989  < 2e-16 ***
## OPEN.L      -1.085e-02  3.122e-03  -3.475  0.00051 ***
## L.WAT.C      1.597e-01  7.852e-02   2.034  0.04195 *  
## SQ.LPROAD    4.924e-01  3.101e-01   1.588  0.11231    
## D.PARK      -1.154e-04  1.061e-05 -10.878  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
## 
##     Null deviance: 197.574  on 51  degrees of freedom
## Residual deviance:  51.329  on 47  degrees of freedom
## AIC: 383.54
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.98 
##           Std. Err.:  1.22 
## 
##  2 x log-likelihood:  -371.542
summary(stepAIC(M6, trace = 0))
## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, 
##     data = RK, init.theta = 4.979895134, link = "log")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -3.08218  -0.70494  -0.09268   0.55575   1.67860  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.032e+00  3.363e-01  11.989  < 2e-16 ***
## OPEN.L      -1.085e-02  3.122e-03  -3.475  0.00051 ***
## L.WAT.C      1.597e-01  7.852e-02   2.034  0.04195 *  
## SQ.LPROAD    4.924e-01  3.101e-01   1.588  0.11231    
## D.PARK      -1.154e-04  1.061e-05 -10.878  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
## 
##     Null deviance: 197.574  on 51  degrees of freedom
## Residual deviance:  51.329  on 47  degrees of freedom
## AIC: 383.54
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.98 
##           Std. Err.:  1.22 
## 
##  2 x log-likelihood:  -371.542

Both methods have choosen the variables OPEN.L, L.WAT.C, SQ.LPROAD y D.PARK. The variable SQ.LPROAD is not significant and L.WAT.C is slightly significant but it can be included.

2.2. Drop method

M7 <- glm.nb(TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, link = "log", data = RK)
drop1(M7, test = "Chi")
## Single term deletions
## 
## Model:
## TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK
##           Df Deviance    AIC     LRT  Pr(>Chi)    
## <none>         51.329 381.54                      
## OPEN.L     1   63.331 391.54  12.002 0.0005316 ***
## L.WAT.C    1   55.384 383.60   4.055 0.0440383 *  
## SQ.LPROAD  1   54.086 382.30   2.757 0.0968442 .  
## D.PARK     1  172.847 501.06 121.518 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We should eliminate both variables to improve our model.

2.3. Manual analysis

M7b <- glm.nb(TOT.N ~ OPEN.L + L.WAT.C + D.PARK,link = "log", data = RK)
M7c <- glm.nb(TOT.N ~ OPEN.L + D.PARK, link = "log", data = RK)
anova(M7b, M7c)
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: TOT.N
##                       Model    theta Resid. df    2 x log-lik.   Test
## 1           OPEN.L + D.PARK 4.132845        49       -379.4317       
## 2 OPEN.L + L.WAT.C + D.PARK 4.725770        48       -374.2536 1 vs 2
##      df LR stat.    Pr(Chi)
## 1                          
## 2     1 5.178081 0.02287358

3- Overdispersion

summary(M7c)
## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + D.PARK, data = RK, link = "log", 
##     init.theta = 4.13284466)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5880  -0.7878  -0.1676   0.3721   2.4369  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.6717034  0.1641768  28.455   <2e-16 ***
## OPEN.L      -0.0093591  0.0031952  -2.929   0.0034 ** 
## D.PARK      -0.0001119  0.0000113  -9.901   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.1328) family taken to be 1)
## 
##     Null deviance: 170.661  on 51  degrees of freedom
## Residual deviance:  51.839  on 49  degrees of freedom
## AIC: 387.43
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.133 
##           Std. Err.:  0.980 
## 
##  2 x log-likelihood:  -379.432
51.839/49
## [1] 1.057939

Close to 1, so it is not supposed to be a problem.

4- Validation

M8 <- glm.nb(TOT.N ~ OPEN.L + D.PARK, link = "log", data = RK)
summary(M8)
## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + D.PARK, data = RK, link = "log", 
##     init.theta = 4.13284466)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5880  -0.7878  -0.1676   0.3721   2.4369  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.6717034  0.1641768  28.455   <2e-16 ***
## OPEN.L      -0.0093591  0.0031952  -2.929   0.0034 ** 
## D.PARK      -0.0001119  0.0000113  -9.901   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.1328) family taken to be 1)
## 
##     Null deviance: 170.661  on 51  degrees of freedom
## Residual deviance:  51.839  on 49  degrees of freedom
## AIC: 387.43
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.133 
##           Std. Err.:  0.980 
## 
##  2 x log-likelihood:  -379.432
drop1(M8, test = "Chi")
## Single term deletions
## 
## Model:
## TOT.N ~ OPEN.L + D.PARK
##        Df Deviance    AIC     LRT  Pr(>Chi)    
## <none>      51.839 385.43                      
## OPEN.L  1   59.731 391.32   7.891  0.004967 ** 
## D.PARK  1  154.595 486.19 102.756 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both variables are significant.

par(mfrow = c(2, 2))
plot(M8)

There is no problem with the model.

5- Interpretation

The farther from the park (D.PARK), the lesser the number of roadkills (TOT.N). In addition to this, the open spaces (OPEN.L) are significantly the safest.