Poisson distribution model

To create a model about the number of times an event occurs in an interval of time or space (like number of people, number of cases, etcetera) we need a special distribution called Poisson.

This time we will use this kind of distribution to create a model in which we fit the total number of roadkills of amphibious along a roadway of Portugal (Zuur et al. 2013). We want to see the relationship between the distance to a local park and the total number of dead amphibious, being the distance to that park (D.PARK) the explanatory variable.

Sector	X	Y	BufoCalamita	TOT.N	S.RICH	OPEN.L	OLIVE	MONT.S	MONT	POLIC	SHRUB	URBAN	WAT.RES	L.WAT.C	L.D.ROAD	L.P.ROAD	D.WAT.RES	D.WAT.COUR	D.PARK	N.PATCH	P.EDGE	L.SDI
1	260181	256546	5	22	3	22.684	60.333	0.000	0.653	4.811	0.406	7.787	0.043	0.583	3330.189	1.975	252.113	735.000	250.214	122	553.936	1.801
2	259914	256124	1	14	4	24.657	40.832	0.000	0.161	2.224	0.735	27.150	0.182	1.419	2587.498	1.761	139.573	134.052	741.179	96	457.142	1.886
3	259672	255688	40	65	6	30.121	23.710	0.258	10.918	1.946	0.474	28.086	0.453	2.005	2149.651	1.250	59.168	269.029	1240.080	67	432.360	1.930
4	259454	255238	27	55	5	50.277	14.940	1.783	26.454	0.625	0.607	0.831	0.026	1.924	4222.983	0.666	277.842	48.751	1739.885	63	421.292	1.865
5	259307	254763	67	88	4	43.609	35.353	2.431	11.330	0.791	0.173	2.452	0.000	2.167	2219.302	0.653	967.808	126.102	2232.130	59	407.573	1.818
6	259189	254277	56	104	7	31.385	17.666	0.000	43.678	0.054	0.325	2.730	0.039	2.391	1005.629	1.309	560.000	344.444	2724.089	49	420.289	1.799
7	259092	253786	27	49	7	24.810	9.786	0.000	60.660	0.022	0.055	1.001	0.114	1.165	3735.189	0.685	93.147	95.133	3215.511	35	298.617	1.593
8	258993	253296	37	66	7	56.228	1.619	0.000	25.280	11.263	0.092	0.083	0.224	2.428	4891.466	0.677	88.971	243.230	3709.401	55	442.750	1.627
9	258880	252809	8	26	7	48.735	0.603	1.108	41.722	1.238	1.744	0.185	0.177	2.416	4010.961	0.664	684.887	187.084	4206.477	52	431.084	1.801
10	258767	252322	16	47	6	15.633	0.000	0.000	82.069	0.119	0.000	0.361	0.000	0.211	2885.246	0.654	794.000	236.004	4704.176	26	220.231	1.440

ggplot(RK, aes(x = D.PARK, y = TOT.N)) +
  geom_point() + labs(title = "Roadkills vs. Distance to the park") + theme_minimal()

As we can see in the scatter plot, the data shows a Poisson distribution. This kind of distribution is hard to fit to a classical regression line, since the residuals will be too high. We need to create a GLM model.

1. Creation of the GLM model

M1 <- glm(TOT.N ~ D.PARK, family = poisson, data = RK) 
summary(M1)

## 
## Call:
## glm(formula = TOT.N ~ D.PARK, family = poisson, data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -8.1100  -1.6950  -0.4708   1.4206   7.3337  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.316e+00  4.322e-02   99.87   <2e-16 ***
## D.PARK      -1.059e-04  4.387e-06  -24.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.4  on 51  degrees of freedom
## Residual deviance:  390.9  on 50  degrees of freedom
## AIC: 634.29
## 
## Number of Fisher Scoring iterations: 4

# Pseudo R squared of the model
100 * (1071.4 - 390.9)/ 1071.4

## [1] 63.51503

The slope and the intercept of the model are statistically significants (p < 0.001). The AIC value is 634.29 (we’ll need it to compare it with differents models in the future). We can explain a 63% of the variance with our GLM model, which is a reasonable amount of information.

2. Plotting the model

MyData <- data.frame(D.PARK = seq(from = 0, to = 25000, by = 1000))
G <- predict(M1, newdata = MyData, type = "link", se = TRUE)
Fi <- exp(G$fit)

FSEUP <- exp(G$fit + 1.96 * G$se.fit)
FSELOW <- exp(G$fit - 1.96 * G$se.fit)
plot(RK$D.PARK, RK$TOT.N, 
     xlab = "Distance to the park", ylab = "Killroads", main = "Killroads vs. Distance to the park")
lines(MyData$D.PARK, Fi,lty = 1)
lines(MyData$D.PARK, FSEUP, lty = 2, col = "red")
lines(MyData$D.PARK, FSELOW, lty = 2, col = "red")

3. Variable selection

First of all, we have to transform the variables which has outliers. We will use a square transformation. Then, we will create a GLM model with a Poisson distribution without interactions

RK$SQ.POLIC <- sqrt(RK$POLIC)
RK$SQ.WATRES <- sqrt(RK$WAT.RES)
RK$SQ.LPROAD <- sqrt(RK$L.P.ROAD)
RK$SQ.SHRUB <- sqrt(RK$SHRUB)
RK$SQ.DWATCOUR <- sqrt(RK$D.WAT.COUR)

M2 <- glm(TOT.N ~ OPEN.L + MONT.S + SQ.POLIC +
          SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
          SQ.DWATCOUR + D.PARK, family = poisson, data = RK)
summary(M2)

## 
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, family = poisson, 
##     data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.8398  -1.3965  -0.1409   1.4641   4.3749  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.749e+00  1.567e-01  23.935  < 2e-16 ***
## OPEN.L      -3.025e-03  1.580e-03  -1.915 0.055531 .  
## MONT.S       8.697e-02  1.359e-02   6.398 1.57e-10 ***
## SQ.POLIC    -1.787e-01  4.676e-02  -3.822 0.000133 ***
## SQ.SHRUB    -6.112e-01  1.176e-01  -5.197 2.02e-07 ***
## SQ.WATRES    2.243e-01  7.050e-02   3.181 0.001468 ** 
## L.WAT.C      3.355e-01  4.127e-02   8.128 4.36e-16 ***
## SQ.LPROAD    4.517e-01  1.348e-01   3.351 0.000804 ***
## SQ.DWATCOUR  7.355e-03  4.879e-03   1.508 0.131629    
## D.PARK      -1.301e-04  5.936e-06 -21.923  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.44  on 51  degrees of freedom
## Residual deviance:  270.23  on 42  degrees of freedom
## AIC: 529.62
## 
## Number of Fisher Scoring iterations: 5

There are no significant variables. In order to decide which variables we have to eliminate we will use two kinds of selections: 1. Step method using both forward and step methods 2. Manual analysis

3.1. Step method

Forward

summary(step(M2, trace = 0))

## 
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, family = poisson, 
##     data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.8398  -1.3965  -0.1409   1.4641   4.3749  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.749e+00  1.567e-01  23.935  < 2e-16 ***
## OPEN.L      -3.025e-03  1.580e-03  -1.915 0.055531 .  
## MONT.S       8.697e-02  1.359e-02   6.398 1.57e-10 ***
## SQ.POLIC    -1.787e-01  4.676e-02  -3.822 0.000133 ***
## SQ.SHRUB    -6.112e-01  1.176e-01  -5.197 2.02e-07 ***
## SQ.WATRES    2.243e-01  7.050e-02   3.181 0.001468 ** 
## L.WAT.C      3.355e-01  4.127e-02   8.128 4.36e-16 ***
## SQ.LPROAD    4.517e-01  1.348e-01   3.351 0.000804 ***
## SQ.DWATCOUR  7.355e-03  4.879e-03   1.508 0.131629    
## D.PARK      -1.301e-04  5.936e-06 -21.923  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.44  on 51  degrees of freedom
## Residual deviance:  270.23  on 42  degrees of freedom
## AIC: 529.62
## 
## Number of Fisher Scoring iterations: 5

There are some variables which are not significant. Eliminating them have to be considered.

Drop

drop1(M2, test = "Chi")

## Single term deletions
## 
## Model:
## TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR + D.PARK
##             Df Deviance     AIC    LRT  Pr(>Chi)    
## <none>           270.23  529.62                     
## OPEN.L       1   273.93  531.32   3.69 0.0546474 .  
## MONT.S       1   306.89  564.28  36.66 1.410e-09 ***
## SQ.POLIC     1   285.53  542.92  15.30 9.181e-05 ***
## SQ.SHRUB     1   298.31  555.70  28.08 1.167e-07 ***
## SQ.WATRES    1   280.02  537.41   9.79 0.0017539 ** 
## L.WAT.C      1   335.47  592.86  65.23 6.648e-16 ***
## SQ.LPROAD    1   281.25  538.64  11.02 0.0009009 ***
## SQ.DWATCOUR  1   272.50  529.89   2.27 0.1319862    
## D.PARK       1   838.09 1095.48 567.85 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We should eliminate the SQ.DWATCOUR variable at least. Now we gonna test our model without those variables and see how it improved.

M2b<-glm(TOT.N~OPEN.L + MONT.S + SQ.POLIC +
           SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD+
           D.PARK, family = poisson, data = RK)
summary(M2b)

## 
## Call:
## glm(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + D.PARK, family = poisson, 
##     data = RK)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -7.1443  -1.4118  -0.3607   1.3934   4.3559  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.852e+00  1.405e-01  27.405  < 2e-16 ***
## OPEN.L      -3.464e-03  1.544e-03  -2.243 0.024890 *  
## MONT.S       8.927e-02  1.344e-02   6.643 3.08e-11 ***
## SQ.POLIC    -1.583e-01  4.492e-02  -3.525 0.000424 ***
## SQ.SHRUB    -6.160e-01  1.185e-01  -5.200 2.00e-07 ***
## SQ.WATRES    2.080e-01  6.917e-02   3.006 0.002645 ** 
## L.WAT.C      3.113e-01  3.817e-02   8.155 3.50e-16 ***
## SQ.LPROAD    4.936e-01  1.315e-01   3.753 0.000175 ***
## D.PARK      -1.286e-04  5.837e-06 -22.033  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1071.4  on 51  degrees of freedom
## Residual deviance:  272.5  on 43  degrees of freedom
## AIC: 529.89
## 
## Number of Fisher Scoring iterations: 5

Now all the variables are statiscally significants.

3.2. Manual analysis

anova(M2, M2b, test = "Chi")

## Analysis of Deviance Table
## 
## Model 1: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Model 2: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + D.PARK
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        42     270.23                     
## 2        43     272.50 -1   -2.269    0.132

There are no significant differences between the two models (p = 0.132, Dif deviance = 2.269). However, we will continue our analysis with the model without the SQ.DWATCOUR variable, since this is the most parsimonious model.

M3 <- glm(TOT.N ~ MONT.S + SQ.POLIC + D.PARK +
            SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
            SQ.DWATCOUR, family = poisson, data = RK)
anova(M2, M3, test = "Chi")

## Analysis of Deviance Table
## 
## Model 1: TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR + D.PARK
## Model 2: TOT.N ~ MONT.S + SQ.POLIC + D.PARK + SQ.SHRUB + SQ.WATRES + L.WAT.C + 
##     SQ.LPROAD + SQ.DWATCOUR
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1        42     270.23                       
## 2        43     273.93 -1  -3.6928  0.05465 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

4- Overdispersion

This problem happens when the variance is greater than the mean. A value greater than 1 would mean a problem.

270.23/42

## [1] 6.434048

The model has a problem of overdispersion. We should check the assumptions of the model to see whether something is wrong or not.

5- Validation

op <- par(mfrow = c(2, 2))
plot(M2b)

EP <- resid(M2b, type = "pearson")
ED <- resid(M2b, type = "deviance")
par(op)

mu <- predict(M2b, type = "response")
E <- RK$TOT.N - mu
EP2 <- E/sqrt(6.337209 * mu)

op <- par(mfrow=c(2,2))
plot(x = mu, y = E, main = "Response residuals")
plot(x = mu, y = EP, main = "Pearson residuals")
plot(x = mu, y = EP2, main = "Pearson residuals scaled")
plot(x = mu, y = ED, main = "Deviance residuals")

par(op)

Here we observe an anormal ditribution of the residuals. The most likely reason is an overdispersion problem. We will try to fix this with another kind of distribution: the negative binomial distribution.

Negative binomial distribution

This is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a number of specified failures occur.

1- Model creation

M6 <- glm.nb(TOT.N ~ OPEN.L + MONT.S + SQ.POLIC +
             SQ.SHRUB + SQ.WATRES + L.WAT.C + SQ.LPROAD +
             SQ.DWATCOUR + D.PARK, link = "log", data = RK)
summary(M6, cor = FALSE)

## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + MONT.S + SQ.POLIC + SQ.SHRUB + 
##     SQ.WATRES + L.WAT.C + SQ.LPROAD + SQ.DWATCOUR + D.PARK, data = RK, 
##     link = "log", init.theta = 5.517795385)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.97265  -0.64463  -0.03337   0.62007   1.48627  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.951e+00  4.145e-01   9.532   <2e-16 ***
## OPEN.L      -9.419e-03  3.245e-03  -2.903   0.0037 ** 
## MONT.S       5.846e-02  3.481e-02   1.679   0.0931 .  
## SQ.POLIC    -4.618e-02  1.298e-01  -0.356   0.7221    
## SQ.SHRUB    -3.881e-01  2.883e-01  -1.346   0.1784    
## SQ.WATRES    1.631e-01  1.675e-01   0.974   0.3301    
## L.WAT.C      2.076e-01  9.636e-02   2.154   0.0312 *  
## SQ.LPROAD    5.944e-01  3.214e-01   1.850   0.0644 .  
## SQ.DWATCOUR -1.489e-05  1.139e-02  -0.001   0.9990    
## D.PARK      -1.235e-04  1.292e-05  -9.557   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(5.5178) family taken to be 1)
## 
##     Null deviance: 213.674  on 51  degrees of freedom
## Residual deviance:  51.803  on 42  degrees of freedom
## AIC: 390.11
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  5.52 
##           Std. Err.:  1.41 
## 
##  2 x log-likelihood:  -368.107

2- Variables selection

2.1. Step method

summary(step(M6, trace = 0))

## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, 
##     data = RK, init.theta = 4.979895134, link = "log")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -3.08218  -0.70494  -0.09268   0.55575   1.67860  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.032e+00  3.363e-01  11.989  < 2e-16 ***
## OPEN.L      -1.085e-02  3.122e-03  -3.475  0.00051 ***
## L.WAT.C      1.597e-01  7.852e-02   2.034  0.04195 *  
## SQ.LPROAD    4.924e-01  3.101e-01   1.588  0.11231    
## D.PARK      -1.154e-04  1.061e-05 -10.878  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
## 
##     Null deviance: 197.574  on 51  degrees of freedom
## Residual deviance:  51.329  on 47  degrees of freedom
## AIC: 383.54
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.98 
##           Std. Err.:  1.22 
## 
##  2 x log-likelihood:  -371.542

summary(stepAIC(M6, trace = 0))

## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, 
##     data = RK, init.theta = 4.979895134, link = "log")
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -3.08218  -0.70494  -0.09268   0.55575   1.67860  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.032e+00  3.363e-01  11.989  < 2e-16 ***
## OPEN.L      -1.085e-02  3.122e-03  -3.475  0.00051 ***
## L.WAT.C      1.597e-01  7.852e-02   2.034  0.04195 *  
## SQ.LPROAD    4.924e-01  3.101e-01   1.588  0.11231    
## D.PARK      -1.154e-04  1.061e-05 -10.878  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.9799) family taken to be 1)
## 
##     Null deviance: 197.574  on 51  degrees of freedom
## Residual deviance:  51.329  on 47  degrees of freedom
## AIC: 383.54
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.98 
##           Std. Err.:  1.22 
## 
##  2 x log-likelihood:  -371.542

Both methods have choosen the variables OPEN.L, L.WAT.C, SQ.LPROAD y D.PARK. The variable SQ.LPROAD is not significant and L.WAT.C is slightly significant but it can be included.

2.2. Drop method

M7 <- glm.nb(TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK, link = "log", data = RK)
drop1(M7, test = "Chi")

## Single term deletions
## 
## Model:
## TOT.N ~ OPEN.L + L.WAT.C + SQ.LPROAD + D.PARK
##           Df Deviance    AIC     LRT  Pr(>Chi)    
## <none>         51.329 381.54                      
## OPEN.L     1   63.331 391.54  12.002 0.0005316 ***
## L.WAT.C    1   55.384 383.60   4.055 0.0440383 *  
## SQ.LPROAD  1   54.086 382.30   2.757 0.0968442 .  
## D.PARK     1  172.847 501.06 121.518 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We should eliminate both variables to improve our model.

2.3. Manual analysis

M7b <- glm.nb(TOT.N ~ OPEN.L + L.WAT.C + D.PARK,link = "log", data = RK)
M7c <- glm.nb(TOT.N ~ OPEN.L + D.PARK, link = "log", data = RK)
anova(M7b, M7c)

## Likelihood ratio tests of Negative Binomial Models
## 
## Response: TOT.N
##                       Model    theta Resid. df    2 x log-lik.   Test
## 1           OPEN.L + D.PARK 4.132845        49       -379.4317       
## 2 OPEN.L + L.WAT.C + D.PARK 4.725770        48       -374.2536 1 vs 2
##      df LR stat.    Pr(Chi)
## 1                          
## 2     1 5.178081 0.02287358

3- Overdispersion

summary(M7c)

## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + D.PARK, data = RK, link = "log", 
##     init.theta = 4.13284466)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5880  -0.7878  -0.1676   0.3721   2.4369  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.6717034  0.1641768  28.455   <2e-16 ***
## OPEN.L      -0.0093591  0.0031952  -2.929   0.0034 ** 
## D.PARK      -0.0001119  0.0000113  -9.901   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.1328) family taken to be 1)
## 
##     Null deviance: 170.661  on 51  degrees of freedom
## Residual deviance:  51.839  on 49  degrees of freedom
## AIC: 387.43
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.133 
##           Std. Err.:  0.980 
## 
##  2 x log-likelihood:  -379.432

51.839/49

## [1] 1.057939

Close to 1, so it is not supposed to be a problem.

4- Validation

M8 <- glm.nb(TOT.N ~ OPEN.L + D.PARK, link = "log", data = RK)
summary(M8)

## 
## Call:
## glm.nb(formula = TOT.N ~ OPEN.L + D.PARK, data = RK, link = "log", 
##     init.theta = 4.13284466)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.5880  -0.7878  -0.1676   0.3721   2.4369  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.6717034  0.1641768  28.455   <2e-16 ***
## OPEN.L      -0.0093591  0.0031952  -2.929   0.0034 ** 
## D.PARK      -0.0001119  0.0000113  -9.901   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(4.1328) family taken to be 1)
## 
##     Null deviance: 170.661  on 51  degrees of freedom
## Residual deviance:  51.839  on 49  degrees of freedom
## AIC: 387.43
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  4.133 
##           Std. Err.:  0.980 
## 
##  2 x log-likelihood:  -379.432

drop1(M8, test = "Chi")

## Single term deletions
## 
## Model:
## TOT.N ~ OPEN.L + D.PARK
##        Df Deviance    AIC     LRT  Pr(>Chi)    
## <none>      51.839 385.43                      
## OPEN.L  1   59.731 391.32   7.891  0.004967 ** 
## D.PARK  1  154.595 486.19 102.756 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Both variables are significant.

par(mfrow = c(2, 2))
plot(M8)

There is no problem with the model.

5- Interpretation

The farther from the park (D.PARK), the lesser the number of roadkills (TOT.N). In addition to this, the open spaces (OPEN.L) are significantly the safest.

Generalized Linear Model (GLM)

David Valls Lanàquera