Exercise 1

Set working directory

setwd("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000")

Read in data

island2 = read.csv("../GEOG 6000/datafiles/island2.csv")

Build a model relating species presence to the island characteristics. Factor incidence from 0 and 1 to present and absent. 0 is absent and 1 is present

island2$incidence = factor(island2$incidence,
levels = c(0,1),
labels = c("absent", "present"))

island2
##    incidence  area isolation quality
## 1    present 7.928     3.317       3
## 2     absent 1.925     7.554       8
## 3    present 2.045     5.883       8
## 4     absent 4.781     5.932       2
## 5     absent 1.536     5.308       3
## 6    present 7.369     4.934       4
## 7    present 8.599     2.882       6
## 8     absent 2.422     8.771       7
## 9    present 6.403     6.092       3
## 10   present 7.199     6.977       3
## 11    absent 2.654     7.748       8
## 12   present 4.128     4.297       5
## 13    absent 4.175     8.516       3
## 14   present 7.098     3.318       2
## 15    absent 2.392     9.292       5
## 16   present 8.690     5.923       2
## 17   present 4.667     4.997       9
## 18   present 2.307     2.434       3
## 19   present 1.053     6.842       3
## 20    absent 0.657     6.452       2
## 21   present 5.632     2.506       7
## 22   present 5.895     6.613       6
## 23    absent 4.855     9.577       6
## 24   present 8.241     6.330       5
## 25   present 2.898     6.649       3
## 26   present 4.445     5.032       3
## 27   present 6.027     2.023       3
## 28    absent 1.574     5.737       1
## 29    absent 2.700     5.762       3
## 30    absent 3.106     6.924       9
## 31    absent 4.330     7.262       5
## 32   present 7.799     3.219       8
## 33    absent 4.630     9.229       2
## 34   present 6.951     5.841       7
## 35    absent 0.859     9.294       4
## 36   present 3.657     2.759       8
## 37    absent 2.696     8.342       8
## 38    absent 4.171     8.805       6
## 39   present 8.063     2.274       1
## 40    absent 0.153     5.650       7
## 41   present 1.948     5.447       3
## 42   present 2.006     5.480       1
## 43   present 6.508     4.007       3
## 44   present 1.501     5.400       9
## 45   present 9.269     5.272       5
## 46   present 4.170     4.786       5
## 47    absent 3.376     5.219       1
## 48    absent 2.228     7.990       1
## 49    absent 1.801     8.466       6
## 50   present 6.441     3.451       8

Center isolation and area

island2$isolation.cen = island2$isolation - mean(island2$isolation)
island2$area.cen = island2$area - mean(island2$area)

Make boxplots. Area vs Incidence

boxplot(area.cen ~ incidence, data = island2)

Isolation vs Incidence

boxplot(isolation.cen ~ incidence, data = island2)

Quality vs Incidence

boxplot(quality ~ incidence, data = island2)

Isolation and area are related to species incidence because there is high variation between absence and presence as shown in the boxplot. There is little to no variation between incidence and quality. Therefore, isolation and area are more related to the presence/absence of species than quality.

Build a generalized linear model of the presence of bird species as explained by island area and island isolation.

island2.glm1 = glm(incidence ~ area.cen + isolation.cen, data = island2, family = binomial(link = 'logit'))

Summarize

summary(island2.glm1)
## 
## Call:
## glm(formula = incidence ~ area.cen + isolation.cen, family = binomial(link = "logit"), 
##     data = island2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8189  -0.3089   0.0490   0.3635   2.1192  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)     1.1154     0.5877   1.898  0.05770 . 
## area.cen        0.5807     0.2478   2.344  0.01909 * 
## isolation.cen  -1.3719     0.4769  -2.877  0.00401 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.029  on 49  degrees of freedom
## Residual deviance: 28.402  on 47  degrees of freedom
## AIC: 34.402
## 
## Number of Fisher Scoring iterations: 6

Get Coefficients

coef(island2.glm1)
##   (Intercept)      area.cen isolation.cen 
##     1.1154028     0.5807241    -1.3719411

The Coefficients are Intercept= 1.1154028, area = 0.5807241 and isolation = -1.3719411.

Their significance is as follows: Intercept = 0.05770, area = 0.01909, isolation = 0.00401.

The AIC score = 34.402

Back to odd

exp(coef(island2.glm1))
##   (Intercept)      area.cen isolation.cen 
##     3.0507967     1.7873322     0.2536142

Predict the probability of presence of the species on a new island with an area of 5 and an isolation distance of 6.

newisland2 = data.frame(isolation.cen = 6 - mean(island2$isolation), area.cen = 5 - mean(island2$area))

newislandpred = predict(island2.glm1, newdata = newisland2, type = 'response', se.fit = TRUE)

newislandpred
## $fit
##         1 
## 0.7881208 
## 
## $se.fit
##         1 
## 0.1125028 
## 
## $residual.scale
## [1] 1

predicted value = 0.7881208

standard error = 0.1125028

residual.scale = [1] 1

There is about a 78% probability of presence of the species on a new island with an area of 5 and an isolation distance of 6.

Exercise 2

Make a Poisson regression model of the abundance (‘cover’), using both distance to stream and elevation as explanatory variables.

Read in data

tsuga = read.csv("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/tsuga.csv")

Get column names

colnames(tsuga) ##use variables streamdist and elev
## [1] "plotID"     "date"       "plotsize"   "spcode"     "species"   
## [6] "cover"      "elev"       "tci"        "streamdist"

Make a Poisson regression model of the abundance (‘cover’), using both distance to stream and elevation as explanatory variables.

tsuga.glm = glm(cover ~ streamdist + elev, data = tsuga, family = poisson(link = 'log'))

Summarize model

summary(tsuga.glm)
## 
## Call:
## glm(formula = cover ~ streamdist + elev, family = poisson(link = "log"), 
##     data = tsuga)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.31395  -0.82155  -0.07929   0.71900   2.62316  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.622e+00  5.226e-02  31.047  < 2e-16 ***
## streamdist  -8.963e-04  1.173e-04  -7.641 2.15e-14 ***
## elev         8.901e-05  5.653e-05   1.575    0.115    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 748.23  on 744  degrees of freedom
## Residual deviance: 687.10  on 742  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 3150.2
## 
## Number of Fisher Scoring iterations: 4

Get coefficients

coef(tsuga.glm)
##   (Intercept)    streamdist          elev 
##  1.622411e+00 -8.963237e-04  8.901257e-05

Coefficients: Intercept= 1.622411e+00, streamdist= -8.963237e-04, elev = 8.901257e-05

Significance: Pr(>|z|) : Intercept = < 2e-16 ***, streamdist = 2.15e-14, elev = 0.115

AIC: The AIC = 3150.2

Transform coeff to non log scale

exp(coef(tsuga.glm))
## (Intercept)  streamdist        elev 
##   5.0652901   0.9991041   1.0000890

Intercept= 5.0652901, streamdist= 0.9991041, elev= 1.0000890

The explanatory variables are useful for the model as the sign of the coefficients tells us there is a positive relationship between hemlock abundance and elevation and a negative relationship between hemlock abundance and stream distance. So, as the elevation increases, abundance increases, and as stream distance increases, abundance decreases.