Exercise 1
Set working directory
setwd("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000")
Read in data
island2 = read.csv("../GEOG 6000/datafiles/island2.csv")
Build a model relating species presence to the island characteristics. Factor incidence from 0 and 1 to present and absent. 0 is absent and 1 is present
island2$incidence = factor(island2$incidence,
levels = c(0,1),
labels = c("absent", "present"))
island2
## incidence area isolation quality
## 1 present 7.928 3.317 3
## 2 absent 1.925 7.554 8
## 3 present 2.045 5.883 8
## 4 absent 4.781 5.932 2
## 5 absent 1.536 5.308 3
## 6 present 7.369 4.934 4
## 7 present 8.599 2.882 6
## 8 absent 2.422 8.771 7
## 9 present 6.403 6.092 3
## 10 present 7.199 6.977 3
## 11 absent 2.654 7.748 8
## 12 present 4.128 4.297 5
## 13 absent 4.175 8.516 3
## 14 present 7.098 3.318 2
## 15 absent 2.392 9.292 5
## 16 present 8.690 5.923 2
## 17 present 4.667 4.997 9
## 18 present 2.307 2.434 3
## 19 present 1.053 6.842 3
## 20 absent 0.657 6.452 2
## 21 present 5.632 2.506 7
## 22 present 5.895 6.613 6
## 23 absent 4.855 9.577 6
## 24 present 8.241 6.330 5
## 25 present 2.898 6.649 3
## 26 present 4.445 5.032 3
## 27 present 6.027 2.023 3
## 28 absent 1.574 5.737 1
## 29 absent 2.700 5.762 3
## 30 absent 3.106 6.924 9
## 31 absent 4.330 7.262 5
## 32 present 7.799 3.219 8
## 33 absent 4.630 9.229 2
## 34 present 6.951 5.841 7
## 35 absent 0.859 9.294 4
## 36 present 3.657 2.759 8
## 37 absent 2.696 8.342 8
## 38 absent 4.171 8.805 6
## 39 present 8.063 2.274 1
## 40 absent 0.153 5.650 7
## 41 present 1.948 5.447 3
## 42 present 2.006 5.480 1
## 43 present 6.508 4.007 3
## 44 present 1.501 5.400 9
## 45 present 9.269 5.272 5
## 46 present 4.170 4.786 5
## 47 absent 3.376 5.219 1
## 48 absent 2.228 7.990 1
## 49 absent 1.801 8.466 6
## 50 present 6.441 3.451 8
Center isolation and area
island2$isolation.cen = island2$isolation - mean(island2$isolation)
island2$area.cen = island2$area - mean(island2$area)
Make boxplots. Area vs Incidence
boxplot(area.cen ~ incidence, data = island2)
Isolation vs Incidence
boxplot(isolation.cen ~ incidence, data = island2)
Quality vs Incidence
boxplot(quality ~ incidence, data = island2)
Isolation and area are related to species incidence because there is high variation between absence and presence as shown in the boxplot. There is little to no variation between incidence and quality. Therefore, isolation and area are more related to the presence/absence of species than quality.
Build a generalized linear model of the presence of bird species as explained by island area and island isolation.
island2.glm1 = glm(incidence ~ area.cen + isolation.cen, data = island2, family = binomial(link = 'logit'))
Summarize
summary(island2.glm1)
##
## Call:
## glm(formula = incidence ~ area.cen + isolation.cen, family = binomial(link = "logit"),
## data = island2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8189 -0.3089 0.0490 0.3635 2.1192
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1154 0.5877 1.898 0.05770 .
## area.cen 0.5807 0.2478 2.344 0.01909 *
## isolation.cen -1.3719 0.4769 -2.877 0.00401 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.029 on 49 degrees of freedom
## Residual deviance: 28.402 on 47 degrees of freedom
## AIC: 34.402
##
## Number of Fisher Scoring iterations: 6
Get Coefficients
coef(island2.glm1)
## (Intercept) area.cen isolation.cen
## 1.1154028 0.5807241 -1.3719411
The Coefficients are Intercept= 1.1154028, area = 0.5807241 and isolation = -1.3719411.
Their significance is as follows: Intercept = 0.05770, area = 0.01909, isolation = 0.00401.
The AIC score = 34.402
Back to odd
exp(coef(island2.glm1))
## (Intercept) area.cen isolation.cen
## 3.0507967 1.7873322 0.2536142
Predict the probability of presence of the species on a new island with an area of 5 and an isolation distance of 6.
newisland2 = data.frame(isolation.cen = 6 - mean(island2$isolation), area.cen = 5 - mean(island2$area))
newislandpred = predict(island2.glm1, newdata = newisland2, type = 'response', se.fit = TRUE)
newislandpred
## $fit
## 1
## 0.7881208
##
## $se.fit
## 1
## 0.1125028
##
## $residual.scale
## [1] 1
predicted value = 0.7881208
standard error = 0.1125028
residual.scale = [1] 1
There is about a 78% probability of presence of the species on a new island with an area of 5 and an isolation distance of 6.
Exercise 2
Make a Poisson regression model of the abundance (‘cover’), using both distance to stream and elevation as explanatory variables.
Read in data
tsuga = read.csv("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/tsuga.csv")
Get column names
colnames(tsuga) ##use variables streamdist and elev
## [1] "plotID" "date" "plotsize" "spcode" "species"
## [6] "cover" "elev" "tci" "streamdist"
Make a Poisson regression model of the abundance (‘cover’), using both distance to stream and elevation as explanatory variables.
tsuga.glm = glm(cover ~ streamdist + elev, data = tsuga, family = poisson(link = 'log'))
Summarize model
summary(tsuga.glm)
##
## Call:
## glm(formula = cover ~ streamdist + elev, family = poisson(link = "log"),
## data = tsuga)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.31395 -0.82155 -0.07929 0.71900 2.62316
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.622e+00 5.226e-02 31.047 < 2e-16 ***
## streamdist -8.963e-04 1.173e-04 -7.641 2.15e-14 ***
## elev 8.901e-05 5.653e-05 1.575 0.115
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 748.23 on 744 degrees of freedom
## Residual deviance: 687.10 on 742 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 3150.2
##
## Number of Fisher Scoring iterations: 4
Get coefficients
coef(tsuga.glm)
## (Intercept) streamdist elev
## 1.622411e+00 -8.963237e-04 8.901257e-05
Coefficients: Intercept= 1.622411e+00, streamdist= -8.963237e-04, elev = 8.901257e-05
Significance: Pr(>|z|) : Intercept = < 2e-16 ***, streamdist = 2.15e-14, elev = 0.115
AIC: The AIC = 3150.2
Transform coeff to non log scale
exp(coef(tsuga.glm))
## (Intercept) streamdist elev
## 5.0652901 0.9991041 1.0000890
Intercept= 5.0652901, streamdist= 0.9991041, elev= 1.0000890
The explanatory variables are useful for the model as the sign of the coefficients tells us there is a positive relationship between hemlock abundance and elevation and a negative relationship between hemlock abundance and stream distance. So, as the elevation increases, abundance increases, and as stream distance increases, abundance decreases.