Exercise 1

Set working directory and read csv

setwd("/Users/annapeterson/Desktop/CLasses/GEOG6000/Lab05")
island = read.csv("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab05/island2.csv")

Code the incidence

island$incidence = factor(island$incidence,
                          levels = c(0,1),
                          labels = c("absence", "presence"))

Prepare data for boxplots

incidence = island$incidence
area = island$area
isolation = island$isolation
quality = island$quality

Attach the boxplots to several panels using mfrow( ) and print Going from left to right. It seems there is a presence of birds with more island area (positive). There’s a lack of birds in areas that are very isolated (negative). There seems to be nothing of notable difference (or relation) between the presence or absence of birds on quality. If it’s the quality of birds, I feel bad having them lumped into a quality category haha!

Center the data

c_area = area - mean(area)
c_isolation = isolation - mean(isolation)

Generalized linear model (glm) and summary

island_glm = glm(incidence ~ c_area + c_isolation,
                 family = binomial(link = "logit"))
summary(island_glm)
## 
## Call:
## glm(formula = incidence ~ c_area + c_isolation, family = binomial(link = "logit"))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8189  -0.3089   0.0490   0.3635   2.1192  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   1.1154     0.5877   1.898  0.05770 . 
## c_area        0.5807     0.2478   2.344  0.01909 * 
## c_isolation  -1.3719     0.4769  -2.877  0.00401 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.029  on 49  degrees of freedom
## Residual deviance: 28.402  on 47  degrees of freedom
## AIC: 34.402
## 
## Number of Fisher Scoring iterations: 6

The AIC is 34.402 and the p-values with respect to their z-value for area and isolation are 0.01909 and 0.00401, respectively. Based on the outline for the lab exercise, the coefficients are given in log-odds, so we can convert them using the exponential function [exp()] exp(coef(island_glm)). We can interpret this as a bird is likely to be present 3x for every absence based on the average isolation and area datasets.

Create a new dataframe where we recenter the data around 5 (area) and 6(isolation)

island_2 = data.frame(c_area = 5 - mean(area),
                      c_isolation = 6 - mean(isolation))

Then we can use the predict() to make a prediction using the new dataframe

predict(island_glm, newdata = island_2, type = "response", se.fit = TRUE)
## $fit
##         1 
## 0.7881208 
## 
## $se.fit
##         1 
## 0.1125028 
## 
## $residual.scale
## [1] 1

There is roughly a 78% chance the newest island will have our bird.


Exercise 2

Read in csv

trees = read.csv("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab05/tsuga.csv")

The headers for the tsuga.csv are plotID, date, plotsize, spcode, species, over, elev, tci, and streamdist. We can use this data to create a general linear model (glm()). We’ll want to model cover vs elev + streamdist where elev and streamdist are our explanatory variables.

trees_glm = glm(trees$cover ~ trees$elev + trees$streamdist,
                family = poisson(link = "log"))
summary(trees_glm)
## 
## Call:
## glm(formula = trees$cover ~ trees$elev + trees$streamdist, family = poisson(link = "log"))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.31395  -0.82155  -0.07929   0.71900   2.62316  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       1.622e+00  5.226e-02  31.047  < 2e-16 ***
## trees$elev        8.901e-05  5.653e-05   1.575    0.115    
## trees$streamdist -8.963e-04  1.173e-04  -7.641 2.15e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 748.23  on 744  degrees of freedom
## Residual deviance: 687.10  on 742  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 3150.2
## 
## Number of Fisher Scoring iterations: 4
exp(coef(trees_glm))
##      (Intercept)       trees$elev trees$streamdist 
##        5.0652901        1.0000890        0.9991041

The explanatory variables are only partially useful. The streamdist data has a low p-value, 2.15 x 10 -14, suggesting that it is a good predictor of where Hemlock will grow away from a water source. Elevation on the otherhand, was not. It has a p-value above 0.05. From the coefficients, we are able to glean that as the elevation increases by 1, the abundance of trees is reduced by 5.06 (5.07 * .999).