Part 1: Build model relating species presence to island characteristics

Build box plots with the ‘incidence’ variable
library(ggplot2)
## Warning in file(con, "r"): cannot open file '/var/db/timezone/zoneinfo/
## +VERSION': No such file or directory
island <- read.csv("island2.csv")
island$incidence = factor(island$incidence, levels = c(0,1), 
                          labels = c("absent", "present"))

ggplot(island, aes(x=area, y=incidence))+geom_boxplot()+coord_flip()

ggplot(island, aes(x=isolation, y=incidence)) + geom_boxplot()+coord_flip()

Variables areas and isolation appear to have a relationship with the presence/absence of a species.

Build a generalized linear model
island.glm = glm(incidence ~ area + isolation, data=island,
                 family=binomial(link='logit'))
summary(island.glm)
## 
## Call:
## glm(formula = incidence ~ area + isolation, family = binomial(link = "logit"), 
##     data = island)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8189  -0.3089   0.0490   0.3635   2.1192  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   6.6417     2.9218   2.273  0.02302 * 
## area          0.5807     0.2478   2.344  0.01909 * 
## isolation    -1.3719     0.4769  -2.877  0.00401 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.029  on 49  degrees of freedom
## Residual deviance: 28.402  on 47  degrees of freedom
## AIC: 34.402
## 
## Number of Fisher Scoring iterations: 6
exp(coef(island.glm))
## (Intercept)        area   isolation 
## 766.3669575   1.7873322   0.2536142
Predict probability of presence with an area of 5 and isolation distance of 6
newisland = data.frame(area=5, isolation=6)
predict(island.glm, newdata=newisland, type='response', se.fit=TRUE)
## $fit
##         1 
## 0.7881208 
## 
## $se.fit
##         1 
## 0.1125028 
## 
## $residual.scale
## [1] 1

The predicted value is 0.788 and the standard error is 0.113

Part 2: Build Poisson regression model of Hemlock trees

hemlock = read.csv("tsuga.csv")
hemlock.glm = glm(cover ~ elev + streamdist, data=hemlock,
                  family=poisson(link='log'))
summary(hemlock.glm)
## 
## Call:
## glm(formula = cover ~ elev + streamdist, family = poisson(link = "log"), 
##     data = hemlock)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.31395  -0.82155  -0.07929   0.71900   2.62316  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.622e+00  5.226e-02  31.047  < 2e-16 ***
## elev         8.901e-05  5.653e-05   1.575    0.115    
## streamdist  -8.963e-04  1.173e-04  -7.641 2.15e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 748.23  on 744  degrees of freedom
## Residual deviance: 687.10  on 742  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 3150.2
## 
## Number of Fisher Scoring iterations: 4
Transform coefficients to original scale
exp(coef(hemlock.glm))
## (Intercept)        elev  streamdist 
##   5.0652901   1.0000890   0.9991041

The relationship is significant but not that negative. As the distance from a stream increases it leads to a 0.99 decrease in expected hemlock trees.