Set working directory and read csv
setwd("/Users/annapeterson/Desktop/CLasses/GEOG6000/Lab05")
island = read.csv("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab05/island2.csv")
Code the incidence
island$incidence = factor(island$incidence,
levels = c(0,1),
labels = c("absence", "presence"))
Prepare data for boxplots
incidence = island$incidence
area = island$area
isolation = island$isolation
quality = island$quality
Attach the boxplots to several panels using mfrow( ) and print
Going from left to right. It seems there is a presence of birds with
more island area (positive). There’s a lack of birds in areas that are
very isolated (negative). There seems to be nothing of notable
difference (or relation) between the presence or absence of birds on
quality. If it’s the quality of birds, I feel bad having them lumped
into a quality category haha!
Center the data
c_area = area - mean(area)
c_isolation = isolation - mean(isolation)
Generalized linear model (glm) and summary
island_glm = glm(incidence ~ c_area + c_isolation,
family = binomial(link = "logit"))
summary(island_glm)
##
## Call:
## glm(formula = incidence ~ c_area + c_isolation, family = binomial(link = "logit"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8189 -0.3089 0.0490 0.3635 2.1192
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.1154 0.5877 1.898 0.05770 .
## c_area 0.5807 0.2478 2.344 0.01909 *
## c_isolation -1.3719 0.4769 -2.877 0.00401 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.029 on 49 degrees of freedom
## Residual deviance: 28.402 on 47 degrees of freedom
## AIC: 34.402
##
## Number of Fisher Scoring iterations: 6
The AIC is 34.402 and the p-values with respect to
their z-value for area and isolation are 0.01909 and
0.00401, respectively. Based on the outline for the lab
exercise, the coefficients are given in log-odds, so we can convert them
using the exponential function [exp()] exp(coef(island_glm)). We can
interpret this as a bird is likely to be present 3x for
every absence based on the average isolation and area datasets.
Create a new dataframe where we recenter the data around 5 (area) and
6(isolation)
island_2 = data.frame(c_area = 5 - mean(area),
c_isolation = 6 - mean(isolation))
Then we can use the predict() to make a prediction using the new dataframe
predict(island_glm, newdata = island_2, type = "response", se.fit = TRUE)
## $fit
## 1
## 0.7881208
##
## $se.fit
## 1
## 0.1125028
##
## $residual.scale
## [1] 1
There is roughly a 78% chance the newest island will
have our bird.
Read in csv
trees = read.csv("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab05/tsuga.csv")
The headers for the tsuga.csv are plotID, date, plotsize, spcode, species, over, elev, tci, and streamdist. We can use this data to create a general linear model (glm()). We’ll want to model cover vs elev + streamdist where elev and streamdist are our explanatory variables.
trees_glm = glm(trees$cover ~ trees$elev + trees$streamdist,
family = poisson(link = "log"))
summary(trees_glm)
##
## Call:
## glm(formula = trees$cover ~ trees$elev + trees$streamdist, family = poisson(link = "log"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.31395 -0.82155 -0.07929 0.71900 2.62316
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.622e+00 5.226e-02 31.047 < 2e-16 ***
## trees$elev 8.901e-05 5.653e-05 1.575 0.115
## trees$streamdist -8.963e-04 1.173e-04 -7.641 2.15e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 748.23 on 744 degrees of freedom
## Residual deviance: 687.10 on 742 degrees of freedom
## (1 observation deleted due to missingness)
## AIC: 3150.2
##
## Number of Fisher Scoring iterations: 4
exp(coef(trees_glm))
## (Intercept) trees$elev trees$streamdist
## 5.0652901 1.0000890 0.9991041
The explanatory variables are only partially useful. The streamdist data has a low p-value, 2.15 x 10 -14, suggesting that it is a good predictor of where Hemlock will grow away from a water source. Elevation on the otherhand, was not. It has a p-value above 0.05. From the coefficients, we are able to glean that as the elevation increases by 1, the abundance of trees is reduced by 5.06 (5.07 * .999).