Get to know the Fletcher data

This exercise has been modified from a similar one in the Fletcher and Fortin (2018) textbook.

Five-lined Skink The goal of this activity is to determine if forest cover is correlated with the occurrence of a reptile species, the five-lined skink, and if the relationship between forest cover and the occurrence of five-lined skinks changes with spatial extent. Please take a moment to read about this species: https://srelherp.uga.edu/lizards/eumine.htm

We will create different sized buffers (extents) around point locations where surveys for the reptile took place, quantify the amount of forest within those buffers, and use logistic regression (general linear models - glm) to test the hypothesis that forest cover predicts this species’ occurrence at different spatial extents.

We first need to get the data to do the activity:

Recall that all data for class exercises are in the shared Google Drive folder. Download the following five items to a folder on your computer and then create a new R project for this assignment in that same folder: 1. NLCD: “reptile_NLCD_2019.tif” and "reptile_NLCD_2019.tif.aux* + These two files are the NLCD raster and the auxilary data that contains information about the categories/land cover classes 2. States: “FL_GA_AL” (folder) + state lines as a shapefile 3. Points: “reptiledata” (folder) + point data for the location of reptile surveys as a shapefile 4. Picture: “flsk_pic” + picture of a five-lined skink for the introductory section of the Rmd 5. Occupancy data: reptiles_flsk.csv + spreadsheet of whether a five-lined skink was detected at each reptile survey point.

Pull in the NLCD data

Assess its CRS.

Name the CRS so that we may use it in the future to reproject other data, if needed

Plot it

Get basic information about the nlcd file such as its resolution, extent and number of unique land cover values

#add your code here
nlcd=rast("reptile_NLCD_2019.tif")
crs(nlcd)
## [1] "PROJCRS[\"Albers Conical Equal Area\",\n    BASEGEOGCRS[\"WGS 84\",\n        DATUM[\"World Geodetic System 1984\",\n            ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4326]],\n    CONVERSION[\"Albers Equal Area\",\n        METHOD[\"Albers Equal Area\",\n            ID[\"EPSG\",9822]],\n        PARAMETER[\"Latitude of false origin\",23,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8821]],\n        PARAMETER[\"Longitude of false origin\",-96,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8822]],\n        PARAMETER[\"Latitude of 1st standard parallel\",29.5,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8823]],\n        PARAMETER[\"Latitude of 2nd standard parallel\",45.5,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8824]],\n        PARAMETER[\"Easting at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8826]],\n        PARAMETER[\"Northing at false origin\",0,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8827]]],\n    CS[Cartesian,2],\n        AXIS[\"easting\",east,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]],\n        AXIS[\"northing\",north,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1,\n                ID[\"EPSG\",9001]]]]"
res(nlcd)
## [1] 30 30
ext(nlcd)
## SpatExtent : 702645, 1597755, 263955, 1404855 (xmin, xmax, ymin, ymax)
unique(nlcd)
##           NLCD Land Cover Class
## 1                  Unclassified
## 2                    Open Water
## 3         Developed, Open Space
## 4      Developed, Low Intensity
## 5   Developed, Medium Intensity
## 6     Developed, High Intensity
## 7                   Barren Land
## 8              Deciduous Forest
## 9              Evergreen Forest
## 10                 Mixed Forest
## 11                  Shrub/Scrub
## 12                   Herbaceous
## 13                  Hay/Pasture
## 14             Cultivated Crops
## 15               Woody Wetlands
## 16 Emergent Herbaceous Wetlands
plot(nlcd)

nlcdproj=crs(nlcd) ###naming projection for nlcd

Pull in the shapefiles (reptiledata and SE_states)

Use the st_read function to pull in both of these shapefiles - there are several files that make up the shapefile in each folders

** Is the reptile shapefile in the same crs as the nlcd raster? Does it have a crs at all?

You often have to look for the metadata or ask the person that collected the points to know the crs for sure. In this case, we can see from the dataframe that the coordinates are in lat/long, and I am telling you it is ok to define the crs as the same as the nlcd. Defining a crs is different than transforming a file with a crs to a different crs. To define a crs, we use st_crs()

The states shapefile does have a crs, but it differs from the nlcd and reptile shapefile so it will need to be transformed to a different crs using st_transform()

Use the subset or filter function to remove sites where management = Corn. Another way of thinking about it is to keep the sites that are not corn…where != is ‘not equal to’

Plot reptile data and state lines on top of nlcd. HINT: to only plot the state lines (without a fill) use plot(st_geometry(states), add=T)

#your code for the above here:
sites=st_read("reptiledata") 
## Reading layer `reptiledata' from data source 
##   `/Users/joshuasementelli/Downloads/Scale Exercise/reptiledata' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 85 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 812598.9 ymin: 786930.5 xmax: 1373597 ymax: 1014229
## CRS:           NA
sites1=subset(sites, management!="Corn")
crs(sites)
## CRS arguments: NA
st_crs(sites)=nlcdproj #changing sites projection to match nlcd file
crs(sites)
## CRS arguments:
##  +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0
## +datum=WGS84 +units=m +no_defs
SE_states=st_read("Fl_GA_AL")
## Reading layer `FL_GA_AL' from data source 
##   `/Users/joshuasementelli/Downloads/Scale Exercise/FL_GA_AL' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1074243 ymin: -2086803 xmax: 1998476 ymax: -956691.8
## Projected CRS: NAD27_US_National_Atlas_Equal_Area
crs(SE_states)
## CRS arguments:
##  +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +datum=NAD27 +units=m
## +no_defs
SE_states=st_transform(SE_states, nlcdproj)
crs(SE_states)
## CRS arguments:
##  +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0
## +datum=WGS84 +units=m +no_defs
plot(nlcd)
plot(st_geometry(SE_states, nlcdproj), add=TRUE)
plot(sites, pch=20, color="black", add=TRUE)
## Warning in plot.sf(sites, pch = 20, color = "black", add = TRUE): ignoring all
## but the first attribute

** How many survey points are we going to use in our analysis? across which states?

Manipulate the nlcd data

Because, we are only interested in forest cover for this exercise, we need to pull out only the forest cover classes. Use the unique() function to determine how many land cover classes there are in this particular raster. Note: 41, 42 and 43 are all forest. You can learn more about NLCD data and what the other levels mean here: https://www.mrlc.gov/data/legends/national-land-cover-database-2019-nlcd2019-legend

I have provided code for creating a new raster called ‘forest’ that has cell values for forest = 1, and all others = 0. To do this, we have to perform a reclassification.

#your code to determine how many landcover types there are:
tempdir()
## [1] "/var/folders/v1/8rk3bz_s2q7fj17y7p1043w00000gn/T//RtmpY4yfb0"
?removeTmpFiles
#merge all forest classes together 
#reclassification vector - keeps only the three forest classes which are 8-10 in freq(nlcd)
reclass <- c(rep(0,7), rep(1,3), rep(0,6))
reclass #look at it
##  [1] 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
lc_cats <- c(1,11,21,22,23,24,31,41,42,43,52,71,81,82,90,95)

reclass.mat <- cbind(lc_cats, reclass)
reclass.mat#look at it, think about what this reclassification matrix is showing
##       lc_cats reclass
##  [1,]       1       0
##  [2,]      11       0
##  [3,]      21       0
##  [4,]      22       0
##  [5,]      23       0
##  [6,]      24       0
##  [7,]      31       0
##  [8,]      41       1
##  [9,]      42       1
## [10,]      43       1
## [11,]      52       0
## [12,]      71       0
## [13,]      81       0
## [14,]      82       0
## [15,]      90       0
## [16,]      95       0
forest <- classify(nlcd, reclass.mat) #this will take a minute

plot(forest)
plot(sites, pch = 19, add = T)
## Warning in plot.sf(sites, pch = 19, add = T): ignoring all but the first
## attribute

** Write in your own words what this classify function is doing.

** Do you feel confident reclassifying a land cover raster in future assignments, even if the classification criteria were different (i.e., not just retaining forest)? If not, what is still confusing your about the above steps?

Create buffers around points

Use the st_buffer() function to create 100m, 500m, 1km, and 2km buffers around the reptile points and plot them on the forest map. Be sure to refer to the crs to make sure you know the units of the file - is it in meters? Also, look up the st_buffer() function to see what information you need to include.

Use mapview to interactively map your buffers to make sure they are all there and seem correct.

#your code here
?st_buffer
buffer1=st_buffer(sites1,100 )
buffer2=st_buffer(sites1, 500)
buffer3=st_buffer(sites1, 1000)
buffer4=st_buffer(sites1, 2000)
mapview(buffer1)+buffer2+buffer3+buffer4

Extract forest cover within buffers and merge into one dataframe

Because there is an extract function in tidyverse, we have to specify that we want to use the one in the raster package with raster::extract(). I have provided one line of code to do this, but you need to fill in the appropriate file name and replicate for the other buffer sizes.

#first convert sf buffers to spatVector b/c the terra extract function does not recognize sf files
buff1.v<-vect(buffer1)
buff2.v=vect(buffer2)
buff3.v=vect(buffer3)
buff4.v=vect(buffer4)

ext100 = terra::extract(forest, buff1.v, fun=sum)
ext500 = terra::extract(forest, buff2.v, fun=sum)
ext1km = terra::extract(forest, buff3.v, fun=sum)
ext2km = terra::extract(forest, buff4.v, fun=sum)
#values in dataframe are the number of cells....because forest=1 and no forest=0

#repeat this for the other spatial extents:
?merge
## Help on topic 'merge' was found in the following packages:
## 
##   Package               Library
##   sp                    /Library/Frameworks/R.framework/Versions/4.0/Resources/library
##   base                  /Library/Frameworks/R.framework/Resources/library
##   raster                /Library/Frameworks/R.framework/Versions/4.0/Resources/library
##   terra                 /Library/Frameworks/R.framework/Versions/4.0/Resources/library
## 
## 
## Using the first match ...
#merge extracted values. 
forest_cov <- merge(ext100, ext500,  by = 'ID')
names(forest_cov) <- c("ID", "For100m", "For500m")
forest_cov <- merge(forest_cov ,ext1km, by = 'ID')
names(forest_cov) <- c("ID", "For100m", "For500m", "For1km")
forest_cov <- merge(forest_cov, ext2km, by = 'ID' )

#feel free name your files differently  
#rename the columns to be more meaningful
names(forest_cov) <- c("ID", "For100m", "For500m", "For1km", "For2km")
?names
## Help on topic 'names' was found in the following packages:
## 
##   Package               Library
##   base                  /Library/Frameworks/R.framework/Resources/library
##   raster                /Library/Frameworks/R.framework/Versions/4.0/Resources/library
##   terra                 /Library/Frameworks/R.framework/Versions/4.0/Resources/library
##   satellite             /Library/Frameworks/R.framework/Versions/4.0/Resources/library
## 
## 
## Using the first match ...
#You can only merge two dataframes together at once, so need to do this two more times to get all four spatial extents in one dataframe

#make sure the site ID carries over by adding the ID column from the buff1km dataframe using cbind()
forest_cov<-cbind(forest_cov, buffer1[,1])
?cbind
#look at it
head(forest_cov)
##   ID For100m For500m For1km For2km site                       geometry
## 1  1      31     504   1661   8628  AL1 POLYGON ((846379.4 921444.9...
## 2  2       3     615   2788   8722 AL10 POLYGON ((899163.5 989168.9...
## 3  3      33     780   2661   9828 AL11 POLYGON ((898855.1 990398.2...
## 4  4       0     329   1826   8162 AL12 POLYGON ((867789.7 1007136,...
## 5  5      35     519   1833   9249 AL13 POLYGON ((868634.7 1001561,...
## 6  6       2     137    850   8243 AL14 POLYGON ((867388.2 1006418,...

Calculate forest area and proportion forest

#We can multiply the forest cell count by the grain to get the forest area 
#and divide by the buffer size to get the proportion of forest.

#area of each cell, in ha (ha = 10000 square meters)
res(forest)
## [1] 30 30
grainarea <- (30^2)/10000 

#area of buffers, in ha (ha = 10000 square meters)
bufferarea_100m <- (3.14159*100^2)/10000#pi*r^2
bufferarea_500m <- (3.14159*500^2)/10000
bufferarea_1km <- (3.14159*1000^2)/10000
bufferarea_2km <- (3.14159*2000^2)/10000

#Use the mutate function to create 3 new columns with forest area (number of forest cells times the grain area) within each buffer
forest_cov <- forest_cov %>% 
 mutate(For100m_area = For100m*grainarea, 
        For500m_area = For500m*grainarea,
        For1km_area = For1km*grainarea,
        For2km_area = For2km*grainarea)

# do this same thing to calculate the percent cover of forest within the buffer: 
# area of forest cells divided by the buffer area * 100

forest_cov <- forest_cov %>% 
 mutate(For100m_percover = (For100m_area/bufferarea_100m)*100, 
        For500m_percover = (For500m_area/bufferarea_500m)*100,
        For1km_percover = (For1km_area/bufferarea_1km)*100,
        For2km_percover = (For2km_area/bufferarea_2km)*100)

# To look at the variation in forest cover within a 1km area around survey points, use the following mapview code and change the base map to Esri.WorldImagery once the interactive map opens

library(RColorBrewer) # allows us to pic a color scheme that will show up over the basemap
#first convert forest_cov to sf so it can be recognized by mapview
forest_cov_sf<-st_as_sf(forest_cov)
mapview(forest_cov_sf, zcol = "For1km_percover", 
        col.regions = brewer.pal(9, "OrRd"), alpha.regions = 0.4)
## Warning: Found less unique colors (9) than unique zcol values (77)! 
## Interpolating color vector to match number of zcol values.

** Do you think forest area or proportion forest is a better predictor of five-lined skink presence/absence? why?

** Make a prediction about what scale/extent will be most important in predicting five-lined skink presence? Why did you select that extent over the others?

Get reptile presence/absence data and develop models

Create a new data frame that includes the site name, the x and y coordinates, and forest area and proportion forest. I have provided the code for this.

Read in the presence/absence data for the five-lined skink: “reptiles_flsk.csv” and merge with the data frame you just created. The previous shapefile just included the point locations and did not have these data. I have provided the code for this.

Now you have all the information you need to model species presence as a function of forest cover.

I have provided the code for one logistic regression model, but would like you to model species presence at three different scales (100m, 500m, 1km)

#make a data frame
reptile=read.csv("reptiles_flsk.csv", header = T)
forest.scale <- data.frame(site=sites1$site,
                           x=sites1$coords_x1, y=sites1$coords_x2,
                           f100m=forest_cov$For100m_area,
                           f500m=forest_cov$For500m_area,
                           f1km=forest_cov$For1km_area,
                           fp100m=forest_cov$For100m_percover,
                           fp500m=forest_cov$For500m_percover,
                           fp1km=forest_cov$For1km_percover)

#Read in herp data and merge with above data frame
reptile <- read.csv("reptiles_flsk.csv", header=T)
reptile <- merge(reptile, forest.scale, by="site", all=F)

#run logistic regression - family is binomial because the response variable, presence, is 0/1

pres.100m <- glm(pres ~ fp100m , family = "binomial", data = reptile)
pres.500m<- glm(pres ~ fp500m , family = "binomial", data = reptile)
pres.1km<- glm(pres ~ fp1km , family = "binomial", data = reptile)
#summary information
summary(pres.100m)
## 
## Call:
## glm(formula = pres ~ fp100m, family = "binomial", data = reptile)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8051  -0.7877  -0.7125  -0.6579   1.8094  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.420476   0.513598  -2.766  0.00568 **
## fp100m       0.004462   0.006632   0.673  0.50107   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 86.608  on 77  degrees of freedom
## Residual deviance: 86.143  on 76  degrees of freedom
## AIC: 90.143
## 
## Number of Fisher Scoring iterations: 4
summary(pres.500m)
## 
## Call:
## glm(formula = pres ~ fp500m, family = "binomial", data = reptile)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9950  -0.8208  -0.6145  -0.3906   2.1358  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -2.63560    0.87473  -3.013  0.00259 **
## fp500m       0.02432    0.01279   1.901  0.05724 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 86.608  on 77  degrees of freedom
## Residual deviance: 82.548  on 76  degrees of freedom
## AIC: 86.548
## 
## Number of Fisher Scoring iterations: 4
summary(pres.1km)
## 
## Call:
## glm(formula = pres ~ fp1km, family = "binomial", data = reptile)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1842  -0.7584  -0.5654  -0.3231   2.1919  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.39539    0.98549  -3.445  0.00057 ***
## fp1km        0.03931    0.01553   2.532  0.01135 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 86.608  on 77  degrees of freedom
## Residual deviance: 79.043  on 76  degrees of freedom
## AIC: 83.043
## 
## Number of Fisher Scoring iterations: 4
#In addition to comparing p-values for each model provided in the above summary, you can also compare the odds ratios and the Log-Likelihoods

#odds ratio can be computed by raising e to the power of the logistic coefficient.  Only need to do this if/when your model is significant
exp(coef(pres.100m))  
## (Intercept)      fp100m 
##   0.2415991   1.0044722
exp(coef(pres.500m))
## (Intercept)      fp500m 
##  0.07167594  1.02461498
exp(coef(pres.1km))
## (Intercept)       fp1km 
##  0.03352763  1.04009592
?exp
#To interpret the odds ratio, you have to think about it in the context of the x variable, and in our case that is a continuous variable.  When Odds ratio is equal to 1.02, for every 1% increase in forest cover, there is a 1.02 times increase (or a 2% increase) in the odds of finding a skink. That may not seem like a lot, but over a larger change in forest cover, it would add up!
#https://rstudio-pubs-static.s3.amazonaws.com/182726_aef0a3092d4240f3830c2a7a9546916a.html


#Calculate the log-Likelihood of each model - smaller LL values are best
logLik(pres.100m)
## 'log Lik.' -43.07137 (df=2)
logLik(pres.500m)
## 'log Lik.' -41.27417 (df=2)
logLik(pres.1km)
## 'log Lik.' -39.52152 (df=2)

** At what scale(s) is forest cover a significant predictor five lined skink presence? Is the relationship positive or negative? Use output from the models to justify your answers.

Forest cover within the 1km wide buffer is the most sifnificant predictor of skink presence while displaying a positive relationship

Plotting data

Once you have a model that performs better than the others, it is always a good idea to plot the data to help visualize the relationships. I have provided the code to plot one significant effect. Please be sure to include at least two plots (of two separate models that you think are most interesting/important) with your assignment that you turn in.

#there are several ways to go about plotting model results.
#here is a quick way that includes the raw data as points and the modeled relationship as a line with standard error. Be sure to change file and variable names to match yours
ggplot(reptile, aes(x=fp1km, y=pres))+geom_point()+
  stat_smooth(method="glm", method.args=list(family="binomial"), se=T)+
  labs(x="Proportion forest cover within 1km", y="Prob of skink presence")
## `geom_smooth()` using formula 'y ~ x'

ggplot(reptile, aes(x=fp100m, y=pres))+geom_point()+
  stat_smooth(method="glm", method.args=list(family="binomial"), se=T)+
  labs(x="Proportion forest cover within 100m", y="Prob of skink presence")
## `geom_smooth()` using formula 'y ~ x'

ggplot(reptile, aes(x=fp500m, y=pres))+geom_point()+
  stat_smooth(method="glm", method.args=list(family="binomial"), se=T)+
  labs(x="Proportion forest cover within 500m", y="Prob of skink presence")
## `geom_smooth()` using formula 'y ~ x'

ggplot(reptile, aes(x=f100m, y=pres))+geom_point()+
  stat_smooth(method="glm", method.args=list(family="binomial"), se=T)+
  labs(x="Forest area within 500m", y="Prob of skink presence")
## `geom_smooth()` using formula 'y ~ x'

** Interpret the figures in your own words and with the added detail they provide over the model output alone

There is a significant positive association between the probability of a skink’s presence and the proportion of forest cover.

** Does the probability of five lined skink presence vary with scale? Explain your answer. Why might one scale/extent be better than another for this species?

Landscapes with higher proportion of forest cover display a higher likelihood of a five linked skink being present, while also varying by spatial extent. Skinks may prefer more mature undisturbed forest patches with greater scale.

** Did the results support your prediction about the scale at which forest cover would best predict skink presence? If not, can you explain why this different extent may be more important?

** How might this study of scale and predicting five-lined skink presence be improved?

Either temperature or elevation as potential predictor variables could be useful in predicting skink presence. Perhaps certain tree species located within forest patches would be useful as well.