GEOG71921: Practical 2: looking at species occurrence and characteristic scale

In practical 2 of GEOG71921 we will cover a range of spatial techniques in R that are central to working with spatial data in ecology. Today you will learn how to: 1) convert tabulated data to spatial point distributions, 2) crop data quickly and neatly to a desired study extent, 3) reclassify a raster for more focussed analysis, 4) provide background (or pseudo-absence) points to your study area, 5) assign and transform spatial projections, 6) use for loops and 7) create your own functions in R.

The main aim of the practical will be to evaluate the characteristic scale that describes how the occurrence of Sciurus vulgaris (Eurasian Red Squirrel) responds to land cover. We will use occurrence data downloaded from the National Biodiversity Network Atlas [https://nbnatlas.org/] to model the distribution of Sciurus vulgaris (point data uploaded since 2015) and use the UK Land Cover Map (2015) to characterize associated habitat types. We will evaluate this relationship through a general linear model approach at a number of scales using a series of buffers around occurrence data points.

1. Read in and map the data

Copy the data from the “Sciurus” folder in the GEOG71921 dropbox (get the link from the GEOG71921 Blackboard site) to a folder for week two in your own GEOG71921 area. Open R Studio and create a new script for today’s practical (Prac2.R). Remember how to set the working directory?:

setwd("P:/GEOG70922/Week2")

Install the packages for today’s work:

#Install packages

install.packages(c("raster","dismo","sp","rgdal","mapview","rgeos"))

Load the libraries that we will need from the start of today’s practical (Note you may receive some information or warnings in the R console similar to the ones below as produced on my machine, you can ignore these- they just provide information about versions and compatibility):

library(raster)      #for working with raster data

## Loading required package: sp

library(rgdal)       #for working with coordinate systems

## Warning: package 'rgdal' was built under R version 4.0.5

## rgdal: version: 1.5-23, (SVN revision 1121)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/mlibxmda/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/mlibxmda/Documents/R/win-library/4.0/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Overwritten PROJ_LIB was C:/Users/mlibxmda/Documents/R/win-library/4.0/rgdal/proj

library(sp)          #for working with vector data
library(rgeos)

## rgeos version: 0.5-5, (SVN revision 640)
##  GEOS runtime version: 3.8.0-CAPI-1.13.1 
##  Linking to sp version: 1.4-5 
##  Polygon checking: TRUE

Now read in the spreadsheet containing the coordinates for occurrence locations and some other miscellaneous data columns. Notice that when you read in a .csv file the object automatically takes the class of data.frame.

sciurus<- read.csv("Sciurus.csv")


head(sciurus) # view the first six rows of the data

##                    NBN.Atlas.record.ID  Scientific.name Name.qualifier
## 1 fd44c89f-dc54-4bc8-b242-b25f34714c67 Sciurus vulgaris             NA
## 2 fbcf21ec-d168-4b2b-9644-d798e0d296b8 Sciurus vulgaris             NA
## 3 fbc7523e-c291-497f-8c1f-821e7a715e7a Sciurus vulgaris             NA
## 4 f9dad20d-9ea8-4b58-aafd-cdeee990d703 Sciurus vulgaris             NA
## 5 f55a1c93-8b8d-4f9d-88be-b81561231bc9 Sciurus vulgaris             NA
## 6 e2d32bf2-d58d-4cfc-ae48-ac5668d7842f Sciurus vulgaris             NA
##             Common.name Species.ID..TVK. Taxon.Rank Occurrence.status
## 1 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 2 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 3 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 4 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 5 Eurasian Red Squirrel NBNSYS0000005108    species           present
## 6 Eurasian Red Squirrel NBNSYS0000005108    species           present
##   Start.date.year Locality  OSGR Latitude Longitude Coordinate.uncertainty_m
## 1            2015          NO64G 56.57747  -2.60386                     2000
## 2            2015          SD31R 53.60975  -2.95368                     2000
## 3            2015          NS99Z 56.17291  -3.62832                     2000
## 4            2015          SD31W 53.60999  -2.92346                     2000
## 5            2015          SD31V 53.59201  -2.92306                     2000
## 6            2015          SD30N 53.55558  -2.98265                     2000
##   Identification.verification.status Vitality
## 1                        Unconfirmed         
## 2      Accepted - considered correct         
## 3                        Unconfirmed         
## 4      Accepted - considered correct         
## 5      Accepted - considered correct         
## 6      Accepted - considered correct

#check the class of the loaded .csv file.
class(sciurus)

## [1] "data.frame"

We will do a small amount of data cleaning next, to ensure there are no NA values in our coordinate data (this would cause later steps to crash). Here we use the subset() function and !is.na() to select all rows with complete values for latitude (‘!’ means “is not” in the R language). In addition let’s remove all records with coordinate uncertainty greater than 100 m.

#subset the data to only include points with complete coordinates.
sciurus<-subset(sciurus,!is.na(sciurus$Latitude))

#remove all points with uncertainty > 100m
sciurus<-subset(sciurus, sciurus$Coordinate.uncertainty_m<100)

2. Create a spatialPoints layer from coordinates

Next we make a SpatialPoints layer from our sciurus object. For this we combine two objects, a list of cooordinates (in this case lat and long) and a Proj.4 string. The latter tells R what coordinate system our coordinates are derived from. In this case we are working (for now) in the WGS84 system so we create an object (crs.latlong) to hold a character string that contains the information for this coordinate system. The full string that describes the projection for WGS84 is “+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0”. Fortunately, R allows use to pass the epsg code for WGS84 instead. epsg codes are a kind of shorthand for projection strings and the code for WGS84 is “epsg:4326”. So all we have to do is assign this code to an object (“crs.latlong” below) using the crs() function and tell the SpataialPoints() function that this object contains our projection string.

Note the use of the ‘$’ symbol below for accessing colmuns of dataframes.

#make spatial points layer

#create crs object
sciurus.latlong<-data.frame(x=sciurus$Longitude,y=sciurus$Latitude)

head(sciurus.latlong) #inspect the first few rows

##          x        y
## 1 -4.60706 57.57929
## 2 -4.54146 57.53051
## 3 -4.48384 57.55013
## 4 -5.59317 54.55672
## 5 -1.96440 55.15190
## 6 -3.62640 57.31776

crs.latlong<-crs("+init=epsg:4326")

#Combine coordinates and crs object to create our spatial points object

sciurus.sp<-SpatialPoints(sciurus.latlong,proj4string = crs.latlong)

#check that the points now have our desired crs. 
crs(sciurus.sp)

## CRS arguments: +proj=longlat +datum=WGS84 +no_defs

Type ?crs() directly into the console and press enter to bring up information about this function in the help panel. The crs() function is very versatile and allows us to get or set the crs of any raster or Spatial object (i.e. any vector object associated with the ‘sp’ library) as well as creating character objects that hold projection strings that can then be applied to any number of objects. The latter is important when you have data without a formal projection (usually from a text or csv file, as in our case) and want to assign one from scratch. The crs() function uses the PROJ.4 library and relies on rgdal for projecting. You can find more information on epsg codes here [http://www.epsg-registry.org/] and a handy overview of using CRS in R here [https://tinyurl.com/y65j2o94]

Now plot our points:

plot(sciurus.sp)

Now let’s have a look at a handy tool in R that allows us to work interactively with spatial data in R - the ‘mapview’ library. This allows use to look at our data and pan around their extent against base maps similar to how you would using ArcGIS. Run the following code then have a play around with the options. When you’re ready, move on to the next step. We will focus our analysis around the Cairngorms area of Scotland, see if you can locate it on the map.

library(mapview)

## Warning: package 'mapview' was built under R version 4.0.5

mapview(sciurus.sp)#Note: you may receive a warning message about Feature IDs when you run this command but you can ignore this for now).

We have a lot of occurrence data and although it would be nice to consider all of these points, that would be computationally too intensive for our practical session so let’s cut them down to something we can work with by clipping them to a much smaller extent. To do this we use the crop() function and create a new extent:

#First set the exetent to something workable

studyExtent<-c(-4.2,-2.7,56.5,57.5) #list coordinates in the order: min Long, max Long, min Lat, max Lat

#now crop our points to this area

C<-crop(sciurus.sp,studyExtent)

3. Mapping land-cover

Now lets bring in our landcover data to see where our points fit on the map.Y ou will probably receive a warning message on loading the raster layer. This relates to some changes and updates to the Proj library which may not yet be fully implemented in certain packages (such as raster). These deprecated versions still work fine for now so this is nothing to worry about.

#read in our raster data

LCM<-raster("LCMUK.tif")

## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum OSGB 1936 in Proj4 definition

#check crs

crs(LCM)

## CRS arguments:
##  +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +ellps=airy +units=m +no_defs

The raster is set to OSGB, so our two data sets have different projections. Let’s fix that with the spTransform() function. Here we create an object containing the crs information for LCM inside the spTransform() function and pass it to our points layer:

sciurusFin<-spTransform(C,crs(LCM))

#inspect
crs(sciurusFin)

## CRS arguments:
##  +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +ellps=airy +units=m +no_defs

Note that above we did not need to use any character strings to define our projection. This is down to the versatility of the crs() function whereby, in this case, we can simply pass the crs of the LCM layer directly to the spTransform function.

Before we continue, let’s crop the land cover data to the extent of our points data (plus 5km to allow space for the buffers we will create in subsequent steps).

sciurusCoords<-coordinates(sciurusFin)




x.min <- min(sciurusCoords[,1]) - 5000
x.max <- max(sciurusCoords[,1]) + 5000
y.min <- min(sciurusCoords[,2]) - 5000
y.max <- max(sciurusCoords[,2]) + 5000

extent.new <- extent(x.min, x.max, y.min, y.max)

extent(extent.new)

## class      : Extent 
## xmin       : 265055.5 
## xmax       : 362642.6 
## ymin       : 731572.5 
## ymax       : 851775.9

LCM <- crop(LCM, extent.new)

plot(LCM)

plot(sciurusFin,add=TRUE)

Now we are ready to start characterizing the locations of our points with the LCM layer.

4. Generating (pseudo-)absence points

Our analysis will be based on attempting to predict the locations of our points relative to the underlying habitat. In order to do this however we want also to characterize areas where Sciurus vulgaris doesn’t occur. We could try without absence data (and some algorithms such as Maxent and Bioclim are built for this) but it is more robust to provide absence as well as presence data in our modelling. Since we don’t have formal absence point locations, we will simulate our own by creating random background points (also called pseudo-absence points). For this we will use the randomPoints() function from the ‘dismo’ package (this is a package in R developed specifically for carrying our species distribution analyses).This function creates a matrix of random coordinates which we can then convert to a SpatialPoints layer (try ?randomPoints in the console for more information on this function).

Note that in practice we might create many more background points to get a representative sample but we will settle for 1000 for this practical to reduce processing time later on. Note that we provide our sciurusFin layer as an argument so that the randomPoints function avoids cells containing our presence data. We also provide the extent so that our analysis is based on a comparable set of resources.

library(dismo)

set.seed(11)

back.xy <- randomPoints(LCM, p=sciurusFin, n=1000,ext = extent(sciurusFin)) 

#create a spatialPoints layer from the back.xy matrix

back.xy<-SpatialPoints(back.xy)

plot(LCM)

plot(sciurusFin,add=T) # add the presence points to the map with the "add=" argument

plot(back.xy,add=TRUE, col='red') # set the background points to red (see the hints page on Blackboard for more info on colours and plotting)

A<-SpatialPoints(back.xy,crs(LCM)) #create a spatial points layer from the background points using the crs of the LCM object.

crs(A) #check the crs

## CRS arguments:
##  +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +ellps=airy +units=m +no_defs

#create a SptialPoints layer for our presence data containing only XY coordinates.
P<-sciurusFin

5. Characterizing point locations

Now we will use some functions that we learned last week to extract land cover values to our points and plot them to compare presence and absence locations.

eA<-extract(LCM,A) #this creates a vector of values extracted from the LCM layer to the background points

eP<-extract(LCM,P)  #this creates a vector of values extracted from the LCM layer to the presence points

A very useful function for cross-tabulating data (i.e. summing counts per category of interest) is the table() function. Let’s use this to tabulate counts per landcover type for our presence and absence points.

#calculate point frequency per landcover category (headings are landcover category codes - see LCM documentation for descriptions)
table(eA) # tabulate frequency of unique values (i.e. landcover types) at background points

## eA
##   0   1   2   3   4   7   8   9  10  11  12  14  20  21 
##   3  50 144  86 104 105   1 349  62  28  49   4   1  14

table(eP)  # tabulate frequency of unique values (i.e. landcover types) at presence points

## eP
##   1   2   3   4   7   9  10  11  14  21 
## 101  87  26  45  28   1   1   1   1  25

Use the range of values provided by table(eA) and table(eP) to set the x axes of two histograms comparing presence and absence locations. Note ylim is will be 1 on this occasion as we will set the argument ‘freq’ in the hist() function to FALSE so that the values on the y axis of our histograms represent proportions rather than counts (this is important as the number of points in our presence and absence data are markedly different).

par(mfrow=c(1,2)) # set the plotting pane up so that two plots are given side by side (i.e. rows=1 and columns =2) 

hist(eA,freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,1)) #plot a histogram of land cover at background point with 21 categories on the x axis and 0-1 scale on the y. Setting "freq=FALSE" tells R to give proportions rather than raw counts. 


hist(eP,freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,1))# do the same for the presence points

The max value on our y axis seems to be 0.4, so change ylim to ylim=c(0,0.5) for both histograms and run again (remember the c() function for combining elements - here min and max values for our axes).

hist(eA,freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,0.4))

hist(eP,freq=FALSE,breaks=c(0:21),xlim=c(0,21),ylim=c(0,0.5))

From these outputs we can see that the proportion of points occuring in broadleaf woodland (class 1 in the UK Landcover Map - see documentation) is much higher for presence relative to background points. The focus for the rest of the practical will be determining to what extent this response to broadleaf woodland differs as a function of scale. If it does, then what is the characteristic scale of this effect (so we can operationalize this optimum scale in future analyses, for example, for species distribution modelling)?

Next let’s create a single data frame to hold both our presence and absence data so that we can start to carry out our scale analysis on the one dataset.

Abs<-data.frame(A,Pres=0) # create a data frame from the A (the extracted background values create above)

head(Abs)

##          x        y Pres
## 1 352287.5 784412.5    0
## 2 345837.5 777687.5    0
## 3 275862.5 795087.5    0
## 4 297387.5 771237.5    0
## 5 306762.5 778462.5    0
## 6 353012.5 824137.5    0

Pres<-data.frame(P,Pres=1) # create a data frame from the A (the extracted presence values create above)


head(Pres) # check the first few rows

##          x        y Pres
## 1 302144.9 826414.9    1
## 2 348875.4 780665.7    1
## 3 313612.9 791182.1    1
## 4 302332.6 820623.6    1
## 5 291470.2 822623.9    1
## 6 293429.5 818942.5    1

sciurusData<-rbind(Pres,Abs) # bind the two data frames by row (both dataframes have the same column headings)

#inspect
head(sciurusData)#first few rows

##          x        y Pres
## 1 302144.9 826414.9    1
## 2 348875.4 780665.7    1
## 3 313612.9 791182.1    1
## 4 302332.6 820623.6    1
## 5 291470.2 822623.9    1
## 6 293429.5 818942.5    1

tail(sciurusData) #last few rows

##             x        y Pres
## 1311 347287.5 843287.5    0
## 1312 302887.5 770287.5    0
## 1313 346762.5 845637.5    0
## 1314 269137.5 765512.5    0
## 1315 314437.5 842387.5    0
## 1316 335437.5 755287.5    0

class(sciurusData)# check object class

## [1] "data.frame"

6. Re-classifying the raster

In order to explore how occurrence responds to broadleaf forest cover, we first need to extract a layer from our LCM raster consisting only of cells assigned to this land cover type in the raster (classification ‘1’). we can do this through a simple reclassification of the raster. First we convert the values of the raster layers to factors. In R factors represent categories (categorical data). You will hopefully have realized that our land cover map is a categorical rather than continuous raster, meaning that the values represent land cover types (rather than some continuous variable such as elevation or slope - we’ll look at these later in the course).

#access levels of the raster by treating them as categorical data ('factors' in R)

LCM<-as.factor(LCM)

levels(LCM)

## [[1]]
##    ID
## 1   0
## 2   1
## 3   2
## 4   3
## 5   4
## 6   5
## 7   7
## 8   8
## 9   9
## 10 10
## 11 11
## 12 12
## 13 13
## 14 14
## 15 15
## 16 16
## 17 17
## 18 18
## 19 19
## 20 20
## 21 21

Looking at this list we can create a binary raster (assigning 1 for broadleaf woodland - the second element in the list - and 0 for everything else).

To reclassify a raster we create a two-column matrix where the first column contains the original cell values and the second column contains our new values (this is similar to how the reclassify function works in ArcGIS only here we provide the matrix ourselves).

#create an vector object called reclass
reclass <- c(0,1,rep(0,19))

# combine with the LCM categories into a matrix of old and new values.
RCmatrix<- cbind(levels(LCM)[[1]],reclass)


#Use the reclassify() function to asssign new values to LCM with our reclassification matrix
broadleaf <- reclassify(LCM, RCmatrix)

Inspect the new data visually:

plot(broadleaf)

plot(sciurusFin,add=TRUE)

7. Buffer Analysis

Before we create a function to implement our analysis for all points, let’s first work with the first point (the first row of our sciurusFin SpatialPoints layer) to understand the buffering process and visualize the result.

buf5km<-5000 #create an object to hold the distance parameter for our buffer


buffer.site1.5km<-buffer(sciurusFin[1,],width=buf5km) # use the buffer() function from the raster package applied to the first item of sciurusFin

Now use the zoom() function for a close up look at our buffered area (this will open in a separate window).

zoom(broadleaf,buffer.site1.5km) # use the zoom() function for a close-up of the result.

plot(buffer.site1.5km,border="red",lwd=5,add=T) # add the buffer



dev.off() # enter this to clear the active graphics pane

Now that we have our buffer we’ll to calculate the percentage cover by broadleaf woodland inside it.

To do so for a buffer width of 5km around our first site as an example we need to create the following objects:

an raster object for the broadleaf woodland inside each buffer (bufferlandcover5km below)
a numeric object holding the area described by each cell in our raster(grainarea below),
a numeric to hold the buffer area for a given radius in ha (bufferArea)
a numeric to hold the total area of woodland inside the buffer (landcover5km)
an object containing the final result as a percentage of the buffered area (percentlandcover5km )

Note that we use the crop() and mask() functions here. The first will clip our data to the extent (i.e. a bounding box delineated by the min and max x,y coordinates) of another object and the second clips to the mask (think of this as the outline) of another object. We could just use mask() directly, but this would take more time so putting both one after the other speeds up the analysis a little.

Also note the use of the res() function to access the resolution (size) of the LCM raster cells (again, you can use ?function() in the console to get more information).

buffer5km <- crop(broadleaf, buffer.site1.5km)#crop the broadleaf layer to the extent of the buffer 
bufferlandcover5km <- mask(broadleaf, buffer.site1.5km)#clip the above again to the circle described by the buffer (doing this speeds up the process compared to using only the mask() function)


bufferArea <- (3.14159*buf5km^2) #calculate the area of the buffer according to the buffer width
landcover5km <- cellStats(bufferlandcover5km, 'sum')*625 #total area of woodland inside the buffer (625 is the area in metres of each cell of our 25m raster) 

percentlandcover5km <- landcover5km/bufferArea*100 #calculate percentage

#return the result
percentlandcover5km

## [1] 9.07741

Now that we are happy with the work flow above, let’s create a function to automate the application of these steps to all rows in our sciurusData data frame.

Creating your own function in R is easy. We simply need to assign a name to our function (we’ll call it ‘landBuffer’), use ‘function()’ to list the arguments that the function requires, then place the associated code (which will produce the objects we just created in the example above) inside two braces {}.

Let’s set an example (we haven’t looked at the abs() function yet,a common term found in many statistical packages - can you work out what it does?).

#Create two numerical vectors to work with
v1<-c(1,2,3,4,5,6)

v2<-c(6,7,8,9,10,11)

#create a function called meanDiff() that will take two objects (vectors), calculate the mean of both and reurn the difference between the mean values:

meanDiff<-function(A,B){
  c<-abs(mean(A)-mean(B))
  print(c)
}

meanDiff(v1,v2)

## [1] 5

A function to automate our buffer calculations:

#function for automating whole dataset. The function is set up take two arguments: a list of coordinates and a buffer width.

landBuffer <- function(coords, size){         
  
  sciurusPoints <- SpatialPoints(cbind(coords[i,1],coords[i,2]))    # spatial points from coordinates  
  sciurusBuffer <- buffer(sciurusPoints, width=size)                     #buffer each point
  bufferlandcover <- crop(woodland, sciurusBuffer)              #crop the woodland layer to the buffer extent
  masklandcover <- mask(bufferlandcover, sciurusBuffer)        # mask the above to the buffer
  landcoverArea <- cellStats(masklandcover, 'sum')*625    #use cellStats() to sum landcover area (625 is the area in metres of each cell of our 25m raster)
  percentcover <- landcoverArea/gArea(buffer)*100             # convert to precentage cover (we use the gArea() function from the rgeos package to get the area of our buffer)
  return(percentcover)                                        # return the result
}

Now we have our function it just remains to apply this function to every row in our sciurusData data frame for a series of buffer distances. We do this with a for loop that will iterate our function over each row in the SciurusData data frame and apply our function to each set of coordinates held in the row (remember the first two columns of our sciursData object are the XY coordinates).

First we create empty vectors for each distance parameter that will hold the result for the corresponding iteration of the landBuffer() function. We will experiment with buffer distances of 100-2000m.

Before we do that, let’s have a look at how for loops work in R. They are called in a similar way as calling any other function and take the format:

for(value in the sequence){ statement }

Try the below as an example:

#example for loop

#Create a vector of values
v1<-c(1,2,3,4,5,6,7,8,9,10,11)

#create a new empty vector to store the results of our for loop

x<-rep(NA,length=length(v1))

for (i in v1){
  x[i]<-(sum(v1)-i)
}

print(x)

##  [1] 65 64 63 62 61 60 59 58 57 56 55

In our case the sequence of values is the rows (‘nrow’) of our sciurusData data frame and the sequence of buffer radii we wish to try. The statement is the application of our function for each of the distances. Note: once run this loop may take several minutes to complete so please be patient!

We set the sequence of radii to provide by using the seq() function:

radii<-seq(100,2000,by=100)

Importantly, we need a place to hold the results for i rows of coordinate and j radii. We’ll just call this ‘res’

res <- matrix(NA, nrow = nrow(sciurusData), ncol = length(radii))
head(res)

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [2,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
## [6,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA    NA    NA
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]    NA    NA    NA    NA    NA    NA
## [2,]    NA    NA    NA    NA    NA    NA
## [3,]    NA    NA    NA    NA    NA    NA
## [4,]    NA    NA    NA    NA    NA    NA
## [5,]    NA    NA    NA    NA    NA    NA
## [6,]    NA    NA    NA    NA    NA    NA

Now we modify the function we made earlier to make it specific to our data so that for each row i we apply the buffer j (i.e. each number in the object ‘radii’). This time the landBuffer function will only have one argument, i, as we will enter the chosen radii j directly into the main body of the landBuffer function (as the second argument of the buffer() function). I have left a line above return(percentcover)} below. If you type “print(i)” (without the speech marks) in this empty line then the function will print each iteration of i to give you some idea of progress (there are 1316 rows in sciurusData).

landBuffer <- function(i){         
  
  
  SciurusPoints<-SpatialPoints(cbind(sciurusData[i,1],sciurusData[i,2]))  
  sciurusBuffer <- buffer(SciurusPoints, width=radii[j])           #buffer each point
  bufferlandcover <- crop(broadleaf, sciurusBuffer)                 #crop the landcover layer to the buffer extent
  masklandcover <- mask(bufferlandcover, sciurusBuffer)            # mask the above to the buffer
  landcoverArea <- cellStats(masklandcover, 'sum')*625      #use cellStats() to sum landcover area
  percentcover <- landcoverArea/gArea(sciurusBuffer)*100           # convert to precentage cover
  
  return(percentcover)} #get the result

Now run a for loop that iterates over each row (coordinates) in sciurusData and applies each buffer size (by using the seq_along() function), returning the results to our ‘res’ matrix. The matrix should be set up so that there are enough rows for each iteration of j - so when each row in a column is filled R will start to fill the next column. This way the results from our different buffer sizes should be separated into unique columns.

for(i in 1:nrow(sciurusData)){

  for(j in seq_along(radii)){
    
    res[i,j]<-landBuffer(i)}
  
  }

Take a look at the result in the ‘res’ matrix object.

#inspect
head(res)

##          [,1]     [,2]      [,3]     [,4]        [,5]      [,6]      [,7]
## [1,] 51.72798 34.31953 25.200812 24.49618 21.16868214 20.282233 20.666831
## [2,]  0.00000  0.00000  0.000000  0.00000  0.07958151  1.160564  1.583510
## [3,] 37.80122 28.35091 21.221736 16.66238 15.99588387 13.097790 11.653007
## [4,] 23.87445 21.38753 13.042526 11.43984  8.19689572  8.897655  8.526591
## [5,]  0.00000  0.00000  4.421195  4.47646  3.26284198  2.542187  3.126417
## [6,]  0.00000  0.00000  0.000000  0.00000  0.00000000  0.000000  0.000000
##           [,8]      [,9]     [,10]     [,11]     [,12]     [,13]     [,14]
## [1,] 19.833205 19.969065 21.288054 22.460402 23.860637 25.440184 25.478265
## [2,]  1.865192  2.603593  3.143470  3.288492  3.163918  3.143234  3.278677
## [3,] 12.030486 11.961789 11.181202 10.309423  9.519386  9.241344  8.414933
## [4,]  7.491853  6.828290  6.724638  6.938719  6.369284  6.380648  6.425395
## [5,]  3.450605  5.600180  7.241918  8.500752  8.082497  7.522572  6.841574
## [6,]  0.000000  0.884239  2.526713  4.324367  7.750908 10.948344 13.307572
##          [,15]     [,16]     [,17]     [,18]     [,19]     [,20]
## [1,] 25.236182 25.110143 24.604180 23.800767 22.794261 22.183346
## [2,]  3.519271  3.590494  3.655518  3.672048  3.720050  3.566246
## [3,]  7.692879  7.010012  6.374782  5.802819  5.219092  4.754995
## [4,]  7.339184  7.530711  7.496909  7.829200  8.128998  7.823857
## [5,]  6.446102  6.419368  6.312824  6.054581  5.869412  5.545837
## [6,] 15.580292 18.224477 20.019294 21.166472 21.532200 21.228368

Now convert the values to a data frame so that we can do some analysis on these results.

#create data frame from the res matrix so we can do some statistics

glmData<-data.frame(res)
colnames(glmData)<-c("w100","w200","w300","w400","w500","w600","w700","w800","w900","w1000","w1100","w1200","w1300","w1400","w1500","w1600","w1700","w1800","w1900","w2000")


head(glmData)

##       w100     w200      w300     w400        w500      w600      w700
## 1 51.72798 34.31953 25.200812 24.49618 21.16868214 20.282233 20.666831
## 2  0.00000  0.00000  0.000000  0.00000  0.07958151  1.160564  1.583510
## 3 37.80122 28.35091 21.221736 16.66238 15.99588387 13.097790 11.653007
## 4 23.87445 21.38753 13.042526 11.43984  8.19689572  8.897655  8.526591
## 5  0.00000  0.00000  4.421195  4.47646  3.26284198  2.542187  3.126417
## 6  0.00000  0.00000  0.000000  0.00000  0.00000000  0.000000  0.000000
##        w800      w900     w1000     w1100     w1200     w1300     w1400
## 1 19.833205 19.969065 21.288054 22.460402 23.860637 25.440184 25.478265
## 2  1.865192  2.603593  3.143470  3.288492  3.163918  3.143234  3.278677
## 3 12.030486 11.961789 11.181202 10.309423  9.519386  9.241344  8.414933
## 4  7.491853  6.828290  6.724638  6.938719  6.369284  6.380648  6.425395
## 5  3.450605  5.600180  7.241918  8.500752  8.082497  7.522572  6.841574
## 6  0.000000  0.884239  2.526713  4.324367  7.750908 10.948344 13.307572
##       w1500     w1600     w1700     w1800     w1900     w2000
## 1 25.236182 25.110143 24.604180 23.800767 22.794261 22.183346
## 2  3.519271  3.590494  3.655518  3.672048  3.720050  3.566246
## 3  7.692879  7.010012  6.374782  5.802819  5.219092  4.754995
## 4  7.339184  7.530711  7.496909  7.829200  8.128998  7.823857
## 5  6.446102  6.419368  6.312824  6.054581  5.869412  5.545837
## 6 15.580292 18.224477 20.019294 21.166472 21.532200 21.228368

#add in the presences data
glmData$Pres<-sciurusData$Pres


head(glmData)

##       w100     w200      w300     w400        w500      w600      w700
## 1 51.72798 34.31953 25.200812 24.49618 21.16868214 20.282233 20.666831
## 2  0.00000  0.00000  0.000000  0.00000  0.07958151  1.160564  1.583510
## 3 37.80122 28.35091 21.221736 16.66238 15.99588387 13.097790 11.653007
## 4 23.87445 21.38753 13.042526 11.43984  8.19689572  8.897655  8.526591
## 5  0.00000  0.00000  4.421195  4.47646  3.26284198  2.542187  3.126417
## 6  0.00000  0.00000  0.000000  0.00000  0.00000000  0.000000  0.000000
##        w800      w900     w1000     w1100     w1200     w1300     w1400
## 1 19.833205 19.969065 21.288054 22.460402 23.860637 25.440184 25.478265
## 2  1.865192  2.603593  3.143470  3.288492  3.163918  3.143234  3.278677
## 3 12.030486 11.961789 11.181202 10.309423  9.519386  9.241344  8.414933
## 4  7.491853  6.828290  6.724638  6.938719  6.369284  6.380648  6.425395
## 5  3.450605  5.600180  7.241918  8.500752  8.082497  7.522572  6.841574
## 6  0.000000  0.884239  2.526713  4.324367  7.750908 10.948344 13.307572
##       w1500     w1600     w1700     w1800     w1900     w2000 Pres
## 1 25.236182 25.110143 24.604180 23.800767 22.794261 22.183346    1
## 2  3.519271  3.590494  3.655518  3.672048  3.720050  3.566246    1
## 3  7.692879  7.010012  6.374782  5.802819  5.219092  4.754995    1
## 4  7.339184  7.530711  7.496909  7.829200  8.128998  7.823857    1
## 5  6.446102  6.419368  6.312824  6.054581  5.869412  5.545837    1
## 6 15.580292 18.224477 20.019294 21.166472 21.532200 21.228368    1

Use these data to model Sciurus vulgaris occurrence as a function of forest cover at each scale with the general linear model function: glm(). In glm() we state the dependent variable as the first argument followed by tilda (~), we give “binomial” to indicate our dependent variable is binary (i.e. 0 and 1)

#make a list of glms (general linear models) to test the relationship between broadleaf cover and presence for the different buffer sizes

glm100<-glm(Pres~w100,family = "binomial",data = glmData)
glm200<-glm(Pres~w200,family = "binomial",data = glmData)
glm300<-glm(Pres~w300,family = "binomial",data = glmData)
glm400<-glm(Pres~w400,family = "binomial",data = glmData)
glm500<-glm(Pres~w500,family = "binomial",data = glmData)
glm600<-glm(Pres~w600,family = "binomial",data = glmData)
glm700<-glm(Pres~w700,family = "binomial",data = glmData)
glm800<-glm(Pres~w800,family = "binomial",data = glmData)
glm900<-glm(Pres~w900,family = "binomial",data = glmData)
glm1000<-glm(Pres~w1000,family = "binomial",data = glmData)
glm1100<-glm(Pres~w1100,family = "binomial",data = glmData)
glm1200<-glm(Pres~w1200,family = "binomial",data = glmData)
glm1300<-glm(Pres~w1300,family = "binomial",data = glmData)
glm1400<-glm(Pres~w1400,family = "binomial",data = glmData)
glm1500<-glm(Pres~w1500,family = "binomial",data = glmData)
glm1600<-glm(Pres~w1600,family = "binomial",data = glmData)
glm1700<-glm(Pres~w1700,family = "binomial",data = glmData)
glm1800<-glm(Pres~w1800,family = "binomial",data = glmData)
glm1900<-glm(Pres~w1900,family = "binomial",data = glmData)
glm2000<-glm(Pres~w2000,family = "binomial",data = glmData)

Now take all the logliklihoosd statistics and combine into a new data frame so we can plot the effect of scale on our modelling.

#make a list of loglikelihood values

ll100<-as.numeric(logLik(glm100))
ll200<-as.numeric(logLik(glm200))
ll300<-as.numeric(logLik(glm300))
ll400<-as.numeric(logLik(glm400))
ll500<-as.numeric(logLik(glm500))
ll600<-as.numeric(logLik(glm600))
ll700<-as.numeric(logLik(glm700))
ll800<-as.numeric(logLik(glm800))
ll900<-as.numeric(logLik(glm900))
ll1000<-as.numeric(logLik(glm1000))
ll1100<-as.numeric(logLik(glm1100))
ll1200<-as.numeric(logLik(glm1200))
ll1300<-as.numeric(logLik(glm1300))
ll1300<-as.numeric(logLik(glm1300))
ll1400<-as.numeric(logLik(glm1400))
ll1500<-as.numeric(logLik(glm1500))
ll1600<-as.numeric(logLik(glm1600))
ll1700<-as.numeric(logLik(glm1700))
ll1800<-as.numeric(logLik(glm1800))
ll1900<-as.numeric(logLik(glm1900))
ll2000<-as.numeric(logLik(glm2000))

#combine into a vector
ll<-c(ll100,ll200,ll300,ll400,ll500,ll600,ll700,ll800,ll900,ll1000,ll1100,ll1200,ll1300,ll1400,ll1500,ll1600,ll1700,ll1800,ll1900,ll2000)

#do the same for our distance values
Dist<-c(100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000)

#make a dataframe
glmRes<-data.frame(cbind(Dist,ll))


#and plot

plot(glmRes$Dist, glmRes$ll, type = "b", frame = FALSE, pch = 19, 
     col = "red", xlab = "buffer", ylab = "logLik")

#now, by finding the max loglikelihood we can determine the optimum buffer size

opt<-glmRes[which(glmRes$ll==max(glmRes$ll)),]

#print
opt

##    Dist        ll
## 18 1800 -535.4487

Our results give us some confidence in the characteristic scale that we need to consider for the response of red squirrel to broadleaf woodland. Our data suggest the strongest is at 1800m. Still, we cannot be sure about the variation that may occur within our buffer sizes. At 100m intervals, and a fairly smooth increase in loglikelihood after 300m we can be fairly confident that at 1800m the response is well explained. However we can perform an extra test as to whether the response to broadleaf woodland varies within this larger buffer size by applying a Gaussian filter. A Gaussian filter follows the bell shape of a normal distribution and, within a specified radius, applies more weight to locations closer to the focal cell than those further away. We can create a Gaussian filter using the focalWeight() function in the Raster package. The degree to which the filter “favours” cells near to the focal cell is determined by the sigma argument - the sigma is basically the standard deviation of a normal distribution which determines the shape of the filter - smaller sigma = more weight given to cell closer to the focal cell (the centre of the buffer). The other argument that the focalWeight function takes is the size of the buffer to be used (1800m in our case).

Once the focal weight is established we use the focal() functions to create a new raster for which each cell is the result of some statistic of the neighbourhood created by the focalWeight() function. Here we will ‘sum’ as the statistic such that all woodland cells (with a value of 1) will be multiplied by the corresponding values in the focalWeight object (the Gaussian filter) where nearby cells have higher values than those further away.

We’ll experiment with different sigma values to see which gives us the best model using the rationale that if the smaller sigma gives us a better result then there is evidence that, within our buffer extent, locations closer to the focal cell are more important than those further away. An advantage of using focal analysis is that we can also easily use a much greater number of background points as we can simply use the extract() function to get the data from the new focal raster for our modelling without the need to make buffers around each point as above (so it’s quicker). For now though we’ll use the same number of background points as above so that we can compare log likelihood values (adding more sampling points will affect the log likelihood statistic and prevent a fair comparison).

We can look at the relative weights created by the different sigma values simply by typing the name of the object we create. For a radius of 50 metres and a very low sigma of 1 (meaning cells closer to the focal cell have much more weight) we would specify the following (you can experiment with the sigma values to look at the effect this has):

fw.test <- focalWeight(broadleaf, c(1,50),type= 'Gauss')

#you can type ?focal() into the console to get information on the other options for creating focal weights.

Now let’s run several iterations for increasingly large sigma values to evaluate the effects on our 1800m buffer. This step can take several minutes to run so please be patient.

sigma<-c(100,200,300,400,500) #set up list of sigma values to test

loglikFocal<- rep(NA,length=length(sigma)) #create empty vector to hold results.


#loop through sigma values
for(i in sigma){
  print (i)
  
  
  fw.i <- focalWeight(broadleaf, c(i,1800),type= 'Gauss')#buffer around each cell of the broadleaf layer, applying our 1800m distance and the sigma
  
  
  focalBroadleaf<-focal(broadleaf,fw.i,sum) #the focal() function now creates the raster
  
  
  
  eA<-extract(focalBroadleaf,A)
  
  eP<-extract(focalBroadleaf,P)
  
  Abs<-data.frame(eA,Pres=0)
  
  Pres<-data.frame(eP,Pres=1)
  
  names(Abs)<-c("area","Pres")
  names(Pres)<-c("area","Pres")
  
  head(Pres)
  
  sciurusData<-rbind(Pres,Abs)
  
  
  head(sciurusData)
  
  
  pres.i <- glm(Pres ~ area, family = "binomial", data = sciurusData)
  
  LLi<-as.vector(logLik(pres.i))
  
  loglikFocal[i]<-as.numeric(LLi)
  
  
  
}

## [1] 100
## [1] 200
## [1] 300
## [1] 400
## [1] 500

#summary(loglikFocal)

loglikFocal1800<-loglikFocal[!is.na(loglikFocal)]#remove NA values


result<-data.frame(sigma,loglikFocal1800)

plot(result$sigma, result$loglikFocal1800, type = "b", frame = FALSE, pch = 19, 
     col = "red", xlab = "sigma", ylab = "logLik")

Our results show that the loglikelihood improves as we allow increasingly equal weight to a greater number of cells (i.e. at greater distance from the focal cell). This trend confirms what we see in our buffer approach where the loglikelihood steadily increasing as we include a greater number of cells (larger buffer). From this we can infer that further modelling of this species should indeed place emphasis on this “characteristic” scale.

You’ve reached the end of this week’s practical. Please continue to the exercises below.

Week Two independent exercises

Open a new script and execute the following in R.

1.Calculate the characteristic scale(s) of the response by the Eurasian badger (Meles meles) to deciduous tree cover for buffer distances of between 100 and 2000m in 100m intervals. Using the log likelihood statistic for model selection, calculate the optimum scale for modelling badger response to broadleaf tree cover for the same extent as that used in today’s practical. Use the occurrence data in the “Melesmeles.csv” file in the GEOG71921 dropbox. Remove any records that contain “unconfirmed” (under the “Identification verification status” field in the csv file) and any points with coordinate uncertainty > 1000m. Write R code that will produce plots showing the relationship between buffer radii and your model performance (loglikelihood statistic).

Write a 300 word summary of the importance of scale in modelling species-habitat relationships. Draw on ideas covered in the lectures in Weeks 1 and 2 and the suggested reading, relating these to your results (from the practical and exercise one).

GEOG71921: Practical 2: looking at species occurrence and characteristic scale

MD

06/09/2021